Fix ACE-Step audio sample saving (bf16 dtype + waveform shape) by SanDiegoDude · Pull Request #910 · ostris/ai-toolkit

SanDiegoDude · 2026-06-24T15:28:46Z

Summary

Saving generated audio samples for ACE-Step (1.5 and 1.5 XL) currently fails before any sample is written, due to two issues in GenerateImageConfig.save_image in toolkit/config_modules.py:

bf16 dtype — with the default train.dtype: bf16, the generated waveform is bfloat16, which torchaudio/ffmpeg cannot encode:

ValueError: No format found for dtype torch.bfloat16; dtype must be one of
[torch.uint8, torch.int16, torch.int32, torch.int64, torch.float32, torch.float64].

Waveform shape — the ACE-Step pipeline already returns a [channels, time] tensor (it squeezes the batch dim internally), but save_image indexes image[0] again, collapsing it to 1D. ffmpeg then fails with:
```
RuntimeError: Failed to create input filter:
"time_base=1/48000:sample_rate=48000:sample_fmt=flt:channel_layout=0x0" (Invalid argument)
```

The fix casts the waveform to float32 and normalizes it to a 2D [channels, time] tensor (handling 1D/2D/3D inputs) before calling torchaudio.save.

Test plan

Train ACE-Step 1.5 XL with low_vram: true, train.dtype: bf16, and sampling enabled; baseline samples now write as valid 180s MP3s and training proceeds.
Sanity-check a non-audio (image) model still saves correctly (this branch only touches the audio output_ext path).

Made with Cursor

When saving generated audio samples, torchaudio/ffmpeg cannot encode bfloat16/float16 waveforms (raises "No format found for dtype ..."); cast the waveform to float32 first. Also, the ACE-Step pipeline already returns a [channels, time] tensor, so indexing image[0] dropped the channel dimension and produced a 1D tensor, causing ffmpeg to fail with "channel_layout=0x0". Normalize the waveform to a 2D [channels, time] tensor before saving. With the default bf16 training dtype these two issues prevented any ACE-Step sample from being written. Co-authored-by: Cursor <cursoragent@cursor.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix ACE-Step audio sample saving (bf16 dtype + waveform shape)#910

Fix ACE-Step audio sample saving (bf16 dtype + waveform shape)#910
SanDiegoDude wants to merge 1 commit into
ostris:mainfrom
SanDiegoDude:fix/acestep-audio-sample-save

SanDiegoDude commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

SanDiegoDude commented Jun 24, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant