Fix CUDA OOM on long scripts by splitting input into chunks by rachana192837 · Pull Request #158 · microsoft/VibeVoice

rachana192837 · 2025-12-09T12:50:37Z

Summary
Fixes #157 (CUDA OOM on long scripts).
Long-form scripts (>10 min) caused the 1.5B multi-speaker model to crash due to GPU memory limits.

Changes

How to Test

Run the 1.5B model with a script longer than 10 minutes using long_script_inference.py.
Confirm audio is generated without CUDA OOM errors.
Adjust CHUNK_SIZE if GPU memory is low.
Notes

rachana192837 · 2025-12-09T12:53:24Z

@microsoft-github-policy-service agree

codeCraft-Ritik

Great fix! This effectively addresses the CUDA OOM issue for long-form scripts.

Add long_script_inference.py to fix CUDA OOM for long scripts

88fec2f

rachana192837 mentioned this pull request Jan 29, 2026

Inference fails with CUDA out of memory on long scripts #157

Open

codeCraft-Ritik reviewed Mar 31, 2026

View reviewed changes

Provide feedback