Skip to content

Fix CUDA OOM on long scripts by splitting input into chunks#158

Open
rachana192837 wants to merge 1 commit intomicrosoft:mainfrom
rachana192837:fix-long-script-oom
Open

Fix CUDA OOM on long scripts by splitting input into chunks#158
rachana192837 wants to merge 1 commit intomicrosoft:mainfrom
rachana192837:fix-long-script-oom

Conversation

@rachana192837
Copy link
Copy Markdown

Summary
Fixes #157 (CUDA OOM on long scripts).
Long-form scripts (>10 min) caused the 1.5B multi-speaker model to crash due to GPU memory limits.

Changes

  • Added long_script_inference.py to split long scripts into smaller chunks.
  • Generates audio sequentially and concatenates outputs.
  • Supports 1.5B multi-speaker model without crashing.

How to Test

  1. Run the 1.5B model with a script longer than 10 minutes using long_script_inference.py.
  2. Confirm audio is generated without CUDA OOM errors.
  3. Adjust CHUNK_SIZE if GPU memory is low.
    Notes
  • This is a workaround to reduce memory usage.

@rachana192837
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Copy link
Copy Markdown

@codeCraft-Ritik codeCraft-Ritik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great fix! This effectively addresses the CUDA OOM issue for long-form scripts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inference fails with CUDA out of memory on long scripts

3 participants