Skip to content

[draft] TTS eval#1237

Draft
rfejgin wants to merge 26 commits intomainfrom
rfejgin/2512_tts_eval_merge
Draft

[draft] TTS eval#1237
rfejgin wants to merge 26 commits intomainfrom
rfejgin/2512_tts_eval_merge

Conversation

@rfejgin
Copy link
Collaborator

@rfejgin rfejgin commented Feb 12, 2026

Creating this PR just to easily view diffs.

karpnv and others added 26 commits December 20, 2025 07:00
Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>
Create a small dummy context wav for requests without context_audio_filepath to prevent dataloader failures (missing d*.wav) and 500s from the unified server.
Avoid KV-cache shape mismatches when batch sizes vary between requests in the unified server.
Route HuggingFace resolve URLs used by NeMo audio codec checkpoints through huggingface_hub download so multi-rank server startup avoids repeated downloads and 429s.
Longform decoding with the transformer cache path can produce sequence-length mismatches; disable cache per request batch to prevent 500s in serve_unified.
Correct HuggingFace resolve URL matching so downloads go through hf_hub_download() and avoid multi-rank 429s.
Stop setting srun --wait by default; allow opt-in via cluster_config.srun_wait_seconds.
Add a large srun --wait for multi-instance runs to override nemo_run's default --wait=60, preventing premature termination when some ranks finish earlier.
Lower Magpie inference runner batch size to reduce memory/latency spikes under multi-instance load.
Use a 1-hour default srun --wait for multi-instance runs to avoid premature task termination when chunk runtimes differ.
Introduce the emergent_tts dataset package with prepare/generate/score helpers and default configs to run EmergentTTS evaluation via NeMo-Skills.

Co-authored-by: Cursor <cursoragent@cursor.com>
Install google-genai for EmergentTTS-Eval, run scoring from the dataset base dir so relative paths resolve, and avoid shipping large local caches/data. Document EmergentTTS-Eval usage in nv_tts guide.

Co-authored-by: Cursor <cursoragent@cursor.com>
Document dataset preparation (HF_TOKEN) and evaluation workflow, including cloning and patching EmergentTTS-Eval for NVIDIA Inference API judging.

Co-authored-by: Cursor <cursoragent@cursor.com>
@rfejgin rfejgin changed the title [draft, please ignore] TTS eval [draft] TTS eval Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments