[MagpieTTS][bugfix] reset kv cache for longform inference and add missing utmosv2 score #15385
Closed
XuesongYang wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
Closed
[MagpieTTS][bugfix] reset kv cache for longform inference and add missing utmosv2 score #15385XuesongYang wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
XuesongYang wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
Conversation
…nt stale cache from prior batch or datasets Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
… display MOS. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
blisc
reviewed
Feb 11, 2026
| 'gt_audio_filepath', | ||
| 'pred_audio_filepath', | ||
| 'context_audio_filepath', | ||
| 'utmosv2', |
Collaborator
There was a problem hiding this comment.
This will be added in #15381, please remove from yours
Collaborator
Author
There was a problem hiding this comment.
I've reviewed the other PR and don't anticipate any conflicts during a rebase. I suggest we avoid reverting the commit here. Instead, let's simply merge whichever PR is ready first, and then rebase the remaining one.
Collaborator
|
@subhankar-ghosh please review |
Collaborator
|
Drafting since we plan to add this to #15375 |
Collaborator
Author
let's close this PR and move our discussion to that PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two inference bugfixes for MagpieTTS.
1. Reset KV cache at start of longform inference batch
generate_long_form_speechnever reset the decoder KV cache. When the inference scriptprocesses multiple datasets sequentially (e.g., a non-longform dataset followed by a longform
dataset), the prior
generate_speechcall leavesuse_cache=Truewith populated tensors.The longform path then inherits this stale cache, causing a
RuntimeError: Sizes of tensors must matchintorch.catduring self-attention KV concatenation.Fix: call
reset_cache(use_cache=self.model.use_kv_cache_for_inference)at the start of eachlongform batch in
_run_longform_inference, matching the pattern used byinfer_batch.Error Details:
2. Save filewise utmosv2 score in evaluation output
The
utmosv2metric was computed per file but not included in the saved filewise metricsJSON, so downstream visualization (box plots) could not display MOS scores.
Fix: add
'utmosv2'tofilewise_metrics_keys_to_saveinevaluate_generated_audio.py.Error Details:
