[TTS] Infrastructure for parallelization of evaluation (scoring)#15417
Merged
rfejgin merged 28 commits intoNVIDIA-NeMo:mainfrom Feb 24, 2026
Merged
[TTS] Infrastructure for parallelization of evaluation (scoring)#15417rfejgin merged 28 commits intoNVIDIA-NeMo:mainfrom
rfejgin merged 28 commits intoNVIDIA-NeMo:mainfrom
Conversation
For debugging NeMo Skills deployment Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
When running the project from a directory different from the repo root (which happens in NeMo Skills), these paths need to be converted to absolute paths, which is done in this commit. Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
… in Nemo Skills) Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
nemo/collections/tts/modules/magpietts_inference/evaluate_generated_audio.py
Fixed
Show fixed
Hide fixed
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
It has been moved outside of evaluate() since in the NeMo Skills use case we need the full metrics for chunk-wise scoring and aggregation at the end. Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
- Break evaluation into two steps: - evaluate_dir() for directory-level evaluation. Can be run in parallel across multiple directories (e.g. in NeMo Skills) - compute_global_metrics() for global metrics aggregation. - Move model loading to separate function - Cleanup Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
…jgin/NeMo into magpietts_evaluation_parallelization
rfejgin
commented
Feb 23, 2026
| logging.warning(f"Metric '{key}' not found in any measurements") | ||
| results[key] = "N/A" | ||
| continue | ||
|
|
Collaborator
Author
There was a problem hiding this comment.
This check no longer makes sense since the metric names are now inferred from the metrics themselves.
rfejgin
commented
Feb 23, 2026
|
|
||
|
|
||
| # Define the standard metric keys used in evaluation | ||
| STANDARD_METRIC_KEYS = [ |
Collaborator
Author
There was a problem hiding this comment.
Removed for better maintainability - we can infer these names from the metrics themselves.
rfejgin
commented
Feb 23, 2026
| import torch | ||
| from threadpoolctl import threadpool_limits | ||
|
|
||
| # If UTMOSv2 cache is not set but HF_HOME is, use an area under HF_HOME for the cache location |
Collaborator
Author
There was a problem hiding this comment.
As part of making evaluation more efficient, we want to ensure UTMOS models don't get re-downloaded each time.
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
blisc
reviewed
Feb 24, 2026
| gt_audio_paths = [_resolve_path(audio_dir, r.get('audio_filepath')) for r in records] | ||
| context_audio_paths = [_resolve_path(audio_dir, r.get('context_audio_filepath')) for r in records] | ||
|
|
||
| device = "cuda" |
Collaborator
There was a problem hiding this comment.
Hmm, guess we always hard-coded this, but should we remove this hardcode?
Collaborator
Author
There was a problem hiding this comment.
Done in latest commit. It's part of EvaluationConfig now, still defaulting to "cuda".
blisc
previously approved these changes
Feb 24, 2026
Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>
blisc
approved these changes
Feb 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR makes MagpieTTS's evaluation a little faster and much more parallelizable.
Changes:
STANDARD_METRIC_KEYSsince the set of metrics on which to compute CIs can be inferred from the metrics themselves, which should be easier to maintain.So
evaluate()has been broken into:evaluate_dir(): computes metrics for all audios in a given directory+manifest and outputs per-file metrics.compute_global_metrics(): takes per-file metrics collected (1) and computes global metrics from those. This mostly amounts to computing averages. But it also includes computing FCD computation, since that is not something that can easily be broken down into directory-wise chunks due to the statefulness of the FCD metric.evaluate(): wrapper that chains (1) and (2) for easy use in NeMo. NeMo skills would call (1) and (2) directly.Running on a single GPU (local machine), these changes yield ~20% speedup of evaluation (more for larger sets, less for small ones, due to overhead of loading models). The benefit is much larger when evaluating on multiple GPUs in parallel, which we have prototyped in NeMo Skills (and will merge later on).
@vmendelev : adding you just as FYI. I think your existing Skills integration should work as-is after this PR, since the
evaluate()API hasn't changed. Later, we can break down how Skills does Magpie scoring into the parallelizable part (evaluate_dir()) and the aggregation step (compute_global_metrics()) – I experimented with that and it seemed to work well.