[TTS] Infrastructure for parallelization of evaluation (scoring) by rfejgin · Pull Request #15417 · NVIDIA-NeMo/NeMo

rfejgin · 2026-02-20T21:51:19Z

This PR makes MagpieTTS's evaluation a little faster and much more parallelizable.

Changes:

Made the ASR step batched, rather than batch-size-1. We first collect all audios that need ASR and run ASR on them in a batched manner before the main per-sample evaluation loop. This substantially speeds up the ASR part of evaluation.
Infrastructure towards multi-GPU evaluation (scoring). That is something we will do (and have prototyped) with NeMo Skills later on. To enable that, evaluation was broken down into two steps: a first step where evaluation of each utterance is independent of other utternaces, and a second step that focuses on parts that require with global state.
During refactoring I also removed the constant STANDARD_METRIC_KEYS since the set of metrics on which to compute CIs can be inferred from the metrics themselves, which should be easier to maintain.

So evaluate() has been broken into:

evaluate_dir(): computes metrics for all audios in a given directory+manifest and outputs per-file metrics.
compute_global_metrics(): takes per-file metrics collected (1) and computes global metrics from those. This mostly amounts to computing averages. But it also includes computing FCD computation, since that is not something that can easily be broken down into directory-wise chunks due to the statefulness of the FCD metric.
evaluate(): wrapper that chains (1) and (2) for easy use in NeMo. NeMo skills would call (1) and (2) directly.

Running on a single GPU (local machine), these changes yield ~20% speedup of evaluation (more for larger sets, less for small ones, due to overhead of loading models). The benefit is much larger when evaluating on multiple GPUs in parallel, which we have prototyped in NeMo Skills (and will merge later on).

@vmendelev : adding you just as FYI. I think your existing Skills integration should work as-is after this PR, since the evaluate() API hasn't changed. Later, we can break down how Skills does Magpie scoring into the parallelizable part (evaluate_dir()) and the aggregation step (compute_global_metrics()) – I experimented with that and it seemed to work well.

For debugging NeMo Skills deployment Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

When running the project from a directory different from the repo root (which happens in NeMo Skills), these paths need to be converted to absolute paths, which is done in this commit. Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

… in Nemo Skills) Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

nemo/collections/tts/modules/magpietts_inference/evaluate_generated_audio.py

- remove g2p path handling (unrelated to parallelization) - update a comment Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

It has been moved outside of evaluate() since in the NeMo Skills use case we need the full metrics for chunk-wise scoring and aggregation at the end. Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

- Break evaluation into two steps: - evaluate_dir() for directory-level evaluation. Can be run in parallel across multiple directories (e.g. in NeMo Skills) - compute_global_metrics() for global metrics aggregation. - Move model loading to separate function - Cleanup Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

…rallelization

…jgin/NeMo into magpietts_evaluation_parallelization

rfejgin · 2026-02-23T22:16:53Z

nemo/collections/tts/modules/magpietts_inference/evaluation.py

-            logging.warning(f"Metric '{key}' not found in any measurements")
-            results[key] = "N/A"
-            continue
-


This check no longer makes sense since the metric names are now inferred from the metrics themselves.

rfejgin · 2026-02-23T22:17:13Z

nemo/collections/tts/modules/magpietts_inference/evaluation.py



-# Define the standard metric keys used in evaluation
-STANDARD_METRIC_KEYS = [


Removed for better maintainability - we can infer these names from the metrics themselves.

rfejgin · 2026-02-23T22:18:04Z

nemo/collections/tts/modules/utmosv2.py

 import torch
 from threadpoolctl import threadpool_limits

+# If UTMOSv2 cache is not set but HF_HOME is, use an area under HF_HOME for the cache location


As part of making evaluation more efficient, we want to ensure UTMOS models don't get re-downloaded each time.

Remove unnecessary NaN default values for metrics. Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

examples/tts/magpietts_inference.py

blisc · 2026-02-24T19:20:38Z

nemo/collections/tts/modules/magpietts_inference/evaluate_generated_audio.py

+    gt_audio_paths = [_resolve_path(audio_dir, r.get('audio_filepath')) for r in records]
+    context_audio_paths = [_resolve_path(audio_dir, r.get('context_audio_filepath')) for r in records]

+    device = "cuda"


Hmm, guess we always hard-coded this, but should we remove this hardcode?

Done in latest commit. It's part of EvaluationConfig now, still defaulting to "cuda".

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

rfejgin added 14 commits February 6, 2026 17:16

Debug logging for evaluate_generated_audio.py

eb98654

For debugging NeMo Skills deployment Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Move codec codes to device (NeMo Skills compatibility)

decc8c9

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Fix UTMOSv2 model caching

0ed3c8b

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Debug logging for FCD metric

52d2d39

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Fix logic on when to update FCD metric

bb58546

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Tweak UTMOSv2 cache location setup

adf7d6b

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Delete temporary debug logging

8c0bc3c

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Log evaluation timing breakdown for profiling

ee06675

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Experimental: batch ASRs during evaluation

fc23ea2

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Experimental: tune ASR batch size during evaluation

ed5d829

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Experimental: tune UTMOSv2 batch size during evaluation

80ae46b

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Experimental: make evaluation compatible with chunked operation (used…

0494d62

… in Nemo Skills) Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Merge remote-tracking branch 'nemo/main' into nemo-main-rfejgin

e37aa88

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

github-actions bot added the TTS label Feb 20, 2026

github-advanced-security bot found potential problems Feb 20, 2026

View reviewed changes

nemo/collections/tts/modules/magpietts_inference/evaluate_generated_audio.py Fixed Show fixed Hide fixed

rfejgin added 4 commits February 20, 2026 14:14

Cleanup

e66337b

- remove g2p path handling (unrelated to parallelization) - update a comment Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Cleanup utmos thread / batch size

4589a8c

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Add back filewise metric filtering

20f19f9

It has been moved outside of evaluate() since in the NeMo Skills use case we need the full metrics for chunk-wise scoring and aggregation at the end. Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

Remove non-batched version of whisper transcription

85ff1a7

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

rfejgin changed the title ~~[TTS] Infrastructure for parallel evalution~~ [TTS] Parallel evolution infrastructure Feb 21, 2026

rfejgin changed the title ~~[TTS] Parallel evolution infrastructure~~ [Draft] [TTS] Parallel evolution infrastructure Feb 21, 2026

rfejgin changed the title ~~[Draft] [TTS] Parallel evolution infrastructure~~ [Draft] [TTS] Parallel evaluation infrastructure Feb 21, 2026

rfejgin added 2 commits February 20, 2026 21:27

Merge remote-tracking branch 'nemo/main' into magpietts_evaluation_pa…

7ae45be

…rallelization

rfejgin changed the title ~~[Draft] [TTS] Parallel evaluation infrastructure~~ [Draft] [TTS] Evaluation: Batch the ASR; refactor for parallelization Feb 21, 2026

rfejgin added the Run CICD label Feb 23, 2026

rfejgin temporarily deployed to test February 23, 2026 18:06 — with GitHub Actions Inactive

Merge branch 'main' into magpietts_evaluation_parallelization

4a66481

rfejgin marked this pull request as ready for review February 23, 2026 19:28

Merge branch 'magpietts_evaluation_parallelization' of github.com:rfe…

68e4b1c

…jgin/NeMo into magpietts_evaluation_parallelization

chtruong814 added Run CICD and removed Run CICD labels Feb 23, 2026

chtruong814 temporarily deployed to test February 23, 2026 22:16 — with GitHub Actions Inactive

rfejgin commented Feb 23, 2026

View reviewed changes

Cleanup

38b0ea3

Remove unnecessary NaN default values for metrics. Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

chtruong814 added Run CICD and removed Run CICD labels Feb 23, 2026

chtruong814 had a problem deploying to test February 23, 2026 22:31 — with GitHub Actions Error

Restore a logging warning that was accidentally removed.

a96c33d

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

rfejgin marked this pull request as ready for review February 23, 2026 22:34

chtruong814 added Run CICD and removed Run CICD labels Feb 23, 2026

chtruong814 temporarily deployed to test February 23, 2026 22:36 — with GitHub Actions Inactive

blisc reviewed Feb 24, 2026

View reviewed changes

blisc previously approved these changes Feb 24, 2026

View reviewed changes

Make device used in evaluation configurable.

7340a13

Signed-off-by: Fejgin, Roy <rfejgin@nvidia.com>

rfejgin dismissed blisc’s stale review via 7340a13 February 24, 2026 20:47

chtruong814 added Run CICD and removed Run CICD labels Feb 24, 2026

Merge branch 'main' into magpietts_evaluation_parallelization

2dda783

chtruong814 added Run CICD and removed Run CICD labels Feb 24, 2026

rfejgin requested a review from blisc February 24, 2026 20:48

chtruong814 temporarily deployed to test February 24, 2026 20:49 — with GitHub Actions Inactive

rfejgin enabled auto-merge (squash) February 24, 2026 20:50

blisc approved these changes Feb 24, 2026

View reviewed changes

rfejgin merged commit f0e64ea into NVIDIA-NeMo:main Feb 24, 2026
131 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TTS] Infrastructure for parallelization of evaluation (scoring)#15417

[TTS] Infrastructure for parallelization of evaluation (scoring)#15417
rfejgin merged 28 commits intoNVIDIA-NeMo:mainfrom
rfejgin:magpietts_evaluation_parallelization

rfejgin commented Feb 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

rfejgin Feb 23, 2026

Uh oh!

rfejgin Feb 23, 2026 •

edited

Loading

Uh oh!

rfejgin Feb 23, 2026

Uh oh!

Uh oh!

blisc Feb 24, 2026

Uh oh!

rfejgin Feb 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		# Define the standard metric keys used in evaluation
		STANDARD_METRIC_KEYS = [

Conversation

rfejgin commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rfejgin Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

rfejgin Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rfejgin Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

blisc Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

rfejgin Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rfejgin commented Feb 20, 2026 •

edited

Loading

rfejgin Feb 23, 2026 •

edited

Loading