Skip to content

feat: [EXPERIMENTAL FEATURE] Add ASR Support to Nemo Automodel#1263

Open
rylativity wants to merge 12 commits intoNVIDIA-NeMo:mainfrom
rylativity:asr
Open

feat: [EXPERIMENTAL FEATURE] Add ASR Support to Nemo Automodel#1263
rylativity wants to merge 12 commits intoNVIDIA-NeMo:mainfrom
rylativity:asr

Conversation

@rylativity
Copy link

What does this PR do ?

[EXPERIMENTAL FEATURE] Adds comprehensive ASR support for Whisper (5 variants) and Parakeet CTC (2 variants) with distributed training and PEFT and lays groundwork for incorporating additional ASR models in Nemo Automodel

Changelog

New Model Support:

  • Add NeMoAutoModelForSpeechSeq2Seq for encoder-decoder ASR models (Whisper family: tiny/base/small/medium/large-v3)
  • Add NeMoAutoModelForCTC for CTC-based ASR models (Parakeet CTC: 0.6B/1.1B)

New Components:

  • Add ASR dataset component with LibriSpeech, Common Voice, and custom dataset loaders (nemo_automodel/components/datasets/asr/)
  • Add processor-specific collate functions with automatic mel-spectrogram extraction and tokenization (nemo_automodel/components/datasets/asr/collate_fns.py)
  • Implement collate function registry for automatic processor selection

New Recipe:

  • Add ASR fine-tuning recipe with support for both CTC and Seq2Seq loss computation (nemo_automodel/recipes/asr/finetune.py)
  • Implement validation loop with loss tracking and metrics logging
  • Add pipeline parallelism support via AutoPipeline

Example Configurations:

  • Add 8 YAML configs for Whisper and Parakeet models with full and PEFT fine-tuning examples
  • Add finetune.py entry point script for ASR examples (examples/asr_finetune/finetune.py)
  • Include distributed training configurations with device mesh setup

Testing:

  • Add 4 functional tests covering Whisper and Parakeet fine-tuning (full and PEFT) (tests/functional_tests/asr_finetune/)
  • Add comprehensive unit tests for dataset loaders and collate functions (tests/unit_tests/datasets/asr/)
  • Include pytest test class with parameterized model/PEFT configurations

Documentation:

  • Add comprehensive README for ASR fine-tuning with quick start examples, PEFT guide, and troubleshooting (examples/asr_finetune/README.md)
  • Update root README with ASR examples and usage
  • Add inline documentation for ASR model classes and dataset utilities

Dependencies:

  • Add librosa and torchcodec as ASR extras in pyproject.toml
  • Update Docker build with ASR-specific dependencies

Other:

  • Update model exports in nemo_automodel/init.py and _transformers/init.py
  • Ensure component independence (no cross-component imports, verified by lint-imports)
  • Add copyright year 2026 across new files

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Linting/formatting passed
  • Commits DCO signed
  • Confirmed documentation builds successfully

Ryan Stewart added 8 commits February 12, 2026 15:31
…eech dataset

Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
…xample config

Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…set and model functionality

Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
@rylativity rylativity requested a review from jgerh as a code owner February 12, 2026 21:45
@rylativity rylativity changed the title [EXPERIMENTAL FEATURE] Add ASR Support to Nemo Automodel feat: [EXPERIMENTAL FEATURE] Add ASR Support to Nemo Automodel Feb 13, 2026
@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Feb 14, 2026
Copy link
Contributor

@jgerh jgerh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed tech pubs review and provided a few copyedits.

git submodule init && git submodule update && \
pip install nvidia-mathdx==25.1.1 && \
env NVTE_CUDA_ARCHS="80;90;100;120" NVTE_BUILD_THREADS_PER_JOB=8 pip install --no-cache-dir --no-build-isolation -v . && \
uv pip install nvidia-mathdx==25.1.1 && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rylativity , I'm not sure about this one.

@thomasdhc can you provide guidance?

Copy link
Contributor

@thomasdhc thomasdhc Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove all instances of uv pip install changes here. pip is used by design

# NeMoAutoModel handles infrastructure internally
model = cfg_model.instantiate(**kwargs)
else:
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok but this won't allow anyone to bring their own model


# Build pipeline config if PP enabled
self.pipeline_config = None
if self.pp_enabled:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need PP for asr models? I would review models we want to support, if it's <40B then I'd skip PP to simplify train loop.

)

train_ctx, batch = make_cp_batch_and_ctx(self.device_mesh, batch, labels)
with train_ctx():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this to something else, since train_ctx is inside a validation function.

Also, why not use _forward_backward_step here?

@akoumpa
Copy link
Contributor

akoumpa commented Feb 19, 2026

Thanks a lot @rylativity for adding this feature!

Since this is a new model category, I want your help with adding a bit more testing, for example, for the dataset and data preprocessing, can we add a functional test that for a cached dataset + cached preprocessor to ensure data is correctly transformed? And I also see a few functional tests for finetuning, if it's not too much trouble would you mind adding some loss matching testing too (to ensure we avoid convergence regression over time).

Also we have the ability to add longer-running tests on our nightly suite, so I would also encourage including there as well, once this PR is merged.

Next steps:

  • Evaluate whether we want to keep pipeline parallelism or not, and proceed accordingly (simpler is better if functionality is not affected).
  • Please consult with @thomasdhc whether the Dockerfile changes are ok.
  • Please include additional tests for dataset, data preprocessing and loss-reproducibility tests.
  • Please ping me when ready.

akoumpa and others added 3 commits February 18, 2026 23:40
Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>
Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>
Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>
@akoumpa
Copy link
Contributor

akoumpa commented Feb 19, 2026

/ok to test 74bc2db

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

plan to support Qwen3-ASR ?

5 participants

Comments