feat: [EXPERIMENTAL FEATURE] Add ASR Support to Nemo Automodel#1263
feat: [EXPERIMENTAL FEATURE] Add ASR Support to Nemo Automodel#1263rylativity wants to merge 12 commits intoNVIDIA-NeMo:mainfrom
Conversation
…eech dataset Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
…xample config Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
…set and model functionality Signed-off-by: Ryan Stewart <rystewart@nvidia.com>
jgerh
left a comment
There was a problem hiding this comment.
Completed tech pubs review and provided a few copyedits.
| git submodule init && git submodule update && \ | ||
| pip install nvidia-mathdx==25.1.1 && \ | ||
| env NVTE_CUDA_ARCHS="80;90;100;120" NVTE_BUILD_THREADS_PER_JOB=8 pip install --no-cache-dir --no-build-isolation -v . && \ | ||
| uv pip install nvidia-mathdx==25.1.1 && \ |
There was a problem hiding this comment.
Hi @rylativity , I'm not sure about this one.
@thomasdhc can you provide guidance?
There was a problem hiding this comment.
Please remove all instances of uv pip install changes here. pip is used by design
| # NeMoAutoModel handles infrastructure internally | ||
| model = cfg_model.instantiate(**kwargs) | ||
| else: | ||
| raise ValueError( |
There was a problem hiding this comment.
ok but this won't allow anyone to bring their own model
|
|
||
| # Build pipeline config if PP enabled | ||
| self.pipeline_config = None | ||
| if self.pp_enabled: |
There was a problem hiding this comment.
do we need PP for asr models? I would review models we want to support, if it's <40B then I'd skip PP to simplify train loop.
| ) | ||
|
|
||
| train_ctx, batch = make_cp_batch_and_ctx(self.device_mesh, batch, labels) | ||
| with train_ctx(): |
There was a problem hiding this comment.
I would rename this to something else, since train_ctx is inside a validation function.
Also, why not use _forward_backward_step here?
|
Thanks a lot @rylativity for adding this feature! Since this is a new model category, I want your help with adding a bit more testing, for example, for the dataset and data preprocessing, can we add a functional test that for a cached dataset + cached preprocessor to ensure data is correctly transformed? And I also see a few functional tests for finetuning, if it's not too much trouble would you mind adding some loss matching testing too (to ensure we avoid convergence regression over time). Also we have the ability to add longer-running tests on our nightly suite, so I would also encourage including there as well, once this PR is merged. Next steps:
|
Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>
Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>
Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>
|
/ok to test 74bc2db |
What does this PR do ?
[EXPERIMENTAL FEATURE] Adds comprehensive ASR support for Whisper (5 variants) and Parakeet CTC (2 variants) with distributed training and PEFT and lays groundwork for incorporating additional ASR models in Nemo Automodel
Changelog
New Model Support:
New Components:
New Recipe:
Example Configurations:
Testing:
Documentation:
Dependencies:
Other:
Pre checks: