Skip to content

Add secondary Nemotron perf/diagnostic patches (Liger, TP CE, FLOPs, benchmark)#1307

Open
jasont314 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
jasont314:pr2-nemotron-secondary-separate
Open

Add secondary Nemotron perf/diagnostic patches (Liger, TP CE, FLOPs, benchmark)#1307
jasont314 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
jasont314:pr2-nemotron-secondary-separate

Conversation

@jasont314
Copy link

Adds auxiliary optimization and reporting updates separated from the core PP/EP/SQuAD correctness path.

What does this PR do ?

Introduces secondary Nemotron performance/diagnostic improvements (Liger integration path, TP CE handling, FLOPs accounting updates, and benchmark instrumentation) without changing the core PP/EP/SQuAD integration PR scope.

Changelog

  • _transformers/kernel_patches.py
    • Add NemotronH-specific Liger patch path with optional RMSNorm and CE patch toggles.
  • components/loss/masked_ce.py
    • Improve TP-aware masked CE behavior used in Nemotron runs.
  • components/utils/flops_utils.py
    • Update FLOPs computation/reporting for better diagnostic fidelity.
  • recipes/llm/benchmark.py
    • Add benchmark-side instrumentation/reporting improvements for run analysis.
  • components/distributed/init_utils.py
    • Minor distributed init/reporting update used by benchmark diagnostics.
  • NEMOTRON_SECONDARY_PATCH_NOTES.md
    • Add implementation notes and usage context for this secondary patch set.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)
  • Validation run:
    • python -m pytest -q tests/unit_tests/_transformers/test_auto_model.py tests/unit_tests/loss/test_masked_ce.py tests/unit_tests/utils/test_flops_utils.py tests/unit_tests/recipes/llm/test_benchmark.py
    • Result: 83 passed, 7 warnings

…benchmark)

Adds auxiliary optimization and reporting updates separated from core PP/EP/SQuAD correctness path.
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 17, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments