Skip to content

Nemotron Nano-v3 pipeline parallelism#1298

Open
prestonfu wants to merge 4 commits intoNVIDIA-NeMo:zhiyul/llm-optimization-workshopfrom
prestonfu:prestonfu/a1
Open

Nemotron Nano-v3 pipeline parallelism#1298
prestonfu wants to merge 4 commits intoNVIDIA-NeMo:zhiyul/llm-optimization-workshopfrom
prestonfu:prestonfu/a1

Conversation

@prestonfu
Copy link

@prestonfu prestonfu commented Feb 16, 2026

What does this PR do?

Single-node pipeline parallelism for Nemotron NanoV3 30B.

Changelog

  • parallelizer.py: Unpack ModuleList/ModuleDict in layer extraction
  • functional.py:
    • Support backbone.* model structure (vs model.*).
    • Add stage_model.to_empty(device=device) to enable devicce storage for buffers such as e_score_correction_bias in MoE, which are otherwise on CPU.
  • hf_utils.py: Support backbone and backbone.embeddings (vs embed_tokens)
  • flops_utils.py: An (incorrect) attempt to calibrate Mamba2 SSM FLOPs
  • train_ft.py:
    • Pass trust_remote_code to AutoConfig
    • Add checkpoint.enabled
    • Add MFU and nsys support

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 16, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@prestonfu prestonfu changed the title Prestonfu/a1 Nemotron pipeline parallelism Feb 16, 2026
@prestonfu prestonfu changed the title Nemotron pipeline parallelism Nemotron Nano-v3 pipeline parallelism Feb 16, 2026
@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Feb 18, 2026
@akoumpa
Copy link
Contributor

akoumpa commented Feb 19, 2026

@ZhiyuLi-Nvidia can you take a look? Thank you

@akoumpa akoumpa removed the needs-follow-up Issue needs follow-up label Feb 19, 2026
@ZhiyuLi-Nvidia
Copy link
Contributor

Hi, @prestonfu thanks a lot for contribution. I am just curious why you want to merge into this dev branch NVIDIA-NeMo:zhiyul/llm-optimization-workshop, which is for UCB homework only.
Are you interested in contributing into main branch instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments