feat: add MoE LoRA rank scaling and torch_mm to MoE LoRA by hemildesai · Pull Request #1300 · NVIDIA-NeMo/Automodel

hemildesai · 2026-02-17T03:25:37Z

wandb - https://wandb.ai/Nemo-automodel/automodel-moe-lora

Implement per-expert LoRA rank scaling based on LoRA Without Regret: when moe_rank_scaling=True, MoE expert modules receive a LoRA rank of dim // n_activated_experts while standard Linear modules keep the full rank. This allows training a separate LoRA per expert with appropriately reduced rank, matching the paper's recommendation. Default is False, preserving existing behavior.

Also renames lora_moe.py to lora_experts.py for clarity, updates Qwen3 MoE configs for torch_mm experts backend, and adds a new TE packed-sequence LoRA config.

nemo_automodel/components/_peft/lora.py
- Add moe_rank_scaling: bool = False to PeftConfig
- Wire through from_dict deserialization
- In apply_lora_to_linear_modules, compute scaled rank for MoE modules when enabled, with validation (ValueError if rank < 1) and a warning on non-exact division
nemo_automodel/components/_peft/lora_moe.py to lora_experts.py
- Rename for clarity; update all imports
tests/unit_tests/_peft/test_lora_moe.py
- Add 5 new tests for moe_rank_scaling:
  - Basic scaling (dim=16, n_activated=2 gives MoE rank=8, Linear rank=16)
  - Default off (both keep full rank)
  - Floor-division with warning (dim=16, n_activated=3 gives rank=5)
  - dim too small raises ValueError
  - Output equivalence with zero-init B matrices
examples/llm_finetune/qwen/
- Update qwen3_moe_30b_lora.yaml (torch_mm experts, disable actckpt)
- Update qwen3_moe_30b_te_packed_sequence.yaml (safetensors format)
- Add qwen3_moe_30b_te_packed_sequence_lora.yaml

copy-pr-bot · 2026-02-17T03:25:40Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

hemildesai · 2026-02-17T03:25:55Z

/ok to test 987d4e9

hemildesai · 2026-02-17T18:38:22Z

/ok to test 1fb50e5

Signed-off-by: Hemil Desai <hemild@nvidia.com>

hemildesai · 2026-02-18T04:46:55Z

/ok to test 51a8283

hemildesai requested review from HuiyingLi, ZhiyuLi-Nvidia, adil-a and akoumpa as code owners February 17, 2026 03:25

copy-pr-bot bot temporarily deployed to nemo-ci February 17, 2026 03:26 Inactive

copy-pr-bot bot temporarily deployed to test February 17, 2026 03:26 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 17, 2026 04:09 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 17, 2026 04:19 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 17, 2026 04:19 Failure

copy-pr-bot bot temporarily deployed to test February 17, 2026 18:38 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 17, 2026 18:38 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 17, 2026 21:07 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 17, 2026 21:19 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 17, 2026 22:01 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 18, 2026 02:04 Inactive

fix

51a8283

Signed-off-by: Hemil Desai <hemild@nvidia.com>

copy-pr-bot bot temporarily deployed to nemo-ci February 18, 2026 04:47 Inactive

copy-pr-bot bot temporarily deployed to test February 18, 2026 04:47 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 18, 2026 05:00 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 18, 2026 05:09 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 18, 2026 05:26 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 18, 2026 05:26 Failure

copy-pr-bot bot temporarily deployed to nemo-ci February 18, 2026 05:26 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 18, 2026 05:26 Failure

copy-pr-bot bot temporarily deployed to nemo-ci February 18, 2026 05:26 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 18, 2026 05:52 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 18, 2026 05:52 Failure

copy-pr-bot bot had a problem deploying to nemo-ci February 18, 2026 06:37 Failure

copy-pr-bot bot temporarily deployed to nemo-ci February 18, 2026 07:07 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: add MoE LoRA rank scaling and torch_mm to MoE LoRA#1300

feat: add MoE LoRA rank scaling and torch_mm to MoE LoRA#1300
hemildesai wants to merge 4 commits intomainfrom
hemil/lora-torch-mm

hemildesai commented Feb 17, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 17, 2026

Uh oh!

hemildesai commented Feb 17, 2026

Uh oh!

hemildesai commented Feb 17, 2026

Uh oh!

hemildesai commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

hemildesai commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Feb 17, 2026

Uh oh!

hemildesai commented Feb 17, 2026

Uh oh!

hemildesai commented Feb 17, 2026

Uh oh!

hemildesai commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hemildesai commented Feb 17, 2026 •

edited

Loading