v3.2.2
中文版
新特性
- Megatron-SWIFT发布。支持TP、PP、SP、CP等并行技术对Qwen系、Llama系、Deepseek-R1蒸馏系等100+模型进行预训练和微调。支持streaming数据集和序列packing功能支持超大数据集并提升训练效率。更多内容参考Megatron-SWIFT训练文档。
- 支持多轮GRPO训练以适配例如Deep Search等多轮agent工具调用场景,示例代码参考这里。
- 支持mini-batch,降低训练时的显存消耗。参考GRPO训练文档。
- 支持iic/gme-Qwen2-VL-2B-Instruct等多模态模型的Embedding训练。具体参考embedding模型训练文档。
- 支持大模型和多模态大模型的多标签分类和回归任务的训练到部署。示例脚本参考这里。
- 支持在训练过程中使用EvalScope对模型进行评测,及时了解模型的训练效果。示例脚本参考评测文档。
- 书写外置plugin,以支持多模态模型LoRA训练LLM的同时,全参数训练ViT,并采用不同的学习率。避免ViT部分merge-lora造成的精度误差。示例脚本参考这里。
新模型
- iic/gme-Qwen2-VL-2B-Instruct系列
- Qwen/Qwen2.5-VL-32B-Instruct
- LLM-Research/gemma-3-4b-it系列
- deepseek-ai/DeepSeek-V3-0324
- mistralai/Mistral-Small-3.1-24B-Instruct-2503系列
English Version
New Features
- Release of Megatron-SWIFT: Megatron-SWIFT has been released, supporting various parallel technologies such as TP (Tensor Parallelism), PP (Pipeline Parallelism), SP (Sequence Parallelism), and CP (Context Parallelism) for pre-training and fine-tuning over 100 models, including the Qwen series, Llama series, and Deepseek-R1 distillation series. It also supports streaming datasets and sequence packing, enabling the handling of ultra-large datasets while improving training efficiency. For more details, refer to the Megatron-SWIFT Training Documentation.
- Support for Multi-turn GRPO Training: Supports multi-turn GRPO training to adapt to scenarios such as multi-turn agent tool calls in Deep Search. Example code can be found here.
- Supports mini-batch training to reduce GPU memory consumption during training. Refer to the GRPO Training Documentation.
- Embedding Training for Multimodal Models: Supports embedding training for multimodal models such as iic/gme-Qwen2-VL-2B-Instruct. For more information, refer to the Embedding Model Training Documentation.
- Multi-label Classification and Regression Tasks for Large Models and Multimodal Large Models: Supports end-to-end training and deployment for multi-label classification and regression tasks for large models and multimodal large models. Example scripts can be found here.
- Model Evaluation with EvalScope During Training: Supports model evaluation using EvalScope during training to monitor training performance in real time. Example scripts can be found in the Evaluation Documentation.
- Custom External Plugin for LoRA + ViT Training: Provides an external plugin to support LoRA training for LLMs (Large Language Models) while performing full-parameter training for ViTs (Vision Transformers) with different learning rates. This avoids precision errors caused by merging LoRA into the ViT portion. Example code can be found here.
New Models
- iic/gme-Qwen2-VL-2B-Instruct series
- Qwen/Qwen2.5-VL-32B-Instruct
- LLM-Research/gemma-3-4b-it series
- deepseek-ai/DeepSeek-V3-0324
- mistralai/Mistral-Small-3.1-24B-Instruct-2503 series
What's Changed
- update code doc by @hjh0119 in #3498
- fix readme by @Jintao-Huang in #3499
- feat: swanlab config add ms-swift by @Zeyi-Lin in #3500
- Support GME models by @tastelikefeet in #3513
- fix docs by @tastelikefeet in #3514
- Fix docs links by @tastelikefeet in #3516
- fix vllm memory leak by @hjh0119 in #3515
- [Docs] Easy
.[all]
install from git by @xihuai18 in #3518 - Fix bugs by @tastelikefeet in #3520
- support megatron by @Jintao-Huang in #2885
- fix megatron by @Jintao-Huang in #3527
- support gemma3 by @hjh0119 in #3492
- fix megatron pipeline parallel by @Jintao-Huang in #3529
- fix megatron tie_weight by @Jintao-Huang in #3530
- support megatron llama by @Jintao-Huang in #3532
- Support megatron llama3.1 3.2 by @Jintao-Huang in #3537
- 更新LlavaHfTemplate以适配transformers版本大于4.47时对LLaVA和LLaVA-Next模型处理图像token逻辑的修改 by @zsxm1998 in #3521
- refactor llava-hf by @Jintao-Huang in #3538
- fix docs by @Jintao-Huang in #3539
- refactor get_megatron_model_meta by @Jintao-Huang in #3542
- Gather infonce loss and support hard negative samples by @tastelikefeet in #3548
- fix docs by @tastelikefeet in #3553
- fix unsloth by @tastelikefeet in #3554
- fix grpo mllm split modules by @hjh0119 in #3552
- grpo embedding layer lora by @hjh0119 in #3531
- update arguments by @Jintao-Huang in #3556
- update doc by @hjh0119 in #3557
- Support all models' embedding and mask fake negative by @tastelikefeet in #3563
- skip grpo first wake up by @hjh0119 in #3562
- move grpovllmengine import by @hjh0119 in #3568
- fix bugs & support dataset_name by @Jintao-Huang in #3565
- fix wrap by @tastelikefeet in #3572
- Feature: add train-eval loop by @Yunnglin in #3569
- compat vllm>=0.8 by @Jintao-Huang in #3574
- [grpo] Fix Incorrect Placement of Data in eval_queue During async_generate by @hjh0119 in #3573
- Fix lmdeploy 0.7.3 by @tastelikefeet in #3584
- support vit full llm lora by @Jintao-Huang in #3575
- support Mistral3.1-2503 by @hjh0119 in #3588
- Support megatron packing by @Jintao-Huang in #3595
- [megatron] support streaming by @Jintao-Huang in #3609
- fix rft by @lxline in #3602
- [template] refactor replace media tokens by @Jintao-Huang in #3614
- fix top_logprobs by @Jintao-Huang in #3616
- Fix bugs by @Jintao-Huang in #3619
- Support multi turn grpo by @tastelikefeet in #3615
- fix grpo npu context by @hjh0119 in #3597
- support regression multi-label by @Jintao-Huang in #3621
- Support peft 0.15 by @tastelikefeet in #3623
- update grpo warning by @hjh0119 in #3598
- fix grpo rm zero3 by @hjh0119 in #3626
- GRPO mini batch by @hjh0119 in #3205
- fix grpo warning with pt backend by @hjh0119 in #3629
- compat transformers 4.50 by @Jintao-Huang in #3625
- support train_sampler_random by @Jintao-Huang in #3631
- fix grpo multi turn by @tastelikefeet in #3632
- update docs by @Jintao-Huang in #3633
- Support deepseek v3 0324 by @Jintao-Huang in #3637
- fix grpo cosine reward by @hjh0119 in #3638
- fix grpo lora split module by @hjh0119 in #3635
- fix reward model by @Jintao-Huang in #3641
- support qwen2_5_vl_32b by @Jintao-Huang in #3642
- fix grpo warning by @hjh0119 in #3630
- grpo reset prefix cache by @hjh0119 in #3640
- fix prm by @Jintao-Huang in #3647
- fix grpo pt ddp by @Jintao-Huang in #3648
- [grpo] separate the epsilon by @hjh0119 in #3599
- Fix template torch_dtype by @Jintao-Huang in #3651
- fix grpo epsilon by @hjh0119 in #3652
- update docs by @Jintao-Huang in #3653
- set grpo multi turn max tokens by @hjh0119 in #3655
- fix label_names by @Jintao-Huang in #3657
- fix grpo vllm tp by @Jintao-Huang in #3658
- compat vllm0.8.1 by @Jintao-Huang in #3656
- Fix evaluation of embedding by @tastelikefeet in #3661
- update readme by @Jintao-Huang in #3663
- fix npu context by @Jintao-Huang in #3664
New Contributors
Full Changelog: v3.2.1...v3.2.2