Skip to content

v3.2.2

Compare
Choose a tag to compare
@Jintao-Huang Jintao-Huang released this 26 Mar 02:59
· 77 commits to main since this release

中文版

新特性

  1. Megatron-SWIFT发布。支持TP、PP、SP、CP等并行技术对Qwen系、Llama系、Deepseek-R1蒸馏系等100+模型进行预训练和微调。支持streaming数据集和序列packing功能支持超大数据集并提升训练效率。更多内容参考Megatron-SWIFT训练文档
  2. 支持多轮GRPO训练以适配例如Deep Search等多轮agent工具调用场景,示例代码参考这里
  3. 支持iic/gme-Qwen2-VL-2B-Instruct等多模态模型的Embedding训练。具体参考embedding模型训练文档
  4. 支持大模型和多模态大模型的多标签分类和回归任务的训练到部署。示例脚本参考这里
  5. 支持在训练过程中使用EvalScope对模型进行评测,及时了解模型的训练效果。示例脚本参考评测文档
  6. 书写外置plugin,以支持多模态模型LoRA训练LLM的同时,全参数训练ViT,并采用不同的学习率。避免ViT部分merge-lora造成的精度误差。示例脚本参考这里

新模型

  1. iic/gme-Qwen2-VL-2B-Instruct系列
  2. Qwen/Qwen2.5-VL-32B-Instruct
  3. LLM-Research/gemma-3-4b-it系列
  4. deepseek-ai/DeepSeek-V3-0324
  5. mistralai/Mistral-Small-3.1-24B-Instruct-2503系列

English Version

New Features

  1. Release of Megatron-SWIFT: Megatron-SWIFT has been released, supporting various parallel technologies such as TP (Tensor Parallelism), PP (Pipeline Parallelism), SP (Sequence Parallelism), and CP (Context Parallelism) for pre-training and fine-tuning over 100 models, including the Qwen series, Llama series, and Deepseek-R1 distillation series. It also supports streaming datasets and sequence packing, enabling the handling of ultra-large datasets while improving training efficiency. For more details, refer to the Megatron-SWIFT Training Documentation.
  2. Support for Multi-turn GRPO Training: Supports multi-turn GRPO training to adapt to scenarios such as multi-turn agent tool calls in Deep Search. Example code can be found here.
  3. Embedding Training for Multimodal Models: Supports embedding training for multimodal models such as iic/gme-Qwen2-VL-2B-Instruct. For more information, refer to the Embedding Model Training Documentation.
  4. Multi-label Classification and Regression Tasks for Large Models and Multimodal Large Models: Supports end-to-end training and deployment for multi-label classification and regression tasks for large models and multimodal large models. Example scripts can be found here.
  5. Model Evaluation with EvalScope During Training: Supports model evaluation using EvalScope during training to monitor training performance in real time. Example scripts can be found in the Evaluation Documentation.
  6. Custom External Plugin for LoRA + ViT Training: Provides an external plugin to support LoRA training for LLMs (Large Language Models) while performing full-parameter training for ViTs (Vision Transformers) with different learning rates. This avoids precision errors caused by merging LoRA into the ViT portion. Example code can be found here.

New Models

  1. iic/gme-Qwen2-VL-2B-Instruct series
  2. Qwen/Qwen2.5-VL-32B-Instruct
  3. LLM-Research/gemma-3-4b-it series
  4. deepseek-ai/DeepSeek-V3-0324
  5. mistralai/Mistral-Small-3.1-24B-Instruct-2503 series

What's Changed

New Contributors

Full Changelog: v3.2.1...v3.2.2