Skip to content

Commit f33cf50

Browse files
authored
Support megatron llama3.1/3.2 (#3537)
1 parent 436ed07 commit f33cf50

17 files changed

+160
-87
lines changed

docs/source/Instruction/Megatron-SWIFT训练.md

+1
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,7 @@ I am a language model developed by swift, you can call me swift-robot. How can I
214214
- hidden_dropout: 默认为0.。
215215
- transformer_impl: 使用哪种transformer实现,可选项为'local'和'transformer_engine'。默认为transformer_engine。
216216
- padded_vocab_size: 完整词表大小,默认为None。
217+
- rope_scaling: rope_scaling相关参数,默认为None。格式参考[llama3.1 config.json](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct/file/view/master?fileName=config.json&status=1),传入json字符串。
217218

218219
### Megatron训练参数
219220

docs/source/Instruction/支持的模型和数据集.md

+38-38
Large diffs are not rendered by default.

docs/source/Instruction/预训练与微调.md

+1
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ ms-swift使用了分层式的设计思想,用户可以使用命令行界面、
5656
- 量化训练:支持使用GPTQ、AWQ、AQLM、BNB、HQQ、EETQ量化技术的QLoRA训练。微调7B模型只需要9GB显存资源。具体参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/qlora)
5757
- 多模态训练:SWIFT支持多模态模型的预训练、微调和RLHF。支持Caption、VQA、OCR、[Grounding](https://github.com/modelscope/ms-swift/blob/main/examples/notebook/qwen2_5-vl-grounding/zh.ipynb)任务。支持图像、视频和音频三种模态。具体参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal)。多模态自定义数据集格式参考[自定义数据集文档](../Customization/自定义数据集.md)
5858
- RLHF训练:参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/rlhf)。多模态模型参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal/rlhf)。GRPO训练参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/grpo_zero2.sh)。强化微调查看[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/rft)
59+
- Megatron训练:支持使用Megatron的并行技术来加速大模型的训练,包括数据并行、张量并行、流水线并行、序列并行,上下文并行。参考[Megatron-SWIFT训练文档](./Megatron-SWIFT训练.md)
5960
- 序列分类模型训练:参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls)
6061
- Embedding模型训练:参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/embedding)
6162
- Agent训练:参考[这里](https://github.com/modelscope/swift/blob/main/examples/train/agent)

docs/source_en/Instruction/Megatron-SWIFT-Training.md

+1
Original file line numberDiff line numberDiff line change
@@ -227,6 +227,7 @@ I am a language model developed by swift, you can call me swift-robot. How can I
227227
- hidden_dropout: Default is 0.
228228
- transformer_impl: Which transformer implementation to use, options are 'local' and 'transformer_engine'. Default is transformer_engine.
229229
- padded_vocab_size: Full vocabulary size, default is None.
230+
- rope_scaling: Related parameters for rope_scaling, default is None. Refer to the format in [llama3.1 config.json](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct/file/view/master?fileName=config.json&status=1). Pass the value as a JSON string.
230231

231232
### Megatron Training Parameters
232233

docs/source_en/Instruction/Pre-training-and-Fine-tuning.md

+1
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ Additionally, we offer a series of scripts to help you understand the training c
5959
- Quantization Training: Supports QLoRA training using quantization techniques such as GPTQ, AWQ, AQLM, BNB, HQQ, and EETQ. Fine-tuning a 7B model only requires 9GB of memory. For more details, refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/qlora).
6060
- Multi-modal Training: SWIFT supports pre-training, fine-tuning, and RLHF for multi-modal models. It supports tasks such as Captioning, VQA, OCR, and [Grounding](https://github.com/modelscope/ms-swift/blob/main/examples/notebook/qwen2_5-vl-grounding/zh.ipynb). It supports three modalities: images, videos, and audio. For more details, refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal). The format for custom multi-modal datasets can be found in the [Custom Dataset Documentation](../Customization/Custom-dataset.md).
6161
- RLHF Training: Refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/rlhf). For multi-modal models, refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal/rlhf). For GRPO training, refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/grpo_zero2.sh). For reinforcement fine-tuning, see [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/rft).
62+
- Megatron Training: Supports the use of Megatron's parallelization techniques to accelerate the training of large models, including data parallelism, tensor parallelism, pipeline parallelism, sequence parallelism, and context parallelism. Refer to the [Megatron-SWIFT Training Documentation](./Megatron-SWIFT-Training.md).
6263
- Sequence Classification Model Training: Refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls).
6364
- Embedding Model Training: Refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/embedding).
6465
- Agent Training: Refer to [here](https://github.com/modelscope/swift/blob/main/examples/train/agent).

0 commit comments

Comments
 (0)