modelscope
diff --git a/‎docs/source/Instruction/Megatron-SWIFT训练.md
+1 b/‎docs/source/Instruction/Megatron-SWIFT训练.md
+1
diff --git a/‎docs/source/Instruction/支持的模型和数据集.md
+38-38 b/‎docs/source/Instruction/支持的模型和数据集.md
+38-38
diff --git a/‎docs/source/Instruction/预训练与微调.md
+1 b/‎docs/source/Instruction/预训练与微调.md
+1
diff --git a/‎docs/source_en/Instruction/Megatron-SWIFT-Training.md
+1 b/‎docs/source_en/Instruction/Megatron-SWIFT-Training.md
+1
diff --git a/‎docs/source_en/Instruction/Pre-training-and-Fine-tuning.md
+1 b/‎docs/source_en/Instruction/Pre-training-and-Fine-tuning.md
+1
@@ -214,6 +214,7 @@ I am a language model developed by swift, you can call me swift-robot. How can I
 - hidden_dropout: 默认为0.。
 - transformer_impl: 使用哪种transformer实现，可选项为'local'和'transformer_engine'。默认为transformer_engine。
 - padded_vocab_size: 完整词表大小，默认为None。
+- rope_scaling: rope_scaling相关参数，默认为None。格式参考[llama3.1 config.json](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct/file/view/master?fileName=config.json&status=1)，传入json字符串。
 
 ### Megatron训练参数
 
 
@@ -56,6 +56,7 @@ ms-swift使用了分层式的设计思想，用户可以使用命令行界面、
 - 量化训练：支持使用GPTQ、AWQ、AQLM、BNB、HQQ、EETQ量化技术的QLoRA训练。微调7B模型只需要9GB显存资源。具体参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/qlora)。
 - 多模态训练：SWIFT支持多模态模型的预训练、微调和RLHF。支持Caption、VQA、OCR、[Grounding](https://github.com/modelscope/ms-swift/blob/main/examples/notebook/qwen2_5-vl-grounding/zh.ipynb)任务。支持图像、视频和音频三种模态。具体参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal)。多模态自定义数据集格式参考[自定义数据集文档](../Customization/自定义数据集.md)。
 - RLHF训练：参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/rlhf)。多模态模型参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal/rlhf)。GRPO训练参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/grpo_zero2.sh)。强化微调查看[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/rft)。
+- Megatron训练：支持使用Megatron的并行技术来加速大模型的训练，包括数据并行、张量并行、流水线并行、序列并行，上下文并行。参考[Megatron-SWIFT训练文档](./Megatron-SWIFT训练.md)。
 - 序列分类模型训练：参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls)。
 - Embedding模型训练：参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/embedding)
 - Agent训练：参考[这里](https://github.com/modelscope/swift/blob/main/examples/train/agent)。
 
@@ -227,6 +227,7 @@ I am a language model developed by swift, you can call me swift-robot. How can I
 - hidden_dropout: Default is 0.
 - transformer_impl: Which transformer implementation to use, options are 'local' and 'transformer_engine'. Default is transformer_engine.
 - padded_vocab_size: Full vocabulary size, default is None.
+- rope_scaling: Related parameters for rope_scaling, default is None. Refer to the format in [llama3.1 config.json](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct/file/view/master?fileName=config.json&status=1). Pass the value as a JSON string.
 
 ### Megatron Training Parameters
 
 
@@ -59,6 +59,7 @@ Additionally, we offer a series of scripts to help you understand the training c
 - Quantization Training: Supports QLoRA training using quantization techniques such as GPTQ, AWQ, AQLM, BNB, HQQ, and EETQ. Fine-tuning a 7B model only requires 9GB of memory. For more details, refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/qlora).
 - Multi-modal Training: SWIFT supports pre-training, fine-tuning, and RLHF for multi-modal models. It supports tasks such as Captioning, VQA, OCR, and [Grounding](https://github.com/modelscope/ms-swift/blob/main/examples/notebook/qwen2_5-vl-grounding/zh.ipynb). It supports three modalities: images, videos, and audio. For more details, refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal). The format for custom multi-modal datasets can be found in the [Custom Dataset Documentation](../Customization/Custom-dataset.md).
 - RLHF Training: Refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/rlhf). For multi-modal models, refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal/rlhf). For GRPO training, refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/grpo_zero2.sh). For reinforcement fine-tuning, see [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/rft).
+- Megatron Training: Supports the use of Megatron's parallelization techniques to accelerate the training of large models, including data parallelism, tensor parallelism, pipeline parallelism, sequence parallelism, and context parallelism. Refer to the [Megatron-SWIFT Training Documentation](./Megatron-SWIFT-Training.md).
 - Sequence Classification Model Training: Refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls).
 - Embedding Model Training: Refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/embedding).
 - Agent Training: Refer to [here](https://github.com/modelscope/swift/blob/main/examples/train/agent).