Skip to content

Commit 8d8a462

Browse files
committed
Merge branch 'main' into glm_4_5
2 parents 9828d8f + 5ff8d5b commit 8d8a462

File tree

164 files changed

+2744
-1218
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

164 files changed

+2744
-1218
lines changed

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ You can contact us and communicate with us by adding our group:
5151

5252

5353
## 📝 Introduction
54-
🍲 ms-swift is an official framework provided by the ModelScope community for fine-tuning and deploying large language models and multi-modal large models. It currently supports the training (pre-training, fine-tuning, human alignment), inference, evaluation, quantization, and deployment of 500+ large models and 200+ multi-modal large models. These large language models (LLMs) include models such as Qwen3, Qwen3-MoE, Qwen2.5, InternLM3, GLM4, Mistral, DeepSeek-R1, Yi1.5, TeleChat2, Baichuan2, and Gemma2. The multi-modal LLMs include models such as Qwen2.5-VL, Qwen2-Audio, Llama4, Llava, InternVL3, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, and GOT-OCR2.
54+
🍲 ms-swift is an official framework provided by the ModelScope community for fine-tuning and deploying large language models and multi-modal large models. It currently supports the training (pre-training, fine-tuning, human alignment), inference, evaluation, quantization, and deployment of 500+ large models and 200+ multi-modal large models. These large language models (LLMs) include models such as Qwen3, Qwen3-MoE, Qwen2.5, InternLM3, GLM4.5, Mistral, DeepSeek-R1, Yi1.5, TeleChat2, Baichuan2, and Gemma2. The multi-modal LLMs include models such as Qwen2.5-VL, Qwen2-Audio, Llama4, Llava, InternVL3, MiniCPM-V-4, Ovis2.5, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, and GOT-OCR2.
5555

5656
🍔 Additionally, ms-swift incorporates the latest training technologies, including lightweight techniques such as LoRA, QLoRA, Llama-Pro, LongLoRA, GaLore, Q-GaLore, LoRA+, LISA, DoRA, FourierFt, ReFT, UnSloth, and Liger, as well as human alignment training methods like DPO, GRPO, RM, PPO, GKD, KTO, CPO, SimPO, and ORPO. ms-swift supports acceleration of inference, evaluation, and deployment modules using vLLM, SGLang and LMDeploy, and it supports model quantization with technologies like GPTQ, AWQ, and BNB. Furthermore, ms-swift offers a Gradio-based Web UI and a wealth of best practices.
5757

@@ -75,6 +75,7 @@ You can contact us and communicate with us by adding our group:
7575

7676

7777
## 🎉 News
78+
- 🎁 2025.08.12: Support [Dynamic Fine-Tuning](https://arxiv.org/abs/2508.05629)(DFT) in SFT training, use parameter `--enable_dft_loss true`. Training scripts can be found [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/full/dft.sh).
7879
- 🎁 2025.07.12: Deployment(pt/vLLM/SGLang) of Embedding models is supported, check [here](examples/deploy/embedding/client.py).
7980
- 🎁 2025.07.09: Megatron-SWIFT supports LoRA training. Compared to ms-swift, it achieves significant speedup on MoE models. Training scripts can be found [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/megatron/lora).
8081
- 🎁 2025.06.23: Fine-tuning of reranker models is supported. Training scripts can be found here: [Reranker](https://github.com/modelscope/ms-swift/blob/main/examples/train/reranker/train_reranker.sh).
@@ -125,13 +126,13 @@ Running Environment:
125126
| torch | >=2.0 | 2.7.1 | |
126127
| transformers | >=4.33 | 4.54.1 | |
127128
| modelscope | >=1.23 | | |
128-
| peft | >=0.11,<0.17 | | |
129+
| peft | >=0.11,<0.18 | | |
129130
| flash_attn | | 2.7.4.post1/3.0.0b1 | |
130131
| trl | >=0.15,<0.21 | 0.20.0 | RLHF |
131132
| deepspeed | >=0.14 | 0.16.9 | Training |
132133
| vllm | >=0.5.1 | 0.10 | Inference/Deployment |
133134
| sglang | >=0.4.6 | 0.4.9.post6 | Inference/Deployment |
134-
| lmdeploy | >=0.5,<0.9 | 0.8 | Inference/Deployment |
135+
| lmdeploy | >=0.5 | 0.9.2 | Inference/Deployment |
135136
| evalscope | >=0.11 | | Evaluation |
136137
| gradio | | 5.32.1 | Web-UI/App |
137138

README_CN.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@
4949
<img src="asset/discord_qr.jpg" width="200" height="200"> | <img src="asset/wechat.png" width="200" height="200">
5050

5151
## 📝 简介
52-
🍲 ms-swift是魔搭社区提供的大模型与多模态大模型微调部署框架,现已支持500+大模型与200+多模态大模型的训练(预训练、微调、人类对齐)、推理、评测、量化与部署。其中大模型包括:Qwen3、Qwen3-MoE、Qwen2.5、InternLM3、GLM4、Mistral、DeepSeek-R1、Yi1.5、TeleChat2、Baichuan2、Gemma2等模型,多模态大模型包括:Qwen2.5-VL、Qwen2-Audio、Llama4、Llava、InternVL3、MiniCPM-V-2.6、GLM4v、Xcomposer2.5、Yi-VL、DeepSeek-VL2、Phi3.5-Vision、GOT-OCR2等模型。
52+
🍲 ms-swift是魔搭社区提供的大模型与多模态大模型微调部署框架,现已支持500+大模型与200+多模态大模型的训练(预训练、微调、人类对齐)、推理、评测、量化与部署。其中大模型包括:Qwen3、Qwen3-MoE、Qwen2.5、InternLM3、GLM4.5、Mistral、DeepSeek-R1、Yi1.5、TeleChat2、Baichuan2、Gemma2等模型,多模态大模型包括:Qwen2.5-VL、Qwen2-Audio、Llama4、Llava、InternVL3、MiniCPM-V-4、Ovis2.5、GLM4v、Xcomposer2.5、Yi-VL、DeepSeek-VL2、Phi3.5-Vision、GOT-OCR2等模型。
5353

5454
🍔 除此之外,ms-swift汇集了最新的训练技术,包括LoRA、QLoRA、Llama-Pro、LongLoRA、GaLore、Q-GaLore、LoRA+、LISA、DoRA、FourierFt、ReFT、UnSloth、和Liger等轻量化训练技术,以及DPO、GRPO、RM、PPO、GKD、KTO、CPO、SimPO、ORPO等人类对齐训练方法。ms-swift支持使用vLLM、SGLang和LMDeploy对推理、评测和部署模块进行加速,并支持使用GPTQ、AWQ、BNB等技术对大模型进行量化。ms-swift还提供了基于Gradio的Web-UI界面及丰富的最佳实践。
5555

@@ -71,6 +71,7 @@
7171
- **模型量化**:支持AWQ、GPTQ、FP8和BNB的量化导出,导出的模型支持使用vLLM/SGLang/LmDeploy推理加速,并支持继续训练。
7272

7373
## 🎉 新闻
74+
- 🎁 2025.08.12: 支持在SFT训练中使用[Dynamic Fine-Tuning](https://arxiv.org/abs/2508.05629)(DFT),使用参数 `--enable_dft_loss true`。训练脚本参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/full/dft.sh)
7475
- 🎁 2025.07.12: 支持部署Embedding模型的部署(pt/vLLM/SGLang), 查看[这里](examples/deploy/embedding/client.py).
7576
- 🎁 2025.07.09: Megatron-SWIFT支持LoRA训练。相比ms-swift,在MoE模型提速显著。训练脚本参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/megatron/lora)
7677
- 🎁 2025.06.23: 支持Reranker模型训练,训练脚本参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/reranker/train_reranker.sh)
@@ -121,13 +122,13 @@ pip install -e .
121122
| torch | >=2.0 | 2.7.1 | |
122123
| transformers | >=4.33 | 4.54.1 | |
123124
| modelscope | >=1.23 | | |
124-
| peft | >=0.11,<0.17 | | |
125+
| peft | >=0.11,<0.18 | | |
125126
| flash_attn | | 2.7.4.post1/3.0.0b1 | |
126127
| trl | >=0.15,<0.21 | 0.20.0 | RLHF |
127128
| deepspeed | >=0.14 | 0.16.9 | 训练 |
128129
| vllm | >=0.5.1 | 0.10 | 推理/部署 |
129130
| sglang | >=0.4.6 | 0.4.9.post6 | 推理/部署 |
130-
| lmdeploy | >=0.5,<0.9 | 0.8 | 推理/部署 |
131+
| lmdeploy | >=0.5 | 0.9.2 | 推理/部署 |
131132
| evalscope | >=0.11 | | 评测 |
132133
| gradio | | 5.32.1 | Web-UI/App |
133134

docs/source/BestPractices/Qwen3最佳实践.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -333,7 +333,6 @@ ms-swift 引入了 Megatron 并行技术以加速大模型的CPT/SFT/DPO。支
333333

334334
```bash
335335
# https://help.aliyun.com/zh/pai/user-guide/general-environment-variables
336-
# 请确保两个节点上的权重保存路径`--save`和packing缓存路径`--packing_cache`相同且共享。
337336
PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' \
338337
NNODES=$WORLD_SIZE \
339338
NODE_RANK=$RANK \

docs/source/Customization/插件化.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,10 +32,11 @@ example在[这里](https://github.com/modelscope/ms-swift/blob/main/swift/plugin
3232
SWIFT支持在plugin中定制loss。如果不使用这个能力,默认会使用交叉熵Loss(CE Loss)。开发者可以在这个文件中编写代码,注册后trainer会自动使用你定制的loss方法。
3333
例如在plugin/loss.py中添加下面的代码:
3434
```python
35-
@register_loss_func("custom_loss")
36-
def loss_scale_func(outputs, labels, loss_scale=None, num_items_in_batch=None) -> torch.Tensor:
35+
def custom_loss_func(outputs, labels, loss_scale=None, num_items_in_batch=None) -> torch.Tensor:
3736
# Write your own loss calculating here
3837
return loss
38+
39+
loss_mapping['custom_loss'] = custom_loss_func
3940
```
4041
需要注意的是,loss和trainer训练的任务是强相关的,目前的loss定制针对pt和sft任务,如果是人类对齐任务(例如DPO、PPO等)或分类任务(seq_cls)任务在插件中是无法定制的。
4142

@@ -120,7 +121,7 @@ class IA3(Tuner):
120121

121122
@staticmethod
122123
def prepare_model(args: 'TrainArguments', model: torch.nn.Module) -> torch.nn.Module:
123-
model_arch: ModelKeys = MODEL_ARCH_MAPPING[model.model_meta.model_arch]
124+
model_arch: ModelKeys = model.model_meta.model_arch
124125
ia3_config = IA3Config(
125126
target_modules=find_all_linears(model), feedforward_modules='.*' + model_arch.mlp.split('{}.')[1] + '.*')
126127
return get_peft_model(model, ia3_config)

docs/source/Customization/自定义数据集.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -157,12 +157,13 @@ alpaca格式:
157157
- images: image, images.
158158
- videos: video, videos.
159159
- audios: audio, audios.
160+
- 如果需要传入base64格式而不是文件路径,以下为样本例子:`"videos": ['data:video/mp4;base64,{base64_encoded}']`, `"images": ['data:image/jpg;base64,{base64_encoded}']`
160161

161162
多模态模型的RLHF和序列分类的数据格式可以参考纯文本大模型的格式,并在此基础上增加`images`等字段。
162163

163164
#### grounding
164165

165-
如果是grounding(物体检测)任务,SWIFT支持两种方式
166+
如果是grounding(物体检测)任务,ms-swift支持两种方式
166167
1. 直接使用对应模型grounding任务的数据集格式,例如qwen2-vl的格式如下:
167168

168169
```jsonl
@@ -175,7 +176,7 @@ alpaca格式:
175176
- 不同模型对bbox是否归一化的处理不同。例如:qwen2.5-vl使用绝对坐标,而qwen2-vl、internvl2.5需要对bbox的坐标进行千分位坐标归一化。
176177
- 注意:Qwen2.5-VL采用绝对坐标,因此要小心每次的图像缩放,如果使用方案一的数据集格式,你需要预先对图像进行resize(H和W需要是28的系数),并根据该尺寸缩放坐标点。如果使用方案二的数据集格式,ms-swift会帮助你处理图像的缩放问题,你依旧可以使用`MAX_PIXELS`或者`--max_pixels`等进行图像缩放(仅训练,推理场景,你依旧需要自己处理图像的缩放问题)。
177178

178-
2. 使用SWIFT的grounding数据格式
179+
2. 使用ms-swift的grounding数据格式
179180

180181
```jsonl
181182
{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "<image>描述图像"}, {"role": "assistant", "content": "<ref-object><bbox>和<ref-object><bbox>正在沙滩上玩耍"}], "images": ["/xxx/x.jpg"], "objects": {"ref": ["一只狗", "一个女人"], "bbox": [[331.5, 761.4, 853.5, 1594.8], [676.5, 685.8, 1099.5, 1427.4]]}}
@@ -189,6 +190,22 @@ alpaca格式:
189190
- bbox_type: 可选项为'real','norm1'。默认为'real',即bbox为真实bbox值。若是'norm1',则bbox已经归一化为0~1。
190191
- image_id: 该参数只有当bbox_type为'real'时生效。代表bbox对应的图片是第几张,用于缩放bbox。索引从0开始,默认全为第0张。
191192

193+
测试ms-swift格式的grounding数据格式的最终格式:
194+
```python
195+
import os
196+
os.environ["MAX_PIXELS"] = "1003520"
197+
from swift.llm import get_model_tokenizer, get_template
198+
199+
_, tokenizer = get_model_tokenizer('Qwen/Qwen2.5-VL-7B-Instruct', load_model=False)
200+
template = get_template(tokenizer.model_meta.template, tokenizer)
201+
data = {...}
202+
template.set_mode('train')
203+
encoded = template.encode(data, return_template_inputs=True)
204+
print(f'[INPUT_IDS] {template.safe_decode(encoded["input_ids"])}\n')
205+
print(f'[LABELS] {template.safe_decode(encoded["labels"])}')
206+
print(f'images: {encoded["template_inputs"].images}')
207+
```
208+
192209
### 文生图格式
193210

194211
```jsonl

docs/source/GetStarted/SWIFT安装.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,13 @@ pip install ms-swift==2.*
3838

3939
## 镜像
4040

41+
docker可以查看[这里](https://github.com/modelscope/modelscope/blob/master/docker/build_image.py#L345)
4142
```
43+
# swift3.7.1
44+
modelscope-registry.cn-hangzhou.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.6.3-py311-torch2.7.1-vllm0.10.0-modelscope1.28.2-swift3.7.1
45+
modelscope-registry.cn-beijing.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.6.3-py311-torch2.7.1-vllm0.10.0-modelscope1.28.2-swift3.7.1
46+
modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.6.3-py311-torch2.7.1-vllm0.10.0-modelscope1.28.2-swift3.7.1
47+
4248
# swift3.6.4
4349
modelscope-registry.cn-hangzhou.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.4.0-py310-torch2.6.0-vllm0.8.5.post1-modelscope1.28.1-swift3.6.4
4450
modelscope-registry.cn-beijing.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.4.0-py310-torch2.6.0-vllm0.8.5.post1-modelscope1.28.1-swift3.6.4
@@ -48,16 +54,16 @@ modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu2
4854
modelscope-registry.cn-hangzhou.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.4.0-py310-torch2.6.0-vllm0.8.5.post1-modelscope1.27.1-swift3.5.3
4955
modelscope-registry.cn-beijing.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.4.0-py310-torch2.6.0-vllm0.8.5.post1-modelscope1.27.1-swift3.5.3
5056
modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.4.0-py310-torch2.6.0-vllm0.8.5.post1-modelscope1.27.1-swift3.5.3
57+
```
5158

59+
<details><summary>历史镜像</summary>
60+
61+
```
5262
# swift3.4.1.post1
5363
modelscope-registry.cn-hangzhou.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.4.0-py311-torch2.6.0-vllm0.8.5.post1-modelscope1.26.0-swift3.4.1.post1
5464
modelscope-registry.cn-beijing.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.4.0-py311-torch2.6.0-vllm0.8.5.post1-modelscope1.26.0-swift3.4.1.post1
5565
modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.4.0-py311-torch2.6.0-vllm0.8.5.post1-modelscope1.26.0-swift3.4.1.post1
56-
```
57-
58-
<details><summary>历史镜像</summary>
5966
60-
```
6167
# swift3.3.0.post1
6268
modelscope-registry.cn-hangzhou.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.4.0-py311-torch2.6.0-vllm0.8.3-modelscope1.25.0-swift3.3.0.post1
6369
modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.4.0-py311-torch2.6.0-vllm0.8.3-modelscope1.25.0-swift3.3.0.post1
@@ -90,13 +96,13 @@ modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu2
9096
| torch | >=2.0 | 2.7.1 | |
9197
| transformers | >=4.33 | 4.54.1 | |
9298
| modelscope | >=1.23 | | |
93-
| peft | >=0.11,<0.17 | | |
99+
| peft | >=0.11,<0.18 | | |
94100
| flash_attn | | 2.7.4.post1/3.0.0b1 | |
95101
| trl | >=0.15,<0.21 | 0.20.0 | RLHF |
96102
| deepspeed | >=0.14 | 0.16.9 | 训练 |
97103
| vllm | >=0.5.1 | 0.10 | 推理/部署 |
98104
| sglang | >=0.4.6 | 0.4.9.post6 | 推理/部署 |
99-
| lmdeploy | >=0.5,<0.9 | 0.8 | 推理/部署 |
105+
| lmdeploy | >=0.5 | 0.9.2 | 推理/部署 |
100106
| evalscope | >=0.11 | | 评测 |
101107
| gradio | | 5.32.1 | Web-UI/App |
102108

0 commit comments

Comments
 (0)