Skip to content

Commit 54cd730

Browse files
authored
support log completions (#3110)
1 parent 29d2bb0 commit 54cd730

File tree

20 files changed

+69
-92
lines changed

20 files changed

+69
-92
lines changed

.dev_scripts/ci_container_test.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ if [ "$MODELSCOPE_SDK_DEBUG" == "True" ]; then
2626

2727
# test with install
2828
pip install .
29-
pip install auto_gptq bitsandbytes deepspeed==0.14.* -U -i https://mirrors.aliyun.com/pypi/simple/
29+
pip install auto_gptq bitsandbytes deepspeed -U -i https://mirrors.aliyun.com/pypi/simple/
3030
else
3131
echo "Running case in release image, run case directly!"
3232
fi

README.md

+4-3
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ You can contact us and communicate with us by adding our group:
5757
## 📝 Introduction
5858
🍲 ms-swift is an official framework provided by the ModelScope community for fine-tuning and deploying large language models and multi-modal large models. It currently supports the training (pre-training, fine-tuning, human alignment), inference, evaluation, quantization, and deployment of 450+ large models and 150+ multi-modal large models. These large language models (LLMs) include models such as Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, DeepSeek-R1, Yi1.5, TeleChat2, Baichuan2, and Gemma2. The multi-modal LLMs include models such as Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, and GOT-OCR2.
5959

60-
🍔 In addition, ms-swift gathers the latest training technologies, including LoRA, QLoRA, Llama-Pro, LongLoRA, GaLore, Q-GaLore, LoRA+, LISA, DoRA, FourierFt, ReFT, UnSloth, and Liger. ms-swift supports acceleration of inference, evaluation, and deployment modules using vLLM and LMDeploy, and supports the quantization of large models and multi-modal large models using technologies such as GPTQ, AWQ, and BNB. To help researchers and developers fine-tune and apply large models more easily, ms-swift also provides a Gradio-based Web-UI interface and a wealth of best practices.
60+
🍔 Additionally, ms-swift incorporates the latest training technologies, including lightweight techniques such as LoRA, QLoRA, Llama-Pro, LongLoRA, GaLore, Q-GaLore, LoRA+, LISA, DoRA, FourierFt, ReFT, UnSloth, and Liger, as well as human alignment training methods like DPO, GRPO, RM, PPO, KTO, CPO, SimPO, and ORPO. ms-swift supports acceleration of inference, evaluation, and deployment modules using vLLM and LMDeploy, and it supports model quantization with technologies like GPTQ, AWQ, and BNB. Furthermore, ms-swift offers a Gradio-based Web UI and a wealth of best practices.
6161

6262
**Why choose ms-swift?**
6363

@@ -67,7 +67,7 @@ You can contact us and communicate with us by adding our group:
6767
- 🍊 **Lightweight Training**: Supports lightweight fine-tuning methods like LoRA, QLoRA, DoRA, LoRA+, ReFT, RS-LoRA, LLaMAPro, Adapter, GaLore, Q-Galore, LISA, UnSloth, Liger-Kernel.
6868
- **Distributed Training**: Supports distributed data parallel (DDP), device_map simple model parallelism, DeepSpeed ZeRO2/ZeRO3, FSDP, and other distributed training techniques.
6969
- **Quantization Training**: Supports training quantized models like BNB, AWQ, GPTQ, AQLM, HQQ, EETQ.
70-
- **RLHF Training**: Supports human alignment training methods such as DPO, CPO, SimPO, ORPO, KTO, RM, PPO, GRPO for both pure text and multi-modal large models.
70+
- **RLHF Training**: Supports human alignment training methods such as DPO, GRPO, RM, PPO, KTO, CPO, SimPO, ORPO for both pure text and multi-modal large models.
7171
- 🍓 **Multi-Modal Training**: Supports training on different modalities like images, videos, and audio, for tasks like VQA, captioning, OCR, and grounding.
7272
- **Interface Training**: Provides capabilities for training, inference, evaluation, quantization through an interface, completing the whole large model pipeline.
7373
- **Plugin and Extension**: Supports custom model and dataset extensions, as well as customization of components like loss, metric, trainer, loss-scale, callback, optimizer.
@@ -115,9 +115,10 @@ Running Environment:
115115
| modelscope | >=1.19 | | |
116116
| peft | >=0.11.0,<0.15.0 | | |
117117
| trl | >=0.13,<0.16 | 0.14.0 | RLHF |
118+
| deepspeed | >=0.14 | | Training |
118119
| vllm | >=0.5.1 | 0.6.5 | Inference/Deployment/Evaluation |
119120
| lmdeploy | lmdeploy>=0.5,<0.6.5 | 0.6.4 | Inference/Deployment/Evaluation |
120-
| deepspeed | >=0.14 | | Training |
121+
| evalscope | | >=0.11 | Evaluation |
121122

122123
For more optional dependencies, you can refer to [here](https://github.com/modelscope/ms-swift/blob/main/requirements/install_all.sh).
123124

README_CN.md

+4-3
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@
5555
## 📝 简介
5656
🍲 ms-swift是魔搭社区提供的大模型与多模态大模型微调部署框架,现已支持450+大模型与150+多模态大模型的训练(预训练、微调、人类对齐)、推理、评测、量化与部署。其中大模型包括:Qwen2.5、InternLM3、GLM4、Llama3.3、Mistral、DeepSeek-R1、Yi1.5、TeleChat2、Baichuan2、Gemma2等模型,多模态大模型包括:Qwen2.5-VL、Qwen2-Audio、Llama3.2-Vision、Llava、InternVL2.5、MiniCPM-V-2.6、GLM4v、Xcomposer2.5、Yi-VL、DeepSeek-VL2、Phi3.5-Vision、GOT-OCR2等模型。
5757

58-
🍔 除此之外,ms-swift汇集了最新的训练技术,包括LoRA、QLoRA、Llama-Pro、LongLoRA、GaLore、Q-GaLore、LoRA+、LISA、DoRA、FourierFt、ReFT、UnSloth、和Liger等。ms-swift支持使用vLLM和LMDeploy对推理、评测和部署模块进行加速,并支持使用GPTQ、AWQ、BNB等技术对大模型和多模态大模型进行量化。为了帮助研究者和开发者更轻松地微调和应用大模型,ms-swift还提供了基于Gradio的Web-UI界面及丰富的最佳实践。
58+
🍔 除此之外,ms-swift汇集了最新的训练技术,包括LoRA、QLoRA、Llama-Pro、LongLoRA、GaLore、Q-GaLore、LoRA+、LISA、DoRA、FourierFt、ReFT、UnSloth、和Liger等轻量化训练技术,以及DPO、GRPO、RM、PPO、KTO、CPO、SimPO、ORPO等人类对齐训练方法。ms-swift支持使用vLLM和LMDeploy对推理、评测和部署模块进行加速,并支持使用GPTQ、AWQ、BNB等技术对大模型进行量化。ms-swift还提供了基于Gradio的Web-UI界面及丰富的最佳实践。
5959

6060
**为什么选择ms-swift?**
6161
- 🍎 **模型类型**:支持450+纯文本大模型、**150+多模态大模型**以及All-to-All全模态模型、序列分类模型、Embedding模型**训练到部署全流程**
@@ -64,7 +64,7 @@
6464
- 🍊 **轻量训练**:支持了LoRA、QLoRA、DoRA、LoRA+、ReFT、RS-LoRA、LLaMAPro、Adapter、GaLore、Q-Galore、LISA、UnSloth、Liger-Kernel等轻量微调方式。
6565
- **分布式训练**:支持分布式数据并行(DDP)、device_map简易模型并行、DeepSpeed ZeRO2 ZeRO3、FSDP等分布式训练技术。
6666
- **量化训练**:支持对BNB、AWQ、GPTQ、AQLM、HQQ、EETQ量化模型进行训练。
67-
- **RLHF训练**:支持纯文本大模型和多模态大模型的DPO、CPO、SimPO、ORPO、KTO、RM、PPO、GRPO等人类对齐训练方法
67+
- **RLHF训练**:支持纯文本大模型和多模态大模型的DPO、GRPO、RM、PPO、KTO、CPO、SimPO、ORPO等人类对齐训练方法
6868
- 🍓 **多模态训练**:支持对图像、视频和语音不同模态模型进行训练,支持VQA、Caption、OCR、Grounding任务的训练。
6969
- **界面训练**:以界面的方式提供训练、推理、评测、量化的能力,完成大模型的全链路。
7070
- **插件化与拓展**:支持自定义模型和数据集拓展,支持对loss、metric、trainer、loss-scale、callback、optimizer等组件进行自定义。
@@ -110,9 +110,10 @@ pip install -e .
110110
| modelscope | >=1.19 | ||
111111
| peft | >=0.11.0,<0.15.0 | ||
112112
| trl | >=0.13,<0.16 | 0.14.0 |RLHF|
113+
| deepspeed | >=0.14 | |训练|
113114
| vllm | >=0.5.1 | 0.6.5 |推理/部署/评测|
114115
| lmdeploy | lmdeploy>=0.5,<0.6.5 | 0.6.4 |推理/部署/评测|
115-
| deepspeed | >=0.14 | |训练|
116+
| evalscope | | >=0.11 |评测|
116117

117118
更多可选依赖可以参考[这里](https://github.com/modelscope/ms-swift/blob/main/requirements/install_all.sh)
118119

docs/source/GetStarted/SWIFT安装.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -61,9 +61,10 @@ pip install ms-swift==2.*
6161
| modelscope | >=1.19 | ||
6262
| peft | >=0.11.0,<0.15.0 | ||
6363
| trl | >=0.13,<0.16 | 0.14.0 |RLHF|
64+
| deepspeed | >=0.14 | |训练|
6465
| vllm | >=0.5.1 | 0.6.5 |推理/部署/评测|
6566
| lmdeploy | lmdeploy>=0.5,<0.6.5 | 0.6.4 |推理/部署/评测|
66-
| deepspeed | >=0.14 | |训练|
67+
| evalscope | | >=0.11 |评测|
6768

6869
更多可选依赖可以参考[这里](https://github.com/modelscope/ms-swift/blob/main/requirements/install_all.sh)
6970

docs/source/GetStarted/快速开始.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms-swift是魔搭社区提供的大模型与多模态大模型训练部署框架
88
- 🍊 轻量训练:支持了LoRA、QLoRA、DoRA、LoRA+、ReFT、RS-LoRA、LLaMAPro、Adapter、GaLore、Q-Galore、LISA、UnSloth、Liger-Kernel等轻量微调方式。
99
- 分布式训练:支持分布式数据并行(DDP)、device_map简易模型并行、DeepSpeed ZeRO2 ZeRO3、FSDP等分布式训练技术。
1010
- 量化训练:支持对BNB、AWQ、GPTQ、AQLM、HQQ、EETQ量化模型进行训练。
11-
- RLHF训练:支持纯文本大模型和多模态大模型的DPO、CPO、SimPO、ORPO、KTO、RM、PPO、GRPO等人类对齐训练方法
11+
- RLHF训练:支持纯文本大模型和多模态大模型的DPO、GRPO、RM、PPO、KTO、CPO、SimPO、ORPO等人类对齐训练方法
1212
- 🍓 多模态训练:支持对图像、视频和语音不同模态模型进行训练,支持VQA、Caption、OCR、Grounding任务的训练。
1313
- 界面训练:以界面的方式提供训练、推理、评测、量化的能力,完成大模型的全链路。
1414
- 插件化与拓展:支持自定义模型和数据集拓展,支持对loss、metric、trainer、loss-scale、callback、optimizer等组件进行自定义。

docs/source/Instruction/GRPO.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,8 @@ A conversation between User and Assistant. The user asks a question, and the Ass
8888
- reward_funcs: 奖励函数,根据模型生成结果进行打分,内置accuracy、format、cosine和repetition四个rule-based函数,详细见 swift/plugin/orm.py 文件
8989
- reward_weights: 每个奖励函数的权重。必须与奖励函数的数量匹配。如果为 None,则所有奖励的权重都相等,为`1.0`
9090
- 提示:如果GRPO训练中包含`--reward_model`,则其加在奖励函数的最后位置
91-
- log_completions: 是否记录训练中的模型生成内容,搭配 `--report_to wandb` 使用,默认为False
91+
- log_completions: 是否记录训练中的模型生成内容,搭配 `--report_to wandb` 使用。默认为False
92+
- 提示:若没有设置`--report_to wandb`,则会在checkpoint中创建`completions.jsonl`来存储生成内容
9293
- use_vllm: 是否使用vLLM作为采样的生成后端,默认为False,建议使用加快训练速度
9394
- vllm_device: 设置vLLM部署的设备,默认为`auto`, 即未被使用的第一张显卡,使用`cuda:x`来设置特定的卡。
9495
- vllm_gpu_memory_utilization: vLLM透传参数

docs/source/Instruction/命令行参数.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -369,7 +369,8 @@ reward模型参数将在PPO、GRPO中使用。
369369
- reward_funcs: GRPO算法奖励函数,可选项为`accuracy``format``cosine``repetition`,见swift/plugin/orm.py。你也可以在plugin中自定义自己的奖励函数。默认为`[]`
370370
- reward_weights: 每个奖励函数的权重。必须与奖励函数的数量匹配。如果为 None,则所有奖励的权重都相等,为`1.0`
371371
- 提示:如果GRPO训练中包含`--reward_model`,则其加在奖励函数的最后位置
372-
- log_completions: 是否记录训练中的模型生成内容,搭配 `--report_to wandb` 使用,默认为False
372+
- log_completions: 是否记录训练中的模型生成内容,搭配 `--report_to wandb` 使用。默认为False
373+
- 提示:若没有设置`--report_to wandb`,则会在checkpoint中创建`completions.jsonl`来存储生成内容
373374
- use_vllm: 是否使用vLLM作为GRPO生成的infer_backend,默认为False
374375
- vllm_device: 设置vLLM部署的设备,比如部署在卡0上,则`cuda:1`, 默认为`auto`, 即使用最后一张卡
375376
- vllm_gpu_memory_utilization: vllm透传参数,默认为0.9
@@ -446,7 +447,7 @@ App参数继承于[部署参数](#部署参数), [Web-UI参数](#Web-UI参数)
446447
- 注意:默认评测会使用`~/.cache/opencompass`下的数据集,在指定本参数后会直接使用当前目录下的data文件夹
447448
- temperature: 覆盖生成参数,默认为0
448449
- verbose: 该参数在本地拉起部署并评估时传入DeployArguments中,默认`False`
449-
- eval_num_proc: 评测时客户端最大并发数,文本评测默认256,多模态默认16
450+
- eval_num_proc: 评测时客户端最大并发数,默认为16
450451
- 🔥eval_url: 评测url,例如`http://localhost:8000/v1`。例子可以查看[这里](https://github.com/modelscope/ms-swift/tree/main/examples/eval/eval_url)。默认为None,采用本地部署评估
451452

452453
### 导出参数

docs/source_en/GetStarted/Quick-start.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms-swift is a comprehensive training and deployment framework for large language
88
- 🍊 Lightweight Training: Supports lightweight fine-tuning methods like LoRA, QLoRA, DoRA, LoRA+, ReFT, RS-LoRA, LLaMAPro, Adapter, GaLore, Q-Galore, LISA, UnSloth, Liger-Kernel, and more.
99
- Distributed Training: Supports distributed data parallel (DDP), simple model parallelism via device_map, DeepSpeed ZeRO2 ZeRO3, FSDP, and other distributed training technologies.
1010
- Quantization Training: Provides training for quantized models like BNB, AWQ, GPTQ, AQLM, HQQ, EETQ.
11-
- RLHF Training: Supports human alignment training methods like DPO, CPO, SimPO, ORPO, KTO, RM, PPO, GRPO for both text-based and multimodal large models.
11+
- RLHF Training: Supports human alignment training methods like DPO, GRPO, RM, PPO, KTO, CPO, SimPO, ORPO for both text-based and multimodal large models.
1212
- 🍓 Multimodal Training: Capable of training models for different modalities such as images, videos, and audios; supports tasks like VQA (Visual Question Answering), Captioning, OCR (Optical Character Recognition), and Grounding.
1313
- Interface-driven Training: Offers training, inference, evaluation, and quantization capabilities through an interface, enabling a complete workflow for large models.
1414
- Plugins and Extensions: Allows customization and extension of models and datasets, and supports customizations for components like loss, metric, trainer, loss-scale, callback, optimizer, etc.

docs/source_en/GetStarted/SWIFT-installation.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -62,9 +62,10 @@ You can view the image [here](https://modelscope.cn/docs/intro/environment-setup
6262
| modelscope | >=1.19 | | |
6363
| peft | >=0.11.0,<0.15.0 | | |
6464
| trl | >=0.13,<0.16 | 0.14.0 | RLHF |
65+
| deepspeed | >=0.14 | | Training |
6566
| vllm | >=0.5.1 | 0.6.5 | Inference/Deployment/Evaluation |
6667
| lmdeploy | lmdeploy>=0.5,<0.6.5 | 0.6.4 | Inference/Deployment/Evaluation |
67-
| deepspeed | >=0.14 | | Training |
68+
| evalscope | | >=0.11 | Evaluation |
6869

6970
For more optional dependencies, you can refer to [here](https://github.com/modelscope/ms-swift/blob/main/requirements/install_all.sh).
7071

docs/source_en/Instruction/Command-line-parameters.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -381,6 +381,7 @@ The meanings of the following parameters can be referenced [here](https://huggin
381381
- reward_weights: Weights for each reward function. Must match the number of reward functions. If `None`, all rewards are weighted equally with weight `1.0`.
382382
- Note: If `--reward_model` is included in GRPO training, it is added to the end of the reward functions.
383383
- log_completions: Whether to log the model-generated content during training, to be used in conjunction with `--report_to wandb`, default is False.
384+
- Note: If `--report_to wandb` is not set, a `completions.jsonl` will be created in the checkpoint to store the generated content.
384385
- use_vllm: Whether to use vLLM as the infer_backend for GRPO generation, default is False.
385386
- vllm_device: Set the device for vLLM deployment. For example, if deployed on card 0, use `cuda:0`; default is `auto`, which means using the last available GPU.
386387
- vllm_gpu_memory_utilization: vLLM passthrough parameter, default is 0.9.
@@ -456,7 +457,7 @@ Evaluation Arguments inherit from the [deployment arguments](#deployment-argumen
456457
- Note: By default, the evaluation will use datasets from `~/.cache/opencompass`. Specifying this parameter will directly use the data folder in the current directory.
457458
- temperature: Overrides the generation arguments, with a default value of 0.
458459
- verbose: This parameter is passed into DeployArguments when setting up local deployment and evaluation, and defaults to `False`.
459-
- eval_num_proc: Maximum concurrency for clients during evaluation. The default for text evaluation is 256, while for multimodal it is 16.
460+
- eval_num_proc: Maximum number of concurrent clients during evaluation, default is 16.
460461
- 🔥eval_url: The evaluation URL, for example, `http://localhost:8000/v1`. Examples can be found [here](https://github.com/modelscope/ms-swift/tree/main/examples/eval/eval_url). The default value is None, which means using local deployment for evaluation.
461462

462463

docs/source_en/Instruction/GRPO.md

+1
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ Hyperparameters
9191
- reward_weights: Weights for each reward function. Must match the number of reward functions. If `None`, all rewards are weighted equally with weight `1.0`.
9292
- Note: If `--reward_model` is included in GRPO training, it is added to the end of the reward functions.
9393
- log_completions: Whether to log the model-generated content during training, to be used in conjunction with `--report_to wandb`, default is False.
94+
- Note: If `--report_to wandb` is not set, a `completions.jsonl` will be created in the checkpoint to store the generated content.
9495
- use_vllm: Whether to use vLLM as the back-end for sampling generation; default is False, using it is recommended to speed up training.
9596
- vllm_device: Device for deploying vLLM, default is auto, meaning the first unused GPU. Use cuda:x to specify a particular card.
9697
- vllm_gpu_memory_utilization: vLLM pass-through parameter.

requirements/install_all.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,6 @@ pip install autoawq -U --no-deps
66
pip install auto_gptq optimum bitsandbytes -U
77
pip install git+https://github.com/modelscope/ms-swift.git#egg=ms-swift[all]
88
pip install timm -U
9-
pip install deepspeed==0.14.* -U
9+
pip install deepspeed -U
1010
pip install qwen_vl_utils decord librosa pyav icecream -U
1111
# flash-attn: https://github.com/Dao-AILab/flash-attention/releases

swift/llm/argument/eval_args.py

+2-8
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,6 @@
44
from dataclasses import dataclass, field
55
from typing import Dict, List, Literal, Optional, Union
66

7-
import json
8-
97
from swift.utils import get_logger
108
from .base_args import to_abspath
119
from .deploy_args import DeployArguments
@@ -37,7 +35,7 @@ class EvalArguments(DeployArguments):
3735

3836
temperature: Optional[float] = 0.
3937
verbose: bool = False
40-
eval_num_proc: Optional[int] = None
38+
eval_num_proc: int = 16
4139
# If eval_url is set, ms-swift will not perform deployment operations and
4240
# will directly use the URL for evaluation.
4341
eval_url: Optional[str] = None
@@ -47,15 +45,11 @@ def _init_eval_url(self):
4745
if self.eval_url and 'chat/completions' in self.eval_url:
4846
self.eval_url = self.eval_url.split('/chat/completions', 1)[0]
4947

50-
def _init_dataset_args(self):
51-
if isinstance(self.dataset_args, str):
52-
self.dataset_args = json.loads(self.dataset_args)
53-
5448
def __post_init__(self):
5549
super().__post_init__()
5650
self._init_eval_url()
5751
self._init_eval_dataset()
58-
self._init_dataset_args()
52+
self.dataset_args = self.parse_to_dict(self.dataset_args)
5953
self.eval_output_dir = to_abspath(self.eval_output_dir)
6054
logger.info(f'eval_output_dir: {self.eval_output_dir}')
6155

0 commit comments

Comments
 (0)