Skip to content

Commit 3f3fa7e

Browse files
Fix code format and docs (#847)
1 parent d393bb4 commit 3f3fa7e

File tree

167 files changed

+4541
-10747
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

167 files changed

+4541
-10747
lines changed

docs/source/LLM/LLM微调文档.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ import torch
4949

5050
from swift.llm import (
5151
DatasetName, InferArguments, ModelType, SftArguments,
52-
infer_main, sft_main, app_ui_main, merge_lora
52+
infer_main, sft_main, app_ui_main
5353
)
5454

5555
model_type = ModelType.qwen_7b_chat
@@ -182,10 +182,10 @@ CUDA_VISIBLE_DEVICES=0 swift export \
182182
对微调后模型进行量化可以查看[LLM量化文档](LLM量化文档.md#微调后模型)
183183

184184
## 推理
185-
如果你要使用VLLM进行推理加速, 可以查看[VLLM推理加速与部署](VLLM推理加速与部署.md调后的模型)
185+
如果你要使用VLLM进行推理加速, 可以查看[VLLM推理加速与部署](VLLM推理加速与部署.md#微调后的模型)
186186

187187
### 原始模型
188-
**单样本推理**可以查看[LLM推理文档](LLM推理文档.md推理)
188+
**单样本推理**可以查看[LLM推理文档](LLM推理文档.md#推理)
189189

190190
使用**数据集**评估:
191191
```bash
@@ -271,10 +271,10 @@ CUDA_VISIBLE_DEVICES=0 swift infer --ckpt_dir 'xxx/vx-xxx/checkpoint-xxx-merged'
271271
```
272272

273273
## Web-UI
274-
如果你要使用VLLM进行部署并提供**API**接口, 可以查看[VLLM推理加速与部署](VLLM推理加速与部署.md署)
274+
如果你要使用VLLM进行部署并提供**API**接口, 可以查看[VLLM推理加速与部署](VLLM推理加速与部署.md#部署)
275275

276276
### 原始模型
277-
使用原始模型的web-ui可以查看[LLM推理文档](LLM推理文档.mdWeb-UI)
277+
使用原始模型的web-ui可以查看[LLM推理文档](LLM推理文档.md#Web-UI)
278278

279279
### 微调后模型
280280
```bash

docs/source/LLM/LLM推理文档.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -396,7 +396,7 @@ CUDA_VISIBLE_DEVICES=0 swift infer --model_type yi-6b-chat
396396
```
397397

398398
### 微调后模型
399-
如果你要使用微调后模型进行推理, 可以查看[LLM微调文档](LLM微调文档.md调后模型)
399+
如果你要使用微调后模型进行推理, 可以查看[LLM微调文档](LLM微调文档.md#微调后模型)
400400

401401

402402
## Web-UI
@@ -446,4 +446,4 @@ app_ui_main(app_ui_args)
446446
```
447447

448448
### 微调后模型
449-
使用微调后模型的web-ui可以查看[LLM微调文档](LLM微调文档.md调后模型-1)
449+
使用微调后模型的web-ui可以查看[LLM微调文档](LLM微调文档.md#微调后模型)

docs/source/LLM/VLLM推理加速与部署.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,6 @@ from swift.llm import (
186186
ModelType, get_vllm_engine, get_default_template_type,
187187
get_template, inference_vllm
188188
)
189-
from swift.tuners import Swift
190189

191190
ckpt_dir = 'vx-xxx/checkpoint-100-merged'
192191
model_type = ModelType.qwen_7b_chat
@@ -240,7 +239,7 @@ CUDA_VISIBLE_DEVICES=0 swift app-ui --ckpt_dir 'xxx/vx-xxx/checkpoint-xxx-merged
240239
## 部署
241240
swift使用VLLM作为推理后端, 并兼容openai的API样式.
242241

243-
服务端的部署命令行参数可以参考: [deploy命令行参数](命令行参数.md#deploy-命令行参数).
242+
服务端的部署命令行参数可以参考: [deploy命令行参数](命令行参数.md#deploy-参数).
244243

245244
客户端的openai的API参数可以参考: https://platform.openai.com/docs/api-reference/introduction.
246245

docs/source/LLM/命令行参数.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@
8585
- `--disable_tqdm`: 是否不启用tqdm, 这在`nohup`启动脚本时很有用. 默认为`False`, 即为启动tqdm.
8686
- `--lazy_tokenize`: 如果设置为False, 则在`trainer.train()`之前提前对所有文本进行预处理. 如果设置为True, 则延迟对文本进行编码, 减少预处理的等待并减少内存占用, 这在处理大数据集时很有用. 默认为`None`, 即我们会根据template的类型进行智能选择, LLM的模型通常设置为False, 多模态的模型通常设置为True(避免图片和音频加载导致过多的内存占用).
8787
- `--preprocess_num_proc`: 在对数据集预处理时(对文本进行tokenize), 使用多进程. 默认为`1`. 与`lazy_tokenize`命令行参数一样, 用于解决预处理速度慢的问题. 但该策略无法减少内存占用, 所以如果当数据集巨大时, 建议使用`lazy_tokenize`. 推荐设置的值: 4, 8. 请注意: 当使用qwen-audio时, 该参数会强制设置为1, 因为qwen-audio的预处理函数中使用了torch的多进程, 会造成不兼容问题.
88-
- `--use_flash_attn`: 是否使用flash attn, 默认为`None`. 安装flash_attn的步骤可以查看[https://github.com/Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention). 支持flash_attn的模型可以查看[LLM支持的模型](支持的模型和数据集.md型).
88+
- `--use_flash_attn`: 是否使用flash attn, 默认为`None`. 安装flash_attn的步骤可以查看[https://github.com/Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention). 支持flash_attn的模型可以查看[LLM支持的模型](支持的模型和数据集.md#模型).
8989
- `--ignore_args_error`: 是否忽略命令行传参错误抛出的Error, 默认为`False`. 如果需要拷贝代码到notebook中运行, 需要设置成True.
9090
- `--check_model_is_latest`: 检查模型是否是最新, 默认为`True`. 如果你需要断网进行训练, 请将该参数设置为`False`.
9191
- `--logging_dir`: 默认为`None`. 即设置为`f'{self.output_dir}/runs'`, 表示tensorboard文件存储路径.
@@ -189,7 +189,7 @@ dpo参数继承了sft参数, 除此之外增加了以下参数:
189189
- `--model_revision`: 默认值为`None`. 具体的参数介绍可以在`sft.sh命令行参数`中查看. 如果`model_id_or_path`为None或者是本地的模型目录, 则该参数失效.
190190
- `--sft_type`: 默认值为`'lora'`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
191191
- `--template_type`: 默认值为`'AUTO'`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
192-
- `--infer_backend`: 你可以选择'AUTO', 'vllm', 'pt'. 默认使用'AUTO', 进行智能选择, 即如果没有传入`ckpt_dir`或使用全参数微调, 并且安装了vllm且模型支持vllm则使用vllm引擎, 否则使用原生torch进行推理. vllm环境准备可以参考[VLLM推理加速与部署](VLLM推理加速与部署.md境准备), vllm支持的模型可以查看[支持的模型](支持的模型和数据集.md型).
192+
- `--infer_backend`: 你可以选择'AUTO', 'vllm', 'pt'. 默认使用'AUTO', 进行智能选择, 即如果没有传入`ckpt_dir`或使用全参数微调, 并且安装了vllm且模型支持vllm则使用vllm引擎, 否则使用原生torch进行推理. vllm环境准备可以参考[VLLM推理加速与部署](VLLM推理加速与部署.md#环境准备), vllm支持的模型可以查看[支持的模型](支持的模型和数据集.md#模型).
193193
- `--ckpt_dir`: 必填项, 值为SFT阶段保存的checkpoint路径, e.g. `'/path/to/your/vx-xxx/checkpoint-xxx'`.
194194
- `--load_args_from_ckpt_dir`: 是否从`ckpt_dir``sft_args.json`文件中读取模型配置信息. 默认是`True`.
195195
- `--load_dataset_config`: 该参数只有在`--load_args_from_ckpt_dir true`时才生效. 即是否从`ckpt_dir``sft_args.json`文件中读取数据集相关的配置信息. 默认为`False`.

docs/source/LLM/自我认知微调最佳实践.md

-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88
- [微调](#微调)
99
- [微调后推理](#微调后推理)
1010
- [Web-UI](#web-ui)
11-
- [了解更多](#了解更多)
1211

1312
## 环境安装
1413
```bash

docs/source/conf.py

+1-4
Original file line numberDiff line numberDiff line change
@@ -83,10 +83,7 @@ def get_version():
8383
# List of patterns, relative to source directory, that match files and
8484
# directories to ignore when looking for source files.
8585
# This pattern also affects html_static_path and html_extra_path.
86-
exclude_patterns = [
87-
'build', 'source/.ipynb_checkpoints', 'source/api/generated', 'Thumbs.db',
88-
'.DS_Store'
89-
]
86+
exclude_patterns = ['build', 'source/.ipynb_checkpoints', 'source/api/generated', 'Thumbs.db', '.DS_Store']
9087
# A list of glob-style patterns [1] that are used to find source files.
9188
# They are matched against the source file names relative to the source directory,
9289
# using slashes as directory separators on all platforms.

docs/source_en/LLM/LLM-fine-tuning.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ import torch
4646

4747
from swift.llm import (
4848
DatasetName, InferArguments, ModelType, SftArguments,
49-
infer_main, sft_main, app_ui_main, merge_lora
49+
infer_main, sft_main, app_ui_main
5050
)
5151

5252
model_type = ModelType.qwen_7b_chat
@@ -133,12 +133,12 @@ cd examples/pytorch/llm
133133
- We default to setting `--gradient_checkpointing true` during training to **save memory**, which may slightly reduce training speed.
134134
- If you want to use quantization parameters `--quantization_bit 4`, you need to first install [bnb](https://github.com/TimDettmers/bitsandbytes): `pip install bitsandbytes -U`. This will reduce memory usage but usually slows down the training speed.
135135
- If you want to use quantization based on **auto_gptq**, you need to install the corresponding cuda version of [auto_gptq](https://github.com/PanQiWei/AutoGPTQ): `pip install auto_gptq -U`.
136-
> Models that can use auto_gptq can be viewed in [LLM Supported Models](supported-models-and-datasets.md#models). It is recommended to use auto_gptq instead of bnb.
136+
> Models that can use auto_gptq can be viewed in [LLM Supported Models](Supported-models-datasets.md#models). It is recommended to use auto_gptq instead of bnb.
137137
- If you want to use deepspeed, you need `pip install deepspeed -U`. Using deepspeed can **save memory**, but may slightly reduce training speed.
138-
- If your training involves **knowledge editing**, such as: [Self-aware Fine-tuning](self-aware-fine-tuning-best-practices.md), you need to add LoRA to MLP as well, otherwise, the results might be poor. You can simply pass the argument `--lora_target_modules ALL` to add lora to all linear(qkvo, mlp), **this is usually the best result**.
138+
- If your training involves **knowledge editing**, such as: [Self-aware Fine-tuning](Self-cognition-best-practice.md), you need to add LoRA to MLP as well, otherwise, the results might be poor. You can simply pass the argument `--lora_target_modules ALL` to add lora to all linear(qkvo, mlp), **this is usually the best result**.
139139
- If you are using older GPUs like **V100**, you need to set `--dtype AUTO` or `--dtype fp16`, as they do not support bf16.
140-
- If your machine has high-performance graphics cards like A100 and the model supports flash-attn, it is recommended to install [**flash-attn**](https://github.com/Dao-AILab/flash-attention), which will speed up training and inference as well as reduce memory usage (A10, 3090, V100, etc. graphics cards do not support training with flash-attn). Models that support flash-attn can be viewed in [LLM Supported Models](supported-models-and-datasets.md#models)
141-
- If you are doing **second pre-training** or **multi-turn dialogue**, you can refer to [Customization and Extension](customization-and-extension.md#ways-to-register-datasets)
140+
- If your machine has high-performance graphics cards like A100 and the model supports flash-attn, it is recommended to install [**flash-attn**](https://github.com/Dao-AILab/flash-attention), which will speed up training and inference as well as reduce memory usage (A10, 3090, V100, etc. graphics cards do not support training with flash-attn). Models that support flash-attn can be viewed in [LLM Supported Models](Supported-models-datasets.md#models)
141+
- If you are doing **second pre-training** or **multi-turn dialogue**, you can refer to [Customization and Extension](Customization.md#Registering-Datasets)
142142
- If you need to train **offline**, please use `--model_id_or_path <model_dir>` and set `--check_model_is_latest false`. For specific parameter meanings, please check [Command-line Parameters](Command-line-parameters.md).
143143
- If you want to push weights to the ModelScope Hub during training, you need to set `--push_to_hub true`.
144144
- If you want to merge LoRA weights and save them during inference, you need to set `--merge_lora true`. **It is not recommended to merge** for models trained with qlora, as this will result in precision loss. Therefore **it is not recommended to fine-tune** with qlora, as the deployment ecology is not good.
@@ -175,7 +175,7 @@ CUDA_VISIBLE_DEVICES=0 swift export \
175175

176176
## Quantization
177177

178-
For quantization of the fine-tuned model, you can check [LLM Quantization Documentation](LLM-quantization.md#post-fine-tuning-model)
178+
For quantization of the fine-tuned model, you can check [LLM Quantization Documentation](LLM-quantization.md#fine-tuned-model)
179179

180180
## Inference
181181
If you want to use VLLM for accelerated inference, you can check [VLLM Inference Acceleration and Deployment](VLLM-inference-acceleration-and-deployment.md)

docs/source_en/LLM/LLM-inference.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# LLM Inference Documentation
2-
If you want to use vllm for inference acceleration, you can check out [VLLM Inference Acceleration and Deployment](VLLM-inference-acceleration-and-deployment.md#Inference Acceleration)
2+
If you want to use vllm for inference acceleration, you can check out [VLLM Inference Acceleration and Deployment](VLLM-inference-acceleration-and-deployment.md#inference-acceleration)
33

44
## Table of Contents
55
- [Environment Preparation](#Environment-Preparation)
@@ -394,7 +394,7 @@ CUDA_VISIBLE_DEVICES=0 swift infer --model_type yi-6b-chat
394394
```
395395

396396
### Fine-tuned Models
397-
If you want to perform inference using fine-tuned models, you can check out the [LLM Fine-tuning Documentation](LLM-fine-tuning.md#Fine-tuned Model)
397+
If you want to perform inference using fine-tuned models, you can check out the [LLM Fine-tuning Documentation](LLM-fine-tuning.md#Fine-tuned-Model)
398398

399399

400400
## Web-UI
@@ -444,4 +444,4 @@ app_ui_main(app_ui_args)
444444
```
445445

446446
### Fine-tuned Models
447-
To use the web-ui with fine-tuned models, you can check out the [LLM Fine-tuning Documentation](LLM-fine-tuning#Fine-tuned Model)
447+
To use the web-ui with fine-tuned models, you can check out the [LLM Fine-tuning Documentation](LLM-fine-tuning#fine-tuned-model)

docs/source_en/LLM/RLHF.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -82,8 +82,8 @@ cd examples/pytorch/llm
8282

8383
- We default to setting `--gradient_checkpointing true` during training to **save memory**, which will slightly reduce training speed.
8484
- If you are using older GPUs such as **V100**, you need to set `--dtype AUTO` or `--dtype fp16`, because they do not support bf16.
85-
- If your machine has high-performance graphics cards like A100 and you are using the qwen series models, we recommend installing [**flash-attn**](https://github.com/Dao-AILab/flash-attention), which will speed up training and inference as well as reduce memory usage (A10, 3090, V100, etc. graphics cards do not support training with flash-attn). Models that support flash-attn can be viewed in [LLM Supported Models](supported-models-and-datasets.md#models)
86-
- If you need to train offline, please use `--model_id_or_path <model_dir>` and set `--check_model_is_latest false`. For specific parameter meanings, please see [Command Line Arguments](command-line-arguments.md).
85+
- If your machine has high-performance graphics cards like A100 and you are using the qwen series models, we recommend installing [**flash-attn**](https://github.com/Dao-AILab/flash-attention), which will speed up training and inference as well as reduce memory usage (A10, 3090, V100, etc. graphics cards do not support training with flash-attn). Models that support flash-attn can be viewed in [LLM Supported Models](Supported-models-datasets.md#models)
86+
- If you need to train offline, please use `--model_id_or_path <model_dir>` and set `--check_model_is_latest false`. For specific parameter meanings, please see [Command Line Arguments](Command-line-parameters.md).
8787
- If you want to push weights to the ModelScope Hub during training, you need to set `--push_to_hub true`.
8888

8989
```bash

docs/source_en/LLM/Self-cognition-best-practice.md

-109
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ Fine-tune your own large model in just 10 minutes!
77
- [Fine-Tuning](#fine-tuning)
88
- [Inference After Fine-Tuning](#inference-after-fine-tuning)
99
- [Web-UI](#web-ui)
10-
- [Learn More](#learn-more)
1110

1211
## Environment Setup
1312
```bash
@@ -69,114 +68,6 @@ Using CLI:
6968
CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-4b-chat
7069
```
7170

72-
## Fine-Tuning
73-
Note: Since self-cognition training involves knowledge editing, it's suggested to add lora_target_modules to **MLP**. You can specify `--lora_target_modules ALL` to add LoRA to all the linear layers (including qkvo and mlp), which **usually yields the best results**.
74-
75-
Using python:
76-
```python
77-
# Experimental environment: A10, 3090, V100, ...
78-
# 23GB GPU memory
79-
import os
80-
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
81-
82-
from swift.llm import DatasetName, ModelType, SftArguments, sft_main
83-
84-
sft_args = SftArguments(
85-
model_type=ModelType.qwen1half_4b_chat,
86-
dataset=[DatasetName.ms_bench_mini],
87-
train_dataset_sample=1000,
88-
logging_steps=5,
89-
max_length=2048,
90-
learning_rate=5e-5,
91-
warmup_ratio=0.4,
92-
output_dir='output',
93-
lora_target_modules=['ALL'],
94-
self_cognition_sample=500,
95-
model_name=['Xiao Huang', 'Little Yellow'],
96-
model_author=['Moda', 'ModelScope'])
97-
output = sft_main(sft_args)
98-
best_model_checkpoint = output['best_model_checkpoint']
99-
print(f'best_model_checkpoint: {best_model_checkpoint}')
100-
101-
"""Out[0]
102-
...
103-
"""
104-
```
105-
106-
Using CLI (single GPU):
107-
```bash
108-
# Experimental environment: A10, 3090, V100, ...
109-
# 23GB GPU memory# Best Practices for Self-Cognition Fine-Tuning
110-
Fine-tune your own large model in just 10 minutes!
111-
112-
## Table of Contents
113-
- [Environment Setup](#environment-setup)
114-
- [Inference Before Fine-Tuning](#inference-before-fine-tuning)
115-
- [Fine-Tuning](#fine-tuning)
116-
- [Inference After Fine-Tuning](#inference-after-fine-tuning)
117-
- [Web-UI](#web-ui)
118-
- [Learn More](#learn-more)
119-
120-
## Environment Setup
121-
```bash
122-
# Set up the global pip mirror (for faster downloading)
123-
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
124-
# Install ms-swift
125-
pip install 'ms-swift[llm]' -U
126-
127-
# Align the environment (usually not necessary to run. If you get an error, you can run the following code, which has been tested with the latest environment in the repository)
128-
pip install -r requirements/framework.txt -U
129-
pip install -r requirements/llm.txt -U
130-
```
131-
132-
## Inference Before Fine-Tuning
133-
134-
Using Python:
135-
```python
136-
import os
137-
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
138-
139-
from swift.llm import ModelType, InferArguments, infer_main
140-
infer_args = InferArguments(model_type=ModelType.qwen1half_4b_chat)
141-
infer_main(infer_args)
142-
143-
"""
144-
<<< Who are you?
145-
I am a large-scale language model from Alibaba Cloud, my name is Qwen.
146-
--------------------------------------------------
147-
<<< What's your name?
148-
I am Qwen, a large language model from Alibaba Cloud.
149-
--------------------------------------------------
150-
<<< Who developed you?
151-
I am independently developed by Alibaba Cloud as a large-scale language model.
152-
--------------------------------------------------
153-
<<< Where is the capital of Zhejiang?
154-
The capital of Zhejiang is Hangzhou.
155-
--------------------------------------------------
156-
<<< What's delicious here?
157-
The cuisine of Zhejiang is very rich, for example, Hangzhou's West Lake Fish in Vinegar Gravy, Dongpo Pork, Longjing Shrimp, and Sister Song's Fish Soup are famous Zhejiang dishes. Additionally, there are many snacks in Zhejiang, such as fried dough sticks, shaomai, rice balls, zongzi, etc.
158-
--------------------------------------------------
159-
<<< What should I do if I can't sleep at night?
160-
If you can't sleep at night, try the following methods:
161-
162-
1. Relax: Try activities that help relax the body and mind, such as listening to music, practicing yoga, meditating, etc.
163-
164-
2. Regular routines: Try to maintain a regular daily routine, and avoid staying up late.
165-
166-
3. Avoid stimulants: Avoid spicy, greasy, caffeinated foods that may stimulate the nervous system and cause insomnia.
167-
168-
4. Exercise: Moderate exercise can help relax the body and improve sleep.
169-
170-
5. Drink milk before bed: Milk contains tryptophan which helps produce melatonin and can aid in sleep.
171-
"""
172-
```
173-
If you want to perform single-sample inference, you can refer to [LLM Inference Documentation](LLM-inference#qwen-7b-chat)
174-
175-
Using CLI:
176-
```bash
177-
CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-4b-chat
178-
```
179-
18071
## Fine-Tuning
18172
Note: Self-cognition training involves knowledge editing, so it is recommended to add `lora_target_modules` to **MLP**. You can specify `--lora_target_modules ALL` to add LoRA to all linear layers (including qkvo and mlp), which **usually yields the best results**.
18273

0 commit comments

Comments
 (0)