Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add deepseekv3 doc #3265

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 109 additions & 0 deletions docs/en/benchmark/deepseekv3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# DeepSeekV3 Benchmarking

## Benchmark results

______________________________________________________________________

### v0.7.1 + `9528a74`

| max bsz | prompt no | input-len | output-len | input token throughput (tok/s) | output token throughput (tok/s) |
| ------- | :-------: | :-------: | :--------: | :----------------------------: | :-----------------------------: |
| 1024 | 10000 | 2048 | 1024 | 3489.50 | 1743.56 |
| 1024 | 10000 | 2048 | 2048 | 1665.07 | 1682.41 |
| 1024 | 10000 | 2048 | 4096 | 725.01 | 1455.12 |
| 1024 | 10000 | 2048 | 8192 | 253.17 | 1009.80 |
| 128 | 2000 | 2048 | 16384 | 76.78 | 600.07 |
| 128 | 2000 | 2048 | 32768 | 17.75 | 281.89 |

For output lengths of 16k and 32k, we decrease the total prompt numbers to shorten the execution time.

## User guide

______________________________________________________________________

### Installation

In this document, we will provide step-by-step guidance on how to set up DeepSeekV3 inference with LMDeploy on a multi-node cluster.

We highly recommend that users adopt our official docker image to avoid potential errors caused by environmental differences. Execute the following commands on both head and slave nodes to create docker containers.

```bash
docker run -it \
--gpus all \
--network host \
--ipc host \
--name lmdeploy \
--privileged \
-v "/path/to/the/huggingface/home/in/this/node":"root/.cache/huggingface" \
openmmlab/lmdeploy:latest-cu12
```

`--privileged` is required for enabling RDMA.

### Build a multi-node cluster using Ray

> :warning: The following operations are all assumed to be performed within the Docker container.
> We will build a Ray cluster consisting of docker containers, therefore commands executed on the host machine terminal won't be able to access this cluster.

LMdeploy utilizes Ray for multi-node cluster construction. In the following steps, we will build a Ray cluster with two nodes for illustration.

Start the ray cluster on the head node. The default port in Ray is 6379 (change it to your own).

```bash
ray start --head --port=6379
```

Start on the slave nodes to join in the ray cluster, change the ip and port to your own:

```bash
ray start --address=${head_node_ip}:6379
```

Use the following commands to check the ray cluster status on both head and slave nodes. You should be able to see the ray cluster status of multiple nodes information.

```bash
ray status
```

### Launch service

Use the following commands to launch the LMDeploy DeepSeekV3 API service. We currently support TP16 deployment.

```bash
lmdeploy serve api_server deepseek-ai/DeepSeek-V3 --backend pytorch --tp 16
```

### Benchmarking

To benchmark LMDeploy DeepSeekV3 inference performance, you may refer to the following scripts and modify the parameters according to your needs.

```bash
#!/bin/bash

num_prompts=10000
backend="lmdeploy"
dataset_name="random"
dataset_path="./benchmark/ShareGPT_V3_unfiltered_cleaned_split.json"

echo ">>> num_prompts: ${num_prompts}, dataset: ${dataset_name}"

for in_len in 2048
do
echo "input len: ${in_len}"

for out_len in 1024 2048 4096 8192
do
echo "output len: ${out_len}"

python3 benchmark/profile_restful_api.py \
--backend ${backend} \
--dataset-name ${dataset_name} \
--dataset-path ${dataset_path} \
--num-prompts ${num_prompts} \
--random-input-len ${in_len} \
--random-output-len ${out_len}
done

done

```
1 change: 1 addition & 0 deletions docs/en/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ Documentation
:caption: Benchmark

benchmark/benchmark.md
benchmark/deepseekv3.md
benchmark/evaluate_with_opencompass.md

.. toctree::
Expand Down
109 changes: 109 additions & 0 deletions docs/zh_cn/benchmark/deepseekv3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# DeepSeekV3 Benchmarking

## 性能测试结果

______________________________________________________________________

### v0.7.1 + `9528a74`

| max bsz | prompt no | input-len | output-len | input token throughput (tok/s) | output token throughput (tok/s) |
| ------- | :-------: | :-------: | :--------: | :----------------------------: | :-----------------------------: |
| 1024 | 10000 | 2048 | 1024 | 3489.50 | 1743.56 |
| 1024 | 10000 | 2048 | 2048 | 1665.07 | 1682.41 |
| 1024 | 10000 | 2048 | 4096 | 725.01 | 1455.12 |
| 1024 | 10000 | 2048 | 8192 | 253.17 | 1009.80 |
| 128 | 2000 | 2048 | 16384 | 76.78 | 600.07 |
| 128 | 2000 | 2048 | 32768 | 17.75 | 281.89 |

对于输出长度为 16k 和 32k 的情况,我们减少了prompt no以缩短实验时间。

## 用户指南

______________________________________________________________________

### 安装

在本文档中,我们将提供在多节点集群上使用 LMDeploy 部署 DeepSeekV3 的详细教程。

我们强烈建议用户采用官方 Docker 镜像,以避免因环境差异而导致的潜在问题。请在主节点和从节点上执行以下命令来创建 Docker 容器。

```bash
docker run -it \
--gpus all \
--network host \
--ipc host \
--name lmdeploy \
--privileged \
-v "/path/to/the/huggingface/home/in/this/node":"root/.cache/huggingface" \
openmmlab/lmdeploy:latest-cu12
```

其中 `--privileged` 是开启 RDMA 必需的参数。

### 用Ray构建多节点集群

> :warning: 以下所有操作均默认在 Docker 容器内执行。
> 我们将构建一个由 Docker 容器组成的 Ray 集群,因此在宿主机终端上执行的命令将无法访问该集群。

LMdeploy 使用 Ray 来构建多节点集群。在接下来的步骤中,我们将以两个节点为例构建一个 Ray 集群。

在主节点上启动 Ray 集群。Ray 的默认端口是 6379(请按需修改)。

```bash
ray start --head --port=6379
```

在从节点上启动并加入 Ray 集群,请按需修改主节点 ip 和 port:

```bash
ray start --address=${head_node_ip}:6379
```

使用以下命令在主节点和从节点上检查 Ray 集群状态。您应该能够看到包含多节点信息的 Ray 集群状态。

```bash
ray status
```

### 启动服务

使用以下命令启动 LMDeploy DeepSeekV3 API 服务。我们目前支持 TP16 部署。

```bash
lmdeploy serve api_server deepseek-ai/DeepSeek-V3 --backend pytorch --tp 16
```

### 性能测试

要对 LMDeploy DeepSeekV3 的推理性能进行基准测试,您可以参考以下脚本,并根据需要修改参数。

```bash
#!/bin/bash

num_prompts=10000
backend="lmdeploy"
dataset_name="random"
dataset_path="./benchmark/ShareGPT_V3_unfiltered_cleaned_split.json"

echo ">>> num_prompts: ${num_prompts}, dataset: ${dataset_name}"

for in_len in 2048
do
echo "input len: ${in_len}"

for out_len in 1024 2048 4096 8192
do
echo "output len: ${out_len}"

python3 benchmark/profile_restful_api.py \
--backend ${backend} \
--dataset-name ${dataset_name} \
--dataset-path ${dataset_path} \
--num-prompts ${num_prompts} \
--random-input-len ${in_len} \
--random-output-len ${out_len}
done

done

```
1 change: 1 addition & 0 deletions docs/zh_cn/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ LMDeploy 工具箱提供以下核心功能:
:caption: 测试基准

benchmark/benchmark.md
benchmark/deepseekv3.md
benchmark/evaluate_with_opencompass.md

.. toctree::
Expand Down