Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 3 additions & 8 deletions .github/workflows/ci_xpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,14 +60,11 @@ jobs:
runner_name="${{ runner.name }}"
last_char="${runner_name: -1}"

if [[ "$last_char" =~ [0-3] ]]; then
gpu_id="$last_char"
if [[ "$last_char" == "1" ]]; then
gpu_id="4"
else
gpu_id="0"
Comment on lines 65 to 66
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GPU_ID assignment logic seems inconsistent with the shell script expectations. The workflow sets gpu_id="4" when the last character of the runner name is "1", but in the shell script, GPU_ID is used in arithmetic operations expecting values like 0 or 4.

This means:

  • When last_char == "1": GPU_ID=4, ports will be 8188 + 4*100 = 8588
  • When last_char != "1": GPU_ID=0, ports will be 8188 + 0*100 = 8188

However, there's a potential issue: if the runner name ends with digits 2, 3, or other values, GPU_ID will always be 0, which could cause port conflicts if multiple runners execute simultaneously.

Consider documenting this behavior or adding validation to ensure only expected runner names are used.

Suggested change
else
gpu_id="0"
elif [[ "$last_char" == "0" ]]; then
gpu_id="0"
else
echo "Error: Unexpected runner name '$runner_name'. The last character must be '0' or '1'."
exit 1

Copilot uses AI. Check for mistakes.
fi
FD_API_PORT=$((9180 + gpu_id * 100))
FD_ENGINE_QUEUE_PORT=$((9150 + gpu_id * 100))
FD_METRICS_PORT=$((9170 + gpu_id * 100))
PARENT_DIR=$(dirname "$WORKSPACE")
echo "PARENT_DIR:$PARENT_DIR"
docker run --rm --net=host --cap-add=SYS_PTRACE --privileged --shm-size=64G \
Expand All @@ -77,9 +74,7 @@ jobs:
-e "http_proxy=$(git config --global --get http.proxy)" \
-e "https_proxy=$(git config --global --get https.proxy)" \
-e "no_proxy=bcebos.com,mirrors.tuna.tsinghua.edu.cn,127.0.0.1,localhost" \
-e "FD_API_PORT=${FD_API_PORT}" \
-e "FD_ENGINE_QUEUE_PORT=${FD_ENGINE_QUEUE_PORT}" \
-e "FD_METRICS_PORT=${FD_METRICS_PORT}" \
-e "GPU_ID=${gpu_id}" \
${docker_image} /bin/bash -c "
git config --global --add safe.directory /workspace/FastDeploy
cd FastDeploy
Expand Down
76 changes: 62 additions & 14 deletions scripts/run_ci_xpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,27 @@ apt install -y lsof
function stop_processes() {
ps -efww | grep -E 'cache_transfer_manager.py' | grep -v grep | awk '{print $2}' | xargs kill -9 || true
ps -efww | grep -E 'api_server' | grep -v grep | awk '{print $2}' | xargs kill -9 || true
ps -efww | grep -E '8188' | grep -v grep | awk '{print $2}' | xargs kill -9 || true
lsof -t -i :8188 | xargs kill -9 || true
ps -efww | grep -E "$((8188 + GPU_ID * 100))" | grep -v grep | awk '{print $2}' | xargs kill -9 || true
lsof -t -i :$((8188 + GPU_ID * 100)) | xargs kill -9 || true
}
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stop_processes function still searches for and kills processes using hardcoded port 8188. When GPU_ID != 0, services will run on different ports (e.g., 8288 for GPU_ID=4), so these cleanup commands won't properly stop those processes.

Consider making the port pattern dynamic or using broader process matching:

function stop_processes() {
    ps -efww | grep -E 'cache_transfer_manager.py' | grep -v grep | awk '{print $2}' | xargs kill -9 || true
    ps -efww | grep -E 'api_server' | grep -v grep | awk '{print $2}' | xargs kill -9 || true
    # Kill based on GPU_ID specific port if set
    if [[ -n "$GPU_ID" ]]; then
        local port=$((8188 + GPU_ID * 100))
        ps -efww | grep -E "$port" | grep -v grep | awk '{print $2}' | xargs kill -9 || true
        lsof -t -i :$port | xargs kill -9 || true
    else
        ps -efww | grep -E '8188' | grep -v grep | awk '{print $2}' | xargs kill -9 || true
        lsof -t -i :8188 | xargs kill -9 || true
    fi
}

Copilot uses AI. Check for mistakes.
stop_processes

#设置模型路径
export model_path=${MODEL_PATH}/ERNIE-4.5-300B-A47B-Paddle
# 由于机器原因,需重启使用的卡,以保障没有问题
if [[ "$GPU_ID" == "0" ]]; then
export XPU_VISIBLE_DEVICES="0,1,2,3"
else
export XPU_VISIBLE_DEVICES="4,5,6,7"
fi

mkdir -p /workspace/deps
cd /workspace/deps
wget -q https://klx-sdk-release-public.su.bcebos.com/xre/kl3-release/5.0.21.21/xre-Linux-x86_64-5.0.21.21.tar.gz
tar -zxf xre-Linux-x86_64-5.0.21.21.tar.gz && mv xre-Linux-x86_64-5.0.21.21 xre
cd -
export PATH=/workspace/deps/xre/bin:$PATH

xpu-smi -r -i $XPU_VISIBLE_DEVICES
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XPU reset command xpu-smi -r -i $XPU_VISIBLE_DEVICES uses the value of XPU_VISIBLE_DEVICES directly. However, according to the XPU-SMI documentation, the -i flag typically expects device indices (e.g., "0,1,2,3") which matches the format of XPU_VISIBLE_DEVICES. This should work correctly, but consider verifying that the reset succeeds before proceeding, as a failed reset could cause subsequent tests to fail in unexpected ways.

Consider adding error checking:

xpu-smi -r -i $XPU_VISIBLE_DEVICES || { echo "XPU reset failed"; exit 1; }
Suggested change
xpu-smi -r -i $XPU_VISIBLE_DEVICES
xpu-smi -r -i $XPU_VISIBLE_DEVICES || { echo "XPU reset failed"; exit 1; }

Copilot uses AI. Check for mistakes.
xpu-smi

echo "pip requirements"
python -m pip install -r requirements.txt
Expand Down Expand Up @@ -51,11 +65,19 @@ rm -f core*
#清空消息队列
ipcrm --all=msg
echo "============================开始V1模式测试!============================"
export XPU_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
if [[ "$GPU_ID" == "0" ]]; then
export XPU_VISIBLE_DEVICES="0,1,2,3"
else
export XPU_VISIBLE_DEVICES="4,5,6,7"
fi
export port_num=$((8188 + GPU_ID * 100))
python -m fastdeploy.entrypoints.openai.api_server \
--model ${model_path} \
--port 8188 \
--tensor-parallel-size 8 \
--model ${MODEL_PATH}/ERNIE-4.5-300B-A47B-Paddle \
--port $port_num \
--engine-worker-queue-port $((port_num + 1)) \
--metrics-port $((port_num + 2)) \
--cache-queue-port $((port_num + 47873)) \
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache queue port calculation $((port_num + 47873)) could result in very large port numbers that might exceed the valid port range (0-65535).

For example:

  • When GPU_ID=0: port_num=8188, cache-queue-port = 8188 + 47873 = 56061 (valid)
  • When GPU_ID=4: port_num=8588, cache-queue-port = 8588 + 47873 = 56461 (valid)

While this works for the current GPU_ID values (0 and 4), if GPU_ID ever increases or the base port changes, this could exceed 65535. Consider using a different offset or documenting the maximum supported GPU_ID value.

Copilot uses AI. Check for mistakes.
--tensor-parallel-size 4 \
--num-gpu-blocks-override 16384 \
--max-model-len 32768 \
--max-num-seqs 128 \
Expand Down Expand Up @@ -119,10 +141,18 @@ rm -f core*
#清空消息队列
ipcrm --all=msg
echo "============================开始W4A8测试!============================"
export XPU_VISIBLE_DEVICES="0,1,2,3"
if [[ "$GPU_ID" == "0" ]]; then
export XPU_VISIBLE_DEVICES="0,1,2,3"
else
export XPU_VISIBLE_DEVICES="4,5,6,7"
fi
export port_num=$((8188 + GPU_ID * 100))
python -m fastdeploy.entrypoints.openai.api_server \
--model ${MODEL_PATH}/ERNIE-4.5-300B-A47B-W4A8C8-TP4-Paddle \
--port 8188 \
--port $port_num \
--engine-worker-queue-port $((port_num + 1)) \
--metrics-port $((port_num + 2)) \
--cache-queue-port $((port_num + 47873)) \
--tensor-parallel-size 4 \
--num-gpu-blocks-override 16384 \
--max-model-len 32768 \
Expand Down Expand Up @@ -187,10 +217,18 @@ rm -f core*
#清空消息队列
ipcrm --all=msg
echo "============================开始vl模型测试!============================"
export XPU_VISIBLE_DEVICES="0,1,2,3"
if [[ "$GPU_ID" == "0" ]]; then
export XPU_VISIBLE_DEVICES="0,1,2,3"
else
export XPU_VISIBLE_DEVICES="4,5,6,7"
fi
export port_num=$((8188 + GPU_ID * 100))
python -m fastdeploy.entrypoints.openai.api_server \
--model ${MODEL_PATH}/ERNIE-4.5-VL-28B-A3B-Paddle \
--port 8188 \
--port $port_num \
--engine-worker-queue-port $((port_num + 1)) \
--metrics-port $((port_num + 2)) \
--cache-queue-port $((port_num + 47873)) \
--tensor-parallel-size 4 \
--max-model-len 32768 \
--max-num-seqs 10 \
Expand Down Expand Up @@ -257,7 +295,12 @@ rm -rf log/*
rm -f core*
ipcrm --all=msg
xpu-smi
export XPU_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
if [[ "$GPU_ID" == "0" ]]; then
export XPU_VISIBLE_DEVICES="0,1,2,3"
else
export XPU_VISIBLE_DEVICES="4,5,6,7"
fi

export BKCL_ENABLE_XDR=1
export BKCL_RDMA_NICS=xgbe1,xgbe2,xgbe3,xgbe4
export BKCL_TRACE_TOPO=1
Expand Down Expand Up @@ -301,7 +344,12 @@ rm -rf log/*
rm -f core*
ipcrm --all=msg
xpu-smi
export XPU_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
if [[ "$GPU_ID" == "0" ]]; then
export XPU_VISIBLE_DEVICES="0,1,2,3"
else
export XPU_VISIBLE_DEVICES="4,5,6,7"
fi

export BKCL_ENABLE_XDR=1
export BKCL_RDMA_NICS=xgbe1,xgbe2,xgbe3,xgbe4
export BKCL_TRACE_TOPO=1
Expand Down
4 changes: 3 additions & 1 deletion tests/ci_use/XPU_45T/run_45T.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,15 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os

import openai


def test_45t():
ip = "0.0.0.0"
service_http_port = "8188" # 服务配置的
gpu_id = int(os.getenv("GPU_ID", "0"))
service_http_port = 8188 + gpu_id * 100 # 服务配置的
client = openai.Client(base_url=f"http://{ip}:{service_http_port}/v1", api_key="EMPTY_API_KEY")
# base_response_110 = "你好!我是一个基于人工智能技术开发的助手,可以帮你解答问题、提供建议、聊天交流或者完成一些任务。无论是学习、工作还是生活中的疑问,都可以随时告诉我哦~😊 你有什么想聊的吗?"
# base_response_104 = "你好!我是一个基于人工智能技术打造的助手,可以帮你解答问题、提供建议、分享知识,或者陪你聊聊天~😊 无论是学习、工作、生活还是娱乐相关的问题,都可以随时告诉我哦!你今天有什么想聊的吗?"
Expand Down
4 changes: 3 additions & 1 deletion tests/ci_use/XPU_45T/run_45vl.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,15 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os

import openai


def test_45vl():
ip = "0.0.0.0"
service_http_port = "8188" # 服务配置的
gpu_id = int(os.getenv("GPU_ID", "0"))
service_http_port = 8188 + gpu_id * 100 # 服务配置的
client = openai.Client(base_url=f"http://{ip}:{service_http_port}/v1", api_key="EMPTY_API_KEY")
base_response = "北魏时期"
# 非流式对话
Expand Down
5 changes: 3 additions & 2 deletions tests/ci_use/XPU_45T/run_ep.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,9 @@ def test_fd_ep():
else:
tensor_parallel_size = xpu_device_num
data_parallel_size = 1

engine_worker_queue_port = [str(8023 + i * 10) for i in range(data_parallel_size)]
gpu_id = int(os.getenv("GPU_ID", "0"))
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base_port calculation is consistent with the pattern used in other test files (8188 for GPU_ID=0). However, this differs from the base port 8023 used before the change. Consider documenting why different tests use different base ports:

  • run_45T.py, run_w4a8.py, run_45vl.py: base 8188
  • run_ep.py: base 8023

This could be intentional for avoiding port conflicts between different test types, but it should be documented to prevent confusion.

Suggested change
gpu_id = int(os.getenv("GPU_ID", "0"))
gpu_id = int(os.getenv("GPU_ID", "0"))
# Note: This test uses base_port=8023 (vs. 8188 in run_45T.py, run_w4a8.py, run_45vl.py).
# This is intentional to avoid port conflicts between different test types.
# If you modify the base port here, ensure it does not overlap with other test scripts.

Copilot uses AI. Check for mistakes.
base_port = 8023 + gpu_id * 100
engine_worker_queue_port = [str(base_port + i * 10) for i in range(data_parallel_size)]
engine_worker_queue_port = ",".join(engine_worker_queue_port)

llm = LLM(
Expand Down
8 changes: 5 additions & 3 deletions tests/ci_use/XPU_45T/run_w4a8.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,18 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os

import openai


def test_w4a8():
ip = "0.0.0.0"
service_http_port = "8188" # 服务配置的
gpu_id = int(os.getenv("GPU_ID", "0"))
service_http_port = 8188 + gpu_id * 100 # 服务配置的
client = openai.Client(base_url=f"http://{ip}:{service_http_port}/v1", api_key="EMPTY_API_KEY")
#base_response_110 = "你好!我是一个基于人工智能技术的助手,可以帮你解答问题、提供建议、聊天或者协助完成各种任务。无论是学习、工作还是生活中的疑问,我都可以尽力提供帮助。😊 你有什么想聊的吗?"
#base_response_104 = "你好!我是一个人工智能助手,可以帮你解答问题、提供建议、聊天或者完成一些任务。无论是学习、工作还是生活中的疑问,我都可以尽力帮忙哦~有什么需要我做的吗?😊"
# base_response_110 = "你好!我是一个基于人工智能技术的助手,可以帮你解答问题、提供建议、聊天或者协助完成各种任务。无论是学习、工作还是生活中的疑问,我都可以尽力提供帮助。😊 你有什么想聊的吗?"
# base_response_104 = "你好!我是一个人工智能助手,可以帮你解答问题、提供建议、聊天或者完成一些任务。无论是学习、工作还是生活中的疑问,我都可以尽力帮忙哦~有什么需要我做的吗?😊"
# 非流式对话
response = client.chat.completions.create(
model="default",
Expand Down
Loading