0.7.4 开发版CPU使用率一直100%,即使没处理请求的时候也一样 #14786
Closed
AndrewTsao
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
请问V1模式下,使用FLASHMLA开启状态,有8个进程CPU100%正常吗?
程序版本:
启动命令
VLLM_ATTENTION_BACKEND=FLASHMLA VLLM_USE_V1=1 OMP_NUM_THREADS=12 /opt/vllm-0.7.4-dev/bin/vllm serve DeepSeek-R1 --max-model-len 131072 --max-num-batched-tokens 8192 --enable-reasoning --reasoning-parser deepseek_r1 --api_key ${VLLM_API_KEY} --tensor-parallel-size 8 --trust-remote-code --disable-log-requests --enable-prefix-caching --enable-chunked-prefill --gpu_memory_utilization=0.95 -O3
硬件配置: 8 x H200
strace结果,
Beta Was this translation helpful? Give feedback.
All reactions