You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
is a way to enable multi-GPU tensor parallelism. However one must think how the processes (?) communicate together, usually there's a shared memory setup needed. And if this is not properly set, one might run into issues like:
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.cpp:81, unhandled system error (run with NCCL_DEBUG=INFO for details), NCCL version 2.21.5
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error.
Last error:
Error while creating shared memory segment /dev/shm/nccl-vzIpS6 (size 9637888)
When running in kubernetes, it's possible that the default size for shared memory will not be enough for your containers, so one might need to set up bigger size. Common way to do it is mount /dev/shm as emptyDir and set up proper sizeLimit. Like this:
However I'm not sure where does this number come from. I was testing on the node with 2 NVIDIA L40 GPU's with DeepSeek-R1-Distill-Qwen-32B model, and having 1GiB of shared memory seemed enough.
The text was updated successfully, but these errors were encountered:
The documentation states
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 2
is a way to enable multi-GPU tensor parallelism. However one must think how the processes (?) communicate together, usually there's a shared memory setup needed. And if this is not properly set, one might run into issues like:
when running sglang server.
This means the size of shared memory is too low.
When running in docker containers, this could be set up with
--shm-size
flag (see vllm's doc at https://docs.vllm.ai/en/latest/deployment/docker.html)When running in kubernetes, it's possible that the default size for shared memory will not be enough for your containers, so one might need to set up bigger size. Common way to do it is mount
/dev/shm
as emptyDir and set up propersizeLimit
. Like this:I have found out that vllm project recommends 20Gi as a default value for the shared memory size, see vllm-project/production-stack#44 and their helm chart value https://github.com/vllm-project/production-stack/pull/105/files#diff-7d931e53fe7db67b34609c58ca5e5e2788002e7f99657cc2879c7957112dd908R130
However I'm not sure where does this number come from. I was testing on the node with 2 NVIDIA L40 GPU's with DeepSeek-R1-Distill-Qwen-32B model, and having 1GiB of shared memory seemed enough.
The text was updated successfully, but these errors were encountered: