Document how to configure shared memory for multi GPU deployments

The [documentation](https://docs.sglang.ai/backend/server_arguments.html#tensor-parallelism) states

`python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 2`

is a way to enable multi-GPU tensor parallelism. However one must think how the processes (?) communicate together, usually there's a shared memory setup needed. And if this is not properly set, one might run into issues like:

```
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.cpp:81, unhandled system error (run with NCCL_DEBUG=INFO for details), NCCL version 2.21.5
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error.
Last error:
Error while creating shared memory segment /dev/shm/nccl-vzIpS6 (size 9637888)
```
when running sglang server.

This means the size of shared memory is too low.

When running in docker containers, this could be set up with `--shm-size` flag (see vllm's doc at https://docs.vllm.ai/en/latest/deployment/docker.html)

When running in kubernetes, it's possible that the default size for shared memory will not be enough for your containers, so one might need to set up bigger size. Common way to do it is mount `/dev/shm` as emptyDir and set up proper `sizeLimit`. Like this:

```
    spec:
      containers:
      - command:
        ... < your usual container setup > ...
        volumeMounts:
        - mountPath: /dev/shm
          name: shared
      volumes:
      - emptyDir:
          medium: Memory
          sizeLimit: 1Gi
        name: shared
```

I have found out that vllm project recommends 20Gi as a default value for the shared memory size, see https://github.com/vllm-project/production-stack/issues/44 and their helm chart value https://github.com/vllm-project/production-stack/pull/105/files#diff-7d931e53fe7db67b34609c58ca5e5e2788002e7f99657cc2879c7957112dd908R130

However I'm not sure where does this number come from. I was testing on the node with 2 NVIDIA L40 GPU's with DeepSeek-R1-Distill-Qwen-32B model, and having 1GiB of shared memory seemed enough.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document how to configure shared memory for multi GPU deployments #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Document how to configure shared memory for multi GPU deployments #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions