Semi-PD

A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.

Paper

If you use Semi-PD for your research, please cite our paper:

@misc{hong2025semipd,
      title={semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage},
      author={Ke Hong, Lufang Chen, Zhong Wang, Xiuhong Li, Qiuli Mao, Jianping Ma, Chao Xiong, Guanyu Wu, Buhe Han, Guohao Dai, Yun Liang, Yu Wang},
      year={2025},
      eprint={2504.19867},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
    }

Acknowledgment

This repository originally started as a fork of the SGLang project. Semi-PD is a research prototype and does not have complete feature parity with open-source SGLang. We have only retained the most critical features and adopted the codebase for faster research iterations.

Build && Install

# setup the semi-pd conda environment
conda create -n semi_pd -y python=3.11
conda activate semi_pd

# Use the last release branch
git clone [email protected]:infinigence/Semi-PD.git
cd Semi-PD
pip install --upgrade pip

# build IPC dependency
cd ./semi-pd-ipc/
pip install -e .

For NVIDIA GPUs

# build Semi-PD
cd ..
pip install -e "python[all]" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python

For AMD GPUs

cd ../sgl-kernel
python setup_rocm.py install
cd ..
pip install -e "python[all_hip]"

Using docker to build base environment

You can follow the following steps to build the base environment, or build from Dockerfile.

Pull the NVIDIA image

docker pull lmsysorg/sglang:v0.4.4.post1-cu124

docker run -it --gpus all -p 30000:30000 -v /your/path:/your/path --ipc=host --name semi_pd v0.4.4.post1-cu124:latest

docker exec -it semi_pd bash

Pull the AMD image

docker pull lmsysorg/sglang:v0.4.4.post1-rocm630

docker run -it --device=/dev/kfd --device=/dev/dri --shm-size=32g -p 30000:30000 -v /your/path:/your/path --ipc=host --name semi_pd v0.4.4.post1-rocm630:latest

docker exec -it semi_pd bash

Then you can follow the Build && Install section to build Semi-PD.

Launching

Introduce

The implementation of compute isolation is based on Multi-Process Service (MPS). For NVIDIA GPUs, the MPS service must be manually enabled, whereas on AMD GPUs, it is enabled by default.

Enable MPS (NVIDIA)

export CUDA_MPS_ENABLE_PER_CTX_DEVICE_MULTIPROCESSOR_PARTITIONING=1
nvidia-cuda-mps-control -d

You can disable MPS service by using this cmd:

echo quit | sudo nvidia-cuda-mps-control

Run online serving

Semi-PD can be enabled using the --enable-semi-pd flag. Additionally, our implementation does not share activations between the prefill and decode phases, which may result in slightly higher memory usage compared to the original SGLang. If an out-of-memory issue occurs, consider reducing the value of --mem-fraction-static to mitigate memory pressure.

python3 -m sglang.launch_server \
  --model-path $MODEL_PATH --served-model-name $MODEL_NAME \
  --host 0.0.0.0 --port $SERVE_PORT --trust-remote-code  --disable-radix-cache \
  --enable-semi-pd  --mem-fraction-static 0.85 --tp $TP_SIZE

Evaluation

To reproduce the evaluation results

Please refer to the evaluation directory.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.devcontainer		.devcontainer
.github		.github
3rdparty/amd		3rdparty/amd
assets		assets
benchmark		benchmark
docker		docker
docs		docs
evaluation		evaluation
examples		examples
python		python
scripts		scripts
semi-pd-ipc		semi-pd-ipc
sgl-kernel		sgl-kernel
sgl-router		sgl-router
test		test
.clang-format-ignore		.clang-format-ignore
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Semi-PD

Paper

Acknowledgment

Build && Install

For NVIDIA GPUs

For AMD GPUs

Using docker to build base environment

Pull the NVIDIA image

Pull the AMD image

Launching

Introduce

Enable MPS (NVIDIA)

Run online serving

Evaluation

To reproduce the evaluation results

About

Uh oh!

Releases

Packages

Languages

License

madsys-dev/Semi-PD

Folders and files

Latest commit

History

Repository files navigation

Semi-PD

Paper

Acknowledgment

Build && Install

For NVIDIA GPUs

For AMD GPUs

Using docker to build base environment

Pull the NVIDIA image

Pull the AMD image

Launching

Introduce

Enable MPS (NVIDIA)

Run online serving

Evaluation

To reproduce the evaluation results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages