Sarathi-Serve

This is the official OSDI'24 artifact submission for paper #444, "Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve”.

Setup

Setup CUDA

Sarathi-Serve has been tested with CUDA 12.1 on A100 and A40 GPUs.

Clone repository

git clone https://[email protected]/msri/AI-Infrastructure/_git/llm-batching

Create mamba environment

Setup mamba if you don't already have it,

wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
bash Mambaforge-Linux-x86_64.sh # follow the instructions from there

Create a Python 3.10 environment,

mamba create -p ./env python=3.10

Install Sarathi-Serve

pip install -e . --extra-index-url https://flashinfer.ai/whl/cu121/torch2.3/

Reproducing Results

Refer to readmes in individual folders corresponding to each figure in osdi-experiments.

Citation

If you use our work, please consider citing our paper:

@article{agrawal2024taming,
  title={Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve},
  author={Agrawal, Amey and Kedia, Nitin and Panwar, Ashish and Mohan, Jayashree and Kwatra, Nipun and Gulavani, Bhargav S and Tumanov, Alexey and Ramjee, Ramachandran},
  journal={Proceedings of 18th USENIX Symposium on Operating Systems Design and Implementation, 2024, Santa Clara},
  year={2024}
}

Acknowledgment

This repository originally started as a fork of the vLLM project. Sarathi-Serve is a research prototype and does not have complete feature parity with open-source vLLM. We have only retained the most critical features and adopted the codebase for faster research iterations.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
cmake		cmake
csrc		csrc
examples		examples
pybinds_temp		pybinds_temp
sarathi.egg-info		sarathi.egg-info
sarathi		sarathi
.DS_Store		.DS_Store
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sarathi-Serve

Setup

Setup CUDA

Clone repository

Create mamba environment

Install Sarathi-Serve

Reproducing Results

Citation

Acknowledgment

About

Releases

Packages

Contributors 2

Languages

License

kumeagidi/vajra

Folders and files

Latest commit

History

Repository files navigation

Sarathi-Serve

Setup

Setup CUDA

Clone repository

Create mamba environment

Install Sarathi-Serve

Reproducing Results

Citation

Acknowledgment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages