The Inference Perf project aims to provide GenAI inference performance benchmarking tool. It came out of wg-serving and is sponsored by SIG Scalability. See the proposal for more info.
This project is currently in development.
-
Setup a virtual environment and install inference-perf
pip install .
-
Run inference-perf CLI with a configuration file
inference-perf --config_file config.yml
-
See more examples
-
Build the container
docker build -t inference-perf .
-
Run the container
docker run -it --rm -v $(pwd)/config.yml:/workspace/config.yml inference-perf
Our community meeting is weekly at Th 11:30 PDT (Zoom Link, Meeting Notes).
We currently utilize the #wg-serving Slack channel for communications.
Contributions are welcomed, thanks for joining us!
Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.