LLM-Hardware-Leaderboard

How to use

please select your hardware type

cp .env.example .env

uv venv source .venv/bin/activate uv pip install -e .

start different backend
download with no weight

gradio interface

compatibility as first link with each hardware type
document for hardware partner to help give us hardware to run our leaderboard -> the philiosphy no manual ops or anything docker run that serve a openai container request -> no crazy docker argument (meaning sensible default) What is benefit you show your hardware support the latest and greatest -> need to be in TGI or vLLM (or strong community demand for another backend such a llama.cpp, like support on edge device (or should i do ollama directly)) cannot expected user to learn a new framework -> contribute hardware to run nightly ci on it

-> simpler only a single request

hardware to support (use generic name and then give more info about the specific hardware)

nvidia gpu (done)
intel cpu (in-working)
intel habana
intel gpu
google tpu
aws inferentia
amd gpu
amd cpu
apple silicon cpu (use their logo) -> simple green box if compatible
save the command line to allow people to run the container on the platform (give more credibility)

Later:

montior energy with code-carbon
montior memory usage
dashboard to get more info like the running of the each benchmark (can i do all in 6h? otherwise max it parrallel on each model so 10 jobs? -> faster)
performance for each type of hardware (with aggregate option)
startup time (need to do docker pull first then)
find a way to get price

explain philosphy behind it: why benchamrk llm, why openai, big use cases

-> need to think about how to be smart about the benchmarks

if out of memory should not count as failure...?

also do not use gguf models?

how to manually test a backend with curl

curl http://127.0.0.1:8080/v1/chat/completions
-H "Content-Type: application/json"
-X POST
-d '{ "model": "openai-community/gpt2", "messages": [ { "role": "user", "content": "What is Deep Learning?" } ], "max_tokens": 20 }'

volume should be optional to prevent filling up the harddrive

docker run --gpus=all -p 8080:11434 --name llm-hardware-benchmark ollama/ollama

docker exec -it llm-hardware-benchmark ollama run llama3

curl http://localhost:8080/v1/chat/completions
-H "Content-Type: application/json"
-X POST
-d '{ "model": "llama3.2:1b", "messages": [ { "role": "user", "content": "What is Deep Learning?" } ], "max_tokens": 20 }'

ollama pull llama3

make this a tui instead
add status page on ai model is there are working

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
justfile		justfile
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-Hardware-Leaderboard

How to use

About

Releases

Packages

Languages

huggingface/ai-hardware-leaderboard

Folders and files

Latest commit

History

Repository files navigation

LLM-Hardware-Leaderboard

How to use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages