Skip to content

Add latency, throughput, and serving benchmarks for Scout and Maverick #34

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 7, 2025

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Jun 4, 2025

TSIA. The two new models are showing up on the dashboard now. The only notable configuration difference is setting max_model_len to 8192 to avoid OOM.

cc @luccafong @zhewenl Let me know if the benchmark configurations make sense

@huydhn huydhn temporarily deployed to pytorch-x-vllm June 4, 2025 17:58 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 4, 2025 17:58 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 4, 2025 17:58 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 4, 2025 17:58 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 4, 2025 17:58 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 4, 2025 17:58 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 4, 2025 17:58 — with GitHub Actions Inactive
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 4, 2025 21:58 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 4, 2025 21:58 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 4, 2025 21:58 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 4, 2025 21:58 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 4, 2025 21:58 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 4, 2025 21:58 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 5, 2025 04:32 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 5, 2025 04:32 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 5, 2025 04:32 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 5, 2025 04:32 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 5, 2025 04:32 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 5, 2025 04:32 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 5, 2025 04:32 — with GitHub Actions Inactive
@huydhn huydhn requested a review from yangw-dev June 5, 2025 05:04
@huydhn
Copy link
Contributor Author

huydhn commented Jun 6, 2025

Because of vllm-project/vllm#18841 (comment), meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 isn't working yet. I will land this first, the follow up on the issue

Signed-off-by: Huy Do <[email protected]>
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 6, 2025 18:44 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 6, 2025 18:44 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 6, 2025 18:44 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to pytorch-x-vllm June 6, 2025 18:44 — with GitHub Actions Inactive
@huydhn huydhn merged commit 2a9e850 into main Jun 7, 2025
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants