Add latency, throughput, and serving benchmarks for Scout and Maverick #34

huydhn · 2025-06-04T17:50:21Z

TSIA. The two new models are showing up on the dashboard now. The only notable configuration difference is setting max_model_len to 8192 to avoid OOM.

cc @luccafong @zhewenl Let me know if the benchmark configurations make sense

Signed-off-by: Huy Do <[email protected]>

…-maverick" This reverts commit 9c88e73, reversing changes made to b45242e.

Signed-off-by: Huy Do <[email protected]>

huydhn · 2025-06-06T17:33:03Z

Because of vllm-project/vllm#18841 (comment), meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 isn't working yet. I will land this first, the follow up on the issue

Signed-off-by: Huy Do <[email protected]>

facebook-github-bot added the cla signed label Jun 4, 2025

Add the benchmark suite path

fd909c8

Signed-off-by: Huy Do <[email protected]>

huydhn temporarily deployed to pytorch-x-vllm June 4, 2025 17:58 — with GitHub Actions Inactive

Debug

b45242e

Signed-off-by: Huy Do <[email protected]>

huydhn had a problem deploying to pytorch-x-vllm June 4, 2025 20:59 — with GitHub Actions Error

huydhn added 3 commits June 4, 2025 14:55

Merge branch 'fix-lowercase-bug' into really-add-llama4-scout-maverick

9c88e73

Revert "Merge branch 'fix-lowercase-bug' into really-add-llama4-scout…

444a20b

…-maverick" This reverts commit 9c88e73, reversing changes made to b45242e.

Silly

5a68f9a

Signed-off-by: Huy Do <[email protected]>

huydhn had a problem deploying to pytorch-x-vllm June 4, 2025 21:57 — with GitHub Actions Error

Remove debug message

5ed2063

Signed-off-by: Huy Do <[email protected]>

huydhn temporarily deployed to pytorch-x-vllm June 4, 2025 21:58 — with GitHub Actions Inactive

huydhn added 2 commits June 4, 2025 18:58

Merge branch 'main' into really-add-llama4-scout-maverick

0fa2444

Signed-off-by: Huy Do <[email protected]>

Set max_model_len to 8192

c8e53fb

Signed-off-by: Huy Do <[email protected]>

huydhn temporarily deployed to pytorch-x-vllm June 5, 2025 04:32 — with GitHub Actions Inactive

huydhn requested a review from yangw-dev June 5, 2025 05:04

yangw-dev approved these changes Jun 5, 2025

View reviewed changes

huydhn added 2 commits June 6, 2025 10:43

Fail when there is nothing to upload

f7e313f

Signed-off-by: Huy Do <[email protected]>

Another tweak

9a42a09

Signed-off-by: Huy Do <[email protected]>

huydhn had a problem deploying to pytorch-x-vllm June 6, 2025 17:45 — with GitHub Actions Failure

huydhn had a problem deploying to pytorch-x-vllm June 6, 2025 17:45 — with GitHub Actions Error

huydhn had a problem deploying to pytorch-x-vllm June 6, 2025 17:45 — with GitHub Actions Failure

Silly

7d3c040

Signed-off-by: Huy Do <[email protected]>

huydhn temporarily deployed to pytorch-x-vllm June 6, 2025 18:44 — with GitHub Actions Inactive

huydhn had a problem deploying to pytorch-x-vllm June 6, 2025 18:44 — with GitHub Actions Failure

huydhn temporarily deployed to pytorch-x-vllm June 6, 2025 18:44 — with GitHub Actions Inactive

huydhn merged commit 2a9e850 into main Jun 7, 2025
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add latency, throughput, and serving benchmarks for Scout and Maverick #34

Add latency, throughput, and serving benchmarks for Scout and Maverick #34

huydhn commented Jun 4, 2025 •

edited

Loading

Uh oh!

huydhn commented Jun 6, 2025

Uh oh!

Uh oh!

Uh oh!

Add latency, throughput, and serving benchmarks for Scout and Maverick #34

Add latency, throughput, and serving benchmarks for Scout and Maverick #34

Conversation

huydhn commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huydhn commented Jun 6, 2025

Uh oh!

Uh oh!

Uh oh!

huydhn commented Jun 4, 2025 •

edited

Loading