[New Feature] Add cpu core pinning to vllm-server to improve performance. #502

louie-tsai · 2025-10-29T20:24:37Z

Purpose

Identify a performance issue on GNR.
Fix the performance gap by pinning the right number of CPUs for different models, and maintain the model and #cpu mapping in CSV files as lookup tables.
Add some python scripts to generate right CPU id list and pinning CPU for vllm-server with a docker-compose.override.yaml file.
We also apply same workflows on EMR.
It not only help on Gaudi performance and also release other idle CPU for other CPU workloads.

docker-compose.override.yaml example

services:
vllm-server:
cpuset: "21,22,23,45,46,47,69,70,71,93,94,95,117,118,119,141,42,143"
cpus: "18"

Test Plan

manually tested.

Test Result

GNR

By pinning different number of CPUs, we could see different throughput, TTFT and TPOT on different models.

Llama3.1 405B
For Llama3.1 405B, 18 CPU cores gave the best performance, so we map Llama3.1 405B with number of CPU "18"

Llama3.1 70B
For Llama3.1 70B, 12 CPU cores gave the best performance, so we map Llama3.1 70B with number of CPU "12"

Why performance drop when we use more CPUs?

Here are perfspect results for #CPU=18 and #CPU=24 cases.

#CPU=18
CPU Frequency is around 2300 Hz.

Gaudi utilization is around 40%.

#CPU = 24
CPU frequency dropped to ~1800 Hz

Gaudi utilization dropped to 30%.

Therefore, more CPU cores than needed might drop the CPU frequency and it also drop the Gaudi utilization due to low performance on CPU.

github-actions · 2025-10-29T20:24:48Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

github-actions · 2025-10-29T20:25:40Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

PatrykWo

Correct pre-commit.
The readme is quite unfriendly. Let's sync

github-actions · 2025-10-29T23:24:29Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
d4aa14434397b46a562f93d0371719e62d9bd62d

.cd/README.md

.cd/server/cpu_binding/cpu_binding.py

.cd/server/cpu_binding/cpu_binding_emr.csv

.cd/server/cpu_binding/requirements_cpu_binding.txt

PatrykWo

Some changes are needed.

github-actions · 2025-10-31T22:48:20Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

github-actions · 2025-10-31T23:12:55Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

github-actions · 2025-10-31T23:23:48Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

github-actions · 2025-10-31T23:26:25Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

github-actions · 2025-11-04T01:12:03Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
0384aa7150c4c9778efca041ffd1beb3ad2bd694

github-actions · 2025-11-04T08:20:37Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
0384aa7150c4c9778efca041ffd1beb3ad2bd694

.cd/README.md

github-actions · 2025-11-05T18:03:56Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
0384aa7150c4c9778efca041ffd1beb3ad2bd694

github-actions · 2025-11-06T08:56:25Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

.cd/README.md

github-actions · 2025-11-06T18:53:18Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

github-actions · 2025-11-06T20:08:17Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

github-actions · 2025-11-06T20:08:59Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

… 70B Signed-off-by: louie-tsai <[email protected]> Signed-off-by: Tsai, Louie <[email protected]>

Signed-off-by: louie-tsai <[email protected]> Signed-off-by: Tsai, Louie <[email protected]>

use one CPU id per core, and fallback model_id match if no input/output match Signed-off-by: louie-tsai <[email protected]> Signed-off-by: Tsai, Louie <[email protected]>

move all scripts under cpu_binding folder updated README Signed-off-by: louie-tsai <[email protected]> Signed-off-by: Tsai, Louie <[email protected]>

github-actions · 2025-11-06T21:26:32Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
0384aa7150c4c9778efca041ffd1beb3ad2bd694

PatrykWo

LGTM

louie-tsai requested review from adobrzyn, afierka-intel, iboiko-habana, kzawora-intel, mgawarkiewicz-intel, michalkuligowski, mswiniarsk, vivekgoe and xuechendi as code owners October 29, 2025 20:24

louie-tsai force-pushed the cpu_pinning branch from ffe97f3 to 403c9c6 Compare October 29, 2025 20:25

louie-tsai force-pushed the cpu_pinning branch from 403c9c6 to c070b05 Compare October 29, 2025 20:26

PatrykWo reviewed Oct 29, 2025

View reviewed changes

louie-tsai force-pushed the cpu_pinning branch 3 times, most recently from 1ea2924 to 8b3160f Compare October 29, 2025 22:34

louie-tsai requested a review from PatrykWo October 30, 2025 02:55

louie-tsai commented Oct 30, 2025

View reviewed changes

.cd/README.md Outdated Show resolved Hide resolved

.cd/README.md Outdated Show resolved Hide resolved

.cd/README.md Outdated Show resolved Hide resolved

PatrykWo reviewed Oct 31, 2025

View reviewed changes

.cd/server/cpu_binding/cpu_binding.py Show resolved Hide resolved

PatrykWo reviewed Oct 31, 2025

View reviewed changes

.cd/server/cpu_binding/cpu_binding_emr.csv Show resolved Hide resolved

PatrykWo reviewed Oct 31, 2025

View reviewed changes

.cd/server/cpu_binding/requirements_cpu_binding.txt Show resolved Hide resolved

PatrykWo reviewed Oct 31, 2025

View reviewed changes

louie-tsai force-pushed the cpu_pinning branch from 64ec550 to df4933d Compare October 31, 2025 23:23

louie-tsai force-pushed the cpu_pinning branch from 3b581d8 to a5c5ce0 Compare October 31, 2025 23:29

louie-tsai force-pushed the cpu_pinning branch from 11a1487 to ee6ea33 Compare November 4, 2025 00:21

louie-tsai requested a review from PatrykWo November 4, 2025 06:58

louie-tsai force-pushed the cpu_pinning branch from ee6ea33 to ca9a0fa Compare November 4, 2025 07:21

afierka-intel approved these changes Nov 4, 2025

View reviewed changes

PatrykWo requested changes Nov 4, 2025

View reviewed changes

.cd/README.md Show resolved Hide resolved

louie-tsai requested review from PatrykWo and nngokhale November 4, 2025 22:56

louie-tsai force-pushed the cpu_pinning branch 2 times, most recently from 01f8f55 to 4d36048 Compare November 5, 2025 17:03

PatrykWo added the enhancement New feature or request label Nov 6, 2025

PatrykWo reviewed Nov 6, 2025

View reviewed changes

.cd/README.md Outdated Show resolved Hide resolved

.cd/README.md Show resolved Hide resolved

louie-tsai force-pushed the cpu_pinning branch from 87b5236 to a4667e5 Compare November 6, 2025 20:08

louie-tsai and others added 5 commits November 6, 2025 12:09

add cpu core pinning to vllm-server on Gaudi3 + GNR for Llama405B and…

efdeef4

… 70B Signed-off-by: louie-tsai <[email protected]> Signed-off-by: Tsai, Louie <[email protected]>

adding emr cpu binding

10dd154

Signed-off-by: louie-tsai <[email protected]> Signed-off-by: Tsai, Louie <[email protected]>

pre-commit fix

efa3a1b

Signed-off-by: louie-tsai <[email protected]> Signed-off-by: Tsai, Louie <[email protected]>

fix even policy issue for 70b and 8b

ae66647

use one CPU id per core, and fallback model_id match if no input/output match Signed-off-by: louie-tsai <[email protected]> Signed-off-by: Tsai, Louie <[email protected]>

address reviewer feedbacks

45e0745

move all scripts under cpu_binding folder updated README Signed-off-by: louie-tsai <[email protected]> Signed-off-by: Tsai, Louie <[email protected]>

louie-tsai force-pushed the cpu_pinning branch 2 times, most recently from 89018b5 to 45e0745 Compare November 6, 2025 20:10

louie-tsai requested a review from PatrykWo November 6, 2025 20:10

PatrykWo approved these changes Nov 7, 2025

View reviewed changes

PatrykWo merged commit 2878f62 into vllm-project:main Nov 7, 2025
37 checks passed

[New Feature] Add cpu core pinning to vllm-server to improve performance. #502

[New Feature] Add cpu core pinning to vllm-server to improve performance. #502

Uh oh!

Conversation

louie-tsai commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

GNR

Uh oh!

github-actions bot commented Oct 29, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Oct 29, 2025

🚧 CI Blocked

Uh oh!

PatrykWo left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 29, 2025

✅ CI Passed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PatrykWo left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 31, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Oct 31, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Oct 31, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Oct 31, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Nov 4, 2025

✅ CI Passed

Uh oh!

github-actions bot commented Nov 4, 2025

✅ CI Passed

Uh oh!

Uh oh!

github-actions bot commented Nov 5, 2025

✅ CI Passed

Uh oh!

github-actions bot commented Nov 6, 2025

🚧 CI Blocked

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 6, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Nov 6, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Nov 6, 2025

🚧 CI Blocked

Uh oh!

github-actions bot commented Nov 6, 2025

✅ CI Passed

Uh oh!

PatrykWo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

louie-tsai commented Oct 29, 2025 •

edited

Loading