Skip to content

Commit 4d36048

Browse files
committed
address reviewer feedbacks
move all scripts under cpu_binding folder updated README Signed-off-by: louie-tsai <[email protected]> Signed-off-by: Tsai, Louie <[email protected]>
1 parent bc7b9e2 commit 4d36048

File tree

7 files changed

+39
-37
lines changed

7 files changed

+39
-37
lines changed

.cd/README.md

Lines changed: 35 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -64,40 +64,6 @@ cd vllm-gaudi/.cd/
6464

6565
This launches the vLLM server and runs the benchmark suite automatically.
6666

67-
#### 2.1 (Optional) Running the Server with a Benchmark, and pinning CPU cores for memory access coherence
68-
69-
To improve memory access cohererence and release CPUs to other CPU only workloads like a vLLM serving with Llama3 8B,
70-
pin the CPU cores based on different CPU NUMA nodes by using an auto-generate docker-compose.override.yml file.
71-
Couple python libraries are needed for the python scripts, so install the required packages using following commnad.
72-
73-
```bash
74-
pip install -r vllm-fork/.cd/server/requirements_cpu_binding.txt
75-
```
76-
77-
Run below command to do CPU cores pinning via auto-generated docker-compose.override.yml file.
78-
79-
```bash
80-
cd vllm-fork/.cd/
81-
MODEL="Qwen/Qwen2.5-14B-Instruct" \
82-
HF_TOKEN="<your huggingface token>" \
83-
DOCKER_IMAGE="vault.habana.ai/gaudi-docker/1.22.0/ubuntu22.04/habanalabs/vllm-installer-2.7.1:latest" \
84-
python3 server/generate_cpu_binding_from_csv.py --settings server/cpu_binding.csv --output ./docker-compose.override.yml \
85-
docker compose --profile benchmark -f docker-compose.yml -f docker-compose.override.yml up
86-
```
87-
88-
To also pin idle CPUs to another service like vllm-cpu-service, please give the service name to update
89-
docker-compose.override.yml in order to bind another service to idle cpus.
90-
Here is an exmaple to bind idle cpu for vllm-cpu-service service while docker-compose.vllm-cpu-service.yml defines cpu service.
91-
92-
```bash
93-
cd vllm-fork/.cd/
94-
MODEL="Qwen/Qwen2.5-14B-Instruct" \
95-
HF_TOKEN="<your huggingface token>" \
96-
DOCKER_IMAGE="vault.habana.ai/gaudi-docker/1.22.0/ubuntu22.04/habanalabs/vllm-installer-2.7.1:latest" \
97-
python3 server/generate_cpu_binding_from_csv.py --settings server/cpu_binding.csv --output ./docker-compose.override.yml --cpuservice vllm-cpu-service \
98-
docker compose --profile benchmark -f docker-compose.yml -f docker-compose.vllm-cpu-service.yml -f docker-compose.override.yml up
99-
```
100-
10167
### 3. Run the server using Docker Compose with custom parameters
10268

10369
To override default settings, you can provide additional parameters when starting the server. This is a more advanced approach:
@@ -193,7 +159,41 @@ cd vllm-gaudi/.cd/
193159
> [!NOTE]
194160
> When using configuration files, you do not need to set the `MODEL` environment variable, as the model name is specified within the configuration file. However, you must still provide your `HF_TOKEN`.
195161
196-
### 7. Running the Server Directly with Docker
162+
### 7. Advance Options with pinning CPU cores for memory access coherence
163+
164+
To improve memory access cohererence and release CPUs to other CPU only workloads like a vLLM serving with Llama3 8B,
165+
pin the CPU cores based on different CPU NUMA nodes by using an auto-generate docker-compose.override.yml file.
166+
Validated Xeon Processors as for now: Intel Xeon 6960P, and Intel Xeon PLATINUM 8568Y+.
167+
168+
Couple python libraries are needed for the python scripts, so install the required packages using following commnad.
169+
170+
```bash
171+
pip install -r vllm-fork/.cd/server/cpu_binding/requirements_cpu_binding.txt
172+
```
173+
174+
Run below command to do CPU cores pinning via auto-generated docker-compose.override.yml file.
175+
176+
```bash
177+
export MODEL="Qwen/Qwen2.5-14B-Instruct"
178+
export HF_TOKEN="<your huggingface token>"
179+
export DOCKER_IMAGE="<docker image url>"
180+
python3 server/cpu_binding/generate_cpu_binding_from_csv.py --settings server/cpu_binding/cpu_binding_gnr.csv --output ./docker-compose.override.yml
181+
docker compose --profile benchmark up
182+
```
183+
184+
To also pin idle CPUs to another service like vllm-cpu-service, please give the service name to update
185+
docker-compose.override.yml in order to bind another service to idle cpus.
186+
Here is an exmaple to bind idle cpu for vllm-cpu-service service while docker-compose.vllm-cpu-service.yml defines cpu service.
187+
188+
```bash
189+
export MODEL="Qwen/Qwen2.5-14B-Instruct"
190+
export HF_TOKEN="<your huggingface token>"
191+
export DOCKER_IMAGE="<docker image url>"
192+
python3 server/cpu_binding/generate_cpu_binding_from_csv.py --settings server/cpu_binding/cpu_binding_gnr.csv --output ./docker-compose.override.yml --cpuservice vllm-cpu-service
193+
docker compose --profile benchmark -f docker-compose.yml -f docker-compose.vllm-cpu-service.yml -f docker-compose.override.yml up
194+
```
195+
196+
### 8. Running the Server Directly with Docker
197197

198198
For full control, you can run the server using the `docker run` command. This approach allows you to specify any native Docker parameters as needed.
199199

.cd/server/cpu_binding.py renamed to .cd/server/cpu_binding/cpu_binding.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@ def pick_row_by_parameters(self, rows: list[dict], model: str, input_tok: str, o
108108
raise ValueError(f"MODEL '{model}', input_length '{input_tok}', output_length '{output_tok}' "
109109
f"not found in CSV. Available: {available}")
110110
return matches[0]
111+
111112
def filter_one_cpu_per_core(self, cpus):
112113
"""
113114
Given a list of CPU IDs (possibly with HT pairs),
File renamed without changes.

.cd/server/cpu_binding_gnr.csv renamed to .cd/server/cpu_binding/cpu_binding_gnr.csv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,4 @@ meta-llama/Llama-3.1-70B-Instruct,4096,128,4,bf16,12,0
88
meta-llama/Llama-3.1-8B-Instruct,128,4096,1,bf16,6,0
99
meta-llama/Llama-3.1-8B-Instruct,2048,2048,1,bf16,6,0
1010
meta-llama/Llama-3.1-8B-Instruct,4096,128,1,bf16,6,0
11+
Qwen/Qwen2.5-14B-Instruct,2048,2048,1,bf16,6,0
File renamed without changes.

.cd/server/generate_cpu_binding_from_csv.py renamed to .cd/server/cpu_binding/generate_cpu_binding_from_csv.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ def generate_yaml_file(cpuset_csv, num_alloc, idle_cpuset_csv, num_idle_cpus, ar
6464
def main():
6565
ap = argparse.ArgumentParser(description="Generate override docker-compose YAML (x-sets) for single 'vllm-server'.")
6666
ap.add_argument("--settings",
67-
default="server/cpu_binding_gnr.csv",
67+
default="server/cpu_binding/cpu_binding_gnr.csv",
6868
help="CSV with columns: model_id,input length,output length,world_size,num_allocated_cpu")
6969
ap.add_argument("--output", default="docker-compose.override.yml", help="Output compose YAML path")
7070
ap.add_argument("--cpuservice", help="name of the docker service binding on idle CPUs")
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
ruamel.yaml
22
py-libnuma
3-
psutils
3+
psutil

0 commit comments

Comments
 (0)