-
-
Notifications
You must be signed in to change notification settings - Fork 9k
[bugfix] Fix auto thread-binding when world_size > 1 in CPU backend and refactor code #21032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
ec15300
cce5bb3
282f257
be2e802
204c868
0974938
937ed03
1f28a1e
11ee441
2b09c31
40b847d
8bbc757
dc9b41a
abbde7a
cc53ce8
ce00bc8
f580234
43db035
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -44,7 +44,7 @@ | |||||||||
VLLM_PP_LAYER_PARTITION: Optional[str] = None | ||||||||||
VLLM_CPU_KVCACHE_SPACE: int = 0 | ||||||||||
VLLM_CPU_OMP_THREADS_BIND: str = "" | ||||||||||
VLLM_CPU_NUM_OF_RESERVED_CPU: int = 0 | ||||||||||
VLLM_CPU_NUM_OF_RESERVED_CPU: Optional[int] = None | ||||||||||
VLLM_CPU_MOE_PREPACK: bool = True | ||||||||||
VLLM_CPU_SGL_KERNEL: bool = False | ||||||||||
VLLM_XLA_CACHE_PATH: str = os.path.join(VLLM_CACHE_ROOT, "xla_cache") | ||||||||||
|
@@ -441,7 +441,8 @@ def get_vllm_port() -> Optional[int]: | |||||||||
# (CPU backend only) CPU cores not used by OMP threads . | ||||||||||
# Those CPU cores will not be used by OMP threads of a rank. | ||||||||||
"VLLM_CPU_NUM_OF_RESERVED_CPU": | ||||||||||
lambda: int(os.getenv("VLLM_CPU_NUM_OF_RESERVED_CPU", "0")), | ||||||||||
lambda: int(os.getenv("VLLM_CPU_NUM_OF_RESERVED_CPU", "0")) | ||||||||||
if "VLLM_CPU_NUM_OF_RESERVED_CPU" in os.environ else None, | ||||||||||
Comment on lines
+444
to
+445
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current implementation for parsing
Suggested change
|
||||||||||
|
||||||||||
# (CPU backend only) whether to use prepack for MoE layer. This will be | ||||||||||
# passed to ipex.llm.modules.GatedMLPMOE. On unsupported CPUs, you might | ||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we remove VLLM_CPU_NUM_RESERVED_CPU since we still have it as an optional var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Revert it.
I want to set this value in the worker based on some rules and don't expose it to users. However
CPUWorker
doesn't have enough usage context, users should set it manually in some cases.