-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
[Core] Send kv events from worker side to scheduler side #28309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Core] Send kv events from worker side to scheduler side #28309
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a mechanism to send KV cache events from the worker side to the scheduler side, which is a crucial feature for connectors that generate these events. The changes are logical, but I've identified a critical bug in the lmcache_connector.py that could lead to event loss, and an incorrect type hint in kv_connector_model_runner_mixin.py. Addressing these issues will improve the correctness and maintainability of the new functionality.
vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py
Outdated
Show resolved
Hide resolved
This is required for when worker side operations like CPU offloading generate KV cache events. This commit enables theses events to be passed to the scheduler side so that they can be published by the engine. Signed-off-by: Martin Hickey <[email protected]>
fc375f8 to
499a425
Compare
Update comments: - vllm-project#28309 (review) Signed-off-by: Martin Hickey <[email protected]>
The changes to the connector is for a separate PR and this PR is independent of it for now. Signed-off-by: Martin Hickey <[email protected]>
This commit follows recommendation from @markmc to use a specific event property instead of piggybacking on stats. PR vllm-project#28309 adds the events property to KVConnectorOutput and this commit picks up the new property and uses it to pass the events from worker side to scheduler side. Signed-off-by: Martin Hickey <[email protected]>
|
@hickeyma I think you're missing aggregation code in @njhill Let's think how this change fits to the longer-term future.
At the time when I introduced worker->scheduler output aggregation (#19555), my initial suggestion was to allow abstract metadata to flow from worker connectors to the scheduler connector.
My view is that we should rename cc @sdavidbd |
I agree with this in general - e.g. in an earlier iteration of #26172 I found myself adding NIXL-specific semantics to the contents of
I'm not sure I agree with this, though:
But I do think it could be positive to add |
|
Thanks for the great breakdown @orozery.
I understand your point here, but I believe that adding more fields to |
|
I agree with the usefulness of My main point is actually on the other hand, for fields that are useful only for a single connector (and I think this may be the case here?), we should have an abstract field that each connector can use freely (similar to |
Agree 👍
Honestly, I don't yet fully understand the motivation for the lmcache connector to emit its own events (as per #28252 and LMCache/LMCache#1846) in addition to the KV events already emitted by vLLM ... so I don't have a strong sense either way whether this need is highly-specific to lmcache |
Purpose
This is required for when worker side operations from a connector generates KV cache events. This commit enables theses events to be passed to the scheduler side so that they can be published by the engine.
Related to #28252 and from feedback by @markmc in #28252 (comment)
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.