Skip to content

Conversation

@hickeyma
Copy link
Contributor

@hickeyma hickeyma commented Nov 7, 2025

Purpose

This is required for when worker side operations from a connector generates KV cache events. This commit enables theses events to be passed to the scheduler side so that they can be published by the engine.

Related to #28252 and from feedback by @markmc in #28252 (comment)

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to send KV cache events from the worker side to the scheduler side, which is a crucial feature for connectors that generate these events. The changes are logical, but I've identified a critical bug in the lmcache_connector.py that could lead to event loss, and an incorrect type hint in kv_connector_model_runner_mixin.py. Addressing these issues will improve the correctness and maintainability of the new functionality.

This is required for when worker side operations like CPU offloading
generate KV cache events. This commit enables theses events to be passed
to the scheduler side so that they can be published by the engine.

Signed-off-by: Martin Hickey <[email protected]>
@hickeyma hickeyma force-pushed the send-events-worker-to-scheduler branch from fc375f8 to 499a425 Compare November 7, 2025 17:27
The changes to the connector is for a separate PR and this
PR is independent of it for now.

Signed-off-by: Martin Hickey <[email protected]>
hickeyma added a commit to hickeyma/vllm that referenced this pull request Nov 7, 2025
This commit follows recommendation from @markmc to use
a specific event property instead of piggybacking on
stats. PR vllm-project#28309 adds the events property to KVConnectorOutput and
this commit picks up the new property and uses it to pass the events
from worker side to scheduler side.

Signed-off-by: Martin Hickey <[email protected]>
@njhill
Copy link
Member

njhill commented Nov 7, 2025

@orozery

@orozery
Copy link
Contributor

orozery commented Nov 9, 2025

@hickeyma I think you're missing aggregation code in KVOutputAggregator.
This means adding more state to KVOutputAggregator to aggregate per-block KVEvents.

@njhill Let's think how this change fits to the longer-term future.
Right now, scheduler can send arbitrary metadata to the workers (KVConnectorMetadata).
On the other hand, workers are limited to an explicit KVConnectorOutput, which includes:

  • request-level completions (finished_sending, finished_recving)
  • abstract stats
  • block-level errors
  • (just noticed this) expected_finished_count - looks like a niche bootstrap configuration param

At the time when I introduced worker->scheduler output aggregation (#19555), my initial suggestion was to allow abstract metadata to flow from worker connectors to the scheduler connector.
I think that the current approach where we limit the worker output has 2 drawbacks:

  1. It makes it harder to implement connector logic (e.g. the current PR is needed).
  2. For those connectors who does succeed in pushing their logic to KVConnectorOutput, the result is a growing-over-time KVConnectorOutput which contains a scattered list of fields which are practically connector-specific.

My view is that we should rename KVConnectorStats to KVConnectorWorkerMeta (which could support get_stats() -> KVConnectorStats).
This will allow free us from maintaining the list of connector-specific worker output fields, and delegate it entirely to the connector implementations themselves.

cc @sdavidbd

@markmc
Copy link
Member

markmc commented Nov 10, 2025

@njhill Let's think how this change fits to the longer-term future. Right now, scheduler can send arbitrary metadata to the workers (KVConnectorMetadata). On the other hand, workers are limited to an explicit KVConnectorOutput,

I agree with this in general - e.g. in an earlier iteration of #26172 I found myself adding NIXL-specific semantics to the contents of finished_recving, which was only possible because I happened to be able to use the update_connector_output() to extract the NIXL-specific data and translate the contents to what the scheduler expected. An obscure example, but I did think it was a weird contrast with the complete flexibility of connector-specific KVConnectorMetadata subclasses

My view is that we should rename KVConnectorStats to KVConnectorWorkerMeta (which could support get_stats() -> KVConnectorStats).

I'm not sure I agree with this, though:

  1. Although KVConnectorStats is a recent addition, we should by default assume that we need to maintain compatibility and break any potential external connectors that might have already adopted
  2. KVConnectorStats seems like a useful abstraction that should probably be adopted by all connectors over time
  3. KVConnectorStats is very specifically about metrics and integration with Prometheus/console stat logging - for example the aggregate() and reduce() methods - so not easily repurposed as a generic worker->scheduler channel
  4. KVEventsBatch also seems like a useful, connector-independent abstraction, that ties in nicely with the existing public KV events publishing API, so if connectors have to use a connector-specific metadata channel for them, we'd probably see a lot of duplication

But I do think it could be positive to add KVConnectorWorkerMetdata anyway ...

@NickLucche
Copy link
Collaborator

Thanks for the great breakdown @orozery.
Still, I agree with @markmc as in I think the KVConnectorStats is a connector-agnostic abstraction with a specific purpose.
I also like that the design direction that was followed for worker->scheduler comm was to have a clearer interface instead of a loose container.

the result is a growing-over-time KVConnectorOutput

I understand your point here, but I believe that adding more fields to KVConnectorOutput was kinda part of the original design where it was expected that it would naturally grow to contain more stuff (I think this PR is a good example of that).
Naturally nobody would want the struct to grow indefinitely if we sense that this is the direction in which we're heading.

@orozery
Copy link
Contributor

orozery commented Nov 10, 2025

I agree with the usefulness of KVConnectorStats for multiple connectors.
And I think that any field which may be useful for multiple connectors has a good reason to stay explicit (including KVConnectorStats).

My main point is actually on the other hand, for fields that are useful only for a single connector (and I think this may be the case here?), we should have an abstract field that each connector can use freely (similar to KVConnectorMetadata).

@markmc
Copy link
Member

markmc commented Nov 10, 2025

for fields that are useful only for a single connector, we should have an abstract field that each connector can use freely (similar to KVConnectorMetadata).

Agree 👍

(and I think this may be the case here?),

Honestly, I don't yet fully understand the motivation for the lmcache connector to emit its own events (as per #28252 and LMCache/LMCache#1846) in addition to the KV events already emitted by vLLM ... so I don't have a strong sense either way whether this need is highly-specific to lmcache

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants