Skip to content

Conversation

@usberkeley
Copy link
Contributor

@usberkeley usberkeley commented Oct 21, 2025

Purpose

Fixed a logic flaw in the EventPublisherFactory.create() method to ensure it properly returns NullEventPublisher when enable_kv_cache_events=False or publisher="null".

Changes

  • Fixed conditional logic in EventPublisherFactory.create() method
  • When config.publisher == "null", directly return an instance of NullEventPublisher() to avoid calling the constructor.
    Otherwise, the code will attempt to retrieve the constructor of NullEventPublisher from the registry and call it with parameters. However, NullEventPublisher does not accept these parameters, resulting in a TypeError.

Test Plan

  • New test cases verify correct behavior across various configuration combinations
  • Ensures backward compatibility
  • All existing tests continue to pass

Test Result

Pass


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a logic flaw in the EventPublisherFactory.create method. The changes correctly ensure that a NullEventPublisher is returned when KV cache events are disabled (enable_kv_cache_events=False) or when the publisher is explicitly set to "null". This prevents a TypeError that would occur when trying to instantiate NullEventPublisher with unexpected arguments. The fix is implemented cleanly by adjusting the initial conditional check. Furthermore, a new test suite has been added, which thoroughly validates the factory's behavior across various configurations, including disabled events, different publisher types, and default settings. The changes are correct, well-tested, and improve the robustness of the event publisher creation logic.

@usberkeley usberkeley force-pushed the fix_kv_event_publisher_factory branch from caef5e7 to 1264fa6 Compare October 21, 2025 09:16
@usberkeley usberkeley force-pushed the fix_kv_event_publisher_factory branch from 1264fa6 to 2cb60cd Compare October 21, 2025 14:39
Copy link
Member

@markmc markmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

A suggestion for better (IMO) improvement inline

config_dict = asdict(config)

kind = config_dict.pop("publisher", "null")
kind = config_dict.pop("publisher")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an observation, not a problem with this PR - if we ever add another publisher other than ZmqEventPublisher it will be required to accept all the members of KEventsConfig as constructor parameters, which doesn't make sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, new EventPublisher may not need to implement all parameters of KEventsConfig

not config
or not config.enable_kv_cache_events
or config.publisher == "null"
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this whole NullEventPublisher thing just isn't very useful ... we could just return None and make scheduler like this:

        self.kv_event_publisher = EventPublisherFactory.create(
            self.kv_events_config,
            self.parallel_config.data_parallel_rank,
        )

        self.kv_cache_manager = KVCacheManager(
            ...
            enable_kv_cache_events=self.kv_event_publisher is not None,
            ...
        )

       # publish collected KV cache events                                                                                                                                    
        if events:
            assert self.kv_event_publisher is not None

wdyt?

Copy link
Contributor Author

@usberkeley usberkeley Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markmc Wow, perfect, I completely agree with this revision direction. It has the following benefits:

  1. Scheduler: Avoids redundant checks for whether KV Cache Event is enabled.
self.enable_kv_cache_events = (
    self.kv_events_config is not None
    and self.kv_events_config.enable_kv_cache_events
)
  1. KVCacheManager: The parameter enable_kv_cache_events only stores events in the event queue when an EventPublisher exists, so assigning this parameter based on self.kv_event_publisher is not None is reasonable.

I understand that the original author intended to use the Null Object Design Pattern, returning a NullEventPublisher to avoid redundant null-check logic. However, using None directly seems simpler.

@mergify
Copy link

mergify bot commented Oct 23, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @usberkeley.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 23, 2025
dongbo910220 and others added 23 commits October 23, 2025 21:17
…roject#27253)

Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>
Signed-off-by: ExtReMLapin <[email protected]>
Co-authored-by: CNE Pierre FICHEPOIL <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Siyuan Fu <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Lain <[email protected]>
Co-authored-by: Daniel Campora <[email protected]>
…4108)

Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>
Signed-off-by: ExtReMLapin <[email protected]>
Co-authored-by: CNE Pierre FICHEPOIL <[email protected]>
Co-authored-by: Chauncey <[email protected]>
@mergify
Copy link

mergify bot commented Oct 23, 2025

Documentation preview: https://vllm--27257.org.readthedocs.build/en/27257/

@mergify mergify bot added documentation Improvements or additions to documentation ci/build deepseek Related to DeepSeek models frontend llama Related to Llama models multi-modality Related to multi-modality (#4194) new-model Requests to new models performance Performance-related issues qwen Related to Qwen models gpt-oss Related to GPT-OSS models rocm Related to AMD ROCm labels Oct 23, 2025
@mergify mergify bot added v1 tpu Related to Google TPUs labels Oct 23, 2025
@mergify mergify bot added the kv-connector label Oct 23, 2025
@hmellor
Copy link
Member

hmellor commented Oct 23, 2025

Looks like a bad merge has occurred and everyone has been notified.

I think this issue should be already fixed by #26915.

Please make a PR if I am wrong.

@usberkeley
Copy link
Contributor Author

usberkeley commented Oct 23, 2025

Looks like a bad merge has occurred and everyone has been notified.

I think this issue should be already fixed by #26915.

Please make a PR if I am wrong.

hi @hmellor
I'm very sorry for the mistake made during the code rebase, which caused everyone to be notified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend gpt-oss Related to GPT-OSS models kv-connector llama Related to Llama models multi-modality Related to multi-modality (#4194) needs-rebase new-model Requests to new models performance Performance-related issues qwen Related to Qwen models rocm Related to AMD ROCm speculative-decoding structured-output tpu Related to Google TPUs v1

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.