[Bugfix] Fix dp_chunking enablement logic in FusedMoE layer #27220

alexm-redhat · 2025-10-20T19:54:27Z

The "use_dp_chunking" in FusedMoE was always false due to control flow mistake => this PR fixes the logic to be enabled for DP runs as intended.

gemini-code-assist

Code Review

This pull request correctly fixes a bug where use_dp_chunking was always evaluating to false. The fix involves moving the initialization logic for DP chunking buffers from the __init__ method to a lazy initialization method, ensure_dp_chunking_init, which is called during the forward pass. This ensures that all necessary configurations are set before the use_dp_chunking property is evaluated. The overall approach is sound. I've pointed out a performance issue in the new initialization method, which can be made idempotent to avoid unnecessary tensor re-allocations on every forward pass.

gemini-code-assist · 2025-10-20T19:55:42Z

vllm/model_executor/layers/fused_moe/layer.py

+        if not self.use_dp_chunking:
+            return


This initialization method is called on every forward pass. To avoid re-allocating tensors unnecessarily, which can impact performance, it's best to make this method idempotent. You can add a check to see if self.batched_hidden_states has already been initialized.

Suggested change

if not self.use_dp_chunking:

return

if not self.use_dp_chunking or self.batched_hidden_states is not None:

return

I think this comment should be applied.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-10-20T19:58:10Z

vllm/model_executor/layers/fused_moe/layer.py

+        self.batched_hidden_states = torch.zeros(
+            states_shape, dtype=moe.in_dtype, device=torch.cuda.current_device()
+        )
+
+        self.batched_router_logits = torch.zeros(


Avoid re-allocating DP chunk buffers on every forward

The new ensure_dp_chunking_init always recreates self.batched_hidden_states and self.batched_router_logits with fresh torch.zeros tensors whenever it is called, without first checking whether the buffers already exist. Because this helper is now invoked on every forward (and again inside the chunked path), DP runs will allocate and zero large staging tensors twice per step instead of reusing them as the constructor previously did, adding significant GPU allocation/zeroing overhead and memory churn. Consider only creating these tensors when they are None and reusing them thereafter.

Useful? React with 👍 / 👎.

alexm-redhat · 2025-10-20T20:08:56Z

@bnellnm could you take a quick look

bnellnm · 2025-10-20T20:40:40Z

Another option is to just change how use_dp_chunking is defined so it doesn't depend on moe_quant_config, e.g.

    @property
    def use_dp_chunking(self) -> bool:
        # Route to the chunked forward path using the FlashInfer Cutlass kernel                                                                                                                          
        # only when data parallelism (DP) is enabled.                                                                                                                                                    
        return (
            self.moe_parallel_config.use_pplx_kernels
            or self.moe_parallel_config.use_deepep_ll_kernels
            or self.moe_parallel_config.use_deepep_hybrid_kernels
            or (self.dp_size > 1 and self.moe.use_flashinfer_cutlass_kernels)
        )

Or maybe even better would be to move the buffer initialization as the PR does but change use_dp_chunking to check the actual format, e.g.

    @property
    def use_dp_chunking(self) -> bool:
        return (
            self.quant_method.fused_experts is not None
            and self.quant_method.fused_experts.prepare_finalize.activation_format == FusedMoEActivationFormat.BatchedExperts
        )

cc @varun-sundar-rabindranath

varun-sundar-rabindranath · 2025-10-20T23:12:02Z

vllm/model_executor/layers/fused_moe/layer.py


-        self.ensure_moe_quant_config()
+        self.ensure_moe_quant_config_init()
+        self.ensure_dp_chunking_init()


I think we need not have self.ensure_dp_chunking_init() here as we are doing it anyways in forward_impl_chunked where it is used ?

actually it is opposite, the forward_impl_chunked is duplicated, so I can remove it there, since forward_impl is the only one that calls forward_impl_chunked.

varun-sundar-rabindranath · 2025-10-20T23:12:49Z

maybe even better would be to move the buffer initialization as the PR does but change use_dp_chunking to check the actual format, e.g.

    @property
    def use_dp_chunking(self) -> bool:
        return (
            self.quant_method.fused_experts is not None
            and self.quant_method.fused_experts.prepare_finalize.activation_format == FusedMoEActivationFormat.BatchedExperts
        )

I agree with @bnellnm 's suggestion ^ .

alexm-redhat · 2025-10-22T15:35:43Z

about use_dp_chunking() I would prefer to leave the function as is since else the checks are very different from what it checks now. It seems to me not a big deal that it depends on moe_quant_config, since anyway we want to inspect the params of the quant config to limit the behaviors (if necessary)

alexm-redhat · 2025-10-22T15:35:58Z

@bnellnm @mgoin ready for re-review

Signed-off-by: Alexander Matveev <[email protected]>

…ject#27220) Signed-off-by: Alexander Matveev <[email protected]> Signed-off-by: Alberto Perdomo <[email protected]>

…ject#27220) Signed-off-by: Alexander Matveev <[email protected]>

…ject#27220) Signed-off-by: Alexander Matveev <[email protected]> Signed-off-by: 0xrushi <[email protected]>

alexm-redhat requested a review from mgoin as a code owner October 20, 2025 19:54

gemini-code-assist bot reviewed Oct 20, 2025

View reviewed changes

alexm-redhat requested a review from LucasWilkinson October 20, 2025 19:57

chatgpt-codex-connector bot reviewed Oct 20, 2025

View reviewed changes

alexm-redhat self-assigned this Oct 20, 2025

varun-sundar-rabindranath reviewed Oct 20, 2025

View reviewed changes

alexm-redhat force-pushed the dp_fix branch from 8506055 to 331ada9 Compare October 22, 2025 15:29

alexm-redhat requested a review from pavanimajety as a code owner October 22, 2025 15:29

[Bugfix] Fix dp_chunking enablement logic in FusedMoE layer

ad481cb

Signed-off-by: Alexander Matveev <[email protected]>

alexm-redhat force-pushed the dp_fix branch from 331ada9 to ad481cb Compare October 22, 2025 15:37

bnellnm approved these changes Oct 22, 2025

View reviewed changes

mgoin added bug Something isn't working moe ready ONLY add when PR is ready to merge/full CI is needed labels Oct 22, 2025

mgoin approved these changes Oct 22, 2025

View reviewed changes

DarkLight1337 merged commit 9ef3d5b into main Oct 23, 2025
55 checks passed

DarkLight1337 deleted the dp_fix branch October 23, 2025 16:03

Kay-Tian mentioned this pull request Oct 24, 2025

vLLM PR #27220 变更核心文件提醒 Kay-Tian/vllm#36

Closed

kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025

[Bugfix] Fix dp_chunking enablement logic in FusedMoE layer (vllm-pro…

bdd7091

…ject#27220) Signed-off-by: Alexander Matveev <[email protected]>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Bugfix] Fix dp_chunking enablement logic in FusedMoE layer (vllm-pro…

7785fd2

…ject#27220) Signed-off-by: Alexander Matveev <[email protected]> Signed-off-by: 0xrushi <[email protected]>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Bugfix] Fix dp_chunking enablement logic in FusedMoE layer (vllm-pro…

95e2989

…ject#27220) Signed-off-by: Alexander Matveev <[email protected]> Signed-off-by: 0xrushi <[email protected]>

Uh oh!

Uh oh!

[Bugfix] Fix dp_chunking enablement logic in FusedMoE layer #27220

[Bugfix] Fix dp_chunking enablement logic in FusedMoE layer #27220

Uh oh!

Conversation

alexm-redhat commented Oct 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

bnellnm Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

varun-sundar-rabindranath Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

alexm-redhat Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

alexm-redhat commented Oct 20, 2025

Uh oh!

bnellnm commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

varun-sundar-rabindranath Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexm-redhat Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

varun-sundar-rabindranath commented Oct 20, 2025

Uh oh!

alexm-redhat commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexm-redhat commented Oct 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

bnellnm commented Oct 20, 2025 •

edited

Loading

varun-sundar-rabindranath Oct 20, 2025 •

edited

Loading

alexm-redhat commented Oct 22, 2025 •

edited

Loading