[WIP] Tmp fix for IMA with MTP = 2 and full-cg #28315

LucasWilkinson · 2025-11-07T18:35:37Z

Temp fix for #28207

Temp fix for vllm-project#28207 Signed-off-by: Lucas Wilkinson <[email protected]>

gemini-code-assist

Code Review

This pull request introduces a temporary fix for an issue with multi-token prediction and full CUDA graphs by adjusting CUDA graph capture sizes. The core logic change is in a new method, adjust_cudagraph_sizes_to_be_multipe_of, which unfortunately contains a critical bug that can lead to runtime errors and incorrect behavior. I've provided a detailed review comment with a suggested fix for this issue.

gemini-code-assist · 2025-11-07T18:37:44Z

vllm/config/vllm.py

+    def adjust_cudagraph_sizes_to_be_multipe_of(self, multiple_of: int):
+        new_sizes = sorted(
+            [
+                round_up(size, multiple_of)
+                for size in self.compilation_config.cudagraph_capture_sizes
+            ]
+        )
+        if new_sizes[-1] > self.compilation_config.max_cudagraph_capture_size:
+            new_sizes = new_sizes[:-1]
+        self.compilation_config.max_cudagraph_capture_size = new_sizes[-1]
+        self.compilation_config.cudagraph_capture_sizes = new_sizes


The current implementation of adjust_cudagraph_sizes_to_be_multipe_of has several critical issues that can lead to incorrect behavior or runtime errors:

Potential IndexError: If all cudagraph_capture_sizes, when rounded up, exceed max_cudagraph_capture_size, the new_sizes list can become empty after the if condition, leading to an IndexError on new_sizes[-1]. For example, if cudagraph_capture_sizes is [16], max_cudagraph_capture_size is 16, and multiple_of is 20, new_sizes becomes [20]. The if condition is met, and new_sizes is modified to [], causing a crash on the next line.

Incorrect Filtering: The logic if new_sizes[-1] > ...: new_sizes = new_sizes[:-1] only checks and removes the largest element. If multiple rounded-up sizes exceed max_cudagraph_capture_size, the smaller ones will incorrectly remain in the list.

Incorrect max_cudagraph_capture_size update: The max_cudagraph_capture_size can be updated to a value larger than its original value, which seems to contradict its purpose as a hard limit derived from scheduler and token configurations.

I suggest a more robust implementation that correctly filters the sizes and handles edge cases gracefully.

Additionally, there is a typo in the method name (multipe_of should be multiple_of). I've kept it in the suggestion to match the current code, but it should be corrected here and at the call site.

def adjust_cudagraph_sizes_to_be_multipe_of(self, multiple_of: int): max_size = self.compilation_config.max_cudagraph_capture_size # Use a set to handle duplicates from rounding up rounded_sizes = { round_up(size, multiple_of) for size in self.compilation_config.cudagraph_capture_sizes } new_sizes = sorted([s for s in rounded_sizes if s <= max_size]) if not new_sizes: # All rounded-up sizes exceeded the max size. # Disable cudagraphs by setting sizes to empty. self.compilation_config.max_cudagraph_capture_size = 0 self.compilation_config.cudagraph_capture_sizes = [] return self.compilation_config.max_cudagraph_capture_size = new_sizes[-1] self.compilation_config.cudagraph_capture_sizes = new_sizes

Signed-off-by: Lucas Wilkinson <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-11-07T18:39:21Z

vllm/config/compilation.py

                    enable_str,
                    op,
                )
+
+    def compute_bs_to_padded_graph_size(self):
+        # pre-compute the mapping from batch size to padded graph size
+        self.bs_to_padded_graph_size = [
+            0 for i in range(self.max_cudagraph_capture_size + 1)
+        ]
+        for end, start in zip(
+            self.cudagraph_capture_sizes + [self.max_cudagraph_capture_size + 1],
+            [0] + self.cudagraph_capture_sizes,
+        ):
+            for bs in range(start, end):
+                if bs == start:
+                    self.bs_to_padded_graph_size[bs] = start
+                else:
+                    self.bs_to_padded_graph_size[bs] = end


Initialize padding map during config construction

The mapping from batch size to padded cudagraph size is now built only via the new compute_bs_to_padded_graph_size() helper, but post_init_cudagraph_sizes() no longer invokes it. VllmConfig.pad_for_cudagraph() still accesses compilation_config.bs_to_padded_graph_size and can be called right after EngineArgs.create_engine_config() (e.g., test_mamba_cache_cg_padding) before any GPUModelRunner triggers the new computation, resulting in TypeError: 'NoneType' object is not subscriptable. The mapping should still be populated during configuration initialization or lazily on first use so existing callers do not crash when they query padding before the model runner is constructed.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request refactors the computation of bs_to_padded_graph_size and introduces logic to adjust CUDA graph capture sizes. While the intent is to fix an issue with speculative decoding, the changes introduce two critical bugs. First, the refactoring of bs_to_padded_graph_size computation breaks the model initialization order, as it's now computed after profile_run which depends on it. Second, the new method to adjust capture sizes contains a typo and is vulnerable to an IndexError if it results in an empty list of sizes. I have provided detailed comments and suggestions to fix these critical issues.

gemini-code-assist · 2025-11-07T18:40:09Z

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

[WIP] Tmp fix for IMA with MTP = 2 and full-cg

6b4ee1b

Temp fix for vllm-project#28207 Signed-off-by: Lucas Wilkinson <[email protected]>

LucasWilkinson requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and youkaichao as code owners November 7, 2025 18:35

mergify bot added the v1 label Nov 7, 2025

LucasWilkinson marked this pull request as draft November 7, 2025 18:36

gemini-code-assist bot reviewed Nov 7, 2025

View reviewed changes

cleanup

199ad96

Signed-off-by: Lucas Wilkinson <[email protected]>

chatgpt-codex-connector bot reviewed Nov 7, 2025

View reviewed changes

gemini-code-assist bot reviewed Nov 7, 2025

View reviewed changes

This was referenced Nov 8, 2025

[Bug]: CUDA Graph Capture Issue: Unexpected Prefill Branches in Uniform Decode Graphs when MTP=2 #28207

Open

[Spec-decode] Refoctor cudagraphs for spec-decode;support uniform_alignment of cudagraph sizes. #23679

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP] Tmp fix for IMA with MTP = 2 and full-cg #28315

[WIP] Tmp fix for IMA with MTP = 2 and full-cg #28315

LucasWilkinson commented Nov 7, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 7, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Nov 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

[WIP] Tmp fix for IMA with MTP = 2 and full-cg #28315

Are you sure you want to change the base?

[WIP] Tmp fix for IMA with MTP = 2 and full-cg #28315

Conversation

LucasWilkinson commented Nov 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LucasWilkinson commented Nov 7, 2025 •

edited by github-actions bot

Loading