Bugfix: Cutlass FP8 FusedMoE #27255

amirkl94 · 2025-10-21T08:26:44Z

Purpose

When running cutlass FusedMoE FP8 the scaling factors that are passed are None. This PR passes the correct scaling factors and enables the relevant test.

Test Plan

Enabled previously disabled test_flashinfer_cutlass_moe_fp8_no_graph .

When running cutlass FusedMoE FP8 the scaling factors that are passed are None. This PR passes the correct scaling factors and enables the relevant test. Signed-off-by: Amir Klein <[email protected]>

gemini-code-assist

Code Review

This pull request correctly addresses a bug in the Cutlass FP8 FusedMoE implementation by passing the necessary scaling factors. The changes are logical and enabling the previously skipped test test_flashinfer_cutlass_moe_fp8_no_graph validates the fix. However, I've identified a critical risk of a division-by-zero error in the calculation of the a2_gscale factor, which should be addressed to ensure numerical stability.

gemini-code-assist · 2025-10-21T08:28:12Z

vllm/model_executor/layers/quantization/modelopt.py

            a1_scale=layer.w13_input_scale,
+            a1_gscale=layer.w13_input_scale,
            a2_scale=layer.w2_input_scale,
+            a2_gscale=1.0 / layer.w2_input_scale,


The calculation 1.0 / layer.w2_input_scale introduces a risk of a division-by-zero error if layer.w2_input_scale is zero. Although scales are typically positive, adding a small epsilon to the denominator is a crucial safeguard for numerical stability.

Suggested change

a2_gscale=1.0 / layer.w2_input_scale,

a2_gscale=1.0 / (layer.w2_input_scale + 1e-6),

gemini-code-assist · 2025-10-21T08:28:12Z

tests/kernels/moe/test_flashinfer.py

            a1_scale=td.a1_scale,
+            a1_gscale=td.a1_scale,
            a2_scale=td.a2_scale,
+            a2_gscale=1.0 / td.a2_scale,


To prevent potential division-by-zero errors and for consistency with the recommended fix in the main logic, it's safer to add a small epsilon to the denominator here. While td.a2_scale is currently 1.0 in this test, this change improves the robustness of the test suite against future modifications.

Suggested change

a2_gscale=1.0 / td.a2_scale,

a2_gscale=1.0 / (td.a2_scale + 1e-6),

Bugfix: Cutlass FP8 FusedMoE

5bcd35d

When running cutlass FusedMoE FP8 the scaling factors that are passed are None. This PR passes the correct scaling factors and enables the relevant test. Signed-off-by: Amir Klein <[email protected]>

amirkl94 requested review from WoosukKwon, mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners October 21, 2025 08:26

gemini-code-assist bot reviewed Oct 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Bugfix: Cutlass FP8 FusedMoE #27255

Bugfix: Cutlass FP8 FusedMoE #27255

amirkl94 commented Oct 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 21, 2025

Uh oh!

gemini-code-assist bot Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	a2_gscale=1.0 / layer.w2_input_scale,
	a2_gscale=1.0 / (layer.w2_input_scale + 1e-6),

	a2_gscale=1.0 / td.a2_scale,
	a2_gscale=1.0 / (td.a2_scale + 1e-6),

Uh oh!

Bugfix: Cutlass FP8 FusedMoE #27255

Are you sure you want to change the base?

Bugfix: Cutlass FP8 FusedMoE #27255

Conversation

amirkl94 commented Oct 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

amirkl94 commented Oct 21, 2025 •

edited by github-actions bot

Loading