Skip to content

[Metal] Reject tensor-scale nvfp4 in qqmm#3551

Open
Brooooooklyn wants to merge 1 commit into
ml-explore:mainfrom
mlx-node:fix/metal-qqmm-global-scale-guard
Open

[Metal] Reject tensor-scale nvfp4 in qqmm#3551
Brooooooklyn wants to merge 1 commit into
ml-explore:mainfrom
mlx-node:fix/metal-qqmm-global-scale-guard

Conversation

@Brooooooklyn
Copy link
Copy Markdown
Contributor

@Brooooooklyn Brooooooklyn commented May 14, 2026

Summary

QQMatmul::eval_gpu on Metal silently dropped global_scale_x / global_scale_w in its gemv special case (pre-quantized w, x.shape(-2) == 1), producing numerically incorrect results when tensor-scale nvfp4 weights were in use. The general case already throws [QQMatmul] NYI for the general case, and quantize() / dequantize() already reject the same combination at the op level (mlx/ops.cpp:4940-4945, 5205-5210). qqmm() was missed when tensor-scale nvfp4 landed in #3022.

This change mirrors the existing guards: qqmm() now throws std::invalid_argument (Python ValueError) at graph-construction time when the stream is a Metal GPU and either global_scale_x or global_scale_w is set, rather than letting the request reach eval_gpu where it is silently mis-computed. CUDA is unaffected.

Per-group nvfp4 (no global_scale) on Metal continues to work — that path is exercised by the existing test_qqmv and is unchanged.

Fixes #3550.

Test plan

  • Added test_qqmm_metal_global_scale_rejected in python/tests/test_quantized.py, asserts ValueError when mx.qqmm is called on Metal with both global scales set. Verified the test fails on main (silently runs to completion) and passes with this fix.
  • Full python/tests/test_quantized.py (29 tests) still passes locally on Apple Silicon, including test_qqmv which exercises the gemv branch being guarded.

QQMatmul::eval_gpu on Metal silently dropped global_scale_x /
global_scale_w in the gemv special case (pre-quantized w, M==1),
producing numerically incorrect results when tensor-scale nvfp4
weights were in use. The general case already throws NYI, and
quantize/dequantize already reject the same combination at the
op level. Mirror those guards in qqmm() so the request is rejected
at graph-construction time rather than silently mis-computed.

Fixes ml-explore#3550.
@Brooooooklyn Brooooooklyn force-pushed the fix/metal-qqmm-global-scale-guard branch from 20f4211 to 0964646 Compare May 14, 2026 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metal: QQMatmul::eval_gpu gemv path silently drops global_scale_x/global_scale_w for nvfp4

1 participant