[Metal] Reject tensor-scale nvfp4 in qqmm by Brooooooklyn · Pull Request #3551 · ml-explore/mlx

Brooooooklyn · 2026-05-14T07:24:59Z

Summary

QQMatmul::eval_gpu on Metal silently dropped global_scale_x / global_scale_w in its gemv special case (pre-quantized w, x.shape(-2) == 1), producing numerically incorrect results when tensor-scale nvfp4 weights were in use. The general case already throws [QQMatmul] NYI for the general case, and quantize() / dequantize() already reject the same combination at the op level (mlx/ops.cpp:4940-4945, 5205-5210). qqmm() was missed when tensor-scale nvfp4 landed in #3022.

This change mirrors the existing guards: qqmm() now throws std::invalid_argument (Python ValueError) at graph-construction time when the stream is a Metal GPU and either global_scale_x or global_scale_w is set, rather than letting the request reach eval_gpu where it is silently mis-computed. CUDA is unaffected.

Per-group nvfp4 (no global_scale) on Metal continues to work — that path is exercised by the existing test_qqmv and is unchanged.

Fixes #3550.

Test plan

Added test_qqmm_metal_global_scale_rejected in python/tests/test_quantized.py, asserts ValueError when mx.qqmm is called on Metal with both global scales set. Verified the test fails on main (silently runs to completion) and passes with this fix.
Full python/tests/test_quantized.py (29 tests) still passes locally on Apple Silicon, including test_qqmv which exercises the gemv branch being guarded.

QQMatmul::eval_gpu on Metal silently dropped global_scale_x / global_scale_w in the gemv special case (pre-quantized w, M==1), producing numerically incorrect results when tensor-scale nvfp4 weights were in use. The general case already throws NYI, and quantize/dequantize already reject the same combination at the op level. Mirror those guards in qqmm() so the request is rejected at graph-construction time rather than silently mis-computed. Fixes ml-explore#3550.

Brooooooklyn force-pushed the fix/metal-qqmm-global-scale-guard branch from 20f4211 to 0964646 Compare May 14, 2026 07:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metal] Reject tensor-scale nvfp4 in qqmm#3551

[Metal] Reject tensor-scale nvfp4 in qqmm#3551
Brooooooklyn wants to merge 1 commit into
ml-explore:mainfrom
mlx-node:fix/metal-qqmm-global-scale-guard

Brooooooklyn commented May 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Brooooooklyn commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Brooooooklyn commented May 14, 2026 •

edited

Loading