Fix fp8 kv replicate #349

quic-morteza · 2025-04-08T16:09:17Z

This fix is needed is tested successfully for the replicate_kv_heads.py script to operate on FP8-Dynamic models.

Signed-off-by: quic-morteza <[email protected]>

…ic models Signed-off-by: quic-morteza <[email protected]>

ochougul · 2025-04-09T08:13:00Z

QEfficient/transformers/quantizers/quant_transforms.py

    def mutate(cls, original_module, parent_module):
        #  -- de-quantizing the weights --
-        dequant_weights = original_module.weight.to(torch.float32) * original_module.weight_scale
+        dequant_weights = original_module.weight.to(torch.float32)  # * original_module.weight_scale


why is this removed?

ochougul · 2025-04-09T08:13:24Z

QEfficient/transformers/quantizers/quantizer_compressed_tensors.py

        # Only inference supported
        with torch.no_grad():
-            dequantized_weights = self.weight.to(torch.float32) * self.weight_scale
+            dequantized_weights = self.weight.to(torch.float32)  # * self.weight_scale


Why is this removed?

ochougul · 2025-04-09T08:13:38Z

scripts/replicate_kv_head/replicate_kv_heads.py

+        if layer.bias is not None:
+            layer.bias.data = torch.repeat_interleave(layer.bias.data.view(orig_kv_heads, head_dim), repeat, 0).view(
+                new_kv_heads * head_dim
+            )


quic-morteza · 2025-04-10T04:03:57Z

I investigated my code changes further and noticed that the output results are inconsistent with the ground truth. Therefore, my code changes are invalid, and I will close the PR I submitted.

quic-morteza and others added 3 commits March 25, 2025 13:00

the fix replicates biases too if they exist (e.g. Qwen)

6d29125

Signed-off-by: quic-morteza <[email protected]>

Merge branch 'quic:main' into fix_fp8_kv_replicate

600c17f

fix is needed/tested for replicate_kv_heads.py to work with FP8-Dynam…

9a2783c

…ic models Signed-off-by: quic-morteza <[email protected]>

quic-morteza requested review from ochougul and quic-rishinr as code owners April 8, 2025 16:09

ochougul requested changes Apr 9, 2025

View reviewed changes

quic-morteza closed this Apr 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix fp8 kv replicate #349

Fix fp8 kv replicate #349

Uh oh!

quic-morteza commented Apr 8, 2025

Uh oh!

ochougul Apr 9, 2025

Uh oh!

ochougul Apr 9, 2025

Uh oh!

ochougul Apr 9, 2025

Uh oh!

quic-morteza commented Apr 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix fp8 kv replicate #349

Fix fp8 kv replicate #349

Uh oh!

Conversation

quic-morteza commented Apr 8, 2025

Uh oh!

ochougul Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

ochougul Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

ochougul Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

quic-morteza commented Apr 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants