Skip to content

Fix fp8 kv replicate #349

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

quic-morteza
Copy link
Contributor

This fix is needed is tested successfully for the replicate_kv_heads.py script to operate on FP8-Dynamic models.

@@ -107,7 +107,7 @@ class FP8DeQuantLinearToLinearTransform(ModuleMutatorTransform):
@classmethod
def mutate(cls, original_module, parent_module):
# -- de-quantizing the weights --
dequant_weights = original_module.weight.to(torch.float32) * original_module.weight_scale
dequant_weights = original_module.weight.to(torch.float32) # * original_module.weight_scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this removed?

@@ -121,7 +121,7 @@ def for_fp8_layer(cls, in_features, out_features, activation_quantization_strate
def forward(self, x):
# Only inference supported
with torch.no_grad():
dequantized_weights = self.weight.to(torch.float32) * self.weight_scale
dequantized_weights = self.weight.to(torch.float32) # * self.weight_scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this removed?

Comment on lines +56 to +59
if layer.bias is not None:
layer.bias.data = torch.repeat_interleave(layer.bias.data.view(orig_kv_heads, head_dim), repeat, 0).view(
new_kv_heads * head_dim
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@quic-morteza
Copy link
Contributor Author

I investigated my code changes further and noticed that the output results are inconsistent with the ground truth. Therefore, my code changes are invalid, and I will close the PR I submitted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants