-
Notifications
You must be signed in to change notification settings - Fork 43
Fix fp8 kv replicate #349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix fp8 kv replicate #349
Conversation
Signed-off-by: quic-morteza <[email protected]>
…ic models Signed-off-by: quic-morteza <[email protected]>
@@ -107,7 +107,7 @@ class FP8DeQuantLinearToLinearTransform(ModuleMutatorTransform): | |||
@classmethod | |||
def mutate(cls, original_module, parent_module): | |||
# -- de-quantizing the weights -- | |||
dequant_weights = original_module.weight.to(torch.float32) * original_module.weight_scale | |||
dequant_weights = original_module.weight.to(torch.float32) # * original_module.weight_scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this removed?
@@ -121,7 +121,7 @@ def for_fp8_layer(cls, in_features, out_features, activation_quantization_strate | |||
def forward(self, x): | |||
# Only inference supported | |||
with torch.no_grad(): | |||
dequantized_weights = self.weight.to(torch.float32) * self.weight_scale | |||
dequantized_weights = self.weight.to(torch.float32) # * self.weight_scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this removed?
if layer.bias is not None: | ||
layer.bias.data = torch.repeat_interleave(layer.bias.data.view(orig_kv_heads, head_dim), repeat, 0).view( | ||
new_kv_heads * head_dim | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I investigated my code changes further and noticed that the output results are inconsistent with the ground truth. Therefore, my code changes are invalid, and I will close the PR I submitted. |
This fix is needed is tested successfully for the replicate_kv_heads.py script to operate on FP8-Dynamic models.