-
Notifications
You must be signed in to change notification settings - Fork 270
[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False #3759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False #3759
Conversation
|
@Varshith-Yadav, thank you for the contribution! |
|
please also add unit tests. |
I have reconsidered and now believe that transposing each weight can extend the total compression duration. What about implementing and utilizing a "slice_weight" method with a transpose parameter? |
|
@ljaljushkin I will update the implementation to use a I'll proceed with this approach and update the PR shortly. |
|
@ljaljushkin I also added a new test file test_utils_slice_weight.py to verify the helper works correctly for both Numpy and PyTorch tensors with different transpose_b settings. |
| assign_weight_column, | ||
| assign_weight_slice, | ||
| extract_weight_column, | ||
| slice_weight, | ||
| zero_mask_columns, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you need just 2 methods
def get_weight_slice(weight: Tensor, slice_obj: Union[int, slice, Tensor], is_transposed: bool) -> Tensor:
return weight[:, slice_obj] if is_transposed else weight[slice_obj, :]def set_weight_slice(weight: Tensor, slice_obj: Union[int, slice, Tensor], value: Tensor, is_transposed: bool) -> None:
if is_transposed:
weight[:, slice_obj] = value
else:
weight[slice_obj, :] = value| weight_tensor = fns.astype(weight_tensor, TensorDataType.float32) | ||
|
|
||
| # Get transpose_b value to handle weight shape correctly | ||
| transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id]["transpose"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same issue should be in other data-aware algorithms: awq, lora_correction, scale_estimation
Support copy-pasting a test for transpose_b=False + all these methods and check whether it fails: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-223ea638f7751f7c0c3e8f867ec9c8c132a3ccd62a9dcea2a5d158836c71c222R1960-R1961
| from nncf.quantization.algorithms.weight_compression.config import WeightCompressionParameters | ||
| from nncf.quantization.algorithms.weight_compression.parameters import CompressedWeight | ||
| from nncf.quantization.algorithms.weight_compression.scale_estimation import ScaleEstimation | ||
| from nncf.quantization.algorithms.weight_compression.utils import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utils name violates the code style: https://github.com/openvinotoolkit/nncf/blob/develop/docs/styleguide/PyGuide.md#474-file-naming
Possible name: tensor_slicing.py
|
also recommend configuring automatic code formating: https://github.com/openvinotoolkit/nncf/blob/develop/docs/styleguide/PyGuide.md#2-automating-code-formatting |
Changes
fixes #3494
Added full support for data-aware weight compression when
MatMulnodes usetranspose_b=False.Updated and validated
test_compression_with_transposeto ensure it passes fortranspose_b=False.Reason for changes
Previously, NNCF’s weight compression flow assumed that the weight input of
MatMuloperations was always transposed (transpose_b=True).Related tickets
Tests
pytest tests/openvino/native/quantization/test_weights_compression.py -v(All tests pass; test_scale_estimation[True] remains the expected XFAIL for ticket 176465.)