Skip to content

Conversation

@Varshith-Yadav
Copy link

Changes

fixes #3494
Added full support for data-aware weight compression when MatMul nodes use transpose_b=False.
Updated and validated test_compression_with_transpose to ensure it passes for transpose_b=False.

Reason for changes

Previously, NNCF’s weight compression flow assumed that the weight input of MatMul operations was always transposed (transpose_b=True).

Related tickets

Tests

pytest tests/openvino/native/quantization/test_weights_compression.py -v
(All tests pass; test_scale_estimation[True] remains the expected XFAIL for ticket 176465.)

@Varshith-Yadav Varshith-Yadav requested a review from a team as a code owner November 26, 2025 10:36
@ljaljushkin
Copy link
Contributor

@Varshith-Yadav, thank you for the contribution!
Is it possible to avoid many if conditions and do transpose once to keep original logic for transposed weights?

@ljaljushkin
Copy link
Contributor

ljaljushkin commented Nov 26, 2025

please also add unit tests.
at least you can copy-paste from #3725: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-223ea638f7751f7c0c3e8f867ec9c8c132a3ccd62a9dcea2a5d158836c71c222R1955-R1979 and make sure exception is not raised for transpose_b=False

@ljaljushkin
Copy link
Contributor

@Varshith-Yadav, thank you for the contribution! Is it possible to avoid many if conditions and do transpose once to keep original logic for transposed weights?

I have reconsidered and now believe that transposing each weight can extend the total compression duration. What about implementing and utilizing a "slice_weight" method with a transpose parameter?

@Varshith-Yadav
Copy link
Author

@ljaljushkin
That makes sense. I agree that explicitly transposing the full weight tensor could introduce unnecessary overhead

I will update the implementation to use a slice_weight helper method . This way, we can fetch the necessary channels dynamically based on the transpose_b parameter without physically reshaping the underlying tensor.

I'll proceed with this approach and update the PR shortly.

@github-actions github-actions bot added the NNCF OpenVINO Pull requests that updates NNCF OpenVINO label Dec 1, 2025
@Varshith-Yadav
Copy link
Author

@ljaljushkin
I've updated the implementation as requested. I added a slice_weight helper in utils.py to handle the data access without performing a full transpose, and refactored the GPTQ logic to use it.

I also added a new test file test_utils_slice_weight.py to verify the helper works correctly for both Numpy and PyTorch tensors with different transpose_b settings.

Comment on lines +31 to +35
assign_weight_column,
assign_weight_slice,
extract_weight_column,
slice_weight,
zero_mask_columns,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you need just 2 methods

def get_weight_slice(weight: Tensor, slice_obj: Union[int, slice, Tensor], is_transposed: bool) -> Tensor:
     return weight[:, slice_obj] if is_transposed else weight[slice_obj, :]
def set_weight_slice(weight: Tensor, slice_obj: Union[int, slice, Tensor], value: Tensor, is_transposed: bool) -> None:
    if is_transposed:
        weight[:, slice_obj] = value
    else:
        weight[slice_obj, :] = value

weight_tensor = fns.astype(weight_tensor, TensorDataType.float32)

# Get transpose_b value to handle weight shape correctly
transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id]["transpose"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same issue should be in other data-aware algorithms: awq, lora_correction, scale_estimation
Support copy-pasting a test for transpose_b=False + all these methods and check whether it fails: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-223ea638f7751f7c0c3e8f867ec9c8c132a3ccd62a9dcea2a5d158836c71c222R1960-R1961

from nncf.quantization.algorithms.weight_compression.config import WeightCompressionParameters
from nncf.quantization.algorithms.weight_compression.parameters import CompressedWeight
from nncf.quantization.algorithms.weight_compression.scale_estimation import ScaleEstimation
from nncf.quantization.algorithms.weight_compression.utils import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utils name violates the code style: https://github.com/openvinotoolkit/nncf/blob/develop/docs/styleguide/PyGuide.md#474-file-naming
Possible name: tensor_slicing.py

@ljaljushkin
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

NNCF OpenVINO Pull requests that updates NNCF OpenVINO

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Good First Issue][NNCF]: Support not transposed weight for data-aware weight compression methods

2 participants