[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False #3759

Varshith-Yadav · 2025-11-26T10:36:22Z

Changes

fixes #3494
Added full support for data-aware weight compression when MatMul nodes use transpose_b=False.
Updated and validated test_compression_with_transpose to ensure it passes for transpose_b=False.

Reason for changes

Previously, NNCF’s weight compression flow assumed that the weight input of MatMul operations was always transposed (transpose_b=True).

Related tickets

Tests

pytest tests/openvino/native/quantization/test_weights_compression.py -v
(All tests pass; test_scale_estimation[True] remains the expected XFAIL for ticket 176465.)

…_b=False

ljaljushkin · 2025-11-26T14:26:17Z

@Varshith-Yadav, thank you for the contribution!
Is it possible to avoid many if conditions and do transpose once to keep original logic for transposed weights?

ljaljushkin · 2025-11-26T14:49:21Z

please also add unit tests.
at least you can copy-paste from #3725: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-223ea638f7751f7c0c3e8f867ec9c8c132a3ccd62a9dcea2a5d158836c71c222R1955-R1979 and make sure exception is not raised for transpose_b=False

ljaljushkin · 2025-11-28T15:10:16Z

@Varshith-Yadav, thank you for the contribution! Is it possible to avoid many if conditions and do transpose once to keep original logic for transposed weights?

I have reconsidered and now believe that transposing each weight can extend the total compression duration. What about implementing and utilizing a "slice_weight" method with a transpose parameter?

Varshith-Yadav · 2025-11-28T20:09:32Z

@ljaljushkin
That makes sense. I agree that explicitly transposing the full weight tensor could introduce unnecessary overhead

I will update the implementation to use a slice_weight helper method . This way, we can fetch the necessary channels dynamically based on the transpose_b parameter without physically reshaping the underlying tensor.

I'll proceed with this approach and update the PR shortly.

Varshith-Yadav · 2025-12-01T20:50:55Z

@ljaljushkin
I've updated the implementation as requested. I added a slice_weight helper in utils.py to handle the data access without performing a full transpose, and refactored the GPTQ logic to use it.

I also added a new test file test_utils_slice_weight.py to verify the helper works correctly for both Numpy and PyTorch tensors with different transpose_b settings.

ljaljushkin · 2025-12-02T10:33:14Z

src/nncf/quantization/algorithms/weight_compression/gptq.py

+    assign_weight_column,
+    assign_weight_slice,
+    extract_weight_column,
+    slice_weight,
+    zero_mask_columns,


I believe you need just 2 methods

def get_weight_slice(weight: Tensor, slice_obj: Union[int, slice, Tensor], is_transposed: bool) -> Tensor: return weight[:, slice_obj] if is_transposed else weight[slice_obj, :]

def set_weight_slice(weight: Tensor, slice_obj: Union[int, slice, Tensor], value: Tensor, is_transposed: bool) -> None: if is_transposed: weight[:, slice_obj] = value else: weight[slice_obj, :] = value

ljaljushkin · 2025-12-02T10:39:28Z

src/nncf/quantization/algorithms/weight_compression/gptq.py

        weight_tensor = fns.astype(weight_tensor, TensorDataType.float32)
+
+        # Get transpose_b value to handle weight shape correctly
+        transpose_b = wc_params.node_with_weight.layer_attributes.constant_attributes[wc_params.weight_port_id]["transpose"]


the same issue should be in other data-aware algorithms: awq, lora_correction, scale_estimation
Support copy-pasting a test for transpose_b=False + all these methods and check whether it fails: https://github.com/openvinotoolkit/nncf/pull/3725/files#diff-223ea638f7751f7c0c3e8f867ec9c8c132a3ccd62a9dcea2a5d158836c71c222R1960-R1961

ljaljushkin · 2025-12-02T10:49:54Z

src/nncf/quantization/algorithms/weight_compression/gptq.py

 from nncf.quantization.algorithms.weight_compression.config import WeightCompressionParameters
 from nncf.quantization.algorithms.weight_compression.parameters import CompressedWeight
 from nncf.quantization.algorithms.weight_compression.scale_estimation import ScaleEstimation
+from nncf.quantization.algorithms.weight_compression.utils import (


utils name violates the code style: https://github.com/openvinotoolkit/nncf/blob/develop/docs/styleguide/PyGuide.md#474-file-naming
Possible name: tensor_slicing.py

ljaljushkin · 2025-12-02T10:53:28Z

also recommend configuring automatic code formating: https://github.com/openvinotoolkit/nncf/blob/develop/docs/styleguide/PyGuide.md#2-automating-code-formatting

[NNCF] Enable data-aware weight compression for MatMul with transpose…

13fcd44

…_b=False

Varshith-Yadav requested a review from a team as a code owner November 26, 2025 10:36

Refactor: Use slice_weight helper instead of full transpose

ea6e5e3

github-actions bot added the NNCF OpenVINO Pull requests that updates NNCF OpenVINO label Dec 1, 2025

ljaljushkin requested changes Dec 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False #3759

[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False #3759

Varshith-Yadav commented Nov 26, 2025

Uh oh!

ljaljushkin commented Nov 26, 2025

Uh oh!

ljaljushkin commented Nov 26, 2025 •

edited

Loading

Uh oh!

ljaljushkin commented Nov 28, 2025

Uh oh!

Varshith-Yadav commented Nov 28, 2025

Uh oh!

Varshith-Yadav commented Dec 1, 2025

Uh oh!

ljaljushkin Dec 2, 2025

Uh oh!

ljaljushkin Dec 2, 2025

Uh oh!

ljaljushkin Dec 2, 2025

Uh oh!

ljaljushkin commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False #3759

Are you sure you want to change the base?

[NNCF] Enable data-aware weight compression for MatMul with transpose_b=False #3759

Conversation

Varshith-Yadav commented Nov 26, 2025

Changes

Reason for changes

Related tickets

Tests

Uh oh!

ljaljushkin commented Nov 26, 2025

Uh oh!

ljaljushkin commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ljaljushkin commented Nov 28, 2025

Uh oh!

Varshith-Yadav commented Nov 28, 2025

Uh oh!

Varshith-Yadav commented Dec 1, 2025

Uh oh!

ljaljushkin Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

ljaljushkin Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

ljaljushkin Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

ljaljushkin commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ljaljushkin commented Nov 26, 2025 •

edited

Loading