[ONNX] Add per channel quantization support for Onnx.QLinearConv op #3917

vivekkhandelwal1 · 2024-12-13T06:50:03Z

This commit extends the OnnxToTorch Lowering for Onnx.QLinearConv op by adding the support for per channel quantization for the weight argument.

Since the convolution operation in the downstream pipeline ("Linalg") does not support the per-channel quantization, hence we add the support by performing convolution over the dequantized input and weight and then quantizing the output.

Fixes nod-ai/SHARK-ModelDev#894.

Signed-off-by: Vivek Khandelwal [email protected]

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp

jinchen62

LGTM

zjgarvey

Thanks Vivek. I think you need to modify some of the output quantization handling in the per-channel case. Maybe store a bool that tracks if we are in the per-channel case so you can reuse it for the output.

It looks like this conversion automatically fuses the input and weight quantization with the convolution, so the only thing that fuse-quantized-ops is going to do is quantize the bias (which won't work currently in the per-channel case). I think it is fine, but we won't be able to check correctness e2e until we address the per-channel quantization, unfortunately.

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp

vivekkhandelwal1 · 2025-03-03T11:39:32Z

Hi @zjgarvey, can you please review the patch now?

vivekkhandelwal1 · 2025-03-06T14:09:04Z

Hi @zjgarvey, can you please review the patch now?

@zjgarvey, this patch now adds the support for per-channel weight quantization without modifying the existing lowering. I think this should now create no issues for the existing models.

zjgarvey

Nice, this will get us the functionality we need for now.

I have a nit about the test being too involved, but otherwise this looks good to me. Have you tested out any numerics?

test/Conversion/TorchOnnxToTorch/simple_ops_q_to_z.mlir

vivekkhandelwal1 requested review from zjgarvey, rsuderman and AmosLewis December 13, 2024 07:47

vinayakdsci reviewed Dec 17, 2024

View reviewed changes

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved

vivekkhandelwal1 mentioned this pull request Dec 18, 2024

failed to legalize operation 'torch.operator' that was explicitly marked illegal: onnx.QLinearConv nod-ai/SHARK-ModelDev#894

Closed

jinchen62 approved these changes Dec 19, 2024

View reviewed changes

zjgarvey requested changes Dec 19, 2024

View reviewed changes

vivekkhandelwal1 commented Feb 27, 2025

View reviewed changes

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved

vivekkhandelwal1 force-pushed the fix-qlinearconv branch from 68ad21b to 32cccff Compare March 3, 2025 11:37

vivekkhandelwal1 requested a review from zjgarvey March 3, 2025 11:39

pravg-amd mentioned this pull request Mar 6, 2025

[Tracker] All the issue related with ONNX model zoo models nod-ai/SHARK-ModelDev#886

Open

vivekkhandelwal1 force-pushed the fix-qlinearconv branch from 021fdc9 to 2c13310 Compare March 6, 2025 14:00

zjgarvey approved these changes Mar 7, 2025

View reviewed changes

test/Conversion/TorchOnnxToTorch/simple_ops_q_to_z.mlir Outdated Show resolved Hide resolved

test/Conversion/TorchOnnxToTorch/simple_ops_q_to_z.mlir Outdated Show resolved Hide resolved

AmosLewis approved these changes Mar 9, 2025

View reviewed changes

vivekkhandelwal1 added 11 commits March 10, 2025 10:45

add per channel quantization for onnx.qlinearconv op

d96b177

More changes

aa4bd60

Remove some code

b595e84

Update lit test

b1319ba

Address PR comments

45d2d70

Update the lowering and add test

7d2e31a

Handle non-floating point input for quantize_per_tensor

d81ef2d

Add QlinearConv as Dequant(Input) + Conv

eb1ed60

Revert some changes to original state

077995a

Fix typo

f51e83d

Simplify test

3d7924d

vivekkhandelwal1 force-pushed the fix-qlinearconv branch from 2453a5b to 3d7924d Compare March 10, 2025 06:12

vivekkhandelwal1 enabled auto-merge (squash) March 10, 2025 06:12

vivekkhandelwal1 merged commit c7f8ac0 into llvm:main Mar 10, 2025
3 checks passed

vivekkhandelwal1 deleted the fix-qlinearconv branch March 10, 2025 07:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ONNX] Add per channel quantization support for Onnx.QLinearConv op #3917

[ONNX] Add per channel quantization support for Onnx.QLinearConv op #3917

vivekkhandelwal1 commented Dec 13, 2024 •

edited

Loading

jinchen62 left a comment

zjgarvey left a comment

vivekkhandelwal1 commented Mar 3, 2025

vivekkhandelwal1 commented Mar 6, 2025

zjgarvey left a comment

[ONNX] Add per channel quantization support for Onnx.QLinearConv op #3917

[ONNX] Add per channel quantization support for Onnx.QLinearConv op #3917

Conversation

vivekkhandelwal1 commented Dec 13, 2024 • edited Loading

jinchen62 left a comment

Choose a reason for hiding this comment

zjgarvey left a comment

Choose a reason for hiding this comment

vivekkhandelwal1 commented Mar 3, 2025

vivekkhandelwal1 commented Mar 6, 2025

zjgarvey left a comment

Choose a reason for hiding this comment

vivekkhandelwal1 commented Dec 13, 2024 •

edited

Loading