Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ONNX] Add per channel quantization support for Onnx.QLinearConv op #3917

Merged
merged 11 commits into from
Mar 10, 2025

Conversation

vivekkhandelwal1
Copy link
Collaborator

@vivekkhandelwal1 vivekkhandelwal1 commented Dec 13, 2024

This commit extends the OnnxToTorch Lowering for Onnx.QLinearConv op by adding the support for per channel quantization for the weight argument.

Since the convolution operation in the downstream pipeline ("Linalg") does not support the per-channel quantization, hence we add the support by performing convolution over the dequantized input and weight and then quantizing the output.

Fixes nod-ai/SHARK-ModelDev#894.

Signed-off-by: Vivek Khandelwal [email protected]

Copy link
Collaborator

@jinchen62 jinchen62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@zjgarvey zjgarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Vivek. I think you need to modify some of the output quantization handling in the per-channel case. Maybe store a bool that tracks if we are in the per-channel case so you can reuse it for the output.

It looks like this conversion automatically fuses the input and weight quantization with the convolution, so the only thing that fuse-quantized-ops is going to do is quantize the bias (which won't work currently in the per-channel case). I think it is fine, but we won't be able to check correctness e2e until we address the per-channel quantization, unfortunately.

@vivekkhandelwal1
Copy link
Collaborator Author

Hi @zjgarvey, can you please review the patch now?

@vivekkhandelwal1
Copy link
Collaborator Author

Hi @zjgarvey, can you please review the patch now?

@zjgarvey, this patch now adds the support for per-channel weight quantization without modifying the existing lowering. I think this should now create no issues for the existing models.

Copy link
Collaborator

@zjgarvey zjgarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this will get us the functionality we need for now.

I have a nit about the test being too involved, but otherwise this looks good to me. Have you tested out any numerics?

@vivekkhandelwal1 vivekkhandelwal1 enabled auto-merge (squash) March 10, 2025 06:12
@vivekkhandelwal1 vivekkhandelwal1 merged commit c7f8ac0 into llvm:main Mar 10, 2025
3 checks passed
@vivekkhandelwal1 vivekkhandelwal1 deleted the fix-qlinearconv branch March 10, 2025 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

failed to legalize operation 'torch.operator' that was explicitly marked illegal: onnx.QLinearConv
5 participants