-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ONNX] Add per channel quantization support for Onnx.QLinearConv op #3917
[ONNX] Add per channel quantization support for Onnx.QLinearConv op #3917
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Vivek. I think you need to modify some of the output quantization handling in the per-channel case. Maybe store a bool that tracks if we are in the per-channel case so you can reuse it for the output.
It looks like this conversion automatically fuses the input and weight quantization with the convolution, so the only thing that fuse-quantized-ops is going to do is quantize the bias (which won't work currently in the per-channel case). I think it is fine, but we won't be able to check correctness e2e until we address the per-channel quantization, unfortunately.
68ad21b
to
32cccff
Compare
Hi @zjgarvey, can you please review the patch now? |
021fdc9
to
2c13310
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this will get us the functionality we need for now.
I have a nit about the test being too involved, but otherwise this looks good to me. Have you tested out any numerics?
2453a5b
to
3d7924d
Compare
This commit extends the OnnxToTorch Lowering for Onnx.QLinearConv op by adding the support for per channel quantization for the weight argument.
Since the convolution operation in the downstream pipeline ("Linalg") does not support the per-channel quantization, hence we add the support by performing convolution over the dequantized input and weight and then quantizing the output.
Fixes nod-ai/SHARK-ModelDev#894.
Signed-off-by: Vivek Khandelwal [email protected]