What's the correct way to go about static quantization of models in timm? #643

alexander-soare · 2021-05-17T14:46:54Z

alexander-soare
May 17, 2021

Right now I'm working on quantizing efficientnet. I'm rerunning my code over and over and hunting down the errors one by one, then patching the relevant sections of code. They mainly consist of

Multiplications or additions which I need to replace with torch.nn.quantized.FloatFunctional() implementations
timm.models.layers.Conv2dSame which doesn't work because this. (quite a pain to fix as I need to hunt all the kwargs down and hardcode a replacement using torch.nn.Conv2d).

I'm over an hour in and the end is still not in sight, so I was wondering if I was missing something inbuilt that could help with this.

Answered by rwightman

May 17, 2021

Assuming the trace for FX quantization is no different from other forms of tracing, there may need to be another fun workaround for the same padding (it's annoying, but no other way to support Tensorflow like SAME padding properly unless it gets implemented in the core of Pytorch someday).

See ONNX export code I have in a diff project here: https://github.com/rwightman/gen-efficientnet-pytorch/blob/master/onnx_export.py#L77-L102

It replaces conv2 dynamic same with a static (run once and then export) alternative (loose resolution flexibility) https://github.com/rwightman/gen-efficientnet-pytorch/blob/master/geffnet/conv2d_layers.py#L88-L113

I can bring that layer here if it's needed

View full answer

rwightman · 2021-05-17T16:21:54Z

rwightman
May 17, 2021
Maintainer

@alexander-soare I'd look at FX based quantization. It's pretty new but I think there are some examples out there. The prev approach, manually replacing fns didn't seem like a good solution. FX based operates using FX transformes on the traced model IR, so it doesn't matter what functions are used to create the structure of the model...

0 replies

rwightman · 2021-05-17T16:25:51Z

rwightman
May 17, 2021
Maintainer

Assuming the trace for FX quantization is no different from other forms of tracing, there may need to be another fun workaround for the same padding (it's annoying, but no other way to support Tensorflow like SAME padding properly unless it gets implemented in the core of Pytorch someday).

See ONNX export code I have in a diff project here: https://github.com/rwightman/gen-efficientnet-pytorch/blob/master/onnx_export.py#L77-L102

It replaces conv2 dynamic same with a static (run once and then export) alternative (loose resolution flexibility) https://github.com/rwightman/gen-efficientnet-pytorch/blob/master/geffnet/conv2d_layers.py#L88-L113

I can bring that layer here if it's needed

7 replies

alexander-soare May 17, 2021
Author

Many thanks to you! Cool, I'll let you know.

alexander-soare May 19, 2021
Author

@rwightman FX quantization was significantly easier than the eager version. I did it for efficientnet_b3. There was only one thing I had to tweak - see here. That comes up here in your code. To fix it I added: x_se = x_se.reshape(x_se.shape[0], x_se.shape[1], 1, 1).

I got 10x speedup on my PC, but when switching backend with torch.backends.quantized.engine = 'qnnpack' (or just putting it on an Android) it's actually slower than FP32! Others seem to be reporting a similar thing. Edit I also checked and it's the same thing with eager mode. So consider this paragraph a side comment rather than anything about FX quantization in particular.

lijuncheng16 Mar 7, 2023

@alexander-soare Hi, there. Is it possible to share your FX quantization implementation for efficientnet_b3? Thank you!

alexander-soare Mar 8, 2023
Author

@lijuncheng16 sorry but this was work for a client and I've not got access to it anymore.

lijuncheng16 Mar 8, 2023

@lijuncheng16 sorry but this was work for a client and I've not got access to it anymore.

@alexander-soare Thank you all the same! Even today, it is not exactly straightforward to implement it bug-free.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the correct way to go about static quantization of models in timm? #643

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

What's the correct way to go about static quantization of models in timm? #643

alexander-soare May 17, 2021

Replies: 2 comments · 7 replies

rwightman May 17, 2021 Maintainer

rwightman May 17, 2021 Maintainer

alexander-soare May 17, 2021 Author

alexander-soare May 19, 2021 Author

lijuncheng16 Mar 7, 2023

alexander-soare Mar 8, 2023 Author

lijuncheng16 Mar 8, 2023

alexander-soare
May 17, 2021

Replies: 2 comments 7 replies

rwightman
May 17, 2021
Maintainer

rwightman
May 17, 2021
Maintainer

alexander-soare May 17, 2021
Author

alexander-soare May 19, 2021
Author

alexander-soare Mar 8, 2023
Author