Recommended way to freeze batch norm? #866

alexander-soare · 2021-09-13T10:58:06Z

alexander-soare
Sep 13, 2021

I'm setting up an effnet as a maskrcnn backbone and wanted to set norm_layer = torchvision.ops.misc.FrozenBatchNorm2d but the builder specifies that kwarg so I can't do it via timm.create_model. Any reason as a user I should be prevented from doing what I'm trying to do? Would it make sense to replace that line with the following?

- norm_layer=partial(nn.BatchNorm2d, **resolve_bn_args(kwargs)),
+ norm_layer=partial(kwargs.pop('norm_layer', None) or nn.BatchNorm2d,  **resolve_bn_args(kwargs)),

If so, what about more generally (not just effnet)?

Answered by rwightman

Sep 13, 2021

@alexander-soare Generally, recursively walking the modules and replacing them using helper like this https://detectron2.readthedocs.io/en/latest/_modules/detectron2/layers/batch_norm.html#FrozenBatchNorm2d.convert_frozen_batchnorm
...

Same way as done for syncbatchnorm (https://github.com/rwightman/pytorch-image-models/blob/master/train.py#L396) and my own split_bn helper (https://github.com/rwightman/pytorch-image-models/blob/54e90e82a5a6367d468e4f6dd5982715e4e20a72/timm/models/layers/split_batchnorm.py#L41)

In that way you can (sort of) track the layers and enable/disable the freeze at certain points in the model, actually being able to use the metadata for feature extraction to locate…

View full answer

rwightman · 2021-09-13T16:20:50Z

rwightman
Sep 13, 2021
Maintainer

@alexander-soare Generally, recursively walking the modules and replacing them using helper like this https://detectron2.readthedocs.io/en/latest/_modules/detectron2/layers/batch_norm.html#FrozenBatchNorm2d.convert_frozen_batchnorm
...

Same way as done for syncbatchnorm (https://github.com/rwightman/pytorch-image-models/blob/master/train.py#L396) and my own split_bn helper (https://github.com/rwightman/pytorch-image-models/blob/54e90e82a5a6367d468e4f6dd5982715e4e20a72/timm/models/layers/split_batchnorm.py#L41)

In that way you can (sort of) track the layers and enable/disable the freeze at certain points in the model, actually being able to use the metadata for feature extraction to locate stages and selectively freeze stages generically for all models is a feature that I've wanted to impl... model.freeze(start, end, no_grad=True, bn=True) kind of interface...

However, you bring up a good point, when I adjusted the EfficientNet kwarg handling a few months back I broke the ability to specify norm_layer .. wooops! I think it should be norm_layer=norm_layer or partial(nn.BatchNorm2d, **resolve_bn_args(kwargs)) and user needs to bind their norm_layer with whatever args necessary since resolve_bn_args is bn specific right now and would break GroupNorm or even FrozenBn (momentum arg not present).

Open to PR for any and all of those (generic helper, feature/stage aware helper, or norm_layer fix) :)

1 reply

alexander-soare Sep 17, 2021
Author

Hmm sounds useful and actually simple enough to implement. A from-to though, would require something like FX unless there's an assumption that modules/parameters are registered in the right order. To avoid that, maybe providing a list of highest-level modules would suffice. Like timm.utils.freeze([model.layer1, model.layer2, model.layer3, model.layer4]) or you can also provide strings to be a little more programmatic timm.utils.freeze(model, ['layer{i}' for i in range 4]) (then freeze looks for named params starting with the strings).

And correct me if I'm wrong, but to do it as model.freeze would require either a mixin, or that all timm models inherit from some abstract base class as a middleman right?

Re just a norm layer fix - that sounds like inspecting every model builder and putting that line in there where applicable, right?

rwightman · 2021-09-19T05:02:49Z

rwightman
Sep 19, 2021
Maintainer

@alexander-soare yes, from -> to requires something like FX (but don't actually need to modify the model structure), need to walk the modules in order to be able to access the BN layers, and any trainable parameters (if you also want option to freeze gradients). The from to names could be based on stage layer names already there for the feature extractor (for models that have it). With this variant you could do freeze(from=0, to=3) with same 'stage idx' that features_only uses, any model layer names, or other string based specifiers for future feature markers (one of the things I'm still pondering re the transformers and all of the mess of hybrid variants..)

Highest level modules could work too. It requires another set of metadata for each model to be a generic API that doesn't require the user to know the model structure. An initial version could just require the user to do a bit of digging for each model they want to use freeze with, it's certainly more convenient than the current situation which is having no helper.

Yup, the fix is straightfoward, just updating that line in each of the _create_xxx for the EfficientNet and MobileNetv3 model families.

0 replies

alexander-soare · 2021-09-21T13:00:00Z

alexander-soare
Sep 21, 2021
Author

@rwightman while working on the PR I realised... why this?

Generally, recursively walking the modules and replacing them

Any good reason you know of not to just set track_running_stats to False?

3 replies

rwightman Sep 21, 2021
Maintainer

@alexander-soare FrozenBatchnorm also stops the affine parameters from changing (by making them buffers). I supposed track_running_stats=False and setting 1requires_grad=False1 on them should work w/o swapping the layers. However the behaviour for track_running_stats for this use case didn't behave as expected (based on the name) until quite recently... I think it was 1.8. pytorch/pytorch#38084

alexander-soare Sep 22, 2021
Author

didn't behave as expected (based on the name) until quite recently

Ahh and the world makes sense again. Thanks!

alexander-soare Oct 2, 2021
Author

@rwightman actually re my recent PR here, just FYI that I'm not so sure that track_running_stats=False and requires_grad=False does the same as FrozenBatchNorm2d. See here

rwightman · 2021-10-04T14:34:21Z

rwightman
Oct 4, 2021
Maintainer

@alexander-soare I'm on the road today, but will look at this closer soon, on first pass it looked good... was a bit curious why the ( ) brackets in one of the assignments...

1 reply

alexander-soare Oct 4, 2021
Author

@rwightman hmm are you referring to the ternary operator? If so, it's just a habit of mine - I feel it's more readable that way.

Uh oh!

Recommended way to freeze batch norm? #866

Uh oh!

Uh oh!

alexander-soare Sep 13, 2021

Replies: 4 comments · 5 replies

Uh oh!

Uh oh!

rwightman Sep 13, 2021 Maintainer

Uh oh!

alexander-soare Sep 17, 2021 Author

Uh oh!

rwightman Sep 19, 2021 Maintainer

Uh oh!

alexander-soare Sep 21, 2021 Author

Uh oh!

Uh oh!

rwightman Sep 21, 2021 Maintainer

Uh oh!

alexander-soare Sep 22, 2021 Author

Uh oh!

alexander-soare Oct 2, 2021 Author

Uh oh!

rwightman Oct 4, 2021 Maintainer

Uh oh!

alexander-soare Oct 4, 2021 Author

alexander-soare
Sep 13, 2021

Replies: 4 comments 5 replies

rwightman
Sep 13, 2021
Maintainer

alexander-soare Sep 17, 2021
Author

rwightman
Sep 19, 2021
Maintainer

alexander-soare
Sep 21, 2021
Author

rwightman Sep 21, 2021
Maintainer

alexander-soare Sep 22, 2021
Author

alexander-soare Oct 2, 2021
Author

rwightman
Oct 4, 2021
Maintainer

alexander-soare Oct 4, 2021
Author