Recommended way to freeze batch norm? #866
-
I'm setting up an effnet as a maskrcnn backbone and wanted to set - norm_layer=partial(nn.BatchNorm2d, **resolve_bn_args(kwargs)),
+ norm_layer=partial(kwargs.pop('norm_layer', None) or nn.BatchNorm2d, **resolve_bn_args(kwargs)), If so, what about more generally (not just effnet)? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 5 replies
-
@alexander-soare Generally, recursively walking the modules and replacing them using helper like this https://detectron2.readthedocs.io/en/latest/_modules/detectron2/layers/batch_norm.html#FrozenBatchNorm2d.convert_frozen_batchnorm Same way as done for syncbatchnorm (https://github.com/rwightman/pytorch-image-models/blob/master/train.py#L396) and my own split_bn helper (https://github.com/rwightman/pytorch-image-models/blob/54e90e82a5a6367d468e4f6dd5982715e4e20a72/timm/models/layers/split_batchnorm.py#L41) In that way you can (sort of) track the layers and enable/disable the freeze at certain points in the model, actually being able to use the metadata for feature extraction to locate stages and selectively freeze stages generically for all models is a feature that I've wanted to impl... model.freeze(start, end, no_grad=True, bn=True) kind of interface... However, you bring up a good point, when I adjusted the EfficientNet kwarg handling a few months back I broke the ability to specify norm_layer .. wooops! I think it should be Open to PR for any and all of those (generic helper, feature/stage aware helper, or norm_layer fix) :) |
Beta Was this translation helpful? Give feedback.
-
@alexander-soare yes, from -> to requires something like FX (but don't actually need to modify the model structure), need to walk the modules in order to be able to access the BN layers, and any trainable parameters (if you also want option to freeze gradients). The from to names could be based on stage layer names already there for the feature extractor (for models that have it). With this variant you could do Highest level modules could work too. It requires another set of metadata for each model to be a generic API that doesn't require the user to know the model structure. An initial version could just require the user to do a bit of digging for each model they want to use freeze with, it's certainly more convenient than the current situation which is having no helper. Yup, the fix is straightfoward, just updating that line in each of the _create_xxx for the EfficientNet and MobileNetv3 model families. |
Beta Was this translation helpful? Give feedback.
-
@rwightman while working on the PR I realised... why this?
Any good reason you know of not to just set |
Beta Was this translation helpful? Give feedback.
-
@alexander-soare I'm on the road today, but will look at this closer soon, on first pass it looked good... was a bit curious why the |
Beta Was this translation helpful? Give feedback.
@alexander-soare Generally, recursively walking the modules and replacing them using helper like this https://detectron2.readthedocs.io/en/latest/_modules/detectron2/layers/batch_norm.html#FrozenBatchNorm2d.convert_frozen_batchnorm
...
Same way as done for syncbatchnorm (https://github.com/rwightman/pytorch-image-models/blob/master/train.py#L396) and my own split_bn helper (https://github.com/rwightman/pytorch-image-models/blob/54e90e82a5a6367d468e4f6dd5982715e4e20a72/timm/models/layers/split_batchnorm.py#L41)
In that way you can (sort of) track the layers and enable/disable the freeze at certain points in the model, actually being able to use the metadata for feature extraction to locate…