For which params do we not want weight decay? #894

alexander-soare · 2021-09-29T11:28:34Z

alexander-soare
Sep 29, 2021

create_optimizer_v2 has a kwarg:

filter_bias_and_bn: filter out bias, bn and other 1d params from weight decay

Seems a little intuitive for some of these that we wouldn't want weight decay. But wondering why it's clear that we'd do this for the broad case of all 1d params. And further to that, shouldn't we be careful because this is somewhat dependent on how PT or the author of a custom module decides to represent params?

While I'm at it, any idea why we don't consider bias as 1d here? https://github.com/rwightman/pytorch-image-models/blob/3f9959cdd28cb959980abf81fc4cf34f32e18399/timm/optim/optim_factory.py#L37

rwightman · 2021-09-29T16:50:17Z

rwightman
Sep 29, 2021
Maintainer

@alexander-soare I feel it's pretty standard, that condition is actually a bit redundant, way back I started with bias and then realized it was supposed to include other 1d weights (there are references, I don't have them handy).

See Flax example as a minimal case: https://github.com/google/flax/blob/main/examples/imagenet/train.py#L122-L126

There may be situations where this is incorrect, but in most situations this is far more correct than the defaults (not doing this) and produces better results in majority of situations. I believe the best approach to improve this would be to make that weight decay block a fn that can be overriden as an arg... I also have a note to do this for learning rate so that both learning rate and wd can be applied per param based on name/shape to get a set of param groups to pass to opt....

1 reply

alexander-soare Sep 29, 2021
Author

Thanks for the comment @rwightman . Yeah sounds like a nice feature to add - needed to do that just today with one of my projects

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

For which params do we not want weight decay? #894

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

For which params do we not want weight decay? #894

Uh oh!

Uh oh!

alexander-soare Sep 29, 2021

Replies: 1 comment · 1 reply

Uh oh!

rwightman Sep 29, 2021 Maintainer

Uh oh!

alexander-soare Sep 29, 2021 Author

alexander-soare
Sep 29, 2021

Replies: 1 comment 1 reply

rwightman
Sep 29, 2021
Maintainer

alexander-soare Sep 29, 2021
Author