why norm free nets becoming numerically unstable faster sometimes? #527
Replies: 3 comments 1 reply
-
You're getting NaN because they are normalization free and much more fussy than nets w/ BN. I find it very impressive that they work as well as they do through careful analysis and additions to a ResNet architecture and training recipe. But they are delicate, you can't just throw whatever you want at them. If you want to deviate from the hparams in the paper (ie by using adam), you'll have to experiment and do some hparam sweeps to find out what degree of gradient clipping etc is needed, what LR, etc. You also need to look at your data and ensure it's normalized/standardized appropriately, etc. Also, fix your grad clipping, you aren't doing it correctly when using AMP, it's done correctly in timm utils. |
Beta Was this translation helpful? Give feedback.
-
@rwightman i tried to use this grad clipping :https://github.com/rwightman/pytorch-image-models/blob/master/timm/utils/agc.py from timm utils,,i couldn't understand this comment sorry "Also, fix your grad clipping, you aren't doing it correctly when using AMP, it's done correctly in timm utils." |
Beta Was this translation helpful? Give feedback.
-
Hi, |
Beta Was this translation helpful? Give feedback.
-
i am trying the code below for vinbig kaggle challenge :
if i use any other model rather than norm free nets then it works fine,no issue,,,but when i use norm free nets i get this trainig log :
how do i solve this issue?
if you check some of the comments of this discussion : https://www.kaggle.com/c/cassava-leaf-disease-classification/discussion/220268
you can see many other people getting NaN like me while using nfnets only
Beta Was this translation helpful? Give feedback.
All reactions