-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clamping introduces nan
if ar_steps_train > 1
#119
Comments
Just so it does not cause confusion: The options
are not available on the main branch, but should have nothing to do with this. |
I'm on a train and didn't have the DANRA data downloaded, so tried with MEPS data and this config:
|
I don't think the config file you are referring to exists on main. Did you base it on the README.md?
|
Oh, yes you are correct. Don't mind my comment above. |
Hmm, this is quite confusing. After 1 batch there are non NaNs in the state resulting from the clamping output by
However, on iteration 2 already
is only NaNs. This would imply that there are some NaN weights in the network in the second iteration. That will then result in NaNs in the output state. Could it be that the first iteration does not create NaNs in the state, but the gradients w.r.t. something are NaN or inf? |
Running with
So it first found nan in a backwards pass (of Expm1, whatever that is). That does not neccesarily mean that the nan originated from there, but could be. |
Oh, right, that is probably neural-lam/neural_lam/utils.py Line 325 in f342487
in utils.inverse_softplus |
I tried adjusting the threshold parameter used in inverse_softplus, but it did not change anything. |
After a few iterations clamping state variables introduces nan in the train loss. Replacing the sigmoid and inverse sigmoid functions with simple
torch.clamp
does prevent the issue (just as an indication). Here is a reproducible example based on the danra datastore from thetest_examples
:hierarchical
archetype)The text was updated successfully, but these errors were encountered: