Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clamping in Affine Transform #253

Closed
vpratz opened this issue Nov 21, 2024 · 10 comments
Closed

Clamping in Affine Transform #253

vpratz opened this issue Nov 21, 2024 · 10 comments
Assignees
Labels
discussion Discuss a topic or question not necessarily with a clear output in mind.

Comments

@vpratz
Copy link
Collaborator

vpratz commented Nov 21, 2024

The changes in clamping have lead to worse performance in the two moons test. Reverting the changes in eb5c446 and 7906c1e fixed this, but was not motivated by extensive testing.

@stefanradev93 Could you please evaluate which form is the best in practice and, in case the changes are the best in practical applications, adjust the test settings accordingly?

@vpratz vpratz added the discussion Discuss a topic or question not necessarily with a clear output in mind. label Nov 21, 2024
@vpratz vpratz added the v2 label Nov 21, 2024
@stefanradev93
Copy link
Contributor

That was also my experience, so I picked the settings that stabilized training on the regression and two moons examples (basically the old settings from 1.5x).

However, using my "stable" settings, @elseml got NaNs in his experiments. It seems that stability is currently a bit dependent on the test case, which needs to be solved.

My last suspicion is in the residual net, which we preciously didn't use.

@stefanradev93
Copy link
Contributor

@vpratz @elseml
Could you test the clamping in: eb5c446?
I also fixed the previously wrong position of the actnorm, so the arcsin^-1 may work. However, testing just the loss is not enough. The regression test bed was sometimes producing NaNs in the sampling results.

@vpratz
Copy link
Collaborator Author

vpratz commented Nov 21, 2024

Thanks for the additional information. I just reran pytest tests/test_two_moons a few times on eb5c446 and now it passes for me locally. When I ran it earlier it failed, producing a value that was too high.

The problem with the NaNs seems to persist, see #254.

What would be your preferred configuration until we have figured this out? Using the code from eb5c446?

@stefanradev93
Copy link
Contributor

I am on it.

@stefanradev93
Copy link
Contributor

I have a setting that produced no NaNs in loss or samples over three runs across:

  • Two moons
  • SIR
  • Regression

I will push my commit, and @elseml and @vpratz can decide whether to keep it or not upon further testing.

@stefanradev93
Copy link
Contributor

Let me know if you still observe some problems after my recent push.

@vpratz
Copy link
Collaborator Author

vpratz commented Nov 21, 2024

Thanks a lot! The tests look stable now. @elseml If you don't encounter problems, I think we can close this issue

@elseml
Copy link
Member

elseml commented Nov 21, 2024

Quick timeline of my experiences (note that my setup is more prone to NaNs in general):

  • Commit aeee627 introduces major instabilities (~ NaN in 60% of runs)
  • Reverting its changes to affine_transform.py in commit eb5c446 fixed the issue
  • The changes to affine_transform.py in commit 7906c1e led to NaNs in ~10% of runs (quite hard to test due to their rare occurence) and an extremly high loss at the beginning of the training (larger by factor ~4000 compared to before) but not later on
  • I encountered no NaNs after commit 3878625 yet (cannot be ruled out due to their rare occurence, I will update when it changes) and it fixed the exploding loss at the beginning of the training

The issue can thus be closed for now from my side - we will have to see in the long run if the occasional instabilities in #254 were sufficiently addressed.

@vpratz
Copy link
Collaborator Author

vpratz commented Nov 21, 2024

Thanks, let's keep an eye on this an reopen the issue when needed.

@vpratz vpratz closed this as completed Nov 21, 2024
@github-project-automation github-project-automation bot moved this from Future to Done in bayesflow development Nov 21, 2024
@paul-buerkner
Copy link
Contributor

Great! Thank you all for helping to fix this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discuss a topic or question not necessarily with a clear output in mind.
Projects
Status: Done
Development

No branches or pull requests

4 participants