-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
possible bug with HERReplayBuffer under pyrtorch #1740
Comments
Great work! Wow, this is a tough one. Do you know if this behavior persists across several random seeds? DDPG can be a very unstable algorithm, and it's possible you're seeing some seed-dependent instability. By the way, it's easy to share experiments from garage using tensorboard.dev. See the docs for full details. Other ideas:
|
@ryanjulian This behavior persists under the default seed. But I didn't test it with different seeds. I can try to test possible cases when I have more time. |
@st2yang Would you share your experiment settings/ essential code snippets? That will help us to reproduce the issues and fix the potential bugs. |
@irisliucy Sorry for the late reply. Can you try adapting examples/tf/her_ddpg_fetchreach.py to run with pytorch? I just changed some tf-related to be pytorch things, and training got worse. I used very similar parameters when debugging. If the bug persists, you should see something similar. |
Hi,
I managed to run examples/tf/her_ddpg_fetchreach.py with tuned parameters (PR #1739 ). Then I try using pytorch, and something strange happens. It was working fine at first then crashes dramatically. The comparison between tf and torch.
The main modification is tf.policies.ContinuousMLPPolicy, tf.q_functions.ContinuousMLPQFunction -> torch.policies.DeterministicMLPPolicy, torch.q_functions.ContinuousMLPQFunction, while other parameters are kept the same.
The text was updated successfully, but these errors were encountered: