Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible bug with HERReplayBuffer under pyrtorch #1740

Open
st2yang opened this issue Jul 9, 2020 · 5 comments
Open

possible bug with HERReplayBuffer under pyrtorch #1740

st2yang opened this issue Jul 9, 2020 · 5 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@st2yang
Copy link
Contributor

st2yang commented Jul 9, 2020

Hi,

I managed to run examples/tf/her_ddpg_fetchreach.py with tuned parameters (PR #1739 ). Then I try using pytorch, and something strange happens. It was working fine at first then crashes dramatically. The comparison between tf and torch.

The main modification is tf.policies.ContinuousMLPPolicy, tf.q_functions.ContinuousMLPQFunction -> torch.policies.DeterministicMLPPolicy, torch.q_functions.ContinuousMLPQFunction, while other parameters are kept the same.

@ryanjulian ryanjulian added the bug Something isn't working label Jul 9, 2020
@ryanjulian
Copy link
Member

Great work!

Wow, this is a tough one. Do you know if this behavior persists across several random seeds?

DDPG can be a very unstable algorithm, and it's possible you're seeing some seed-dependent instability.

By the way, it's easy to share experiments from garage using tensorboard.dev. See the docs for full details.

Other ideas:

  • Turning down the learning rate
  • Perhaps the replay buffer becomes over-populated with positive (high-reward) examples and doesn't have negative (low-reward) examples? You could check this by logging the mean/median/min/max (or a histogram) of the rewards in the batches the algorithm samples from the replay buffer. The docs cover how to do that.
  • Collecting negatives even when the agent is succeeding is one purpose for the exploration_policy. Does it improve if you increase sigma in the exploration policy, or use a different exploration policy?

@ryanjulian
Copy link
Member

@krzentner @maliesa96

@st2yang
Copy link
Contributor Author

st2yang commented Jul 9, 2020

@ryanjulian This behavior persists under the default seed. But I didn't test it with different seeds. I can try to test possible cases when I have more time.

@ryanjulian ryanjulian added this to the v2020.09rc2 milestone Jul 9, 2020
@ryanjulian ryanjulian assigned maliesa96 and unassigned maliesa96 Jul 20, 2020
@irisliucy irisliucy self-assigned this Jul 27, 2020
@irisliucy
Copy link
Contributor

irisliucy commented Jul 31, 2020

@st2yang Would you share your experiment settings/ essential code snippets? That will help us to reproduce the issues and fix the potential bugs.

@ryanjulian ryanjulian modified the milestones: v2020.09rc2, v2020.09rc3 Aug 3, 2020
@irisliucy irisliucy assigned irisliucy and unassigned irisliucy Aug 10, 2020
@st2yang
Copy link
Contributor Author

st2yang commented Aug 12, 2020

@irisliucy Sorry for the late reply. Can you try adapting examples/tf/her_ddpg_fetchreach.py to run with pytorch? I just changed some tf-related to be pytorch things, and training got worse. I used very similar parameters when debugging.

If the bug persists, you should see something similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants