possible bug with HERReplayBuffer under pyrtorch #1740

st2yang · 2020-07-09T02:13:26Z

Hi,

I managed to run examples/tf/her_ddpg_fetchreach.py with tuned parameters (PR #1739 ). Then I try using pytorch, and something strange happens. It was working fine at first then crashes dramatically. The comparison between tf and torch.

The main modification is tf.policies.ContinuousMLPPolicy, tf.q_functions.ContinuousMLPQFunction -> torch.policies.DeterministicMLPPolicy, torch.q_functions.ContinuousMLPQFunction, while other parameters are kept the same.

ryanjulian · 2020-07-09T04:19:11Z

Great work!

Wow, this is a tough one. Do you know if this behavior persists across several random seeds?

DDPG can be a very unstable algorithm, and it's possible you're seeing some seed-dependent instability.

By the way, it's easy to share experiments from garage using tensorboard.dev. See the docs for full details.

Other ideas:

Turning down the learning rate
Perhaps the replay buffer becomes over-populated with positive (high-reward) examples and doesn't have negative (low-reward) examples? You could check this by logging the mean/median/min/max (or a histogram) of the rewards in the batches the algorithm samples from the replay buffer. The docs cover how to do that.
Collecting negatives even when the agent is succeeding is one purpose for the exploration_policy. Does it improve if you increase sigma in the exploration policy, or use a different exploration policy?

ryanjulian · 2020-07-09T04:19:22Z

@krzentner @maliesa96

st2yang · 2020-07-09T15:26:11Z

@ryanjulian This behavior persists under the default seed. But I didn't test it with different seeds. I can try to test possible cases when I have more time.

irisliucy · 2020-07-31T23:59:54Z

@st2yang Would you share your experiment settings/ essential code snippets? That will help us to reproduce the issues and fix the potential bugs.

st2yang · 2020-08-12T19:29:10Z

@irisliucy Sorry for the late reply. Can you try adapting examples/tf/her_ddpg_fetchreach.py to run with pytorch? I just changed some tf-related to be pytorch things, and training got worse. I used very similar parameters when debugging.

If the bug persists, you should see something similar.

ryanjulian added the bug Something isn't working label Jul 9, 2020

ryanjulian added this to the v2020.09rc2 milestone Jul 9, 2020

ryanjulian assigned maliesa96 and unassigned maliesa96 Jul 20, 2020

irisliucy self-assigned this Jul 27, 2020

ryanjulian modified the milestones: v2020.09rc2, v2020.09rc3 Aug 3, 2020

irisliucy assigned irisliucy and unassigned irisliucy Aug 10, 2020

ryanjulian modified the milestones: v2020.09rc3, v2020.09rc4 Aug 18, 2020

ryanjulian modified the milestones: v2020.09rc4, v2020.09rc5 Sep 2, 2020

ryanjulian modified the milestones: v2020.09rc5, v2020.09.0, v2020.10.0, v2020.10.0rc6 Sep 9, 2020

ryanjulian modified the milestones: v2020.10.0rc6, v2020.10.0rc7 Oct 6, 2020

ryanjulian modified the milestones: v2020.10.0rc7, v2020.10.0 Oct 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

possible bug with HERReplayBuffer under pyrtorch #1740

possible bug with HERReplayBuffer under pyrtorch #1740

st2yang commented Jul 9, 2020

ryanjulian commented Jul 9, 2020

ryanjulian commented Jul 9, 2020

st2yang commented Jul 9, 2020

irisliucy commented Jul 31, 2020 •

edited

Loading

st2yang commented Aug 12, 2020

possible bug with HERReplayBuffer under pyrtorch #1740

possible bug with HERReplayBuffer under pyrtorch #1740

Comments

st2yang commented Jul 9, 2020

ryanjulian commented Jul 9, 2020

ryanjulian commented Jul 9, 2020

st2yang commented Jul 9, 2020

irisliucy commented Jul 31, 2020 • edited Loading

st2yang commented Aug 12, 2020

irisliucy commented Jul 31, 2020 •

edited

Loading