Skip to content

Files

Latest commit

e95e865 · Sep 24, 2020

History

History

AlgaeDICE

PyTorch Code Implementation for AlgaeDICE as described in the paper:

  • `AlgaeDICE: Policy Gradient from Arbitrary Experience' by Ofir Nachum, Bo Dai, Ilya Kostrikov, Yinlam Chow, Lihong Li, and Dale Schuurmans.

  • Paper available on arXiv here.

  • Original code implementation in Tensorflow is here

You can site the code base:

@misc{pytorchrl,
  author = {Arnob, SY},
  title = {PyTorch Implementations of DICE Algorithms},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/SaminYeasar/PyTorch-implementation-DICE-algorithms}},
}

Basic Commands

Run AlgaeDICE on HalfCheetah:

python -m algae_dice.train_eval --logtostderr --save_dir=$HOME/algae/ \
    --env_name=HalfCheetah-v2 --seed=42

Important tricks

  • Doubel-Q learning and Mixed critic update is important for training algeaDICE
  • Unlike original implementation, there's no separate buffer to store initial states, here we can consider each state as initial state to the agent. Similar assumption is made in [here] (https://arxiv.org/abs/1912.05032)

Performance comparison with the original implementation

  • Performance in compared on seed (0-5) with std 1 over 100k timesteps. (Often paper plot with 75% of the variance)