PyTorch Code Implementation for AlgaeDICE as described in the paper:
`AlgaeDICE: Policy Gradient from Arbitrary Experience' by Ofir Nachum, Bo Dai, Ilya Kostrikov, Yinlam Chow, Lihong Li, and Dale Schuurmans.
Paper available on arXiv here.
Original code implementation in Tensorflow is here
You can site the code base:
author = {Arnob, SY},
title = {PyTorch Implementations of DICE Algorithms},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{}},
Run AlgaeDICE on HalfCheetah:
python -m algae_dice.train_eval --logtostderr --save_dir=$HOME/algae/ \
--env_name=HalfCheetah-v2 --seed=42
- Doubel-Q learning and Mixed critic update is important for training algeaDICE
- Unlike original implementation, there's no separate buffer to store initial states, here we can consider each state as initial state to the agent. Similar assumption is made in [here] (