PyTorch Code Implementation for AlgaeDICE as described in the paper:
-
`AlgaeDICE: Policy Gradient from Arbitrary Experience' by Ofir Nachum, Bo Dai, Ilya Kostrikov, Yinlam Chow, Lihong Li, and Dale Schuurmans.
-
Paper available on arXiv here.
-
Original code implementation in Tensorflow is here
You can site the code base:
@misc{pytorchrl,
author = {Arnob, SY},
title = {PyTorch Implementations of DICE Algorithms},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/SaminYeasar/PyTorch-implementation-DICE-algorithms}},
}
Run AlgaeDICE on HalfCheetah:
python -m algae_dice.train_eval --logtostderr --save_dir=$HOME/algae/ \
--env_name=HalfCheetah-v2 --seed=42
- Doubel-Q learning and Mixed critic update is important for training algeaDICE
- Unlike original implementation, there's no separate buffer to store initial states, here we can consider each state as initial state to the agent. Similar assumption is made in [here] (https://arxiv.org/abs/1912.05032)