Skip to content

Latest commit

 

History

History
11 lines (7 loc) · 492 Bytes

README.md

File metadata and controls

11 lines (7 loc) · 492 Bytes

TRPO + GAE

An implementation of Trust Region Policy Optimization (Schulman 2015) with Generalized Advantage Estimation (Schulman 2016). This implementation can handle environments with both discrete and continuous action spaces.

Results

Below are this implementation's results on three different simulated locomotion tasks, each averaged over five runs.

alt-text-1 alt-text-2 alt-text-3