The Acrobot is a robotic arm with two links vertically suspended against gravity. It is an underactuated robot, and we can only exert torque on its elbow. Our goal is to raise its last link above a specified height indicated by a horizontal line. To fulfill this objective, we can use the n-step Q-learning algorithm, one of the family of TD(n) algorithms. TD(n) is a multi-step extension of TD learning (e.g., Q-learning). In the context of the Acrobot, the n-step Q-learning algorithm learns to select optimal actions (applying torque at the elbow) based on the current state (joint angles and velocities) and the expected future rewards. We could design the reward function to provide positive rewards for reaching the target height and penalties for inefficient movements or exceeding time limits. TD(n) uses the rewards collected over the next n steps plus the discounted Q-value at the n-th step instead of updating the Q-value based on just the immediate reward and the next state’s Q-value (as in TD(0) or the standard TD learning). This multi-step approach allows for better credit assignment over longer horizons, potentially speeding up learning.
A simple experiment has been conducted showcasing Acrobot's movement under the n-step Q-learning policy. It is provided in this notebook.
A very noisy reward curve in the course of 10201 episodes of Acrobot's training session.
The GIF below displays the movement of Arcbot following the learned policy of the n-step Q-learning.
Acrobot's main challenge: set the last link above the horizontal threshold line as quickly as possible.
- Acrobots, Cart-Poles, and Quadrotors
- Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
- Acrobot
- Reinforcement Learning: Industrial Applications of Intelligent Agents
- n-step reinforcement learning
- Learning from Delayed Rewards
- Crossbar Adaptive Array: The first connectionist network that solved the delayed reinforcement learning problem
- Asynchronous Methods for Deep Reinforcement Learning
- Incremental Multi-Step Q-Learning
- Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target
- PyTorch Lightning