Swinging Up Acrobot with n-Step Q-Learning

The Acrobot is a robotic arm with two links vertically suspended against gravity. It is an underactuated robot, and we can only exert torque on its elbow. Our goal is to raise its last link above a specified height indicated by a horizontal line. To fulfill this objective, we can use the n-step Q-learning algorithm, one of the family of TD(n) algorithms. TD(n) is a multi-step extension of TD learning (e.g., Q-learning). In the context of the Acrobot, the n-step Q-learning algorithm learns to select optimal actions (applying torque at the elbow) based on the current state (joint angles and velocities) and the expected future rewards. We could design the reward function to provide positive rewards for reaching the target height and penalties for inefficient movements or exceeding time limits. TD(n) uses the rewards collected over the next n steps plus the discounted Q-value at the n-th step instead of updating the Q-value based on just the immediate reward and the next state’s Q-value (as in TD(0) or the standard TD learning). This multi-step approach allows for better credit assignment over longer horizons, potentially speeding up learning.

Experiment

A simple experiment has been conducted showcasing Acrobot's movement under the n-step Q-learning policy. It is provided in this notebook.

Result

Reward Curve

A very noisy reward curve in the course of 10201 episodes of Acrobot's training session.

Qualitative Result

The GIF below displays the movement of Arcbot following the learned policy of the n-step Q-learning.

Acrobot's main challenge: set the last link above the horizontal threshold line as quickly as possible.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
README.md		README.md
Swinging_Up_Acrobot_with_n_Step_Q_Learning.ipynb		Swinging_Up_Acrobot_with_n_Step_Q_Learning.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Swinging Up Acrobot with n-Step Q-Learning

Experiment

Result

Reward Curve

Qualitative Result

Credit

About

Releases

Packages

Languages

reshalfahsi/swinging-up-acrobot

Folders and files

Latest commit

History

Repository files navigation

Swinging Up Acrobot with n-Step Q-Learning

Experiment

Result

Reward Curve

Qualitative Result

Credit

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages