Skip to content

Releases: DLR-RM/rl-baselines3-zoo

v2.6.0: Refactored hyperparameter optimization

24 Mar 15:07
325ef5d
Compare
Choose a tag to compare

Breaking Changes

  • Upgraded to SB3 >= 2.6.0
  • Refactored hyperparameter optimization. The Optuna Journal storage backend is now supported (recommended default) and you can easily load tuned hyperparameter via the new --trial-id argument of train.py.

For example, optimize using the journal storage:

python train.py --algo ppo --env Pendulum-v1 -n 40000 --study-name demo --storage logs/demo.log --sampler tpe --n-evaluations 2 --optimize --no-optim-plots

Visualize live using optuna-dashboard

optuna-dashboard logs/demo.log

Load hyperparameters from trial number 21 and train an agent with it:

python train.py --algo ppo --env Pendulum-v1 --study-name demo --storage logs/demo.log --trial-id 21

New Features

  • Save the exact command line used to launch a training
  • Added support for special vectorized env (e.g. Brax, IsaacSim) by allowing to override the VecEnv class use to instantiate the env in the ExperimentManager
  • Allow to disable auto-logging by passing --log-interval -2 (useful when logging things manually)
  • Added Gymnasium v1.1 support

Bug fixes

  • Fixed use of old HF api in get_hf_trained_models()

Other

  • scripts/parse_study.py is now deprecated because of the new hyperparameter optimization scripts

Full Changelog: v2.5.0...v2.6.0

RL Zoo v2.5.0: NumPy v2.0 support

27 Jan 12:48
2e99bec
Compare
Choose a tag to compare

Breaking Changes

  • Upgraded to Pytorch >= 2.3.0
  • Upgraded to SB3 >= 2.5.0

New Features

  • Added support for Numpy v2
  • Added support for specifying callbacks and env wrapper as python object in python config files (instead of string)

Other

  • Updated Dockerfile

Full Changelog: v2.4.0...v2.5.0

RL-Zoo3 v2.4.0: CrossQ and Gymnasium v1.0 support

18 Nov 10:35
b8ff1a6
Compare
Choose a tag to compare

New algorithm: CrossQ, Gymnasium v1.0 support, and better defaults for SAC/TQC on Swimmer-v4 env

Breaking Changes

  • Updated defaults hyperparameters for TQC/SAC for Swimmer-v4 (decrease gamma for more consistent results) (@JacobHA) W&B report
  • Upgraded to SB3 >= 2.4.0
  • Renamed LunarLander-v2 to LunarLander-v3 in hyperparameters

New Features

  • Added CrossQ hyperparameters for SB3-contrib (@danielpalen)
  • Added Gymnasium v1.0 support

Bug fixes

Documentation

Other

  • Updated PyTorch version to 2.4.1 in the CI
  • Switched to uv to download packages faster on GitHub CI

New Contributors

Full Changelog: v2.3.0...v2.4.0

RL-Zoo3 v2.3.0

31 Mar 19:07
e06914e
Compare
Choose a tag to compare

Breaking Changes

  • Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC
  • Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)
  • Upgraded to SB3 >= 2.3.0

Other

  • Added test dependencies to setup.py (@power-edge)
  • Simplify dependencies of requirements.txt (remove duplicates from setup.py)

Full Changelog: v2.2.1...v2.3.0

RL-Zoo3 v2.2.1

17 Nov 23:39
28dc228
Compare
Choose a tag to compare

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

Breaking Changes

  • Removed gym dependency, the package is still required for some pretrained agents.
  • Upgraded to SB3 >= 2.2.1
  • Upgraded to Huggingface-SB3 >= 3.0
  • Upgraded to pytablewriter >= 1.0

New Features

Bug fixes

  • Upgraded to pybullet_envs_gymnasium>=0.4.0
  • Removed old hacks (for instance limiting offpolicy algorithms to one env at test time)

Documentation

Other

  • Updated docker image, removed support for X server
  • Replaced deprecated optuna.suggest_uniform(...) by optuna.suggest_float(..., low=..., high=...)
  • Switched to ruff for sorting imports
  • Updated tests to use shlex.split()
  • Fixed rl_zoo3/hyperparams_opt.py type hints
  • Fixed rl_zoo3/exp_manager.py type hints

RL-Zoo3 v2.1.0

20 Aug 12:17
7f98df9
Compare
Choose a tag to compare

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

Breaking Changes

  • Dropped python 3.7 support
  • SB3 now requires PyTorch 1.13+
  • Upgraded to SB3 >= 2.1.0
  • Upgraded to Huggingface-SB3 >= 2.3
  • Upgraded to Optuna >= 3.0
  • Upgraded to cloudpickle >= 2.2.1

New Features

  • Added python 3.11 support

Full Changelog: v2.0.0...v2.1.0

RL-Zoo3 v2.0.0: Gymnasium Support

23 Jun 13:00
07f7447
Compare
Choose a tag to compare

Warning
Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023).
We highly recommended you to upgrade to Python >= 3.8.

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Breaking Changes

  • Fixed bug in HistoryWrapper, now returns the correct obs space limits
  • Upgraded to SB3 >= 2.0.0
  • Upgraded to Huggingface-SB3 >= 2.2.5
  • Upgraded to Gym API 0.26+, RL Zoo3 doesn't work anymore with Gym 0.21

New Features

  • Added Gymnasium support
  • Gym 0.26+ patches to continue working with pybullet and TimeLimit wrapper

Bug fixes

  • Renamed CarRacing-v1 to CarRacing-v2 in hyperparameters
  • Huggingface push to hub now accepts a --n-timesteps argument to adjust the length of the video
  • Fixed record_video steps (before it was stepping in a closed env)

Full Changelog: v1.8.0...v2.0.0

RL-Zoo3 v1.8.0 : New Documentation, OpenRL Benchmark, Multi-Env HerReplayBuffer

08 Apr 16:09
483319b
Compare
Choose a tag to compare

Release 1.8.0 (2023-04-07)

We have run a massive and open source benchmark of all algorithms on all environments from the RL Zoo: Open RL Benchmark

New documentation: https://rl-baselines3-zoo.readthedocs.io/en/master/

Warning
Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend.
Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs).
You can find a migration guide here.
If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.

Breaking Changes

  • Upgraded to SB3 >= 1.8.0
  • Upgraded to new HerReplayBuffer implementation that supports multiple envs
  • Removed TimeFeatureWrapper for Panda and Fetch envs, as the new replay buffer should handle timeout.

New Features

  • Tuned hyperparameters for RecurrentPPO on Swimmer
  • Documentation is now built using Sphinx and hosted on read the doc
  • Open RL Benchmark

Bug fixes

  • Set highway-env version to 1.5 and setuptools to v65.5 for the CI
  • Removed use_auth_token for push to hub util
  • Reverted from v3 to v2 for HumanoidStandup, Reacher, InvertedPendulum and InvertedDoublePendulum since they were not part of the mujoco refactoring (see openai/gym#1304)
  • Fixed gym-minigrid policy (from MlpPolicy to MultiInputPolicy)

Documentation

Other

  • Added support for ruff (fast alternative to flake8) in the Makefile
  • Removed Gitlab CI file
  • Replaced deprecated optuna.suggest_loguniform(...) by optuna.suggest_float(..., log=True)
  • Switched to ruff and pyproject.toml
  • Removed online_sampling and max_episode_length argument when using HerReplayBuffer

RL-Zoo3 v1.7.0 : Added support for python config files

10 Jan 22:22
acdfc93
Compare
Choose a tag to compare

Release 1.7.0 (2023-01-10)

SB3 v1.7.0, added support for python config files

We are currently creating an open source benchmark, please read openrlbenchmark/openrlbenchmark#7 if you want to help

Breaking Changes

  • --yaml-file argument was renamed to -conf (--conf-file) as now python file are supported too
  • Upgraded to SB3 >= 1.7.0 (changed net_arch=[dict(pi=.., vf=..)] to net_arch=dict(pi=.., vf=..))

New Features

  • Specifying custom policies in yaml file is now supported (@Rick-v-E)
  • Added monitor_kwargs parameter
  • Handle the env_kwargs of render:True under the hood for panda-gym v1 envs in enjoy replay to match visualzation behavior of other envs
  • Added support for python config file
  • Tuned hyperparameters for PPO on Swimmer
  • Added -tags/--wandb-tags argument to train.py to add tags to the wandb run
  • Added a sb3 version tag to the wandb run

Bug fixes

  • Allow python -m rl_zoo3.cli to be called directly
  • Fixed a bug where custom environments were not found despite passing --gym-package when using subprocesses
  • Fixed TRPO hyperparameters for MinitaurBulletEnv-v0, MinitaurBulletDuckEnv-v0, HumanoidBulletEnv-v0, InvertedDoublePendulumBulletEnv-v0 and InvertedPendulumSwingupBulletEnv

Documentation

Other

  • scripts/plot_train.py plots models such that newer models appear on top of older ones.
  • Added additional type checking using mypy
  • Standardized the use of from gym import spaces

RL-Zoo3 v1.6.2: The RL Zoo is now a package!

03 Oct 16:13
b372e9a
Compare
Choose a tag to compare

Highlights

You can now install the RL Zoo via pip: pip install rl-zoo3 and it has a basic command line interface (rl_zoo3 train|enjoy|plot_train|all_plots) that has the same interface as the scripts (train.py|enjoy.py|...).

You can use the RL Zoo from outside, for instance with the experimental Stable Baselines3 Jax version (SBX).

File: train.py (you can use python train.py --algo sbx_tqc --env Pendulum-v1 afterward)

import rl_zoo3
import rl_zoo3.train
from rl_zoo3.train import train

from sbx import TQC

# Add new algorithm
rl_zoo3.ALGOS["sbx_tqc"] = TQC
rl_zoo3.train.ALGOS = rl_zoo3.ALGOS
rl_zoo3.exp_manager.ALGOS = rl_zoo3.ALGOS

if __name__ == "__main__":
    train()

Breaking Changes

  • RL Zoo is now a python package
  • low pass filter was removed

New Features

  • RL Zoo cli: rl_zoo3 train and rl_zoo3 enjoy