This is the source code accompanying the paper Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces by Brahma S. Pavse, Matthew Zurek, Yudong Chen, Qiaomin Xie, Josiah P. Hanna.
conda env create -f environment.yml
Generic command:
python run_single_continual.py --outfile <result_file> --env_name <queue/nmodel> --mdp_num <0/1/2> --deployed_interaction_steps 5_000_000 --exp_name <exp_name> --reward_function <opt/stab> --seed 0 --truncated_horizon 200 --algo_name <algo_name> --lr 3e-4 --state_transformation <state_trans> --lyp_power <p> --adam_beta 0.9
where,
exp_name
can be anythingreward_function
is eitheropt
for optimal only orstab
for optimal + stabilityalgo_name
is either MW, PPO, or STOP-suffix where suffix can be anything to uniquely identify the algorithm run based state transformation and lyp power. Example: STOP-SL-2, denotes STOP with symloge and p = 2state_transformation
is eitherid, sigmoid, symsqrt, symloge
lyp_power
is any floating number (p from the paper)
Example command:
python run_single_continual.py --outfile result_file --env_name queue --mdp_num 2 --deployed_interaction_steps 5_000_000 --exp_name test --reward_function stab --seed 0 --truncated_horizon 200 --algo_name STOP-3 --lr 3e-4 --state_transformation sigmoid --lyp_power 3 --adam_beta 0.9
If you found any part of this code useful, please consider citing our paper:
@inproceedings{
pavse2024unbounded,
title={Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces},
author={Brahma S. Pavse, Matthew Zurek, Yudong Chen, Qiaomin Xie, Josiah P. Hanna},
booktitle={Forty-first International Conference on Machine Learning},
year={2024},
url={https://openreview.net/forum?id=64fdhmogiD}
}
If you have any questions, please feel free to email: [email protected]!