pyloa is a research repository for analyzing the performance of classic on-line algorithms vs. modern Machine
Learning, specifically Reinforcement Learning, approaches. PyLoa ships with an implementation of two commonly known
on-line problems as environments:
(k,n)-paging-problemwith acache_size kandn pagesfor a sequence of page-requests(k,n)-coloring-problemwithk colorsfor a graph withn vertices
PyLoa allows for agents to be
- trained on such
enviroments(problem definitions) that require on-line solutions, - evaluated against commonly used heuristics or any state-of-the-art algorithm,
- exploited (extrapolation of a potentially worst case problem instances) to determine a solution's competitve ratio.
pyloa is developed for Python 3.5+ and has the following package dependencies:
matplotlib==3.0.3
scipy==1.2.1
tensorflow==1.13.1
tqdm==4.31.1
numpy==1.16.2We recommend using pyloa within a virtual environment:
mkdir myproject
cd myproject
python3 -m venv virtualenv/
source virtualenv/bin/activate
Update pip and setuptools before continuing:
pip install --upgrade pip setuptools
Afterwards you can install pyloa either from its latest PyPi stable release
pip install pyloa
or from its latest development release on GitHub
pip install git+https://github.com/pyloa/PyLoa.git
pyloa can be used in three different ways to analyze an on-line problem; each depicted via a so called runmode
(train, eval, gen). Any runemode can be invoked via its positional argument and requires a python-configuration-file.
pyloa {train,gen,eval} --config path/to/hyperparams.py
hyperparams depicts the setting of the experiment at hand; it must hold a dictionary named params, which moreover must
contain dictionaries for the keys instance, environment and agent.
params["ìnstance"]: Must define a configuration of a subclass implementation ofpyloa.instance.InstanceGenerator, which generates problem instances for the domain. As an example, for the(k,n)-paging-problema simple generator could randomly generate a sequence of requests of lengthsequence_size, whereas each request is within [1, n].params["agent"]: Must define a configuration of a subclass implementation ofpyloa.agent.Agent, which observes a statesof its environment, acts with actionaaccordingly, receives rewardrand observes transitioned states'. For toy problem instances a simple Q-learning table implementation would suffice.params["environment"]: Must define a configuration of a subclass implementation ofpyloa.environment.Environment, which consumes a problem instance and let's the agent play until it terminates. Anenvironmentconstitutes as a problem definition.
A minimal example for learning the (5,6)-paging-problem with a QTableAgent on a
PagingEnvironment can be invoked with
pyloa train --config hyperparams.py
and the hyperparams.py as following:
from pyloa.instance import RandomSequenceGenerator
from pyloa.environment import DefaultPagingEnvironment
from pyloa.agent import QTableAgent
# vars
sequence_size = 1000
max_page = 6
min_page = 1
episodes = 250
# hyperparams
params = {
'checkpoint_step': episodes//10,
'instance': {
'type': RandomSequenceGenerator,
'sequence_size': sequence_size,
'sequence_number': episodes,
'min_page': min_page,
'max_page': max_page,
},
'environment': {
'type': DefaultPagingEnvironment,
'sequence_size': sequence_size,
'cache_size': 5,
'num_pages': max_page - min_page + 1,
},
'agent': {
'type': QTableAgent,
'discount_factor': 0.55,
'learning_rate': 0.001,
'epsilon': 0.0,
'epsilon_delta': 13 / (episodes * 10),
'epsilon_max': 0.99,
'save_file': "/home/me/models/",
},
}This example is defined in examples/0_train_qtable_paging/hyperparams.py and can be run with
pyloa train --config examples/0_train_qtable_paging/hyperparams.py
The resulting run can be seen via TensorBoard (image):
tensorboard --logdir examples/0_train_qtable_paging/
In total there are five toy examples, which can be run on any system, defined in the examples directory.
PyLoa has three different runmodes: train, eval and gen. There are slight adaptions to be made for the configuration file
depending on the selected runmode; we encourage checking the examples for reference (on a site note: hyperparams are loaded and validated
in pyloa.utils.load). Semantically the three different runmodes stand for:
- train: An
RLAgentwill be trained forepisode-many instances, generated by anInstanceGenerator, on hisenvironment. Everycheckpoint_step-many instances a checkpoint ofRLAgentwill be saved. - eval: All trained
RLagentsnested withinroot_dirwill be evaluated onepisode-many instances, generated by anInstanceGenerator. Additionally non-trainable agents may be defined and evaluated alongside. - gen: Currently only applicable for the
(k,n)-paging-problem. A genetic algorithm empirically determines aPagingAgent's (approximate) competitive ratio.
Each runmode, if not specified otherwise, will create TFEvent-files for TensorBoard in its experiment's output directory.