Machine Learning Experiment Framwork

This framework attempts to streamline the machine learning research process to enable

zero effort reproducability
a single source of truth for the definition of which arguments are allowed for which dataset, ML-model etc.
IntelliSense for available arguments for a specific experiment, model, dataset etc. including type validation through Pydantic
code reusability (metrics tracking, training loop, model checkpointing, checkpoint loading, wandb logging, early stopping, ... are available per default for all experiments)
collaboration through a proper software architecture instead of copy paste experimenting while still allowing for quick "out of framework" scripting (wild-west), optionally reusing Dataset and Deep Learning Model modules
zero effort logging and history plotting

The assumptions to achieve this are that all experiment share some basic logic. They all have a

ML model (here a pytorch Module with an additional interface)
Dataset
train loop with a specified number of epochs
an optimizer
a scheduler

The idea is that all of these previously listed concepts follow an extendable base interface for the corresponding concept so that the framework can work with them (e.g. BaseModel, BaseDataset, ..). Each of these Modules also has their own Pydantic Model, which specifies its variables. In the specific experiment using the modules, the Pydantic Models are sticked together and a Vanilla Python ArgParser is constructed automatically from the Pydantic model. For arguments that do not change across experiments such as API keys etc., a yaml config is used. For demonstration purposes, an experiment for MNIST (handwritten digit classification) is implemented.

Getting started

Fork this repository
Create conda environment from environment.yaml: conda env create --file environment.yaml
Run MNIST experiment via e.g.python run.py --experiment_id=mnist --use_cuda=false --hidden_sizes="[64]", you'll be prompted to fill out a config YAML to specify directories for cache files and the experiment results. Per default, they will be put in cache and results directories within the working directory. If you want to keep it that way, just jump to step 4 . Within the config.yaml there will also be WandB attributes which you do not have to change unless you're running experiments with --use_wandb=true
Run the command again, training should run and you should see the experiment results in the specified folder: results are automatically grouped by experiment and you can optionally specify a subdir via --results_subdir_name=[NAME]:

example history.png:

Adding a new experiment

Either choose a reference experiment, i.e. mnist_experiment.py and copy it to src/experiments/[new_experiment_filename].py OR create a new file and implement all abstract methods of the Base Experiment interface within a new BaseExperiment subclass.
Register your experiment in the Experiment Registry file
Run experiment via python run.py --experiment_id=[EXPERIMENT_ID] [OTHER_ARGUMENTS...]

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.vscode		.vscode
src		src
.gitignore		.gitignore
README.md		README.md
environment.yaml		environment.yaml
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Machine Learning Experiment Framwork

Getting started

Adding a new experiment

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

tfiedlerdev/ml-experiment-framework

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Experiment Framwork

Getting started

Adding a new experiment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages