A Python Library for Usefulness Simulations of Machine Learning Models
Corresponding paper: APLUS - Journal of Biomedical Informatics
Citation:
@article{wornow2023aplus,
title={APLUS: A Python Library for Usefulness Simulations of Machine Learning Models in Healthcare},
author={Wornow, Michael and Ross, Elsie Gyang and Callahan, Alison and Shah, Nigam H},
journal={Journal of Biomedical Informatics},
pages={104319},
year={2023},
publisher={Elsevier}
}
- Run the following commands to install APLUS ML:
pip install aplusml
- Install graphviz by downloading it here. If you're on Mac with
homebrew
, simply run:
brew install graphviz
Run tutorials/synthetic_pad.ipynb
to try an example notebook which works out-of-the-box.
This simulates a utility analysis of PAD referral pathways for synthetic PAD patients.
APLUS ML is a simulation framework for conducting usefulness assessments of machine learning models in workflows.
It aims to quantitatively answer the question: If I use this ML model within this workflow, will the benefits outweigh the costs, and by how much?
APLUS was originally developed for clinical workflows in healthcare settings, thus all of our examples are healthcare workflows. However, APLUS ML is a broadly applicable library to any workflow that involves a machine learning model making decisions on a stream of datapoints, and we encourage contributors from any domain to use and extend APLUS ML.
We showcase APLUS on two clinical workflows:
- Early detection of peripheral artery disease (PAD)
- Triaging patients for advanced care planning (ACP) consults
Jupyter notebooks for these use cases can be found in the tutorials/
folder.
The code used to generate the figures in our paper is located in the tutorials/
directory in pad.ipynb
. This notebook loads de-identified patient data from Stanford Hospital, which can be provided upon request.
The workflows analyzed can be found in the workflows/
folder. The doctor-driven workflow is in pad_doctor.yaml
while the nurse-driven workflow is in pad_nurse.yaml
This tutorials/pad.ipynb
was used to generate the following figures from the APLUS paper:
The code used to replicate the findings of Jung et al. 2021 can be found in the tutorials/
directory in acp_jung_replication.ipynb
. This notebook loads de-identified patient data from Stanford Hospital, which can be provided upon request.
The workflows analyzed can be found in the workflows/
folder in acp_jung_replication.yaml
Some additional example plots that can be generated by APLUS are included below:
Supporting files:
tutorials/
- Contains Jupyter notebooks that demonstrate how to use APLUSpad.ipynb
- Demonstrates how to use APLUS to simulate the novel PAD workflow described in the paperpad.py
- Helper functions for PAD-specific workflow analysis
acp_jung_replication.ipynb
- Demonstrates how to use APLUS to replicate the plots of Jung et al. 2021
workflows/
- Contains YAML files that define the workflows analyzed in the paperpad_doctor.yaml
- The doctor-driven PAD workflowpad_nurse.yaml
- The nurse-driven PAD workflowacp_jung_replication.yaml
- The exact same ACP workflow analyzed in Jung et al. 2021
tests/
- Contains unit tests for the APLUS frameworkrun_tests.py
- Script to run all unit teststest_*.py
- Tests for each moduletest*.yaml
- Workflow YAML files for each corresponding testutils.py
- Utility functions for testing
input/
- Contains input data fed into the simulationoutput/
- Contains output data from the simulations (this is useful for caching results so you don't have to re-run time-consuming simulations)
Higher-level funcs to load / run simulations:
aplusml.load_simulation(path_to_config_yaml: str, path_to_patient_properties_csv: str) -> sim.Simulation
— loads config YAML and CSV containing patients, returns a Simulation objectaplusml.run_test(simulation: sim.Simulation, all_patients: List[sim.Patient], labels: List[str], keys2values: List[Dict]) -> pd.DataFrame
— runs a set of simulations with different settingslabels
= Name for each simulation settingkeys2values
= Variables in simulation to overwrite for each setting
Methods for simulations:
simulation.run(patients: List[sim.Patient]
-- runs patients through simulationsimulation.draw_workflow_diagram(figsize)
— outputs Graphviz representation of workflow in simulation
# Download repo
git clone https://github.com/som-shahlab/aplus.git
cd aplus
# Create environment
conda create -n aplus python=3.10 -y
conda activate aplus
pip install poetry && poetry install
The file tests/run_tests.py
runs all of the test[d].py
files in the tests/
directory. Each test[d].py
file has a corresponding test[d].yaml
file that serves as its input.
To run tests:
cd tests
python3 run_tests.py
We use Sphinx to build the documentation, and host it on Read the Docs.
To build the docs, run:
# View server
sphinx-autobuild docs/source docs/build/html
# Build for dist
make html