APLUS ML

A Python Library for Usefulness Simulations of Machine Learning Models

Corresponding paper: APLUS - Journal of Biomedical Informatics

Citation:

@article{wornow2023aplus,
  title={APLUS: A Python Library for Usefulness Simulations of Machine Learning Models in Healthcare},
  author={Wornow, Michael and Ross, Elsie Gyang and Callahan, Alison and Shah, Nigam H},
  journal={Journal of Biomedical Informatics},
  pages={104319},
  year={2023},
  publisher={Elsevier}
}

Installation

Run the following commands to install APLUS ML:

pip install aplusml

Install graphviz by downloading it here. If you're on Mac with homebrew, simply run:

brew install graphviz

Usage

Run tutorials/synthetic_pad.ipynb to try an example notebook which works out-of-the-box.

This simulates a utility analysis of PAD referral pathways for synthetic PAD patients.

Motivation

APLUS ML is a simulation framework for conducting usefulness assessments of machine learning models in workflows.

It aims to quantitatively answer the question: If I use this ML model within this workflow, will the benefits outweigh the costs, and by how much?

APLUS was originally developed for clinical workflows in healthcare settings, thus all of our examples are healthcare workflows. However, APLUS ML is a broadly applicable library to any workflow that involves a machine learning model making decisions on a stream of datapoints, and we encourage contributors from any domain to use and extend APLUS ML.

Tutorials

We showcase APLUS on two clinical workflows:

Early detection of peripheral artery disease (PAD)
Triaging patients for advanced care planning (ACP) consults

Jupyter notebooks for these use cases can be found in the tutorials/ folder.

Early Detection of PAD

The code used to generate the figures in our paper is located in the tutorials/ directory in pad.ipynb. This notebook loads de-identified patient data from Stanford Hospital, which can be provided upon request.

The workflows analyzed can be found in the workflows/ folder. The doctor-driven workflow is in pad_doctor.yaml while the nurse-driven workflow is in pad_nurse.yaml

This tutorials/pad.ipynb was used to generate the following figures from the APLUS paper:

Triaging Patients for ACP Consults

The code used to replicate the findings of Jung et al. 2021 can be found in the tutorials/ directory in acp_jung_replication.ipynb. This notebook loads de-identified patient data from Stanford Hospital, which can be provided upon request.

The workflows analyzed can be found in the workflows/ folder in acp_jung_replication.yaml

Plot Gallery

Some additional example plots that can be generated by APLUS are included below:

API

Supporting files:

tutorials/ - Contains Jupyter notebooks that demonstrate how to use APLUS
- pad.ipynb - Demonstrates how to use APLUS to simulate the novel PAD workflow described in the paper
  - pad.py - Helper functions for PAD-specific workflow analysis
- acp_jung_replication.ipynb - Demonstrates how to use APLUS to replicate the plots of Jung et al. 2021
workflows/ - Contains YAML files that define the workflows analyzed in the paper
- pad_doctor.yaml - The doctor-driven PAD workflow
- pad_nurse.yaml - The nurse-driven PAD workflow
- acp_jung_replication.yaml - The exact same ACP workflow analyzed in Jung et al. 2021
tests/ - Contains unit tests for the APLUS framework
- run_tests.py - Script to run all unit tests
- test_*.py - Tests for each module
- test*.yaml - Workflow YAML files for each corresponding test
- utils.py - Utility functions for testing
input/ - Contains input data fed into the simulation
output/ - Contains output data from the simulations (this is useful for caching results so you don't have to re-run time-consuming simulations)

Higher-level funcs to load / run simulations:

aplusml.load_simulation(path_to_config_yaml: str, path_to_patient_properties_csv: str) -> sim.Simulation — loads config YAML and CSV containing patients, returns a Simulation object
aplusml.run_test(simulation: sim.Simulation, all_patients: List[sim.Patient], labels: List[str], keys2values: List[Dict]) -> pd.DataFrame — runs a set of simulations with different settings
- labels = Name for each simulation setting
- keys2values = Variables in simulation to overwrite for each setting

Methods for simulations:

simulation.run(patients: List[sim.Patient] -- runs patients through simulation
simulation.draw_workflow_diagram(figsize) — outputs Graphviz representation of workflow in simulation

Development

Installation

# Download repo
git clone https://github.com/som-shahlab/aplus.git
cd aplus

# Create environment
conda create -n aplus python=3.10 -y
conda activate aplus
pip install poetry && poetry install

Tests

The file tests/run_tests.py runs all of the test[d].py files in the tests/ directory. Each test[d].py file has a corresponding test[d].yaml file that serves as its input.

To run tests:

cd tests
python3 run_tests.py

Documentation

We use Sphinx to build the documentation, and host it on Read the Docs.

To build the docs, run:

# View server
sphinx-autobuild docs/source docs/build/html

# Build for dist
make html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

APLUS ML

Installation

Usage

Motivation

Tutorials

Early Detection of PAD

Triaging Patients for ACP Consults

Plot Gallery

API

Development

Installation

Tests

Documentation

Files

README.md

Latest commit

History

README.md

File metadata and controls

APLUS ML

Installation

Usage

Motivation

Tutorials

Early Detection of PAD

Triaging Patients for ACP Consults

Plot Gallery

API

Development

Installation

Tests

Documentation