Skip to content

APLUS ML = A Python Library for Usefulness Simulations of Machine Learning models

License

Notifications You must be signed in to change notification settings

som-shahlab/aplusml

Repository files navigation

APLUS ML

A Python Library for Usefulness Simulations of Machine Learning Models


Graphical Abstract

Corresponding paper: APLUS - Journal of Biomedical Informatics

Citation:

@article{wornow2023aplus,
  title={APLUS: A Python Library for Usefulness Simulations of Machine Learning Models in Healthcare},
  author={Wornow, Michael and Ross, Elsie Gyang and Callahan, Alison and Shah, Nigam H},
  journal={Journal of Biomedical Informatics},
  pages={104319},
  year={2023},
  publisher={Elsevier}
}

Installation

  1. Run the following commands to install APLUS ML:
pip install aplusml
  1. Install graphviz by downloading it here. If you're on Mac with homebrew, simply run:
brew install graphviz

Usage

Run tutorials/synthetic_pad.ipynb to try an example notebook which works out-of-the-box.

This simulates a utility analysis of PAD referral pathways for synthetic PAD patients.

Motivation

APLUS ML is a simulation framework for conducting usefulness assessments of machine learning models in workflows.

It aims to quantitatively answer the question: If I use this ML model within this workflow, will the benefits outweigh the costs, and by how much?

APLUS was originally developed for clinical workflows in healthcare settings, thus all of our examples are healthcare workflows. However, APLUS ML is a broadly applicable library to any workflow that involves a machine learning model making decisions on a stream of datapoints, and we encourage contributors from any domain to use and extend APLUS ML.

Tutorials

We showcase APLUS on two clinical workflows:

  1. Early detection of peripheral artery disease (PAD)
  2. Triaging patients for advanced care planning (ACP) consults

Jupyter notebooks for these use cases can be found in the tutorials/ folder.

Early Detection of PAD

The code used to generate the figures in our paper is located in the tutorials/ directory in pad.ipynb. This notebook loads de-identified patient data from Stanford Hospital, which can be provided upon request.

The workflows analyzed can be found in the workflows/ folder. The doctor-driven workflow is in pad_doctor.yaml while the nurse-driven workflow is in pad_nurse.yaml

This tutorials/pad.ipynb was used to generate the following figures from the APLUS paper:

PAD Figure 1

PAD Figure 2

Triaging Patients for ACP Consults

The code used to replicate the findings of Jung et al. 2021 can be found in the tutorials/ directory in acp_jung_replication.ipynb. This notebook loads de-identified patient data from Stanford Hospital, which can be provided upon request.

The workflows analyzed can be found in the workflows/ folder in acp_jung_replication.yaml

ACP Figure

Plot Gallery

Some additional example plots that can be generated by APLUS are included below:

Additional Plots

API

Supporting files:

  • tutorials/ - Contains Jupyter notebooks that demonstrate how to use APLUS
    • pad.ipynb - Demonstrates how to use APLUS to simulate the novel PAD workflow described in the paper
      • pad.py - Helper functions for PAD-specific workflow analysis
    • acp_jung_replication.ipynb - Demonstrates how to use APLUS to replicate the plots of Jung et al. 2021
  • workflows/ - Contains YAML files that define the workflows analyzed in the paper
    • pad_doctor.yaml - The doctor-driven PAD workflow
    • pad_nurse.yaml - The nurse-driven PAD workflow
    • acp_jung_replication.yaml - The exact same ACP workflow analyzed in Jung et al. 2021
  • tests/ - Contains unit tests for the APLUS framework
    • run_tests.py - Script to run all unit tests
    • test_*.py - Tests for each module
    • test*.yaml - Workflow YAML files for each corresponding test
    • utils.py - Utility functions for testing
  • input/ - Contains input data fed into the simulation
  • output/ - Contains output data from the simulations (this is useful for caching results so you don't have to re-run time-consuming simulations)

Higher-level funcs to load / run simulations:

  • aplusml.load_simulation(path_to_config_yaml: str, path_to_patient_properties_csv: str) -> sim.Simulation — loads config YAML and CSV containing patients, returns a Simulation object
  • aplusml.run_test(simulation: sim.Simulation, all_patients: List[sim.Patient], labels: List[str], keys2values: List[Dict]) -> pd.DataFrame — runs a set of simulations with different settings
    • labels = Name for each simulation setting
    • keys2values = Variables in simulation to overwrite for each setting

Methods for simulations:

  • simulation.run(patients: List[sim.Patient] -- runs patients through simulation
  • simulation.draw_workflow_diagram(figsize) — outputs Graphviz representation of workflow in simulation

Development

Installation

# Download repo
git clone https://github.com/som-shahlab/aplus.git
cd aplus

# Create environment
conda create -n aplus python=3.10 -y
conda activate aplus
pip install poetry && poetry install

Tests

The file tests/run_tests.py runs all of the test[d].py files in the tests/ directory. Each test[d].py file has a corresponding test[d].yaml file that serves as its input.

To run tests:

cd tests
python3 run_tests.py

Documentation

We use Sphinx to build the documentation, and host it on Read the Docs.

To build the docs, run:

# View server
sphinx-autobuild docs/source docs/build/html

# Build for dist
make html

About

APLUS ML = A Python Library for Usefulness Simulations of Machine Learning models

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages