-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updated README with Nature Communications manuscript link.
Added skeleton code for integrating unit tests. Added additional details to applications README. Misc small code formatting changes
- Loading branch information
1 parent
9dd17fd
commit 9775d41
Showing
15 changed files
with
246 additions
and
1,636 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,30 +1,58 @@ | ||
# Trove | ||
-- | ||
<!--[](https://travis-ci.com/som-shahlab/trove)--> | ||
<!--[](https://trove.readthedocs.io/en/latest/?badge=latest)--> | ||
[](https://trove.readthedocs.io/en/latest/?badge=latest) | ||
[](https://opensource.org/licenses/Apache-2.0) | ||
|
||
Trove is a framework for training weakly supervised (bio)medical named entity recognition (NER) and other entity attribute classifiers without hand-labeled training data. | ||
Trove is a research framework for building weakly supervised (bio)medical named entity recognition (NER) and other entity attribute classifiers without hand-labeled training data. | ||
|
||
We combine a range of supervision signal common medical ontologies such as the Unified Medical Language System (UMLS), clinical text heuristics, and other noisy labeling sources for use with weak supervision frameworks such as [Snorkel](https://github.com/snorkel-team/snorkel). | ||
The COVID-19 pandemic has underlined the need for faster, more flexible ways of building and sharing state-of-the-art NLP/NLU tools to analyze electronic health records (EHR), scientific literature, and social media. Trove provides tools for combining freely available supervision sources such as medical ontologies from the [Unified Medical Language System (UMLS)](https://www.nlm.nih.gov/research/umls/index.html), common text heuristics, and other noisy labeling sources for use as entity *labelers* in weak supervision frameworks such as [Snorkel](https://github.com/snorkel-team/snorkel), [FlyingSquid ](https://github.com/HazyResearch/flyingsquid) and others. Technical details are available in our [manuscript](https://www.nature.com/articles/s41467-021-22328-4). | ||
|
||
|
||
Technical details are available in our [manuscript](https://arxiv.org/abs/2008.01972). | ||
|
||
Trove has been used as part of several COVID-19 reseach efforts at Stanford. | ||
|
||
## Installation | ||
- [Continuous symptom profiling of patients screened for SARS-CoV-2](https://med.stanford.edu/covid19/research.html#data-science-and-modeling). We used a daily feed of patient notes from Stanford Health Care emergency departments to generate up-to-date [COVID-19 symptom frequency](https://docs.google.com/spreadsheets/d/1iZZvbv94fpZdC6XaiPosiniMOh18etSPliAXVlLLr1w/edit#gid=344371264) data. Funded by the [Bill & Melinda Gates Foundation](https://www.gatesfoundation.org/about/committed-grants/2020/04/inv017214). | ||
- [Estimating the efficacy of symptom-based screening for COVID-19](https://rdcu.be/chSrv) published in *npj Digitial Medicine*. | ||
- Our COVID-19 symptom data was used by CMU's [DELPHI group](https://covidcast.cmu.edu/) to prioritize selection of informative features from [Google's Symptom Search Trends dataset](https://github.com/GoogleCloudPlatform/covid-19-open-data/blob/main/docs/table-search-trends.md). | ||
|
||
Requirements: python 3.6, pytorch 1.0+, snorkel 0.9.5+ | ||
|
||
## Tutorials | ||
## Getting Started | ||
|
||
See `tutorials/` | ||
### Tutorials | ||
|
||
## Requirements | ||
See [`tutorials/`](https://github.com/som-shahlab/trove/tree/dev/tutorials) for Jupyter notebooks walking through an example NER application. | ||
|
||
### Installation | ||
|
||
Requirements: Python 3.6 or later. We recomend using `pip` to install | ||
|
||
`pip install -r requirements.txt` | ||
|
||
## Contributions | ||
We welcome all contributions to the code base! Please submit a pull request and/or start a discussion on GitHub Issues. | ||
|
||
Weakly supervised methods for programatically building and maintaining training labels provides new opportunities for the larger community to participate in the creation of important datasets. This is especially exciting in domains such as medicine, where sharing labeled data is often challening due to patient privacy concerns. | ||
|
||
Inspired by recent efforts such as [HuggingFace's Datasets](ttps://github.com/huggingface/datasets) library, | ||
we would love to start a conversation around how to support sharing labelers in service of mantaining an open task library, so that it is easier to create, deploy, and version control weakly supervised models. | ||
|
||
Tested on OSX and Linux. | ||
|
||
## Citation | ||
If use Trove in your research, please cite [Ontology-driven weak supervision for clinical entity classification in electronic health records]() | ||
If use Trove in your research, please cite us! | ||
|
||
Fries, J.A., Steinberg, E., Khattar, S. et al. Ontology-driven weak supervision for clinical entity classification in electronic health records. Nat Commun 12, 2017 (2021). https://doi-org.stanford.idm.oclc.org/10.1038/s41467-021-22328-4 | ||
|
||
``` | ||
@article{fries2021trove, | ||
title={Ontology-driven weak supervision for clinical entity classification in electronic health records}, | ||
author={Fries, Jason A and Steinberg, Ethan and Khattar, Saelig and Fleming, Scott L and Posada, Jose and Callahan, Alison and Shah, Nigam H}, | ||
journal={Nature Communications}, | ||
volume={12}, | ||
number={1}, | ||
year={2021}, | ||
publisher={Nature Publishing Group} | ||
} | ||
``` | ||
|
||
See the `manuscript` branch for the code used | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
toolz==0.11.1 | ||
tqdm==4.59.0 | ||
torch==1.8.0 | ||
requests==2.25.1 | ||
pandas==1.1.5 | ||
scipy==1.5.2 | ||
lxml==4.6.2 | ||
spacy==3.0.5 | ||
numpy==1.19.2 | ||
joblib==1.0.1 | ||
msgpack_python==0.5.6 | ||
norm==1.6.0 | ||
pytorch_pretrained_bert==0.6.2 | ||
scikit_learn==0.24.1 | ||
seqeval==1.2.2 | ||
stopwords==1.0.0 |
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
import unittest | ||
import numpy as np | ||
|
||
|
||
class MetricsTest(unittest.TestCase): | ||
def test_convert_tag_fmt(self): | ||
return True | ||
|
||
|
||
|
||
if __name__ == "__main__": | ||
unittest.main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.