Intriguing Effect of the Correlation Prior on ICD-9 Code Assignment

The Ninth Revision of the International Classification of Diseases (ICD-9) is a complex coding system for classifying health conditions, which researchers have attempted to automate using language models. However, the imbalanced distribution of ICD-9 codes leads to poor performance, prompting exploration of using the correlation bias between codes to improve results. While the correlation bias has the potential to enhance code assignment in specific cases, it worsen the overall performance. This repo contains script for datasets, experiments, and figures in this paper.

Structure and implementation

The language_model folder contains source code for all experiments and model implementation. The notebooks folder contains figures and tables generated inside Jupyter notebooks. The dataset folder contains source code for creating the dataset used for experiments. The utils folder contains source code for helper functions and utilities used for figures and experiments. See the README.md files in each directory for a full description.

Dependencies

- dask==2022.5.2
- datasets==2.2.2
- deepspeed==0.8.2
- torch==2.0.0+cu118
- tqdm==4.64.0
- transformers==4.27.1
- scikit-learn==1.1.1

You need to install the version specified here for these packages. Please see requirements.txt for the full list of packages.

Project based on the cookiecutter data science project template. #cookiecutterdatascience

# Text2Table

Name		Name	Last commit message	Last commit date
Latest commit History 278 Commits
baseline_xgb		baseline_xgb
dataset		dataset
docs		docs
language_model		language_model
models		models
notebooks		notebooks
references		references
reports		reports
text2table		text2table
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
test_environment.py		test_environment.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Intriguing Effect of the Correlation Prior on ICD-9 Code Assignment

Structure and implementation

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

nyuolab/text2table

Folders and files

Latest commit

History

Repository files navigation

Intriguing Effect of the Correlation Prior on ICD-9 Code Assignment

Structure and implementation

Dependencies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages