Skip to content

Diabetes Drug Testing Project: Predicting hospitalization time with a regression/classification model

Notifications You must be signed in to change notification settings

phthaloc/emr_data_regression_model

Repository files navigation

Deep learning regression model for diabetes drug testing

Context: You are a data scientist for an exciting unicorn healthcare startup that has created a groundbreaking diabetes drug that is ready for Phase III clinical trial testing. It is a very unique and sensitive drug that requires administering and screening the drug over at least 5-7 days of time in the hospital with frequent monitoring/testing and patient medication adherence training with a mobile application. You have been provided a patient dataset from a client partner and are tasked with building a predictive model that can identify which type of patients the company should focus their efforts testing this drug on. Target patients are people that are likely to be in the hospital for this duration of time and will not incur significant additional costs for administering this drug to the patient and monitoring.

In order to achieve this goal we build a regression model that can predict the estimated hospitalization time for a patient. We use this to select/filter patients for the study.

Expected Hospitalization Time Regression Model: Utilizing a synthetic dataset (denormalized at the line level augmentation) built off of the UCI Diabetes readmission dataset, we build a regression model that predicts the expected days of hospitalization time and then convert this to a binary prediction of whether to include or exclude that patient from the clinical trial (by introducing a threshold of 5 day hospitalization time).

This project demonstrates the importance of building the right data representation at the encounter level, with appropriate filtering and preprocessing/feature engineering of key medical code sets. We analyze and interpret the resulting model for biases across key demographic groups.

Dataset

Due to healthcare PHI regulations (HIPAA, HITECH), there are limited number of publicly available datasets and some datasets require training and approval. So, for the purpose of this toy sample, we are using a dataset from UC Irvine that has been modified. Please note that it is limited in its representation of some key features such as diagnosis codes which are usually an unordered list in 835s/837s (the HL7 standard interchange formats used for claims and remits).

License

This project is licensed under the MIT License

This project is part of the Udacity Nanodegree programm "AI in Healthcare" (November 2020).

About

Diabetes Drug Testing Project: Predicting hospitalization time with a regression/classification model

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published