Skip to content

Data Mining Project of CS5228, Analysis and Prediction on a COVID-19 related dataset.

Notifications You must be signed in to change notification settings

2020dfff/CS5228-Proj

Repository files navigation

CS5228 Course Project - Covid-19 Risk Prediction

Abstract

This study focuses on predicting the risk of severe Covid-19 outcomes using a dataset provided by the Mexican government. We employed data preprocessing techniques, exploratory data analysis, and machine learning models, including LightGBM, to enhance our understanding of the factors influencing Covid-19 mortality prediction.

Keywords

COVID-19 Risk Prediction, Machine Learning in Healthcare, LightGBM, Public Health Data Analytics

Introduction

The global Covid-19 pandemic necessitates the development of accurate risk prediction models to identify individuals at high risk of severe outcomes. Our motivation stems from addressing the challenges faced by healthcare systems in managing Covid-19 patients effectively. We aim to contribute to improved patient care and resource allocation through data-driven predictive modeling.

Dataset and Data Preprocessing

We utilized a dataset from the Mexican government containing anonymized patient information. Data preprocessing involved handling imbalanced data, dropping irrelevant columns, and standardizing values. The dataset's features included patient demographics, medical history, and Covid-19 test outcomes.

Exploratory Data Analysis and Insights

Our analysis revealed correlations between age, gender, pre-existing conditions, and Covid-19 severity. We observed higher hospitalization rates among older patients, males, and those with comorbidities such as diabetes, hypertension, and obesity. Visualization techniques helped illustrate these insights effectively.

Model Visualization and Feature Importance

We employed machine learning models such as Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, MLP, SVM, and LightGBM. Model evaluation metrics, including accuracy, precision, recall, F1-score, and ROC AUC, highlighted LightGBM's superior performance in predicting Covid-19 mortality. Feature importance analysis provided further insights into key predictors.

Conclusion

Our study underscores the significance of data analytics and machine learning in Covid-19 risk prediction. LightGBM emerged as the top-performing model, offering valuable insights for healthcare resource allocation and patient care strategies. The findings contribute to the ongoing efforts in pandemic management and public health strategies.

About

Data Mining Project of CS5228, Analysis and Prediction on a COVID-19 related dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published