This study focuses on predicting the risk of severe Covid-19 outcomes using a dataset provided by the Mexican government. We employed data preprocessing techniques, exploratory data analysis, and machine learning models, including LightGBM, to enhance our understanding of the factors influencing Covid-19 mortality prediction.
COVID-19 Risk Prediction, Machine Learning in Healthcare, LightGBM, Public Health Data Analytics
The global Covid-19 pandemic necessitates the development of accurate risk prediction models to identify individuals at high risk of severe outcomes. Our motivation stems from addressing the challenges faced by healthcare systems in managing Covid-19 patients effectively. We aim to contribute to improved patient care and resource allocation through data-driven predictive modeling.
We utilized a dataset from the Mexican government containing anonymized patient information. Data preprocessing involved handling imbalanced data, dropping irrelevant columns, and standardizing values. The dataset's features included patient demographics, medical history, and Covid-19 test outcomes.
Our analysis revealed correlations between age, gender, pre-existing conditions, and Covid-19 severity. We observed higher hospitalization rates among older patients, males, and those with comorbidities such as diabetes, hypertension, and obesity. Visualization techniques helped illustrate these insights effectively.
We employed machine learning models such as Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, MLP, SVM, and LightGBM. Model evaluation metrics, including accuracy, precision, recall, F1-score, and ROC AUC, highlighted LightGBM's superior performance in predicting Covid-19 mortality. Feature importance analysis provided further insights into key predictors.
Our study underscores the significance of data analytics and machine learning in Covid-19 risk prediction. LightGBM emerged as the top-performing model, offering valuable insights for healthcare resource allocation and patient care strategies. The findings contribute to the ongoing efforts in pandemic management and public health strategies.