A state-of-the-art deep learning project for predicting stroke risk based on patient health data.
Stroke is a leading cause of death and disability worldwide. Early detection and risk assessment can significantly improve patient outcomes. This project aims to develop a reliable deep learning model for predicting stroke risk based on patient demographic and health data.
stroke-prediction/
├── data/ # Data storage and processing
│ ├── raw/ # Raw dataset files
│ └── processed/ # Processed dataset files
├── models/ # Saved model checkpoints
├── notebooks/ # Jupyter notebooks for exploration and visualization
├── src/ # Source code
│ ├── data/ # Data processing utilities
│ ├── models/ # Model architecture definitions
│ ├── training/ # Training scripts
│ ├── evaluation/ # Evaluation scripts
│ └── utils/ # Helper functions
├── tests/ # Unit tests
├── app/ # Web application for deployment
├── requirements.txt # Python dependencies
└── README.md # Project documentation
- Clone this repository:
git clone https://github.com/yourusername/stroke-prediction.git
cd stroke-prediction- Create a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txtThis project uses the Stroke Prediction Dataset from Kaggle, which is publicly available and contains the following features:
- Demographic information (age, gender)
- Medical history (hypertension, heart disease)
- Lifestyle factors (smoking status, work type)
- Health metrics (BMI, glucose level)
- Target: Stroke occurrence (0 = No, 1 = Yes)
python src/data/preprocess.pypython src/training/train.pypython src/evaluation/evaluate.pypython app/app.pyThen open your browser and navigate to http://localhost:5000
The project employs a multi-layer neural network with specialized handling for categorical and numerical features. Techniques include:
- Feature normalization and encoding
- Embedding layers for categorical variables
- Dropout for regularization
- Batch normalization
- Class imbalance handling with weighted loss
The model achieves:
- AUC-ROC: ~0.85
- F1-Score: ~0.78
- Precision: ~0.76
- Recall: ~0.81
This project is designed to run on free compute resources:
- Training: Google Colab (with free GPU runtime)
- Inference: Local CPU or Colab
- Deployment: Lightweight enough for Heroku free tier
This project is licensed under the MIT License - see the LICENSE file for details.