✅ This is the
mainbranch - Production-ready implementation with critical bug fixes and validated results. For the original 2020 research code, see thearchive-2020-researchbranch.
Title: Detecting, Classifying and Explaining IoT Botnet Attacks Using Deep Learning Methods Based on Network Data
Published Papers:
Institution: Kennesaw State University, Department of Computer Science Year: 2020-2022
This project demonstrates that deep learning models can effectively detect and classify IoT botnet attacks using network traffic data in a device-agnostic manner. Two complementary approaches are implemented:
- Approach: Autoencoder-based deep learning
- Goal: Detect malicious vs. benign traffic
- Method: Learn normal traffic patterns, flag deviations as attacks
- Dataset: Trained on benign traffic, tested against Mirai and Gafgyt attacks
- Approach: Multi-class neural network classifier
- Goal: Classify attack types (benign, Gafgyt, Mirai)
- Results: 99.98% accuracy with all features, 99.9% with top 3 features
- Method: Supervised learning with labeled attack data
- Status: Research/experimental - simulation-based, not production
- Implementation: TensorFlow Federated (TFF) simulation
- Latest:
anomaly-detection/train_v04.py,run_experiment_*.py - Note: Experimental attempts archived in
docs/archived/experimental/
Current System (2020):
Figure 1: Current system architecture showing the 2020 implementation with identified issues (data leakage, broken TFF, hard-coded paths)
Target System (2025 Modernized):
Figure 2: Target system architecture with modern tools (Python 3.12, TensorFlow 2.19, Flower FL, SHAP explainability)
Figure 3: Federated learning implementation using Flower framework with 9 IoT device clients and FedAvg aggregation
- Data Pipeline: Current | Target
- File Structure: Current | Target
- Model Architectures: Autoencoder | Classifier
- Deployment: Target Architecture
- Dependencies: Current System
For detailed architecture documentation, see docs/architecture/ARCHITECTURE.md.
archive-2020-research/
├── anomaly-detection/ # Anomaly detection (autoencoders)
│ ├── train_v04.py # Latest FL implementation
│ ├── train_og.py # Original baseline
│ ├── test.py # Evaluation
│ └── run_experiment_*.py # FL experiments
├── classification/ # Multi-class classifier (99.98% accuracy!)
│ ├── train.py
│ └── test.py
├── jupyter/ # Exploratory notebooks
├── data/fisher/ # Feature selection (Fisher scores)
│ ├── fisher.csv
│ ├── fisher2.csv
│ └── demonstrate_structure.csv
├── config/
│ └── devices.json # 9 IoT device configurations
├── scripts/
│ └── download_data.py # Dataset download utility
├── docs/
│ ├── archived/experimental/ # Historical FL attempts with explanations
│ └── references/
│ ├── N_BaIoT_dataset.md # Complete dataset documentation
│ ├── README.md # Paper references
│ └── thesis.pdf # Reference material
├── environment-archive.yaml # Conda environment (2020 deps)
├── PYSYFT_RESEARCH.md # FL framework analysis
└── README.md # This file# Create conda environment (Python 3.8, TensorFlow 2.10, TFF 0.40)
conda env create -f environment-archive.yaml
conda activate botnet-archive-2020# Download N-BaIoT dataset from UCI repository
python scripts/download_data.py
# Manually extract .rar files to respective device folders
# Place attack CSVs in: data/{device}/gafgyt_attacks/ and data/{device}/mirai_attacks/Dataset: N-BaIoT on UCI Citation: Meidan et al., "N-BaIoT—Network-Based Detection of IoT Botnet Attacks Using Deep Autoencoders", IEEE Pervasive Computing, 2018
Anomaly Detection:
cd anomaly-detection
python train_og.py # Original centralized training
python train_v04.py # Federated learning (simulation)Classification:
cd classification
python train.py # All features
python train.py 5 # Top 5 features only# Anomaly detection
cd anomaly-detection
python test.py
# Classification
cd classification
python test.py 5 'model_5.h5' # Test with top 5 features| Features | Accuracy | Training Time | Notes |
|---|---|---|---|
| All 115 | 99.98% | 42 min (20 epochs) | Best accuracy |
| Top 5 | 99.91% | ~8 min (5 epochs) | Fast, excellent |
| Top 3 | 99.94% | ~8 min (5 epochs) | Optimal balance |
| Top 2 | 84.30% | ~8 min (5 epochs) | Insufficient features |
Confusion Matrix (Top 3 Features):
Predicted
Benign Gafgyt Mirai
Actual Benign 111439 40 6
Gafgyt 460 566834 0
Mirai 80 162 733501- Threshold-based detection using MSE reconstruction error
- Trained per-device for device-specific normal patterns
- Evaluated against Mirai and BASHLITE botnet attacks
This project uses LIME (Local Interpretable Model-agnostic Explanations) to interpret black-box deep learning decisions:
- Generates HTML explanations for individual predictions
- Shows which features contributed to classification
- Demonstrates that DL opacity can be mitigated
This project explored federated learning (FL) as a graduate research project at Kennesaw State University. Multiple approaches were tried:
- TensorFlow Federated (TFF) - Primary approach, simulation-based
- PySyft - Explored but not implemented (compatibility issues in 2020)
- Manual FedAvg - Custom implementation attempt
- Working: Simulation code in
train_v04.pyandrun_experiment_*.py - Limitation: Simulation-only, not true distributed deployment
- Note: See
docs/archived/experimental/README.mdfor full evolution history
See PYSYFT_RESEARCH.md for:
- Analysis of 2020 vs 2025 FL frameworks
- Recommendations for modernization
- Alternative frameworks (Flower, modern TFF)
- Dataset:
docs/references/N_BaIoT_dataset.md- Complete dataset documentation with citations - Experiments:
docs/archived/experimental/README.md- Evolution of FL attempts - Research:
PYSYFT_RESEARCH.md- FL framework analysis - Module READMEs: See
anomaly-detection/README.mdandclassification/README.md
- Danmini Doorbell
- Ecobee Thermostat
- Ennio Doorbell
- Philips B120N/10 Baby Monitor
- Provision PT-737E Security Camera
- Provision PT-838 Security Camera
- Samsung SNH 1011 N Webcam
- SimpleHome XCS7 1002 WHT Security Camera
- SimpleHome XCS7 1003 WHT Security Camera
- Mirai: TCP, UDP, ACK, HTTP, and other flood attacks
- BASHLITE (Gafgyt): Various attack vectors
- 115 statistical features extracted using AfterImage framework
- Fisher score feature selection for dimensionality reduction
- Damped incremental statistics on network streams
- Python 3.8
- TensorFlow 2.10.0
- TensorFlow Federated 0.40.0
- Pandas 1.3.5 (before
.append()deprecation) - NumPy 1.21.6
- scikit-learn 1.0.2
- LIME 0.2.0.1
- Uses deprecated
DataFrame.append()- works with Pandas 1.3.5 - Mixes
kerasandtensorflow.kerasimports - works with TF 2.10 - CPU-optimized (GPU setup not guaranteed)
Note: The main branch addresses these issues with modern dependencies.
Published Paper:
@article{regan2022botnet,
title={Detecting, Classifying and Explaining IoT Botnet Attacks Using Deep Learning Methods Based on Network Data},
author={Regan, Christopher M.},
journal={Computer \& Security},
year={2022},
publisher={Elsevier},
url={https://www.sciencedirect.com/science/article/pii/S2666827022000081}
}GitHub Repository:
@misc{regan2022github,
title={IoT Botnet Traffic Analysis with Federated Learning},
author={Regan, Christopher M.},
year={2022},
publisher={GitHub},
howpublished={\url{https://github.com/iAmGiG/BotnetTrafficAnalysisFederaedLearning}},
url={https://github.com/iAmGiG/BotnetTrafficAnalysisFederaedLearning}
}@article{meidan2018nbaiot,
title={N-BaIoT—Network-Based Detection of IoT Botnet Attacks Using Deep Autoencoders},
author={Meidan, Yair and Bohadana, Michael and Mathov, Yael and Mirsky, Yisroel and Breitenbacher, Dominik and Shabtai, Asaf and Elovici, Yuval},
journal={IEEE Pervasive Computing},
volume={17},
number={3},
pages={12--22},
year={2018},
doi={10.1109/MPRV.2018.03367731}
}Current Branch: main
Status: Modern, production-ready implementation
Last Updated: October 2024 (organization and documentation)
Original Research Period: 2020-2022
Other Branches:
archive-2020-research- Original 2020 research code (preserved)develop- Original development branch (historical reference)
X Institution: Kennesaw State University Lab: CCSE, DSL Laboratory Original Supervisor: Dr. Reza Parizi
Original N-BaIoT Dataset: Ben-Gurion University of the Negev & Singapore University of Technology and Design
This project is licensed under the MIT License - see the LICENSE file for details.
Academic Research: This work was conducted at Kennesaw State University (2020-2022) under the supervision of Dr. Reza Parizi.