Skip to content

Commit

Permalink
Merge pull request #3 from Virgula0/interceptor_improvements
Browse files Browse the repository at this point in the history
Interceptor improvements
  • Loading branch information
Virgula0 authored Jan 16, 2025
2 parents fb04d51 + e6096e0 commit dece392
Show file tree
Hide file tree
Showing 17 changed files with 69,231 additions and 53,203 deletions.
4 changes: 3 additions & 1 deletion .github/workflows/markdown-converter.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
steps:
- name: Checkout Repository
uses: actions/checkout@v4

- uses: awalsh128/cache-apt-pkgs-action@latest
with:
packages: pandoc texlive-xetex python3 python3-pip
Expand Down Expand Up @@ -131,3 +131,5 @@ jobs:
with:
commit_message: "Generate PDF"
file_pattern: docs/docs.pdf
skip_checkout: true
push_options: --force
141 changes: 75 additions & 66 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ In the project the following files have the described functions:
- `classifiers.py` -> define a big list of classifiers for `algo_chooser.py`
- `noiser.py` -> helper script for sending normal http requests
- `detector.py` -> runs a real-time demo of the model using previously mentioned scripts internally
- `export_model.py` -> utility for model exportation
- `model` directory -> contains exported model
- `merger.py` -> merges 2 datasets in one single dataset
- `datasets/delayed/merged.csv` -> contains just another dataset for calculating accuracy and other stats
Expand Down Expand Up @@ -68,31 +69,30 @@ Understanding TCP connections is important for building the dataset. When a TCP
## Example Dataset Row (normal tcp scan on a closed port 3306)

```csv
start_request_time,end_request_time,start_response_time,end_response_time,duration,src_ip,dst_ip,src_port,dst_port,SYN,ACK,FIN,RST,URG,PSH,label
2025-01-08 13:52:55.274814,2025-01-08 13:52:55.274814,2025-01-08 13:52:55.274874,2025-01-08 13:52:55.274874,6e-05,"['172.31.0.1', '172.31.0.2']","['172.31.0.1', '172.31.0.2']","['44031', '3306']","['44031', '3306']",1,1,0,1,0,0,1
2025-01-15 12:49:08.025898,2025-01-15 12:49:08.025898,2025-01-15 12:49:08.025946,2025-01-15 12:49:08.025946,4.8e-05,"['172.31.0.2', '172.31.0.1']","['172.31.0.2', '172.31.0.1']","['52666', '22']","['52666', '22']",1,1,0,1,0,0,1
```

- Sessions are grouped using `src_ip`, `dst_ip`, `src_port`, and `dst_port` tuple as keys. However, these grouping keys are excluded and not necessary from the model's training phase.
- Sessions are grouped using `src_port`, and `dst_port` tuple as keys. However, these grouping keys are excluded and not necessary from the model's training phase.

- The `duration` feature provides valuable information for distinguishing between legitimate traffic and `NMAP` scans, as legitimate HTTP requests may exhibit similar flag behaviour but differ in timing.

- The session window in `interceptor.py` is set to **0.5 seconds** by default, as this is typically enough to capture an `NMAP` scan attempt.
- The session window in `interceptor.py` is set to **0.5 seconds** by default, as this is typically enough to capture an `NMAP` scan attempt on a single port.

More technical explanations are present via comments in `interceptor.py`. The script takes a while for writing succesfully all the data when a lot of requests are performed.

> [!NOTE]
> During the data collection, some ports were opened intentionally on the host to differentiate some rows in the dataset. For example, an HTTP server on port 1234 has been opened using the following method: `python3 -m http.server 1234`plus, eventually other ports that had already been opened from other services between the range 0-5000.
> During the data collection, some ports were opened intentionally on the host to differentiate some rows in the dataset. For example, an HTTP server on port `1234` has been opened using the following method: `python3 -m http.server 1234`plus, eventually other ports that had already been opened from other services between the range 0-5000.
## Common `NMAP` Scans

The following commands were run from the container called `traffic_generator` (the container) having the `sudo python3 interceptor.py` running locally.

```bash
nmap -sT 172.31.0.1 -p 0-5000 # TCP Scan
nmap -sS 172.31.0.1 -p 0-5000 # Stealth Scan
nmap -sF 172.31.0.1 -p 0-5000 # FIN Scan
nmap -sN 172.31.0.1 -p 0-5000 # NULL Scan
nmap -sX 172.31.0.1 -p 0-5000 # XMAS Scan
nmap -sT 172.31.0.1 -p 0-2500 # TCP Scan
nmap -sS 172.31.0.1 -p 0-2500 # Stealth Scan
nmap -sF 172.31.0.1 -p 0-2500 # FIN Scan
nmap -sN 172.31.0.1 -p 0-2500 # NULL Scan
nmap -sX 172.31.0.1 -p 0-2500 # XMAS Scan
```

The result is the creation of `bad.csv`
Expand All @@ -110,9 +110,9 @@ The final dataset consists of a merge (`merged.csv`) used for training the model

The `XGBClassifier` was selected as the final model due to its reliable performance in key areas:

1. High accuracy score (`~0.95`)
2. Fast prediction speed (`~3ms` on average for 15,000 rows)
3. High MCC score (`~0.91`)
1. High accuracy score (`~0.99`)
2. Fast prediction speed (`~4ms` on average for `24.511` rows)
3. High MCC score (`~0.98`)

## Why accuracy metric is important?

Expand All @@ -124,52 +124,50 @@ MCC should normally be preferred when unbalanced datasets are present. This is n

## Why the prediction speed is so important?

The prediction speed played a significant role in choosing this model, as it allows efficient analysis of large volumes of network traffic in real-time. The `RandomForestClassifier` is pretty similar in accuracy (maybe even better sometimes), but it has a slower prediction time in average of `~15ms` compared to `~3ms` of `XGBClassifier`. Of course it's useless to underline that even if `DeepSVDD` predicts in `1ms`, given it's low accuracy rate it cannot be even considered.
The prediction speed played a significant role in choosing this model, as it allows efficient analysis of large volumes of network traffic in real-time. The `RandomForestClassifier` is pretty similar in accuracy (maybe even better sometimes), but it has a slower prediction time in average of `~23ms` compared to `~4ms` of `XGBClassifier`. Of course it's useless to underline that even if `DeepSVDD` predicts in `1ms`, given it's low accuracy rate it cannot be even considered.

## Model Performance

```
Dataset loaded with 15192 records.
Dataset loaded with 24511 records.
Dataset preprocessed successfully.
+------------+-----+-----+-----+-----+-----+-----+-------+
| duration | SYN | ACK | FIN | RST | URG | PSH | label |
+------------+-----+-----+-----+-----+-----+-----+-------+
| 0.000030 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| 0.000013 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| 0.000012 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| 0.000010 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| 0.000011 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
+------------+-----+-----+-----+-----+-----+-----+-------+
+------------+-----+-----+-----+-----+-----+-----+--------+
| Duration | SYN | ACK | FIN | RST | URG | PSH | Label |
|------------|-----|-----|-----|-----|-----|-----|-------|
| 0.000048 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| 0.000016 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| 0.000015 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| 0.000014 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| 0.000015 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
+------------+-----+-----+-----+-----+-----+-----+--------+
Dataset split into training and testing sets.
RandomForestClassifier (n_estimators=210): Accuracy: 0.9566, Train time: 411ms, Prediction time: 16ms, MCC: 0.914073, TP: 744, TN: 710, FN: 16, FP: 50
KNeighborsClassifier (n_estimators=N/A): Accuracy: 0.9910, Train time: 13ms, Prediction time: 271ms, MCC: 0.982114, TP: 1238, TN: 1192, FN: 18, FP: 4
....
XGBClassifier (n_estimators=1): Accuracy: 0.9572, Train time: 9ms, Prediction time: 2ms, MCC: 0.915337, TP: 744, TN: 711, FN: 16, FP: 49
RandomForestClassifier (n_estimators=210): Accuracy: 0.9902, Train time: 650ms, Prediction time: 23ms, MCC: 0.980464, TP: 1238, TN: 1190, FN: 18, FP: 6
....
XGBClassifier (n_estimators=210): Accuracy: 0.9910, Train time: 86ms, Prediction time: 4ms, MCC: 0.982114, TP: 1238, TN: 1192, FN: 18, FP: 4
....
DeepSVDD (n_estimators=N/A): Accuracy: 0.3868, Train time: 14032ms, Prediction time: 1ms, MCC: -0.348550, TP: 5, TN: 583, FN: 755, FP: 177
DeepSVDD (n_estimators=N/A): Accuracy: 0.6970, Train time: 22739ms, Prediction time: 1ms, MCC: 0.492361, TP: 526, TN: 1183, FN: 730, FP: 13
....
Best Classifier based on Accuracy
Classifier: XGBClassifier
n_estimators: 1
Accuracy Score: 0.9572
n_estimators: 210
Accuracy Score: 0.9910
Best Classifier based on MCC
Classifier: XGBClassifier
n_estimators: 1
MCC Score: 0.915337
n_estimators: 210
MCC Score: 0.982114
Best Classifier based on prediction time
Classifier: DeepSVDD
Time : 1.000000ms
```

---

## How Training Dataset was created (detailed)

The training dataset, `datasets/train/merged.csv`, is generated using the following steps:
Expand Down Expand Up @@ -197,11 +195,11 @@ The training dataset, `datasets/train/merged.csv`, is generated using the follow
- `label`: `0` for legitimate traffic, `1` for `NMAP` scans
4. Run `NMAP` scans from the container:
```bash
nmap -sT 172.31.0.1 -p 0-5000
nmap -sS 172.31.0.1 -p 0-5000
nmap -sF 172.31.0.1 -p 0-5000
nmap -sN 172.31.0.1 -p 0-5000
nmap -sX 172.31.0.1 -p 0-5000
nmap -sT 172.31.0.1 -p 0-2500
nmap -sS 172.31.0.1 -p 0-2500
nmap -sF 172.31.0.1 -p 0-2500
nmap -sN 172.31.0.1 -p 0-2500
nmap -sX 172.31.0.1 -p 0-2500
```
5. Run noise traffic for legitimate requests, from the container:
```bash
Expand All @@ -217,49 +215,62 @@ The training dataset, `datasets/train/merged.csv`, is generated using the follow
```bash
python3 algo_chooser.py
```
---

## Delayed Dataset
## Delayed Dataset (Making things harder)

A delayed dataset can be created by introducing delays between requests:

```bash
nmap -p 1-10000 --scan-delay 1s 172.31.0.1
nmap -p 1-5000 --scan-delay 1s 172.31.0.1
```

You can also adjust the delay in legitimate requests by modifying `SLEEP_SECOND` in `noiser.py`.

Results confirm the choise of `XGBClassifier`
With this dataset, the results are a little different and worse.

The reasons why this happens are the following:

1. Here we have a minor number of data since it takes some hours to construct this dataset.
2. We added a scan delay that introduces a second delay between each Nmap attempt.
3. The attack type used by Nmap using the above command is a `Stealth Attack` by default, so flags between `HTTP` normal requests and
`Stealth Attacks` are practically the same, the only information affordable is the duration (the backup feature introduced for these situations, when distinguish anomaly/normal packets using flags is impossible).

Given this 3 points above, and relying only on duration feature in these kind of situation, an accuracy of `~90%` seems quite reasonable.

We still continue to prefer `XGBClassifier` for the same reasons discussed for the train dataset.

```
Dataset loaded with 11351 records.
Dataset preprocessed successfully.
+------------+-----+-----+-----+-----+-----+-----+-------+
| duration | SYN | ACK | FIN | RST | URG | PSH | label |
+------------+-----+-----+-----+-----+-----+-----+-------+
| 0.000060 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| 0.000068 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| 0.000062 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| 0.000057 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| 0.000074 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
+------------+-----+-----+-----+-----+-----+-----+-------+
Dataset loaded with 10000 records.
Dataset preprocessed successfully.
+------------+-----+-----+-----+-----+-----+-----+---------+
| duration | SYN | ACK | FIN | RST | URG | PSH | label |
|--------------|-----|-----|-----|-----|-----|-----|-------|
| 0.000067 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| 0.000051 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| 0.000062 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| 0.000037 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
| 0.000042 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
+------------+-----+-----+-----+-----+-----+-----+---------+
Dataset split into training and testing sets.
RandomForestClassifier (n_estimators=210): Accuracy: 1.0000, Train time: 327ms, Prediction time: 11ms, MCC: 1.000000, TP: 752, TN: 384, FN: 0, FP: 0
RandomForestClassifier (n_estimators=210): Accuracy: 0.8940, Train time: 483ms, Prediction time: 20ms, MCC: 0.789968, TP: 436, TN: 458, FN: 70, FP: 36
....
XGBClassifier (n_estimators=210): Accuracy: 1.0000, Train time: 28ms, Prediction time: 3ms, MCC: 1.000000, TP: 752, TN: 384, FN: 0, FP: 0
XGBClassifier (n_estimators=210): Accuracy: 0.8940, Train time: 52ms, Prediction time: 4ms, MCC: 0.789968, TP: 436, TN: 458, FN: 70, FP: 36
....
DeepSVDD (n_estimators=N/A): Accuracy: 0.4833, Train time: 10273ms, Prediction time: 1ms, MCC: 0.270419, TP: 173, TN: 376, FN: 579, FP: 8
Best Classifier based on Accuracy
Classifier: XGBClassifier
n_estimators: 210
Accuracy Score: 1.0000
Classifier: KNeighborsClassifier
n_estimators: N/A
Accuracy Score: 0.8990
Best Classifier based on MCC
Classifier: XGBClassifier
n_estimators: 210
MCC Score: 1.000000
Classifier: KNeighborsClassifier
n_estimators: N/A
MCC Score: 0.798304
Best Classifier based on prediction time
Classifier: DeepSVDD
Expand All @@ -284,8 +295,7 @@ sudo python3 detector.py

> [!TIP]
> When running the script, a log file containing all events called `logs` is created in the main project directory.
---
> When running the script, some other connections directed to localhost interface may be callected in the process.
---

Expand All @@ -302,5 +312,4 @@ https://github.com/user-attachments/assets/f10773c6-742e-4394-913e-42beb0cc3683

# External Dependencies

- `pyshark`
- `python-nmap`
- `pyshark`
5 changes: 1 addition & 4 deletions algo_chooser.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import pandas as pd
import joblib
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, matthews_corrcoef

Expand All @@ -13,7 +12,7 @@
# Main script
if __name__ == "__main__":
# Load Dataset
df = pd.read_csv('./datasets/train/merged.csv')
df = pd.read_csv('./datasets/delayed/merged.csv')
print(f"Dataset loaded with {len(df)} records.")

# Preprocess Dataset
Expand Down Expand Up @@ -132,5 +131,3 @@
# Export model prefering best accuracy model over best_mcc
# I comment the dump because, XGBClassifier does not provide the best accuracy every time
# The reason why it has been chosen over other models can be found in README.md

# joblib.dump(best_acc_clf, "./model/model.pkl")
Loading

0 comments on commit dece392

Please sign in to comment.