Skip to content

Commit b30493e

Browse files
Merge branch 'main' of github.com:deepinstinct-algo/DeepURLBench into main
2 parents e493861 + 0c4696a commit b30493e

File tree

1 file changed

+19
-14
lines changed

1 file changed

+19
-14
lines changed

README.md

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,31 @@
11
# DeepURLBench Dataset
22

3-
This repository contains the dataset **DeepURLBench** for the paper:
4-
**"A New Dataset and Methodology for Malicious URL Classification"**
5-
by Deep Instinct's Research Team.
3+
This repository contains the dataset **DeepURLBench**, introduced in the paper **"A New Dataset and Methodology for Malicious URL Classification"** by Deep Instinct's research team.
64

7-
---
5+
## Dataset Overview
86

9-
## Dataset Description
7+
The repository includes two parquet directories:
108

11-
The repository includes two directories in Parquet format:
9+
1. **`urls_with_dns`**:
10+
- Contains the following fields:
11+
- `url`: The URL being analyzed.
12+
- `first_seen`: The timestamp when the URL was first observed.
13+
- `TTL` (Time to Live): The time-to-live value of the DNS record.
14+
- `label`: Indicates whether the URL is malware, phishing or benign.
15+
- `IP addresses`: The associated IP addresses.
1216

13-
1. **`urls_with_dns`**: Contains URLs with associated DNS data.
14-
2. **`urls_without_dns`**: Contains URLs without DNS data.
17+
2. **`urls_without_dns`**:
18+
- Contains the following fields:
19+
- `url`: The URL being analyzed.
20+
- `first_seen`: The timestamp when the URL was first observed.
21+
- `label`: Indicates whether the URL is malware, phishing or benign.
1522

16-
---
23+
## Usage Instructions
1724

18-
## Loading the Dataset
19-
20-
You can load the dataset using **pandas** in Python. Here's an example:
25+
To load the dataset using Python and Pandas, follow these steps:
2126

2227
```python
2328
import pandas as pd
2429

25-
# Load a Parquet file
26-
df = pd.read_parquet('path_to_directory')
30+
# Replace 'directory' with the path to the parquet file or directory
31+
df = pd.DataFrame.from_parquet("directory")

0 commit comments

Comments
 (0)