You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+46-13
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,13 @@
2
2
3
3
A Python-based tool that crawls arXiv papers to extract author email addresses. This tool is useful for researchers and academics who need to compile contact information for authors in specific research areas.
4
4
5
+
## Important Note on Data Privacy
6
+
7
+
This tool helps collect email addresses from public academic papers. While the data is publicly available:
8
+
- The collected data (emails, database) is not included in this repository
9
+
- Users should respect privacy and data protection regulations when using this tool
10
+
- Consider the ethical implications and use the tool responsibly
11
+
5
12
## Features
6
13
7
14
- Search arXiv papers using custom queries
@@ -18,15 +25,16 @@ A Python-based tool that crawls arXiv papers to extract author email addresses.
18
25
```
19
26
arxiv_parser/
20
27
├── main.py # Main script that generates notebooks
21
-
├── notebooks/ # Generated notebook versions
28
+
├── process_remaining.py # Script for processing remaining papers
29
+
├── notebooks/ # Generated notebook versions
22
30
│ ├── arxiv_email_crawler.ipynb # Local Jupyter version
23
31
│ ├── arxiv_email_crawler_colab.ipynb # Google Colab version
24
32
│ └── arxiv_email_crawler_kaggle.ipynb # Kaggle version
25
-
├── data/ # Directory for database and output files
26
-
│ ├── papers.db # SQLite database
27
-
│ ├── papers_with_emails.csv # Exported results
28
-
│ └── unique_emails.txt # List of unique emails
29
-
└── requirements.txt # Python dependencies
33
+
├── data/ # Directory for database and output files (not tracked in git)
0 commit comments