Skip to content

Commit 88c12d1

Browse files
committedOct 3, 2021
Added 500 Data science books scraper
1 parent ac7faad commit 88c12d1

File tree

4 files changed

+608
-0
lines changed

4 files changed

+608
-0
lines changed
 

‎datascience_books_scraper/500Datasciencebooks.csv

Lines changed: 500 additions & 0 deletions
Large diffs are not rendered by default.

‎datascience_books_scraper/README.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
2+
# 500 Data Science Books Scraper.
3+
4+
This Python Script is intended to scrape the popular books on Data Science from https://1lib.in/s/data%20science
5+
6+
7+
## Run Locally
8+
9+
Clone the project
10+
11+
```bash
12+
git clone https://github.com/python-geeks/Automation-scripts.git
13+
```
14+
15+
Go to the project directory
16+
17+
```bash
18+
cd Automation-scripts/Datascience_books_Scraper
19+
```
20+
21+
Install dependencies
22+
23+
```bash
24+
pip install -r requirements.txt
25+
```
26+
27+
Run the script
28+
29+
```bash
30+
python scrape.py
31+
```
32+
Wait for few seconds then check your directory for file named :
33+
```
34+
500Datasciencebooks.csv
35+
36+
```
37+
## Author
38+
39+
- [@ManthanShettigar](https://github.com/ManthanShettigar)
40+
41+
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
astroid==2.8.0
2+
autopep8==1.5.7
3+
beautifulsoup4==4.10.0
4+
bs4==0.0.1
5+
certifi==2021.5.30
6+
charset-normalizer==2.0.6
7+
flake8==3.9.2
8+
idna==3.2
9+
isort==5.9.3
10+
lazy-object-proxy==1.6.0
11+
lxml==4.6.3
12+
mccabe==0.6.1
13+
platformdirs==2.4.0
14+
pycodestyle==2.7.0
15+
pyflakes==2.3.1
16+
pylint==2.11.1
17+
requests==2.26.0
18+
soupsieve==2.2.1
19+
toml==0.10.2
20+
typing-extensions==3.10.0.2
21+
urllib3==1.26.7
22+
wrapt==1.12.1

‎datascience_books_scraper/scrape.py

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
import requests
2+
from bs4 import BeautifulSoup
3+
import csv
4+
Book_name = []
5+
Year = []
6+
Publisher = []
7+
Author = []
8+
9+
for j in range(1, 11):
10+
source = requests.get(
11+
f'https://1lib.in/s/data%20science?page={j}').text
12+
soup = BeautifulSoup(source, 'lxml')
13+
books = soup.find_all('table', attrs={'style': 'width:100%;height:100%;'})
14+
for i in books:
15+
# book name
16+
try:
17+
Book_name.append(i.find('h3').text.strip())
18+
except Exception:
19+
Book_name.append('nan')
20+
# year
21+
try:
22+
Year.append(
23+
i.find('div', class_='property_year').text.strip()[6:10])
24+
except Exception:
25+
Year.append('nan')
26+
# publisher
27+
try:
28+
Publisher.append(
29+
i.find('div', attrs={'title': 'Publisher'}).text.strip())
30+
except Exception:
31+
Publisher.append('nan')
32+
# Author
33+
try:
34+
Author.append(i.find('div', class_='authors').text.strip())
35+
except Exception:
36+
Author.append('nan')
37+
38+
file_name = '500Datasciencebooks.csv'
39+
40+
with open(file_name, 'w') as file:
41+
writer = csv.writer(file)
42+
writer.writerow(['Sr.No', 'Book name', 'Publisher', 'Author', 'Year'])
43+
44+
for i in range(1, len(Book_name)):
45+
writer.writerow([i, Book_name[i], Publisher[i], Author[i], Year[i]])

0 commit comments

Comments
 (0)
Please sign in to comment.