Skip to content

Commit f9bf905

Browse files
authored
Merge pull request #214 from Namyalg/Medium-Article-Scraper
Scrape Medium articles
2 parents 33d7cb0 + 09c73d4 commit f9bf905

File tree

5 files changed

+62
-0
lines changed

5 files changed

+62
-0
lines changed
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
#!/usr/bin/env python3
2+
3+
#Imports and dependencies
4+
5+
import requests
6+
from bs4 import BeautifulSoup
7+
8+
#The content is written into a text file
9+
10+
file = open("Medium_article_content.txt", "w")
11+
12+
#The URL of the article is entered here
13+
page_url = input("Enter the URL of the Medium Article ")
14+
15+
#Based on the response got from the URL, the content is loaded into response
16+
17+
response = requests.get(page_url)
18+
19+
#Beautiful soup is a library used for web scraping and parsing the contents of a web page
20+
#Here a html parser is used to parse through the content embedded in the html tags
21+
22+
soup = BeautifulSoup(response.text,"html.parser")
23+
24+
#The content of the article is stored in the <article> tag
25+
26+
for line in soup.find('article').find('div'):
27+
28+
#All the content is essentially stored between <p> tags
29+
30+
for content in line.find_all('p'):
31+
32+
#contents are written into a file
33+
34+
file.write(content.text + '\n')
35+
36+
file.close()
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Medium Article Downloader
2+
3+
![Image](assets/medium.PNG)
4+
5+
Medium is a treasure trove of knowledge. It is a great place to read and write blogs
6+
7+
Through this script, the contents of a medium article can be downloaded and stored
8+
9+
The Beautiful Soup library in Python enables web scraping and enables parsing though html content, which web pages are made of. Here, the same has been used.
10+
11+
12+
## Requirements
13+
- pip install requests:wq
14+
- pip install BeautifulSoup
15+
16+
17+
## Working
18+
The user is prompted to enter the URL of the Medium article that has to be downloaded
19+
20+
![Image](assets/promptURL.PNG)
21+
22+
The contents are then stored in a file named Medium$_article$_content.txt
23+
24+
![Image](assets/content.PNG)
25+
26+
Loading
1.32 KB
Loading
Loading

0 commit comments

Comments
 (0)