Skip to content

Commit 5191c27

Browse files
Script to scrape amazon products
1 parent 81d3ecd commit 5191c27

File tree

3 files changed

+43
-0
lines changed

3 files changed

+43
-0
lines changed

Python/amazon-products/README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Amazon scraper
2+
- - - - - -
3+
## Aim
4+
This script scrapes amazon products. the function should take the name of the product and it should return the top 5 products for the query with their name, price, and ratings.
5+
6+
## Requirements
7+
```pip install beautifulsoup4==4.9.1```
8+
9+
## To run
10+
- Enter your user agent in the headers in the script. You can find your user agent by typing ```user-agent``` in google searchbar.
11+
- Run ```scrape.py```
12+
- Enter the keyword you would like to search.
13+
14+
## Output
15+
16+
![alt text](https://github.com/TaniaMalhotra/hacking-tools-scripts/blob/amazon-products/Python/amazon-products/ex.png)

Python/amazon-products/amazon.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
from bs4 import BeautifulSoup
2+
import requests
3+
kw = input("Please enter the keyword to search. For eg: phone, iron, guitar etc ")
4+
url = "https://www.amazon.com/s?k=" + kw + "&ref=nb_sb_noss_2"
5+
headers = {
6+
# the user agent is different for every user. Type user agent in chrome browser and replace the below agent with yours
7+
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36',
8+
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
9+
'Accept-Language' : 'en-US,en;q=0.5',
10+
'Accept-Encoding' : 'gzip',
11+
'DNT' : '1', # Do Not Track Request Header
12+
'Connection' : 'close'
13+
}
14+
15+
16+
soup = BeautifulSoup(requests.get(url, headers=headers).text, 'lxml')
17+
for count,div in enumerate(soup.select('div[data-asin]')):
18+
if int(count) <= 5:
19+
if count!=0:
20+
if div.select_one('.a-text-normal'):
21+
print(str(count) + str(div.select_one('.a-text-normal').text).rstrip())
22+
if div.select_one('.a-price'):
23+
print(div.select_one('.a-price ').get_text('|',strip=True).split('|')[0])
24+
print(str(div.find_all('span', {'class':'a-icon-alt'}))[26:29] + " stars")
25+
print("\n")
26+
else:
27+
break

Python/amazon-products/ex.png

54.3 KB
Loading

0 commit comments

Comments
 (0)