-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Abstract
This workshop is about what is web scraping and how to do web scraping using Scrapy, one of the popular Python framework for web scraping.
About
This workshop will cover the workflow of scraping a website, step by step.
-
Reconnaissance: After deciding the kind of information we want, We find a page where we can start. We will then inspect the elements that matter to us and find out their tag (div, p, etc) and the class if necessary. Open up a scrapy shell and try to get the information we need, by accessing the corresponding element using XPath.
-
Crawling: Then, we use this logic in our code, to extract data recursively. Typically we will jump from page to page, by extracting links that match a pattern.
-
Aquisition: During this process, any useful information we need, say text, images, etc, will be downloaded and saved to disk.
Pre-requisites
Basic knowledge of HTML and Python would be sufficient.
Those who want to follow along must have Python(3.x) and Scrapy(1.5.0) installed
Expected Duration: ~90 minutes
Level: Beginner-Intermediate
Resources: "https://doc.scrapy.org/en/latest/index.html"
Speaker Bio: Aaqa Ishtyaq (I am final year Computer Science student. Currently doing an internship in Delhi as a Backend Developer)