Skip to content

[WORKSHOP] Web Scraping with Scrapy #7

@aaqaishtyaq

Description

@aaqaishtyaq

Abstract
This workshop is about what is web scraping and how to do web scraping using Scrapy, one of the popular Python framework for web scraping.

About
This workshop will cover the workflow of scraping a website, step by step.

  1. Reconnaissance: After deciding the kind of information we want, We find a page where we can start. We will then inspect the elements that matter to us and find out their tag (div, p, etc) and the class if necessary. Open up a scrapy shell and try to get the information we need, by accessing the corresponding element using XPath.

  2. Crawling: Then, we use this logic in our code, to extract data recursively. Typically we will jump from page to page, by extracting links that match a pattern.

  3. Aquisition: During this process, any useful information we need, say text, images, etc, will be downloaded and saved to disk.

Pre-requisites
Basic knowledge of HTML and Python would be sufficient.

Those who want to follow along must have Python(3.x) and Scrapy(1.5.0) installed

Slides

Expected Duration: ~90 minutes

Level: Beginner-Intermediate

Resources: "https://doc.scrapy.org/en/latest/index.html"

Speaker Bio: Aaqa Ishtyaq (I am final year Computer Science student. Currently doing an internship in Delhi as a Backend Developer)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions