Skip to content

feat: re-structure the JS course so that it's a flat list of numbered lessons #1579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion sources/academy/glossary/tools/apify_cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The [Apify CLI](/cli) helps you create, develop, build and run Apify Actors, and

## Installing {#installing}

To install the Apfiy CLI, you'll first need npm, which comes preinstalled with Node.js. If you haven't yet installed Node, learn how to do that [here](../../webscraping/scraping_basics_javascript/data_extraction/computer_preparation.md). Additionally, make sure you've got an Apify account, as you will need to log in to the CLI to gain access to its full potential.
To install the Apfiy CLI, you'll first need npm, which comes preinstalled with Node.js. If you haven't yet installed Node, learn how to do that [here](../../webscraping/scraping_basics_javascript/06_computer_preparation.md). Additionally, make sure you've got an Apify account, as you will need to log in to the CLI to gain access to its full potential.

Open up a terminal instance and run the following command:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Thus far, you've run Actors on the platform and written an Actor of your own, wh

## Advanced Actor overview {#advanced-actors}

In this course, we'll be working out of the Amazon scraper project from the **Web scraping basics for JavaScript devs** course. If you haven't already built that project, you can do it in three short lessons [here](../../webscraping/scraping_basics_javascript/challenge/index.md). We've made a few small modifications to the project with the Apify SDK, but 99% of the code is still the same.
In this course, we'll be working out of the Amazon scraper project from the **Web scraping basics for JavaScript devs** course. If you haven't already built that project, you can do it in three short lessons [here](../../webscraping/scraping_basics_javascript/21_challenge.md). We've made a few small modifications to the project with the Apify SDK, but 99% of the code is still the same.

Take another look at the files within your Amazon scraper project. You'll notice that there is a **Dockerfile**. Every single Actor has a Dockerfile (the Actor's **Image**) which tells Docker how to spin up a container on the Apify platform which can successfully run the Actor's code. "Apify Actors" is a serverless platform that runs multiple Docker containers. For a deeper understanding of Actor Dockerfiles, refer to the [Apify Actor Dockerfile docs](/sdk/js/docs/guides/docker-images#example-dockerfile).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Before developing a pro-level Apify scraper, there are some important things you

### Crawlee, Apify SDK, and the Apify CLI {#crawlee-apify-sdk-and-cli}

If you're feeling ambitious, you don't need to have any prior experience with Crawlee to get started with this course; however, at least 5–10 minutes of exposure is recommended. If you haven't yet tried out Crawlee, you can refer to [this lesson](../../webscraping/scraping_basics_javascript/crawling/pro_scraping.md) in the **Web scraping basics for JavaScript devs** course (and ideally follow along). To familiarize yourself with the Apify SDK, you can refer to the [Apify Platform](../apify_platform.md) category.
If you're feeling ambitious, you don't need to have any prior experience with Crawlee to get started with this course; however, at least 5–10 minutes of exposure is recommended. If you haven't yet tried out Crawlee, you can refer to [this lesson](../../webscraping/scraping_basics_javascript/18_pro_scraping.md) in the **Web scraping basics for JavaScript devs** course (and ideally follow along). To familiarize yourself with the Apify SDK, you can refer to the [Apify Platform](../apify_platform.md) category.

The Apify CLI will play a core role in the running and testing of the Actor you will build, so if you haven't gotten it installed already, please refer to [this short lesson](../../glossary/tools/apify_cli.md).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ try {
}
```

Read more information about logging and error handling in our developer [best practices](../../webscraping/scraping_basics_javascript/best_practices.md) section.
Read more information about logging and error handling in our developer [best practices](../../webscraping/scraping_basics_javascript/25_best_practices.md) section.

### Saving snapshots {#saving-snapshots}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ If you're in a brand new project, don't forget to initialize your project, then
npm init -y && npm i crawlee
```

Now, let's write some data extraction code to extract each product's data. This should look familiar if you went through the [Data Extraction](../../webscraping/scraping_basics_javascript/data_extraction/index.md) lessons:
Now, let's write some data extraction code to extract each product's data. This should look familiar if you went through the [Data Extraction](../../webscraping/scraping_basics_javascript/02_data_extraction.md) lessons:

```js
import { CheerioCrawler } from 'crawlee';
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,13 @@ slug: /anti-scraping/mitigation/using-proxies

---

In the [**Web scraping basics for JavaScript devs**](../../scraping_basics_javascript/crawling/pro_scraping.md) course, we learned about the power of Crawlee, and how it can streamline the development process of web crawlers. You've already seen how powerful the `crawlee` package is; however, what you've been exposed to thus far is only the tip of the iceberg.
In the [**Web scraping basics for JavaScript devs**](../../scraping_basics_javascript/18_pro_scraping.md) course, we learned about the power of Crawlee, and how it can streamline the development process of web crawlers. You've already seen how powerful the `crawlee` package is; however, what you've been exposed to thus far is only the tip of the iceberg.

Because proxies are so widely used in the scraping world, Crawlee has built-in features for implementing them in an effective way. One of the main functionalities that comes baked into Crawlee is proxy rotation, which is when each request is sent through a different proxy from a proxy pool.

## Implementing proxies in a scraper {#implementing-proxies}

Let's borrow some scraper code from the end of the [pro-scraping](../../scraping_basics_javascript/crawling/pro_scraping.md) lesson in our **Web scraping basics for JavaScript devs** course and paste it into a new file called **proxies.js**. This code enqueues all of the product links on [demo-webstore.apify.org](https://demo-webstore.apify.org)'s on-sale page, then makes a request to each product page and scrapes data about each one:
Let's borrow some scraper code from the end of the [pro-scraping](../../scraping_basics_javascript/18_pro_scraping.md) lesson in our **Web scraping basics for JavaScript devs** course and paste it into a new file called **proxies.js**. This code enqueues all of the product links on [demo-webstore.apify.org](https://demo-webstore.apify.org)'s on-sale page, then makes a request to each product page and scrapes data about each one:

```js
// crawlee.js
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ import TabItem from '@theme/TabItem';

---

Now that we know how to execute scripts on a page, we're ready to learn a bit about [data extraction](../../scraping_basics_javascript/data_extraction/index.md). In this lesson, we'll be scraping all the on-sale products from our [Fakestore](https://demo-webstore.apify.org/search/on-sale) website. Playwright & Puppeteer offer two main methods for data extraction:
Now that we know how to execute scripts on a page, we're ready to learn a bit about [data extraction](../../scraping_basics_javascript/02_data_extraction.md). In this lesson, we'll be scraping all the on-sale products from our [Fakestore](https://demo-webstore.apify.org/search/on-sale) website. Playwright & Puppeteer offer two main methods for data extraction:

1. Directly in `page.evaluate()` and other evaluate functions such as `page.$$eval()`.
2. In the Node.js context using a parsing library such as [Cheerio](https://www.npmjs.com/package/cheerio)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ npm install puppeteer
</TabItem>
</Tabs>

> For a more in-depth guide on how to set up the basic environment we'll be using in this tutorial, check out the [**Computer preparation**](../scraping_basics_javascript/data_extraction/computer_preparation.md) lesson in the **Web scraping basics for JavaScript devs** course
> For a more in-depth guide on how to set up the basic environment we'll be using in this tutorial, check out the [**Computer preparation**](../scraping_basics_javascript/06_computer_preparation.md) lesson in the **Web scraping basics for JavaScript devs** course

## Course overview {#course-overview}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ With `page.click()`, Puppeteer and Playwright actually drag the mouse and click,

Notice that in the Playwright example, we are using a different selector than in the Puppeteer example. This is because Playwright supports [many custom CSS selectors](https://playwright.dev/docs/other-locators#css-elements-matching-one-of-the-conditions), such as the **has-text** pseudo class. As a rule of thumb, using text selectors is much more preferable to using regular selectors, as they are much less likely to break. If Google makes the sibling above the **Accept all** button a `<div>` element instead of a `<button>` element, our `button + button` selector will break. However, the button will always have the text **Accept all**; therefore, `button:has-text("Accept all")` is more reliable.

> If you're not already familiar with CSS selectors and how to find them, we recommend referring to [this lesson](../../scraping_basics_javascript/data_extraction/using_devtools.md) in the **Web scraping basics for JavaScript devs** course.
> If you're not already familiar with CSS selectors and how to find them, we recommend referring to [this lesson](../../scraping_basics_javascript/04_using_devtools.md) in the **Web scraping basics for JavaScript devs** course.

Then, we can type some text into an input field `<textarea>` with `page.type()`; passing a CSS selector as the first, and the string to input as the second parameter:

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
---
title: Introduction
description: Start learning about web scraping, web crawling, data extraction, and popular tools to start developing your own scraper.
sidebar_position: 1.1
category: courses
slug: /web-scraping-for-beginners/introduction
---
Expand Down Expand Up @@ -30,4 +29,4 @@ We use web scraping as an umbrella term for crawling, web data extraction and al

## Next up {#next}

In the [next lesson](./data_extraction/index.md), you will learn about the basic building blocks of each web page. HTML, CSS and JavaScript.
In the [next lesson](./02_data_extraction.md), you will learn about the basic building blocks of each web page. HTML, CSS and JavaScript.
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
---
title: Basics of data extraction
title: Data extraction
description: Learn about HTML, CSS, and JavaScript, the basic building blocks of a website, and how to use them in web scraping and data extraction.
sidebar_position: 1.2
category: courses
slug: /web-scraping-for-beginners/data-extraction
---
Expand Down Expand Up @@ -34,4 +33,4 @@ HTML and CSS give websites their structure and style, but they are static. To be

## Next up {#next}

We will show you [how to use the browser DevTools](./browser_devtools.md) to inspect and interact with a web page.
We will show you [how to use the browser DevTools](./03_browser_devtools.md) to inspect and interact with a web page.
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
---
title: Starting with browser DevTools
title: "Data extraction: Starting with browser DevTools"
description: Learn about browser DevTools, a valuable tool in the world of web scraping, and how you can use them to extract data from a website.
sidebar_position: 1
slug: /web-scraping-for-beginners/data-extraction/browser-devtools
---

Expand All @@ -15,7 +14,7 @@ Even though DevTools stands for developer tools, everyone can use them to inspec

## Elements tab {#elements-tab}

When you first open Chrome DevTools on Wikipedia, you will start on the Elements tab (In Firefox it's called the **Inspector**). You can use this tab to inspect the page's HTML on the left hand side, and its CSS on the right. The items in the HTML view are called [**elements**](../../../glossary/concepts/html_elements.md).
When you first open Chrome DevTools on Wikipedia, you will start on the Elements tab (In Firefox it's called the **Inspector**). You can use this tab to inspect the page's HTML on the left hand side, and its CSS on the right. The items in the HTML view are called [**elements**](../../glossary/concepts/html_elements.md).

![Elements tab in Chrome DevTools](./images/browser-devtools-elements-tab.png)

Expand Down Expand Up @@ -67,6 +66,6 @@ By changing HTML elements from the Console, you can change what's displayed on t

## Next up {#next}

In this lesson, we learned the absolute basics of interaction with a page using the DevTools. In the [next lesson](./using_devtools.md), you will learn how to extract data from it. We will extract data about the on-sale products on the [Warehouse store](https://warehouse-theme-metal.myshopify.com).
In this lesson, we learned the absolute basics of interaction with a page using the DevTools. In the [next lesson](./04_using_devtools.md), you will learn how to extract data from it. We will extract data about the on-sale products on the [Warehouse store](https://warehouse-theme-metal.myshopify.com).

It isn't a real store, but a full-featured demo of a Shopify online store. And that is perfect for our purposes. Shopify is one of the largest e-commerce platforms in the world, and it uses all the latest technologies that a real e-commerce web application would use. Learning to scrape a Shopify store is useful, because you can immediately apply the learnings to millions of websites.
Original file line number Diff line number Diff line change
@@ -1,15 +1,14 @@
---
title: Finding elements with DevTools
title: "Data extraction: Finding elements with DevTools"
description: Learn how to use browser DevTools, CSS selectors, and JavaScript via the DevTools console to extract data from a website.
sidebar_position: 2
slug: /web-scraping-for-beginners/data-extraction/using-devtools
---

**Learn how to use browser DevTools, CSS selectors, and JavaScript via the DevTools console to extract data from a website.**

---

With the knowledge of the basics of DevTools we can finally try doing something more practical - extracting data from a website. Let's try collecting the on-sale products from the [Warehouse store](https://warehouse-theme-metal.myshopify.com/). We will use [CSS selectors](../../../glossary/concepts/css_selectors.md), JavaScript, and DevTools to achieve this task.
With the knowledge of the basics of DevTools we can finally try doing something more practical - extracting data from a website. Let's try collecting the on-sale products from the [Warehouse store](https://warehouse-theme-metal.myshopify.com/). We will use [CSS selectors](../../glossary/concepts/css_selectors.md), JavaScript, and DevTools to achieve this task.

> **Why use a Shopify demo and not a real e-commerce store like Amazon?** Because real websites are usually bulkier, littered with promotions, and they change very often. Many have multiple versions of pages, and you never know in advance which one you will get. It will be important to learn how to deal with these challenges in the future, but for this beginner course, we want to have a light and stable environment.
>
Expand Down Expand Up @@ -39,7 +38,7 @@ Now that we know how the parent element looks, we can extract its data, includin

## Selecting elements in Console {#selecting-elements}

We know how to find an element manually using the DevTools, but that's not very useful for automated scraping. We need to tell the computer how to find it as well. We can do that using JavaScript and CSS selectors. The function to do that is called [`document.querySelector()`](../../../glossary/concepts/querying_css_selectors.md) and it will find the first element in the page's HTML matching the provided [CSS selector](../../../glossary/concepts/css_selectors.md).
We know how to find an element manually using the DevTools, but that's not very useful for automated scraping. We need to tell the computer how to find it as well. We can do that using JavaScript and CSS selectors. The function to do that is called [`document.querySelector()`](../../glossary/concepts/querying_css_selectors.md) and it will find the first element in the page's HTML matching the provided [CSS selector](../../glossary/concepts/css_selectors.md).

For example `document.querySelector('div')` will find the first `<div>` element. And `document.querySelector('.my-class')` (notice the period `.`) will find the first element with the class `my-class`, such as `<div class="my-class">` or `<p class="my-class">`.

Expand All @@ -65,7 +64,7 @@ When we look more closely by hovering over the result in the Console, we find th

![Hover over a query result](./images/devtools-collection-query-hover.png)

We need a different function: [`document.querySelectorAll()`](../../../glossary/concepts/querying_css_selectors.md) (notice the `All` at the end). This function does not find only the first element, but all the elements that match the provided selector.
We need a different function: [`document.querySelectorAll()`](../../glossary/concepts/querying_css_selectors.md) (notice the `All` at the end). This function does not find only the first element, but all the elements that match the provided selector.

Run the following function in the Console:

Expand Down Expand Up @@ -204,4 +203,4 @@ price.textContent.match(/((\d+,?)+.?(\d+)?)/)[0];

## Next up {#next}

This concludes our lesson on extracting and cleaning data using DevTools. Using CSS selectors, we were able to find the HTML element that contains data about our favorite Sony subwoofer and then extract the data. In the [next lesson](./devtools_continued.md), we will learn how to extract information not only about the subwoofer, but about all the products on the page.
This concludes our lesson on extracting and cleaning data using DevTools. Using CSS selectors, we were able to find the HTML element that contains data about our favorite Sony subwoofer and then extract the data. In the [next lesson](./05_devtools_continued.md), we will learn how to extract information not only about the subwoofer, but about all the products on the page.
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
---
title: Extracting data with DevTools
title: "Data extraction: Extracting data with DevTools"
description: Continue learning how to extract data from a website using browser DevTools, CSS selectors, and JavaScript via the DevTools console.
sidebar_position: 3
slug: /web-scraping-for-beginners/data-extraction/devtools-continued
---

Expand Down Expand Up @@ -94,4 +93,4 @@ And third, we wrapped this data extraction logic in a **loop** to automatically

And that's it! With a bit of trial and error, you will be able to extract data from any webpage that's loaded in your browser. This is a useful skill on its own. It will save you time copy-pasting stuff when you need data for a project.

More importantly though, it taught you the basics to start programming your own scrapers. In the [next lessons](./computer_preparation.md), we will teach you how to create your own web data extraction script using JavaScript and Node.js.
More importantly though, it taught you the basics to start programming your own scrapers. In the [next lessons](./06_computer_preparation.md), we will teach you how to create your own web data extraction script using JavaScript and Node.js.
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
---
title: Computer preparation
title: "Data extraction: Computer preparation"
description: Set up your computer to be able to code scrapers with Node.js and JavaScript. Download Node.js and npm and run a Hello World script.
sidebar_position: 4
slug: /web-scraping-for-beginners/data-extraction/computer-preparation
---

Expand Down Expand Up @@ -64,4 +63,4 @@ You should see **Hello World** printed in your terminal. If you do, congratulati

## Next up {#next}

You have your computer set up correctly for development, and you've run your first script. Great! In the [next lesson](./project_setup.md) we'll set up your project to download a website's HTML using Node.js instead of a browser.
You have your computer set up correctly for development, and you've run your first script. Great! In the [next lesson](./07_project_setup.md) we'll set up your project to download a website's HTML using Node.js instead of a browser.
Loading
Loading