feat: new course kick off: Scraping with Apify and AI#2275
feat: new course kick off: Scraping with Apify and AI#2275honzajavorek wants to merge 47 commits intomasterfrom
Conversation
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
7b24b4d to
983c801
Compare
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
0780c18 to
f6d6567
Compare
|
Preview for this PR was built for commit |
|
I think that the first article should be about just asking AI to create an Actor and then copy/paste it into the built-in Web IDE. That's the quickest way to get somewhere. If I open the course, the first heading is "Install Node.js", the second heading is "Install Apify CLI" - I'm already like "too complicated, I'm outta here". The current first article could be the next step. |
f6d6567 to
b71a606
Compare
|
Preview for this PR was built for commit |
b71a606 to
f6d6567
Compare
|
Preview for this PR was built for commit |
|
@tomnosek I agree starting with "install Node.js" isn't ideal and I'm aware of it. I didn't start with what you suggest, because:
I think it's worth finding the easiest way to start, so I'll spend some time exploring if we could really start with the Web IDE and plain ChatGPT, and what would the workflow could be. The capabilities of ChatGPT are limited and it's a bit of struggle, but I think it's a really important limitation as most people know only ChatGPT. It really should be the starting point. We pick them up where they already are and bring them to more advanced patterns. Hopefully I can find a way. |
|
@honzajavorek I agree with starting with ChatGPT - at least for now, it's the intro to AI/LLM for a broader audience. Maybe it'll be Gemini in the future, but I'm happy with your choice. What worked well for me in the past was to literally say in the message that I'm going to be using Apify Console to run it with Web IDE for the code and that this is the link to the template to use and this is the link to the docs to use. |
f6d6567 to
b503490
Compare
f3af136 to
f800e47
Compare
|
A PR to update the Python client models has been created: apify/apify-client-python#713 This was automatically triggered by OpenAPI specification changes in this PR. |
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
|
Preview for this PR was built for commit |
szaganek
left a comment
There was a problem hiding this comment.
I left some suggestions plus asked two questions about the order of presenting information, but I think it's shaping up to be a fantastic course. I'll also follow the whole thing as a student and come back with feedback if my results differ :)
There's one more thing on my mind, have you considered using imperative from time to time (Navigate to X vs. We'll navigate to X or Click X vs. We'll click X) to make the content flow a little better, make it sound more instructional, and avoid future tense when it's not really necessary?
|
|
||
| import DocCardList from '@theme/DocCardList'; | ||
|
|
||
| **Learn how to use AI to extract information from websites in this practical course, starting from the absolute basics.** |
There was a problem hiding this comment.
| **Learn how to use AI to extract information from websites in this practical course, starting from the absolute basics.** | |
| **Learn how to use AI to extract information from websites, starting from the absolute basics.** |
|
|
||
| --- | ||
|
|
||
| In this course we'll use AI assistants to create an application for watching prices. It'll be able to scrape product pages of an e-commerce website and record prices. Data from several runs of such program would be useful for seeing trends in price changes, detecting discounts, etc. |
There was a problem hiding this comment.
| In this course we'll use AI assistants to create an application for watching prices. It'll be able to scrape product pages of an e-commerce website and record prices. Data from several runs of such program would be useful for seeing trends in price changes, detecting discounts, etc. | |
| In this practical course, we'll use AI assistants to create an application for watching prices. It'll be able to scrape product pages of an e-commerce website and record prices. Data gathered from several runs of such program would be useful for seeing trends in price changes or detecting discounts. |
| ## What we'll do | ||
|
|
||
| - Use ChatGPT (AI chat) to create a program which extracts data from a web page. | ||
| - Save extracted data in various formats, e.g. CSV which MS Excel or Google Sheets can open. |
There was a problem hiding this comment.
| - Save extracted data in various formats, e.g. CSV which MS Excel or Google Sheets can open. | |
| - Save extracted data in various formats, for example CSV which MS Excel or Google Sheets can open. |
| - Use ChatGPT (AI chat) to create a program which extracts data from a web page. | ||
| - Save extracted data in various formats, e.g. CSV which MS Excel or Google Sheets can open. | ||
| - Use Cursor (AI agent) to improve the program so that it is robust and maintainable. | ||
| - Save time and effort with Apify's scraping platform. |
There was a problem hiding this comment.
| - Save time and effort with Apify's scraping platform. | |
| - Save time and effort with the Apify platform. |
| - Use Cursor (AI agent) to improve the program so that it is robust and maintainable. | ||
| - Save time and effort with Apify's scraping platform. | ||
|
|
||
| ## Who this course is for |
There was a problem hiding this comment.
| ## Who this course is for | |
| ## Who is this course for |
|
|
||
| Try it! The generated code will most likely work out of the box, but the resulting program will still have a few caveats. Some are usability issues: | ||
|
|
||
| - _User-operated:_ We have to run the scraper ourselves. If we're tracking price trends, we'd need to remember to run it daily. If we want, for example, alerts for big discounts, manually running the program isn't much better than just checking the site in a browser every day. |
There was a problem hiding this comment.
| - _User-operated:_ We have to run the scraper ourselves. If we're tracking price trends, we'd need to remember to run it daily. If we want, for example, alerts for big discounts, manually running the program isn't much better than just checking the site in a browser every day. | |
| - _User-operated:_ We have to run the scraper ourselves. If we're tracking price trends, we need to remember to run it daily. If we want, for example, alerts for big discounts, manually running the program isn't much better than just checking the site in a browser every day. |
| Try it! The generated code will most likely work out of the box, but the resulting program will still have a few caveats. Some are usability issues: | ||
|
|
||
| - _User-operated:_ We have to run the scraper ourselves. If we're tracking price trends, we'd need to remember to run it daily. If we want, for example, alerts for big discounts, manually running the program isn't much better than just checking the site in a browser every day. | ||
| - _Manual data management:_ Tracking prices over time means figuring out how to organize the exported data ourselves. Processing the data could also be tricky since different analysis tools often require different formats. |
There was a problem hiding this comment.
| - _Manual data management:_ Tracking prices over time means figuring out how to organize the exported data ourselves. Processing the data could also be tricky since different analysis tools often require different formats. | |
| - _Manual data management:_ Tracking prices over time means figuring out how to organize the exported data ourselves. Processing the data could also be tricky, since different analysis tools often require different formats. |
|
|
||
| The Actor's detail page has plenty of tabs and settings, but for now we'll stay at **Source** → **Code**. That's where the **Web IDE** is. | ||
|
|
||
| IDE stands for _integrated development environment_. Fear not, it's just jargon for ‘an app for editing code, somewhat comfortably’. In the Web IDE, we can browse the files the Actor is made of, and change their contents. |
There was a problem hiding this comment.
| IDE stands for _integrated development environment_. Fear not, it's just jargon for ‘an app for editing code, somewhat comfortably’. In the Web IDE, we can browse the files the Actor is made of, and change their contents. | |
| IDE stands for _integrated development environment_. Fear not, it's just jargon for “an app for editing code, somewhat comfortably”. In the Web IDE, we can browse the files the Actor is made of, and change their contents. |
|
|
||
| ## Creating a new Actor | ||
|
|
||
| Your phone runs apps, Apify runs Actors. If we want Apify to run something for us, it must be wrapped in the Actor structure. Conveniently, the platform provides ready-made templates we can use. |
There was a problem hiding this comment.
This first paragraph feels a little out of place, wouldn't it make more sense before line 57?
|
|
||
| ::: | ||
|
|
||
| First, let's navigate through the tabs to **Source** → **Input**, where we can change what the Actor takes as input. The sample scraper walks through whatever website we give it in the **Start URLs** field. We'll change it to this URL: |
There was a problem hiding this comment.
Shouldn't this part move to the Scraping products section? I'm missing how it's relevant to ChatGPT.

Puts #2174 into action.
Course structure
unlisted, so we can do continuous deployment, but it's all hidden from the visitors.academy/platformso that the numbers make some sense after @TC-MO has previously nuked or moved most of the other content.First lesson
Roasting time! 🍖 @tomnosek @patrikbraborec I guess I won't ask you for a review under each and every PR of the new course, but I think we should do this for the kick off to sync expectations. We had some initial discussions, then I've made a few educated guesses and decisions on how to approach this, and in the end I thought it's best if I just create the first lesson right away and let's see what you think about it.
The lesson is also a result of a few dead ends I had the "pleasure" to already explore 😅 I tried to design the lesson for people who are not familiar with coding, but can do basic work on the computer including running commands in the terminal. We could go lower, but then I'd be explaining the terminal itself, or how to copy and paste things, and I don't know if that's useful for this kind of content.
Note
Low Risk
Low risk: documentation-only changes (new unlisted course pages plus sidebar reordering) with no product code or runtime behavior impact.
Overview
Introduces a new unlisted academy course,
Scraping with Apify and AI, including a course landing page plus six lesson pages; lesson 1 is fully drafted (walkthrough using ChatGPT + Apify CLI to create a simple Shopify price scraper), while lessons 2–6 are placeholders marked under construction.Reorders
academy/platformnavigation by adjusting multiplesidebar_positionvalues, and updates the docs vocabulary allowlist to includecrawlee.devfor linting/spellcheck.Written by Cursor Bugbot for commit 0780c18. Configure here.