Skip to content

Commit 127b27a

Browse files
protoss70TC-MO
andauthored
feat: n8n user docs for WCC actor app (#1763)
Added docs for the new n8n WCC single actor app I am developing. Most of the text is copy pasted from Make.com ai crawling documentation <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Adds comprehensive docs for the n8n Website Content Crawler module and updates image paths/notes in existing n8n integration docs. > > - **Docs (n8n)**: > - **New page**: `sources/platform/integrations/workflows-and-notifications/n8n/website-content-crawler.md` > - Covers prerequisites, Cloud/self‑hosted install and connect, key features, config options, sample output, and AI Agent usage for Website Content Crawler by Apify. > - **Existing page update**: `sources/platform/integrations/workflows-and-notifications/n8n/index.md` > - Fix image paths from `../images/...` to `../../images/...`. > - Refine OAuth/API key sections and add note titled "Credential Control." > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit a8832f2. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Michał Olender <[email protected]>
1 parent 6dcf4e5 commit 127b27a

File tree

9 files changed

+189
-11
lines changed

9 files changed

+189
-11
lines changed
45.5 KB
Loading
33.4 KB
Loading
87.8 KB
Loading
107 KB
Loading
465 KB
Loading
137 KB
Loading
73 KB
Loading

sources/platform/integrations/workflows-and-notifications/n8n.md renamed to sources/platform/integrations/workflows-and-notifications/n8n/index.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ If you're running a self-hosted n8n instance, you can install the Apify communit
3232
1. Agree to the [risks](https://docs.n8n.io/integrations/community-nodes/risks/) of using community nodes and select **Install**.
3333
1. You can now use the node in your workflows.
3434

35-
![Apify Install Node](../images/n8n-install-node-self-hosted.png)
35+
![Apify Install Node](../../images/n8n-install-node-self-hosted.png)
3636

3737
## Install the Apify Node (n8n Cloud)
3838

@@ -42,7 +42,7 @@ For n8n Cloud users, installation is even simpler and doesn't require manual pac
4242
1. Search for **Apify** in the community node registry
4343
1. Click **Install node** to add the Apify node to your instance
4444

45-
![Apify Install Node](../images/n8n-install-node-cloud.png)
45+
![Apify Install Node](../../images/n8n-install-node-cloud.png)
4646

4747
:::note Verified community nodes visibility
4848

@@ -63,7 +63,7 @@ The Apify node offers two authentication methods to securely connect to your Api
6363
1. Enter your Apify API token. (find it in the [Apify Console](https://console.apify.com/settings/integrations)).
6464
1. Click **Save**.
6565

66-
![Apify Auth](../images/n8n-api-auth.png)
66+
![Apify Auth](../../images/n8n-api-auth.png)
6767

6868
### OAuth2 (cloud instance only)
6969

@@ -72,9 +72,9 @@ The Apify node offers two authentication methods to securely connect to your Api
7272
1. Select **Connect my account** and authorize with your Apify account.
7373
1. n8n automatically retrieves and stores the OAuth2 tokens.
7474

75-
![Apify Auth](../images/n8n-oauth.png)
76-
77-
:::note
75+
![Apify Auth](../../images/n8n-oauth.png)
76+
77+
:::note Credential Control
7878

7979
For simplicity on n8n Cloud, use the API key method if you prefer manual control over credentials.
8080

@@ -92,7 +92,7 @@ Start by building a basic workflow in n8n, then add the Apify node to handle tas
9292
1. In the node's **Credentials** dropdown, choose the Apify credential you configured earlier. If you haven't configured any credentials, you can do so in this step. The process will be the same.
9393
1. You can now use Apify node as a trigger or action in your workflow.
9494

95-
![Apify Node](../images/n8n-list-of-operations.png)
95+
![Apify Node](../../images/n8n-list-of-operations.png)
9696

9797
## Use Apify node as trigger
9898

@@ -107,7 +107,7 @@ Triggers let your workflow respond automatically to events in Apify, such as whe
107107
1. Add subsequent nodes (e.g., HTTP Request, Google Sheets) to process or store the output.
108108
1. Save and execute the workflow.
109109

110-
![Apify Node](../images/n8n-trigger-example.png)
110+
![Apify Node](../../images/n8n-trigger-example.png)
111111

112112
## Use Apify node as an action
113113

@@ -122,13 +122,13 @@ Actions allow you to perform operations like running an Actor within a workflow.
122122
- **Memory**: Amount of memory allocated for the Actor run, in megabytes
123123
- **Build Tag**: Specifies the Actor build tag to run. By default, the run uses the build specified in the default run configuration for the Actor (typically `latest`)
124124
- **Wait for finish**: Whether to wait for the run to finish before continuing. If true, the node will wait for the run to complete (successfully or not) before moving to the next node
125-
![Apify Node](../images/n8n-run-actor-example.png)
125+
![Apify Node](../../images/n8n-run-actor-example.png)
126126
1. Add another Apify operation called **Get Dataset Items**.
127127
- Set **Dataset ID** parameter as **defaultDatasetId** value received from the previous **Run Actor** node. This will give you the output of the Actor run
128-
![Apify Node](../images/n8n-get-dataset-items-example.png)
128+
![Apify Node](../../images/n8n-get-dataset-items-example.png)
129129
1. Add any subsequent nodes (e.g. Google Sheets) to process or store the output
130130
1. Save and execute the workflow
131-
![Apify Node](../images/n8n-workflow-example.png)
131+
![Apify Node](../../images/n8n-workflow-example.png)
132132

133133
## Available Operations
134134

Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
---
2+
title: n8n - Website Content Crawler by Apify
3+
description: Learn about Website Content Crawler module.
4+
sidebar_label: Website Content Crawler
5+
sidebar_position: 6
6+
slug: /integrations/n8n/website-content-crawler
7+
toc_max_heading_level: 4
8+
---
9+
10+
Website Content Crawler from [Apify](https://apify.com/apify/website-content-crawler) lets you extract text content from websites to feed AI models, LLM applications, vector databases, or Retrieval Augmented Generation (RAG) pipelines. It supports rich formatting using Markdown, cleans the HTML of irrelevant elements, downloads linked files, and integrates with AI ecosystems like Langchain, LlamaIndex, and other LLM frameworks.
11+
12+
To use these modules, you need an [API token](https://docs.apify.com/platform/integrations/api#api-token). You can find your token in the [Apify Console](https://console.apify.com/) under **Settings > Integrations**. After connecting, you can automate content extraction at scale and incorporate the results into your AI workflows.
13+
14+
## Prerequisites
15+
16+
Before you begin, make sure you have:
17+
18+
- An [Apify account](https://console.apify.com/)
19+
- An [n8n instance](https://docs.n8n.io/getting-started/) (self‑hosted or cloud)
20+
21+
## n8n Cloud setup
22+
23+
This section explains how to install and connect the Apify node when using n8n Cloud.
24+
25+
### Install
26+
27+
For n8n Cloud users, installation is even simpler and doesn't require manual package entry. Just search and add the node from the canvas.
28+
29+
1. Go to the **Canvas** and open the **nodes panel**
30+
1. Search for **Website Content Crawler by Apify** in the community node registry
31+
1. Click **Install node** to add the Apify node to your instance
32+
33+
![Website Content Crawler by Apify on n8n](images/operations.png)
34+
35+
:::note Verified community nodes visibility
36+
37+
On n8n Cloud, instance owners can toggle visibility of verified community nodes in the Cloud Admin Panel. Ensure this setting is enabled to install the Website Content Crawler by Apify node.
38+
39+
:::
40+
41+
### Connect
42+
43+
1. In n8n Cloud, select **Create Credential**.
44+
1. Search for Apify OAuth2 API and select **Continue**.
45+
1. Select **Connect my account** and authorize with your Apify account.
46+
1. n8n automatically retrieves and stores the OAuth2 tokens.
47+
48+
![Apify Auth](images/credentials.png)
49+
50+
:::note Cloud API Key management
51+
52+
On n8n Cloud, you can use the API key method if you prefer to manage your credentials manually.
53+
See the [**Connect** section for n8n self-hosted](#connect-self-hosted) for detailed API configuration instructions.
54+
55+
:::
56+
57+
With authentication set up, you can now create workflows that incorporate the Apify node.
58+
59+
60+
## n8n self-hosted setup
61+
62+
This section explains how to install and connect the Apify node when running your own n8n instance.
63+
64+
### Install
65+
66+
If you're running a self-hosted n8n instance, you can install the Apify community node directly from the editor. This process adds the node to your available tools, enabling Apify operations in workflows.
67+
68+
1. Open your n8n instance.
69+
1. Go to **Settings > Community Nodes**.
70+
1. Select **Install**.
71+
1. Enter the npm package name: `@apify/n8n-nodes-apify-content-crawler` (for latest version). To install a specific [version](https://www.npmjs.com/package/@apify/n8n-nodes-apify-content-crawler?activeTab=versions) enter e.g `@apify/[email protected]`.
72+
1. Agree to the [risks](https://docs.n8n.io/integrations/community-nodes/risks/) of using community nodes and select **Install**.
73+
1. You can now use the node in your workflows.
74+
75+
![Apify Install Node](images/install.png)
76+
77+
<a id="connect-self-hosted"></a>
78+
79+
### Connect
80+
81+
1. Create an account at [Apify](https://console.apify.com/). You can sign up using your email, Gmail, or GitHub account.
82+
83+
![Sign up page](../make/images/ai-crawling/wcc-signup.png)
84+
85+
1. To connect your Apify account to n8n, you can use an OAuth connection (recommended) or an Apify API token. To get the Apify API token, navigate to **[Settings > API & Integrations](https://console.apify.com/settings/integrations)** in the Apify Console.
86+
87+
![Apify Console token for n8n](../make/images/Apify_Console_token_for_Make.png)
88+
89+
1. Find your token under **Personal API tokens** section. You can also create a new API token with multiple customizable permissions by clicking on **+ Create a new token**.
90+
1. Click the **Copy** icon next to your API token to copy it to your clipboard. Then, return to your n8n workflow interface.
91+
92+
![Apify token on n8n](../make/images/Apify_token_on_Make.png)
93+
94+
1. In n8n, click **Create new credential** of the chosen Apify Scraper module.
95+
1. In the **API key** field, paste the API token you copied from Apify and click **Save**.
96+
97+
![Apify token on n8n](images/token.png)
98+
99+
100+
## Website Content Crawler by Apify module
101+
102+
This module provides complete control over the content extraction process, allowing you to fine-tune every aspect of the crawling and transformation pipeline. This module is ideal for complex websites, JavaScript-heavy applications, or when you need precise control over content extraction.
103+
104+
### Key features
105+
106+
- _Multiple Crawler Options_: Choose between headless browsers (Playwright) or faster HTTP clients (Cheerio)
107+
- _Custom Content Selection_: Specify exactly which elements to keep or remove
108+
- _Advanced Navigation Control_: Set crawling depth, scope, and URL patterns
109+
- _Dynamic Content Handling_: Wait for JavaScript-rendered content to load
110+
- _Interactive Element Support_: Click expandable sections to reveal hidden content
111+
- _Multiple Output Formats_: Save content as Markdown, HTML, or plain text
112+
- _Proxy Configuration_: Use proxies to handle geo-restrictions or avoid IP blocks
113+
- _Content Transformation Options_: Multiple algorithms for optimal content extraction
114+
115+
### How it works
116+
117+
the Website Content Crawler by Apify module provides granular control over the entire crawling process. For _Crawler selection_, you can choose from Playwright (Firefox/Chrome) or Cheerio, depending on the complexity of the target website. _URL management_ allows you to define the crawling scope with include and exclude URL patterns. You can also exercise precise _DOM manipulation_ by controlling which HTML elements to keep or remove. To ensure the best results, you can apply specialized algorithms for _Content transformation_ and select from various _Output formatting_ options for better AI model compatibility.
118+
119+
### Output data
120+
121+
For each crawled web page, you'll receive:
122+
123+
- _Page metadata_: URL, title, description, canonical URL
124+
- _Cleaned text content_: The main article content with irrelevant elements removed
125+
- _Markdown formatting_: Structured content with headers, lists, links, and other formatting preserved
126+
- _Crawl information_: Loaded URL, referrer URL, timestamp, HTTP status
127+
- _Optional file downloads_: PDFs, DOCs, and other linked documents
128+
- _Multiple format options_: Content in Markdown, HTML, or plain text
129+
- _Debug information_: Detailed extraction diagnostics and snapshots
130+
- _HTML transformations_: Results from different content extraction algorithms
131+
- _File storage options_: Flexible storage for HTML, screenshots, or downloaded files
132+
133+
```json title="Sample output (shortened)"
134+
{
135+
"url": "https://docs.apify.com/academy/web-scraping-for-beginners",
136+
"crawl": {
137+
"loadedUrl": "https://docs.apify.com/academy/web-scraping-for-beginners",
138+
"loadedTime": "2025-04-22T14:33:20.514Z",
139+
"referrerUrl": "https://docs.apify.com/academy",
140+
"depth": 1,
141+
"httpStatusCode": 200
142+
},
143+
"metadata": {
144+
"canonicalUrl": "https://docs.apify.com/academy/web-scraping-for-beginners",
145+
"title": "Web scraping for beginners | Apify Documentation",
146+
"description": "Learn the basics of web scraping with a step-by-step tutorial and practical exercises.",
147+
"languageCode": "en",
148+
"markdown": "# Web scraping for beginners\n\nWelcome to our comprehensive web scraping tutorial for beginners. This guide will take you through the fundamentals of extracting data from websites, with practical examples and exercises.\n\n## What is web scraping?\n\nWeb scraping is the process of extracting data from websites. It involves making HTTP requests to web servers, downloading HTML pages, and parsing them to extract the desired information.\n\n## Why learn web scraping?\n\n- **Data collection**: Gather information for research, analysis, or business intelligence\n- **Automation**: Save time by automating repetitive data collection tasks\n- **Integration**: Connect web data with your applications or databases\n- **Monitoring**: Track changes on websites automatically\n\n## Getting started\n\nTo begin web scraping, you'll need to understand the basics of HTML, CSS selectors, and HTTP. This tutorial will guide you through these concepts step by step.\n\n...",
149+
"text": "Web scraping for beginners\n\nWelcome to our comprehensive web scraping tutorial for beginners. This guide will take you through the fundamentals of extracting data from websites, with practical examples and exercises.\n\nWhat is web scraping?\n\nWeb scraping is the process of extracting data from websites. It involves making HTTP requests to web servers, downloading HTML pages, and parsing them to extract the desired information.\n\nWhy learn web scraping?\n\n- Data collection: Gather information for research, analysis, or business intelligence\n- Automation: Save time by automating repetitive data collection tasks\n- Integration: Connect web data with your applications or databases\n- Monitoring: Track changes on websites automatically\n\nGetting started\n\nTo begin web scraping, you'll need to understand the basics of HTML, CSS selectors, and HTTP. This tutorial will guide you through these concepts step by step.\n\n..."
150+
}
151+
}
152+
```
153+
154+
You can access any of thousands of our scrapers on Apify Store by using the [general Apify app](https://n8n.io/integrations/apify).
155+
156+
### Configuration options
157+
158+
You can select the _Crawler type_ by choosing the rendering engine (browser or HTTP client) and the _Content extraction algorithm_ from multiple HTML transformers. _Element selectors_ allow you to specify which elements to keep, remove, or click, while _URL patterns_ let you define inclusion and exclusion rules with glob syntax. You can also set _Crawling parameters_ like concurrency, depth, timeouts, and retries. For robust crawling, you can configure _Proxy configuration_ settings and select from various _Output options_ for content formats and storage.
159+
160+
## Usage as an AI Agent Tool
161+
162+
You can setup Apify's Scraper for AI Crawling node as a tool for your AI Agents.
163+
164+
![Setup AI Agent](./images/setup.png)
165+
166+
### Dynamic URL crawling
167+
168+
In the Website Content Crawler module you can set the **Start URLs** to be filled in by your AI Agent dynamically. This allows the Agent to decide on which pages to scrape off the internet.
169+
170+
Two key parameters to configure for optimized AI Agent usage are **Max crawling depth** and **Max pages**. Remember that the scraping results are passed into the AI Agent’s context, so using smaller values helps stay within context limits.
171+
172+
![Apify Configuration](./images/config.png)
173+
174+
### Example usage
175+
176+
Here, the agent was used to find information about Apify's latest blog post. It correctly filled in the URL for the blog and summarized its content.
177+
178+
![Scraping Results](./images/result.png)

0 commit comments

Comments
 (0)