A lightweight frontend for self-hosted Firecrawl API instances. This playground provides a user-friendly interface for using Firecrawl's web scraping and crawling capabilities.
- Scrape Mode: Convert a single URL to markdown, HTML, or take screenshots
- Crawl Mode: Discover and scrape multiple pages from a starting URL
- Extract Mode: Extract structured data from web pages using LLM
- CORS-free: Uses a proxy server to avoid CORS issues when connecting to your Firecrawl API instance
- Configure environment variables:
Edit the
cp .example.env .env
.env
file to set your desired configuration. - Install dependencies and run:
npm i npm start
- Open your browser and navigate to
http://localhost:3000
- Enter your Firecrawl API endpoint (default: http://firecrawl:3002)
- Enter your API key if required
- Choose a mode (Scrape, Crawl, or Extract), enter a URL, and click "Run"
-
Configure environment variables:
cp .example.env .env
Then edit the
.env
file to set your desired configuration. -
Build and run using Docker Compose:
docker-compose up -d
-
Open your browser and navigate to
http://localhost:3000
Scrape mode allows you to convert a single URL to various formats:
- Markdown: Clean, readable markdown format
- HTML: Raw HTML content
- Screenshot: Visual capture of the page
- Links: Extract all links from the page
Advanced options include:
- Only Main Content: Filter out navigation, footers, etc.
- Remove Base64 Images: Exclude embedded images
- Wait For: Time to wait for dynamic content to load
- Timeout: Maximum time to wait for the page to load
Crawl mode allows you to discover and scrape multiple pages from a starting URL:
- Max Depth: How many links deep to crawl
- Page Limit: Maximum number of pages to crawl
- Ignore Sitemap: Skip sitemap.xml discovery
- Allow External Links: Crawl links to external domains
- Include/Exclude Paths: Filter which paths to crawl
Extract mode allows you to extract structured data from web pages using LLM:
- Extraction Prompt: Instructions for what data to extract
- JSON Schema: Optional schema for structured data extraction
This playground is designed to work with self-hosted Firecrawl API instances. It's compatible with the Firecrawl API v1 endpoints.
This is a lightweight application built with vanilla JavaScript, HTML, and CSS. Dependencies are loaded from CDNs:
- Milligram CSS for minimal styling
- Marked.js for markdown rendering
- Highlight.js for syntax highlighting
No build process is required - simply edit the files and refresh the browser to see changes.
- Server: Node.js with Express
- Proxy: Custom HTTP proxy middleware
- Configuration: Environment variables via dotenv (.env file)
Here are some examples of how to use the Firecrawler with different modes.
- Enter URL:
https://smcleod.net
- Select Format:
markdown
- Enable "Only Main Content"
- Click "Run"
- Enter URL:
https://news.ycombinator.com
- Select Formats:
markdown
,screenshot
- Set Wait For:
3000
(3 seconds) - Click "Run"
- Enter URL:
https://github.com
- Select Formats:
html
,markdown
- Disable "Only Main Content" to get the full page
- Click "Run"
- Switch to "Crawl" mode
- Enter URL:
https://smcleod.net
- Set Max Depth:
2
- Set Page Limit:
10
- Select Format:
markdown
- Click "Run"
- Switch to "Crawl" mode
- Enter URL:
https://smcleod.net/about
- Set Max Depth:
3
- Set Page Limit:
20
- Include Paths:
blog,posts
- Exclude Paths:
admin,login,register
- Click "Run"
- Switch to "Extract" mode
- Enter URL:
https://smcleod.net
- Extraction Prompt:
Extract the main heading, summary, and author from this page.
- Click "Run"
- Switch to "Extract" mode
- Enter URL:
https://news.ycombinator.com
- Extraction Prompt:
Extract the top 5 stories with their titles, points, and authors.
- JSON Schema:
{
"type": "object",
"properties": {
"stories": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": { "type": "string" },
"points": { "type": "number" },
"author": { "type": "string" }
}
}
}
}
}
- Click "Run"