skrape-py

A Python library for easily interacting with Skrape.ai API. Define your scraping schema using Pydantic and get type-safe results.

Features

🛡️ Type-safe: Define your schemas using Pydantic and get fully typed results
🚀 Simple API: Just define a schema and get your data
🔄 Async Support: Built with async/await for efficient scraping
🧩 Minimal Dependencies: Built on top of proven libraries like Pydantic and httpx
📝 Markdown Conversion: Convert any webpage to clean markdown
🕷️ Web Crawling: Crawl multiple pages with browser automation
🔄 Background Jobs: Handle long-running tasks asynchronously

Installation

pip install skrape-py

Or with Poetry:

poetry add skrape-py

Environment Setup

Setup your API key in .env:

SKRAPE_API_KEY="your_api_key_here"

Get your API key on Skrape.ai

Quick Start

Extract Structured Data

from skrape import Skrape
from pydantic import BaseModel
from typing import List
import os
import asyncio

# Define your schema using Pydantic
class ProductSchema(BaseModel):
    title: str
    price: float
    description: str
    rating: float

async def main():
    async with Skrape(api_key=os.getenv("SKRAPE_API_KEY")) as skrape:
        # Start extraction job
        job = await skrape.extract(
            "https://example.com/product",
            ProductSchema,
            {"renderJs": True}  # Enable JavaScript rendering if needed
        )
        
        # Wait for job to complete and get results
        while job.status != "COMPLETED":
            job = await skrape.get_job(job.jobId)
            await asyncio.sleep(1)
        
        # Access the extracted data
        product = job.result
        print(f"Product: {product.title}")
        print(f"Price: ${product.price}")

asyncio.run(main())

Convert to Markdown

# Single URL
response = await skrape.to_markdown(
    "https://example.com/article",
    {"renderJs": True}
)
print(response.result)  # Clean markdown content

# Multiple URLs (async)
job = await skrape.to_markdown_bulk(
    ["https://example.com/1", "https://example.com/2"],
    {"renderJs": True}
)

# Get results when ready
while job.status != "COMPLETED":
    job = await skrape.get_job(job.jobId)
    await asyncio.sleep(1)

for markdown in job.result:
    print(markdown)

Web Crawling

# Start crawling job
job = await skrape.crawl(
    ["https://example.com", "https://example.com/page2"],
    {
        "renderJs": True,
        "actions": [
            {"scroll": {"distance": 500}},  # Scroll down 500px
            {"wait_for": ".content"}  # Wait for content to load
        ]
    }
)

# Get results when ready
while job.status != "COMPLETED":
    job = await skrape.get_job(job.jobId)
    await asyncio.sleep(1)

for page in job.result:
    print(page)

API Options

Common options for all endpoints:

options = {
    "renderJs": True,  # Enable JavaScript rendering
    "actions": [
        {"click": {"selector": ".button"}},  # Click element
        {"scroll": {"distance": 500}},       # Scroll page
        {"wait_for": ".content"},           # Wait for element
        {"type": {                          # Type into input
            "selector": "input",
            "text": "search term"
        }}
    ],
    "callbackUrl": "https://your-server.com/webhook"  # For async jobs
}

Error Handling

The library provides typed exceptions for better error handling:

from skrape import Skrape, SkrapeValidationError, SkrapeAPIError

async with Skrape(api_key=os.getenv("SKRAPE_API_KEY")) as skrape:
    try:
        response = await skrape.extract(url, schema)
    except SkrapeValidationError as e:
        print(f"Data doesn't match schema: {e}")
    except SkrapeAPIError as e:
        print(f"API error: {e}")

Rate Limiting

The API response includes rate limit information that you can use to manage your requests:

response = await skrape.to_markdown(url)
usage = response.usage

print(f"Remaining credits: {usage.remaining}")
print(f"Rate limit info:")
print(f"  - Remaining: {usage.rateLimit.remaining}")
print(f"  - Base limit: {usage.rateLimit.baseLimit}")
print(f"  - Burst limit: {usage.rateLimit.burstLimit}")
print(f"  - Reset at: {usage.rateLimit.reset}")

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
examples		examples
skrape		skrape
tests		tests
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

skrape-py

Features

Installation

Environment Setup

Quick Start

Extract Structured Data

Convert to Markdown

Web Crawling

API Options

Error Handling

Rate Limiting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

jenslys/skrape-py

Folders and files

Latest commit

History

Repository files navigation

skrape-py

Features

Installation

Environment Setup

Quick Start

Extract Structured Data

Convert to Markdown

Web Crawling

API Options

Error Handling

Rate Limiting

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages