Agentic Scrape and QA

A tool for scraping documentation websites and performing intelligent Q&A using agentic RAG (Retrieval-Augmented Generation).

Features

Website Crawling: Automatically crawls documentation websites, with support for sitemap.xml
Semantic Chunking: Intelligently splits content into meaningful chunks while preserving context
Rich Metadata: Extracts and stores metadata like topics, technologies, and content types
Vector Search: Uses OpenAI embeddings for semantic search
Agentic RAG: Leverages LLMs for intelligent question answering with context
Source Management: Manage multiple documentation sources independently

Setup

Clone the repository:

git clone https://github.com/yourusername/agentic-scrape-and-qa.git
cd agentic-scrape-and-qa

Install dependencies:

pip install -r requirements.txt

Set up your environment variables by copying the example:

cp .env.example .env

Edit .env with your API keys and configure your Supabase database
Run the SQL setup script in your Supabase database (site_pages.sql)

Usage

Run the program:

python agentic_rag.py

Features:

Crawl a New Website
- Enter the base URL (e.g., https://docs.example.com)
- Provide a unique identifier for this documentation
- System will crawl pages, extract content, and store with metadata
Q&A on Existing Documentation
- Select from available documentation sets
- Ask questions naturally
- Get context-aware responses with source URLs
Manage Documentation Sets
- View all stored documentation sets
- Delete specific sets when needed
- Clean up outdated content

Technical Details

Uses OpenAI embeddings for semantic search
Stores content and metadata in Supabase
Implements vector similarity search
Preserves code blocks and formatting
Handles pagination and rate limiting

Requirements

Python 3.8+
OpenAI API key
Supabase account
Packages listed in requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
agentic_rag.py		agentic_rag.py
pydantic_ai_expert.py		pydantic_ai_expert.py
requirements.txt		requirements.txt
site_pages.sql		site_pages.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agentic Scrape and QA

Features

Setup

Usage

Features:

Technical Details

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

discopops/agentic-scrape-and-qa

Folders and files

Latest commit

History

Repository files navigation

Agentic Scrape and QA

Features

Setup

Usage

Features:

Technical Details

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages