A simple AI agent that uses the ReAct pattern to search for and download web content about specific topics. Built with LangChain and OpenAI.
- Topic-based Web Search: Finds relevant web pages for any given topic
- Automatic Content Download: Saves web pages as HTML files
- ReAct Pattern Implementation: Uses reasoning and acting to complete tasks
- LangSmith Integration: Uses LangSmith for prompt management
- Rate Limiting: Built-in delays to avoid search API issues
- Python 3.8+
- OpenAI API key
- LangSmith API key
- Clone the repository:
git clone <repository-url>
cd agent-scraper
- Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Create a
.env
file in the project root with your API keys:
OPENAI_API_KEY=your_openai_api_key
LANGSMITH_API_KEY=your_langsmith_api_key
Run the script:
python react.py
The script will:
- Search for relevant web pages about the specified topic
- Download the found pages as HTML files
- Save the files in the
downloads
directory
agent-scraper/
├── react.py # Main script with agent implementation
├── requirements.txt # Project dependencies
├── .env # Environment variables (create this)
└── downloads/ # Directory for downloaded web pages
The agent uses the ReAct (Reasoning and Acting) pattern:
- Reasoning: The agent thinks about how to research the topic
- Acting: The agent performs actions (searching and downloading)
- Observing: The agent analyzes the results
- Repeating: The process continues until sufficient information is gathered
The agent has access to two main tools:
- SearchTopics: Finds relevant URLs for a given topic
- DownloadPage: Downloads and saves web pages as HTML files
Feel free to submit issues and enhancement requests!
This project is licensed under the MIT License - see the LICENSE file for details.