A Python application that implements Retrieval Augmented Generation (RAG) to download and summarize academic papers. Currently configured to process the ReAct paper from arXiv.
- Automatic paper download from arXiv
- PDF processing and text chunking
- Vector store creation using Chroma
- RAG-based summarization using OpenAI's GPT-4 and LangChain
- Python 3.x
- OpenAI API key
- Clone the repository:
git clone <repository-url>
cd rag-inboundsquare
- Install the required dependencies:
pip install -r requirements.txt
- Create a
.env
file in the root directory and add your OpenAI API key:
OPENAI_API_KEY=your_api_key_here
Run the script using:
python rag.py
The script will:
- Download the ReAct paper if not already present
- Process the PDF and split it into chunks
- Create a vector store using Chroma
- Generate a comprehensive summary using RAG
- langchain
- openai
- chromadb
- arxiv
- python-dotenv
- requests
The current implementation is configured to summarize the ReAct paper (arXiv:2210.03629). You can modify the process_pdf
function to work with other papers or PDF documents.
[Add your license here]