A lightweight Retrieval-Augmented Generation (RAG) demo built with Streamlit. Upload any PDF, ask a question in natural language, and get an answer grounded in the document's content.
- PDF text extraction — reads multi-page PDFs with PyPDF2
- LLM-powered Q&A — sends extracted context + your question to OpenAI (GPT-4o / GPT-4o-mini)
- Grounded answers — the system prompt constrains the model to answer only from the document
- Simple UI — clean Streamlit interface with sidebar settings and expandable extracted text
# 1. Clone the repo
git clone https://github.com/murtagh27/inkQuery.git
cd inkQuery
# 2. Create a virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Run the app
streamlit run app.pyThen enter your OpenAI API key in the sidebar and upload a PDF.
PDF ──▶ PyPDF2 (text extraction) ──▶ OpenAI Chat API ──▶ Answer
▲
user question
- The uploaded PDF is parsed page-by-page into plain text.
- The full text is injected into the LLM's system prompt as document context.
- The user's question is sent as the user message.
- The model is instructed to answer only from the provided context.
Note: This is a context-stuffing approach (the entire document is sent in one prompt). For production use with very large documents, you would add a vector store and retrieval step.
.
├── app.py # Streamlit application
├── requirements.txt # Python dependencies
├── .gitignore
├── LICENSE
└── README.md
| Setting | Default | Description |
|---|---|---|
| Model | gpt-4o-mini |
Which OpenAI model to use |
| Max context | 80 000 chars | Truncation limit for very large PDFs |
MIT