Skip to content

murtagh27/inkQuery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🖋️ InkQuery

A lightweight Retrieval-Augmented Generation (RAG) demo built with Streamlit. Upload any PDF, ask a question in natural language, and get an answer grounded in the document's content.

Python Streamlit License

Features

  • PDF text extraction — reads multi-page PDFs with PyPDF2
  • LLM-powered Q&A — sends extracted context + your question to OpenAI (GPT-4o / GPT-4o-mini)
  • Grounded answers — the system prompt constrains the model to answer only from the document
  • Simple UI — clean Streamlit interface with sidebar settings and expandable extracted text

Quick Start

# 1. Clone the repo
git clone https://github.com/murtagh27/inkQuery.git
cd inkQuery

# 2. Create a virtual environment
python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Run the app
streamlit run app.py

Then enter your OpenAI API key in the sidebar and upload a PDF.

How It Works

PDF  ──▶  PyPDF2 (text extraction)  ──▶  OpenAI Chat API  ──▶  Answer
                                            ▲
                                     user question
  1. The uploaded PDF is parsed page-by-page into plain text.
  2. The full text is injected into the LLM's system prompt as document context.
  3. The user's question is sent as the user message.
  4. The model is instructed to answer only from the provided context.

Note: This is a context-stuffing approach (the entire document is sent in one prompt). For production use with very large documents, you would add a vector store and retrieval step.

Project Structure

.
├── app.py               # Streamlit application
├── requirements.txt     # Python dependencies
├── .gitignore
├── LICENSE
└── README.md

Configuration

Setting Default Description
Model gpt-4o-mini Which OpenAI model to use
Max context 80 000 chars Truncation limit for very large PDFs

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages