PDF-Chat is an AI-powered document assistant that allows you to have natural conversations with your PDF documents using Groq's powerful LLMs and advanced embeddings. Built with LangChain and Chainlit, this application extracts, processes, and indexes PDF content to provide interactive, context-aware responses to your queries.
- 📄 PDF Processing: Upload and process any PDF document
- 🔍 Semantic Search: Find relevant information across your documents
- 💬 Conversational AI: Chat naturally with your documents
- 🧠 Context Awareness: The system remembers previous questions for more coherent conversations
- 📊 Source References: Responses include references to the source content
- 🚀 High Performance: Powered by Groq's high-speed inference
- 🔄 Customizable Models: Easily swap LLMs and embedding models
- Python 3.8+ installed
- A Groq API key (Get one here)
- Clone the repository
git clone https://github.com/Ibzie/Groq-Powered-Document-Chatbot.git
cd pdf-chat
- Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies
pip install -r requirements.txt
- Set up environment variables
Create a .env
file in the project root:
GROQ_API_KEY=your_groq_api_key_here
- Run the application
chainlit run app.py
- Open in browser
Navigate to http://localhost:8000 in your browser
- Clone the repository
git clone https://github.com/yourusername/pdf-chat.git
cd pdf-chat
- Create a .env file
Create a .env
file with your Groq API key:
GROQ_API_KEY=your_groq_api_key_here
- Build and run with Docker Compose
docker compose up
- Open in browser
Navigate to http://localhost:8000 in your browser
PDF-Chat uses Groq's mixtral-8x7b-32768
by default. To change the model, modify the following line in app.py
:
llm_groq = ChatGroq(
groq_api_key=groq_api_key,
model_name="mixtral-8x7b-32768", # Change to any Groq supported model
temperature=0.2
)
Available Groq models include:
llama3-8b-8192
llama3-70b-8192
mixtral-8x7b-32768
gemma-7b-it
PDF-Chat uses HuggingFace's sentence-transformers/all-mpnet-base-v2
for embeddings. To change this:
# Initialize with a different sentence-transformer model
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2" # Faster, smaller model
)
Other recommended models:
sentence-transformers/multi-qa-mpnet-base-dot-v1
(optimized for retrieval)sentence-transformers/paraphrase-multilingual-mpnet-base-v2
(multilingual support)
- LangChain: Framework for developing applications powered by language models
- Chainlit: Building conversational AI interfaces
- Groq: Ultra-fast LLM inference
- HuggingFace Transformers: State-of-the-art NLP models
- ChromaDB: Vector database for similarity search
- PyPDF2: PDF document processing
Feel free to use this project as you wish, a credit in the form of a link back to this repo would be greatly appreciated