The LangChain Documentation Helper is a sophisticated AI-powered web application that serves as a slim version of chat.langchain.com. This intelligent documentation assistant provides accurate answers to questions about LangChain documentation using advanced Retrieval-Augmented Generation (RAG) techniques, enhanced with web crawling capabilities and conversational memory.
RAG Pipeline Flow:
- 🌐 Web Crawling: Real-time web scraping and content extraction using Tavily's advanced crawling capabilities
- 📚 Document Processing: Intelligent chunking and preprocessing of LangChain documentation
- 🔍 Vector Storage: Advanced embedding and indexing using Pinecone for fast similarity search
- 🎯 Intelligent Retrieval: Context-aware document retrieval based on user queries
- 🧩 Memory System: Conversational memory for coreference resolution and context continuity
- 🧠 Context-Aware Generation: Provides accurate, contextual answers with source citations
- 💬 Interactive Interface: User-friendly chat interface powered by Streamlit
- 🚀 Real-time Processing: Fast end-to-end pipeline from query to response
Component | Technology | Description |
---|---|---|
🖥️ Frontend | Streamlit | Interactive web interface |
🧠 AI Framework | LangChain 🦜🔗 | Orchestrates the AI pipeline |
🔍 Vector Database | Pinecone 🌲 | Stores and retrieves document embeddings |
🌐 Web Crawling | Tavily | Intelligent web scraping and content extraction |
🧩 Memory | Conversational Memory | Coreference resolution and context continuity |
🤖 LLM | OpenAI GPT | Powers the conversational AI |
🐍 Backend | Python | Core application logic |
- Python 3.8 or higher
- OpenAI API key
- Pinecone API key
- Tavily API key (required - for documentation crawling and web search)
-
Clone the repository
git clone https://github.com/emarco177/documentation-helper.git cd documentation-helper
-
Set up environment variables
Create a
.env
file in the root directory:PINECONE_API_KEY=your_pinecone_api_key_here OPENAI_API_KEY=your_openai_api_key_here TAVILY_API_KEY=your_tavily_api_key_here # Required - for documentation crawling
-
Install dependencies
pipenv install
-
Ingest LangChain Documentation (Run the ingestion pipeline)
python ingestion.py # Uses Tavily to crawl and index documentation
-
Run the application
streamlit run main.py
-
Open your browser and navigate to
http://localhost:8501
Run the test suite to ensure everything is working correctly:
pipenv run pytest .
documentation-helper/
├── backend/ # Core backend logic
│ ├── __init__.py
│ └── core.py
├── static/ # Static assets (images, logos)
│ ├── banner.gif
│ ├── LangChain Logo.png
│ ├── Tavily Logo.png
│ ├── Tavily Logo Trimmed Padded.png
│ └── Trimmed Padded Langchain.png
├── chroma_db/ # Local vector database
├── main.py # Streamlit application entry point
├── ingestion.py # Document ingestion pipeline
├── consts.py # Configuration constants
├── logger.py # Logging utilities
├── Tavily Demo Tutorial.ipynb # 📚 Tutorial: Introduction to Tavily API
├── Tavily Crawl Demo Tutorial.ipynb # 📚 Tutorial: Advanced Tavily crawling techniques
└── requirements files # Pipfile, Pipfile.lock
The project includes comprehensive Jupyter notebooks that serve as hands-on tutorials:
Tavily Demo Tutorial.ipynb
: Introduction to Tavily API basics and core functionalityTavily Crawl Demo Tutorial.ipynb
: Advanced tutorial covering Tavily's crawling capabilities, including TavilyMap and TavilyExtract features
These tutorials provide step-by-step guidance on integrating Tavily's powerful web search and crawling capabilities into your AI applications.
Variable | Description | Required |
---|---|---|
PINECONE_API_KEY |
Your Pinecone API key for vector storage | ✅ |
OPENAI_API_KEY |
Your OpenAI API key for LLM access | ✅ |
TAVILY_API_KEY |
Your Tavily API key for documentation crawling and web search | ✅ |
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is designed as a learning tool for understanding:
- 🦜 LangChain framework implementation
- 🔍 Vector search and embeddings
- 💬 Conversational AI development
- 🏗️ RAG (Retrieval-Augmented Generation) architecture
This project is licensed under the MIT License - see the LICENSE file for details.
If you find this project helpful, please consider:
- ⭐ Starring the repository
- 🐛 Reporting issues
- 💡 Contributing improvements
- 📢 Sharing with others