Skip to content

Lforlinux/Opensource-LLM-RAG-Stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LLM-RAG-Stack - Enterprise AI Infrastructure

A production-ready, containerized RAG (Retrieval-Augmented Generation) stack with comprehensive monitoring, observability, and enterprise-grade DevOps practices.

πŸ—οΈ OpenSource RAG LLM Stack Architecture

OpenSource RAG LLM Stack

This diagram illustrates the complete architecture of the OpenSource RAG LLM Stack, showing the interaction between all components for retrieval-augmented generation, chat history management, and comprehensive monitoring.

RAG Flow: User β†’ Open WebUI β†’ Chroma (retrieve) β†’ Open WebUI β†’ Ollama (generate) β†’ Open WebUI β†’ User

This project demonstrates enterprise-grade AI infrastructure practices:

  • Containerized Microservices: Docker Compose orchestration with complete service isolation
  • Vector Database: Chroma for semantic search and embeddings storage
  • LLM Integration: Containerized Ollama for reproducible LLM inference with Open WebUI interface
  • Data Persistence: PostgreSQL with optimized schema for chat history and RAG documents
  • Observability: Prometheus metrics collection with Grafana dashboards
  • Monitoring: Real-time service health monitoring and performance metrics
  • Security: Network isolation, environment-based configuration, and data encryption

πŸš€ Quick Start

Prerequisites

  • Docker & Docker Compose
  • 8GB+ RAM (for LLM models)

Complete Self-Contained Setup

# Clone the repository
git clone <your-repo-url>
cd LLM-RAG-Stack

# Quick start (includes model setup)
./start.sh

# Or manual setup:
# Start all services (includes Ollama)
docker-compose up -d

# Set up Ollama with a model
./scripts/setup-ollama.sh

# Check service status
docker-compose ps

Alternative: Use Local Ollama Installation

# Prerequisites: Install Ollama locally (https://ollama.ai)
# Start Ollama on your host machine
ollama serve

# Start the RAG stack (connects to local Ollama)
docker-compose -f local-ollama-docker-compose.yml up -d

# Check service status
docker-compose -f local-ollama-docker-compose.yml ps

Access Services

Ollama Model Management

# List available models
docker exec -it ollama ollama list

# Pull a new model
docker exec -it ollama ollama pull llama3.2:3b

# Remove a model
docker exec -it ollama ollama rm llama3.2:3b

# Run the setup script for guided model installation
./scripts/setup-ollama.sh

Recommended Models

  • llama3.2:3b (3B params, ~2GB) - Best balance of speed and quality
  • llama3.2:1b (1B params, ~1GB) - Fastest, good for basic tasks
  • mistral:7b (7B params, ~4GB) - High quality, slower
  • codellama:7b (7B params, ~4GB) - Specialized for coding tasks
  • gemma:2b (2B params, ~1.5GB) - Google's efficient model

πŸ—οΈ Infrastructure as Code (Docker Compose)

The project uses Docker Compose for reproducible, self-contained infrastructure:

Core RAG Stack Services

# Complete self-contained setup with Ollama
services:
  ollama:
    image: ollama/ollama:latest
    ports: ["11434:11434"]
    volumes:
      - ollama-data:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0
      - OLLAMA_ORIGINS=*

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports: ["3000:8080"]
    environment:
      - OLLAMA_API_BASE_URL=http://ollama:11434
      - VECTOR_DB=chroma
      - DATABASE_URL=postgresql://user:password@postgres:5432/chatdb

  chroma:
    image: ghcr.io/chroma-core/chroma:latest
    ports: ["8000:8000"]
    environment:
      - CHROMA_DB_IMPL=duckdb+parquet

  postgres:
    image: postgres:15-alpine
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password
      POSTGRES_DB: chatdb

Alternative: Local Ollama Integration

For users with existing Ollama installations, use local-ollama-docker-compose.yml:

# Connects to local Ollama installation
services:
  open-webui:
    environment:
      - OLLAMA_API_BASE_URL=http://host.docker.internal:11434

Monitoring Stack

# Observability Services
prometheus:
  image: prom/prometheus:latest
  ports: ["9090:9090"]
  volumes:
    - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml

grafana:
  image: grafana/grafana-oss:latest
  ports: ["3001:3000"]
  environment:
    - GF_SECURITY_ADMIN_PASSWORD=admin123

πŸ“Š Database Schema & Architecture

PostgreSQL Schema

The database is initialized with an optimized schema for RAG operations:

-- Chat Sessions Management
CREATE TABLE chat_sessions (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    user_id VARCHAR(255) NOT NULL,
    session_name VARCHAR(255),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Message Storage with Full-Text Search
CREATE TABLE chat_messages (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    session_id UUID REFERENCES chat_sessions(id),
    role VARCHAR(50) CHECK (role IN ('user', 'assistant', 'system')),
    content TEXT NOT NULL,
    token_count INTEGER DEFAULT 0
);

-- RAG Document Storage
CREATE TABLE documents (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    title VARCHAR(500),
    content TEXT NOT NULL,
    source VARCHAR(500),
    embedding_id VARCHAR(255), -- Chroma reference
    metadata JSONB DEFAULT '{}'::jsonb
);

-- Performance Indexes
CREATE INDEX idx_documents_content_gin ON documents 
USING gin(to_tsvector('english', content));

πŸ” RAG Implementation Guide

1. Document Upload & Processing

# Access Open WebUI
open http://localhost:3000

# Navigate to Knowledge section
# Upload documents (PDF, TXT, etc.)
# System automatically:
# - Chunks documents
# - Generates embeddings
# - Stores in Chroma vector database

2. Verify Vector Storage

# Check Chroma collections
curl -s http://localhost:8000/api/v2/tenants/default/databases/default/collections | jq '.'

# Verify heartbeat
curl -s -o /dev/null -w "%{http_code}\n" http://localhost:8000/api/v2/heartbeat

3. Query with RAG

  • Ask questions in Open WebUI that reference uploaded content
  • System retrieves relevant chunks from Chroma
  • Augments prompts with retrieved context
  • Generates responses using Ollama LLM

πŸ“ˆ Monitoring & Observability

Prometheus Metrics

  • Service Health: up{job=~"prometheus|postgres_exporter"}
  • Database Performance: PostgreSQL exporter metrics
  • Request Rates: HTTP request monitoring
  • Resource Usage: Container and system metrics

Grafana Dashboards

Pre-configured dashboards include:

  • RAG Stack Overview: Service health and performance
  • Database Metrics: PostgreSQL performance monitoring
  • System Resources: CPU, memory, and disk usage
  • Request Analytics: API call patterns and response times

RAG Stack Monitoring Dashboard

Here's how the RAG Stack Monitoring dashboard looks:

RAG Stack Monitoring Dashboard

The dashboard provides real-time insights into:

  • Service Health Status: Live monitoring of all stack components
  • Active Services Count: Overview of running services
  • Request Rate Monitoring: API performance metrics
  • Database Performance: PostgreSQL metrics and health

Auto-Provisioning

# Grafana automatically configures:
datasources:
  - name: Prometheus
    type: prometheus
    url: http://prometheus:9090
    isDefault: true

# Dashboards auto-loaded from:
# monitoring/grafana/dashboards/

πŸš€ Production Deployment

Environment Configuration

# Production environment variables
export POSTGRES_PASSWORD=secure_password
export GRAFANA_ADMIN_PASSWORD=secure_admin_password
export OLLAMA_API_BASE_URL=https://your-ollama-instance.com

Scaling Considerations

  • Horizontal Scaling: Multiple Ollama instances behind load balancer
  • Database Scaling: PostgreSQL read replicas for query performance
  • Vector DB Scaling: Chroma clustering for high availability
  • Monitoring: Prometheus federation for multi-instance monitoring

Security Best Practices

  • Change default passwords in production
  • Use Docker secrets for sensitive data
  • Configure network security policies
  • Enable SSL/TLS for all services
  • Implement proper backup strategies

πŸ› οΈ Development & Troubleshooting

Service Management

# View logs
docker-compose logs [service-name]

# Restart services
docker-compose restart [service-name]

# Clean restart
docker-compose down
docker-compose up -d

# For local Ollama setup, use:
# docker-compose -f local-ollama-docker-compose.yml [command]

Common Issues

RAG Not Working - Document Upload Issues

# Check Chroma connection
curl -s -o /dev/null -w "%{http_code}\n" http://localhost:8000/api/v2/heartbeat

# Create tenant/database if needed
curl -X POST http://localhost:8000/api/v2/tenants \
  -H "Content-Type: application/json" \
  -d '{"name": "default"}'

curl -X POST http://localhost:8000/api/v2/tenants/default/databases \
  -H "Content-Type: application/json" \
  -d '{"name": "default"}'

Database Connection Issues

# Check PostgreSQL status
docker-compose -f local-ollama-docker-compose.yml logs postgres

# Verify database initialization
docker exec -it postgres psql -U user -d chatdb -c "\dt"

πŸ“Š Data Persistence

All data is persisted in Docker volumes:

  • ollama-data: LLM models and Ollama configurations
  • openwebui-data: WebUI configurations and user data
  • chroma-data: Vector embeddings and collections
  • pgdata: PostgreSQL database files
  • grafana-data: Dashboard configurations and user settings
  • prometheus-data: Metrics time-series data

πŸ† Enterprise Features

DevOps Best Practices

  • Infrastructure as Code: Docker Compose for reproducible deployments
  • Monitoring: Comprehensive observability with Prometheus and Grafana
  • Data Management: Optimized PostgreSQL schema with full-text search
  • Security: Network isolation and environment-based configuration
  • Scalability: Microservices architecture for horizontal scaling

AI/ML Capabilities

  • Vector Search: Chroma for semantic similarity search
  • Containerized LLM: Ollama in Docker for reproducible model inference
  • RAG Pipeline: Complete retrieval-augmented generation workflow
  • Document Processing: Automatic chunking and embedding generation
  • Chat History: Persistent conversation management
  • Model Management: Easy model switching and versioning with Docker volumes

πŸ‘¨β€πŸ’» Author

AI/ML Infrastructure Engineer with expertise in:

  • Containerized AI/ML workloads

  • Vector databases and RAG systems

  • Observability and monitoring

  • Enterprise DevOps practices

  • GitHub: [Your GitHub Profile]

  • LinkedIn: [Your LinkedIn Profile]

  • Portfolio: [Your Portfolio Website]


This project demonstrates modern AI infrastructure practices, enterprise-grade monitoring, and production-ready RAG system implementation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published