LLM-RAG-Stack - Enterprise AI Infrastructure

A production-ready, containerized RAG (Retrieval-Augmented Generation) stack with comprehensive monitoring, observability, and enterprise-grade DevOps practices.

🏗️ OpenSource RAG LLM Stack Architecture

This diagram illustrates the complete architecture of the OpenSource RAG LLM Stack, showing the interaction between all components for retrieval-augmented generation, chat history management, and comprehensive monitoring.

RAG Flow: User → Open WebUI → Chroma (retrieve) → Open WebUI → Ollama (generate) → Open WebUI → User

This project demonstrates enterprise-grade AI infrastructure practices:

Containerized Microservices: Docker Compose orchestration with complete service isolation
Vector Database: Chroma for semantic search and embeddings storage
LLM Integration: Containerized Ollama for reproducible LLM inference with Open WebUI interface
Data Persistence: PostgreSQL with optimized schema for chat history and RAG documents
Observability: Prometheus metrics collection with Grafana dashboards
Monitoring: Real-time service health monitoring and performance metrics
Security: Network isolation, environment-based configuration, and data encryption

🚀 Quick Start

Prerequisites

Docker & Docker Compose
8GB+ RAM (for LLM models)

Complete Self-Contained Setup

# Clone the repository
git clone <your-repo-url>
cd LLM-RAG-Stack

# Quick start (includes model setup)
./start.sh

# Or manual setup:
# Start all services (includes Ollama)
docker-compose up -d

# Set up Ollama with a model
./scripts/setup-ollama.sh

# Check service status
docker-compose ps

Alternative: Use Local Ollama Installation

# Prerequisites: Install Ollama locally (https://ollama.ai)
# Start Ollama on your host machine
ollama serve

# Start the RAG stack (connects to local Ollama)
docker-compose -f local-ollama-docker-compose.yml up -d

# Check service status
docker-compose -f local-ollama-docker-compose.yml ps

Access Services

Open WebUI: http://localhost:3000 (AI Chat Interface)
Grafana: http://localhost:3001 (admin/admin123)
Prometheus: http://localhost:9090 (Metrics)
Chroma API: http://localhost:8000 (Vector Database)
PostgreSQL: localhost:5432 (Database)
Ollama API: http://localhost:11434 (LLM Service)

Ollama Model Management

# List available models
docker exec -it ollama ollama list

# Pull a new model
docker exec -it ollama ollama pull llama3.2:3b

# Remove a model
docker exec -it ollama ollama rm llama3.2:3b

# Run the setup script for guided model installation
./scripts/setup-ollama.sh

Recommended Models

llama3.2:3b (3B params, ~2GB) - Best balance of speed and quality
llama3.2:1b (1B params, ~1GB) - Fastest, good for basic tasks
mistral:7b (7B params, ~4GB) - High quality, slower
codellama:7b (7B params, ~4GB) - Specialized for coding tasks
gemma:2b (2B params, ~1.5GB) - Google's efficient model

🏗️ Infrastructure as Code (Docker Compose)

The project uses Docker Compose for reproducible, self-contained infrastructure:

Core RAG Stack Services

# Complete self-contained setup with Ollama
services:
  ollama:
    image: ollama/ollama:latest
    ports: ["11434:11434"]
    volumes:
      - ollama-data:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0
      - OLLAMA_ORIGINS=*

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports: ["3000:8080"]
    environment:
      - OLLAMA_API_BASE_URL=http://ollama:11434
      - VECTOR_DB=chroma
      - DATABASE_URL=postgresql://user:password@postgres:5432/chatdb

  chroma:
    image: ghcr.io/chroma-core/chroma:latest
    ports: ["8000:8000"]
    environment:
      - CHROMA_DB_IMPL=duckdb+parquet

  postgres:
    image: postgres:15-alpine
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password
      POSTGRES_DB: chatdb

Alternative: Local Ollama Integration

For users with existing Ollama installations, use local-ollama-docker-compose.yml:

# Connects to local Ollama installation
services:
  open-webui:
    environment:
      - OLLAMA_API_BASE_URL=http://host.docker.internal:11434

Monitoring Stack

# Observability Services
prometheus:
  image: prom/prometheus:latest
  ports: ["9090:9090"]
  volumes:
    - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml

grafana:
  image: grafana/grafana-oss:latest
  ports: ["3001:3000"]
  environment:
    - GF_SECURITY_ADMIN_PASSWORD=admin123

📊 Database Schema & Architecture

PostgreSQL Schema

The database is initialized with an optimized schema for RAG operations:

-- Chat Sessions Management
CREATE TABLE chat_sessions (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    user_id VARCHAR(255) NOT NULL,
    session_name VARCHAR(255),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Message Storage with Full-Text Search
CREATE TABLE chat_messages (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    session_id UUID REFERENCES chat_sessions(id),
    role VARCHAR(50) CHECK (role IN ('user', 'assistant', 'system')),
    content TEXT NOT NULL,
    token_count INTEGER DEFAULT 0
);

-- RAG Document Storage
CREATE TABLE documents (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    title VARCHAR(500),
    content TEXT NOT NULL,
    source VARCHAR(500),
    embedding_id VARCHAR(255), -- Chroma reference
    metadata JSONB DEFAULT '{}'::jsonb
);

-- Performance Indexes
CREATE INDEX idx_documents_content_gin ON documents 
USING gin(to_tsvector('english', content));

🔍 RAG Implementation Guide

1. Document Upload & Processing

# Access Open WebUI
open http://localhost:3000

# Navigate to Knowledge section
# Upload documents (PDF, TXT, etc.)
# System automatically:
# - Chunks documents
# - Generates embeddings
# - Stores in Chroma vector database

2. Verify Vector Storage

# Check Chroma collections
curl -s http://localhost:8000/api/v2/tenants/default/databases/default/collections | jq '.'

# Verify heartbeat
curl -s -o /dev/null -w "%{http_code}\n" http://localhost:8000/api/v2/heartbeat

3. Query with RAG

Ask questions in Open WebUI that reference uploaded content
System retrieves relevant chunks from Chroma
Augments prompts with retrieved context
Generates responses using Ollama LLM

📈 Monitoring & Observability

Prometheus Metrics

Service Health: up{job=~"prometheus|postgres_exporter"}
Database Performance: PostgreSQL exporter metrics
Request Rates: HTTP request monitoring
Resource Usage: Container and system metrics

Grafana Dashboards

Pre-configured dashboards include:

RAG Stack Overview: Service health and performance
Database Metrics: PostgreSQL performance monitoring
System Resources: CPU, memory, and disk usage
Request Analytics: API call patterns and response times

RAG Stack Monitoring Dashboard

Here's how the RAG Stack Monitoring dashboard looks:

The dashboard provides real-time insights into:

Service Health Status: Live monitoring of all stack components
Active Services Count: Overview of running services
Request Rate Monitoring: API performance metrics
Database Performance: PostgreSQL metrics and health

Auto-Provisioning

# Grafana automatically configures:
datasources:
  - name: Prometheus
    type: prometheus
    url: http://prometheus:9090
    isDefault: true

# Dashboards auto-loaded from:
# monitoring/grafana/dashboards/

🚀 Production Deployment

Environment Configuration

# Production environment variables
export POSTGRES_PASSWORD=secure_password
export GRAFANA_ADMIN_PASSWORD=secure_admin_password
export OLLAMA_API_BASE_URL=https://your-ollama-instance.com

Scaling Considerations

Horizontal Scaling: Multiple Ollama instances behind load balancer
Database Scaling: PostgreSQL read replicas for query performance
Vector DB Scaling: Chroma clustering for high availability
Monitoring: Prometheus federation for multi-instance monitoring

Security Best Practices

Change default passwords in production
Use Docker secrets for sensitive data
Configure network security policies
Enable SSL/TLS for all services
Implement proper backup strategies

🛠️ Development & Troubleshooting

Service Management

# View logs
docker-compose logs [service-name]

# Restart services
docker-compose restart [service-name]

# Clean restart
docker-compose down
docker-compose up -d

# For local Ollama setup, use:
# docker-compose -f local-ollama-docker-compose.yml [command]

Common Issues

RAG Not Working - Document Upload Issues

# Check Chroma connection
curl -s -o /dev/null -w "%{http_code}\n" http://localhost:8000/api/v2/heartbeat

# Create tenant/database if needed
curl -X POST http://localhost:8000/api/v2/tenants \
  -H "Content-Type: application/json" \
  -d '{"name": "default"}'

curl -X POST http://localhost:8000/api/v2/tenants/default/databases \
  -H "Content-Type: application/json" \
  -d '{"name": "default"}'

Database Connection Issues

# Check PostgreSQL status
docker-compose -f local-ollama-docker-compose.yml logs postgres

# Verify database initialization
docker exec -it postgres psql -U user -d chatdb -c "\dt"

📊 Data Persistence

All data is persisted in Docker volumes:

ollama-data: LLM models and Ollama configurations
openwebui-data: WebUI configurations and user data
chroma-data: Vector embeddings and collections
pgdata: PostgreSQL database files
grafana-data: Dashboard configurations and user settings
prometheus-data: Metrics time-series data

🏆 Enterprise Features

DevOps Best Practices

Infrastructure as Code: Docker Compose for reproducible deployments
Monitoring: Comprehensive observability with Prometheus and Grafana
Data Management: Optimized PostgreSQL schema with full-text search
Security: Network isolation and environment-based configuration
Scalability: Microservices architecture for horizontal scaling

AI/ML Capabilities

Vector Search: Chroma for semantic similarity search
Containerized LLM: Ollama in Docker for reproducible model inference
RAG Pipeline: Complete retrieval-augmented generation workflow
Document Processing: Automatic chunking and embedding generation
Chat History: Persistent conversation management
Model Management: Easy model switching and versioning with Docker volumes

👨‍💻 Author

AI/ML Infrastructure Engineer with expertise in:

Containerized AI/ML workloads
Vector databases and RAG systems
Observability and monitoring
Enterprise DevOps practices
GitHub: [Your GitHub Profile]
LinkedIn: [Your LinkedIn Profile]
Portfolio: [Your Portfolio Website]

This project demonstrates modern AI infrastructure practices, enterprise-grade monitoring, and production-ready RAG system implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
init-scripts		init-scripts
monitoring		monitoring
scripts		scripts
README.md		README.md
docker-compose.yml		docker-compose.yml
local-ollama-docker-compose.yml		local-ollama-docker-compose.yml
start.sh		start.sh

Lforlinux/Opensource-LLM-RAG-Stack

Folders and files

Latest commit

History

Repository files navigation