Skip to content

ByteBoarder/fullStackOllama-byte-boarder-add-ins

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

90 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Full Stack AI Engineering Platform

This repository is a showcase and evolving codebase for building and orchestrating AI systems from the ground upβ€”designed by a Full Stack AI Engineer with end-to-end expertise across mathematics, data engineering, software development, and Kubernetes-based infrastructure.

🎯 Purpose

This project demonstrates a complete, production-grade architecture to:

  • Operate an LLM cluster using Ollama models.
  • Harness GPU acceleration using the NVIDIA GPU Operator on Kubernetes.
  • Use Custom Resource Definitions (CRDs) and controllers to coordinate model behaviors.
  • Create an ecosystem where multiple models can collaborate to perform higher-level tasks (question answering, summarization, classification, etc.).
  • Establish infrastructure-as-code patterns using Kustomize, Flux, and GitOps principles.

🧠 Vision

AI systems are rarely β€œone model fits all.” This project introduces a framework where specialized AI agents (LLMs), hosted as services across a Kubernetes cluster, can interoperate to complete sophisticated tasks.

Inspired by:

  • Full-stack software engineering principles
  • Multi-agent systems
  • MLOps best practices
  • Declarative infrastructure management

πŸ”§ Architecture Overview

πŸ“ Repository Structure

.
β”œβ”€β”€ infra/
β”‚   β”œβ”€β”€ cluster-iac/            # Infrastructure as Code (Terraform) for deploying an EKS cluster with requisite GPU support
β”‚   β”œβ”€β”€ base/                    # Base Kustomize configurations (Flux, GPU Operator, etc.)
β”‚   β”œβ”€β”€ overlays/                # Cluster-specific configurations
β”‚   β”œβ”€β”€ flux/                    # Flux GitOps setup
β”‚   └── monitoring/              # Prometheus/Grafana, if used
β”‚
β”œβ”€β”€ crds/                        # Custom Resource Definitions (YAML) and Go types
β”‚   β”œβ”€β”€ ollamaagent_crd.yaml    # Defines OllamaAgent behavior/contract
β”‚   β”œβ”€β”€ ollamamodeldefinition_crd.yaml
β”‚   └── taskorchestration_crd.yaml
β”‚
β”œβ”€β”€ controllers/                 # Golang operators/controllers (kubebuilder-based)
β”‚   β”œβ”€β”€ ollamaagent_controller.go
β”‚   β”œβ”€β”€ ollamamodeldefinition_controller.go
β”‚   └── taskorchestration_controller.go
β”‚
β”œβ”€β”€ ollama-operators/            # Model server orchestration logic
β”‚   β”œβ”€β”€ agent-specialization/   # Specialized agent roles (Q&A, summarizer, etc.)
β”‚   β”œβ”€β”€ service-deployments/    # Helm or Kustomize configs for model deployments
β”‚   └── collab-logic/           # Logic for inter-agent communication & orchestration
β”‚
β”œβ”€β”€ data/                        # Data pipeline logic (ETL, tokenization, chunking, etc.)
β”‚   β”œβ”€β”€ etl-pipeline/
β”‚   └── example-datasets/
β”‚
β”œβ”€β”€ api/                         # API gateway and backend logic (Go or Python)
β”‚   β”œβ”€β”€ routes/                 # Task submission endpoints
β”‚   └── orchestration/          # Converts user requests into CRs for processing
β”‚
β”œβ”€β”€ examples/                    # Example workflows and scenarios
β”‚   β”œβ”€β”€ question-answering/
β”‚   β”œβ”€β”€ summarization-pipeline/
β”‚   └── multi-model-chat/
β”‚
β”œβ”€β”€ docs/                        # Architecture diagrams and documentation
β”‚   β”œβ”€β”€ architecture.md
β”‚   β”œβ”€β”€ ollama-crd-spec.md
β”‚   └── orchestration-diagram.png
β”‚
└── README.md

πŸ—οΈ Infrastructure Stack

Layer Technology Purpose
Container Runtime containerd Lightweight, Kubernetes-native runtime
GPU Provisioning NVIDIA GPU Operator Automatically manage GPU drivers + toolkit
GitOps Flux Declarative and auditable infra delivery
K8s Package Manager Kustomize + Helm Infra and app lifecycle management
Model Hosting Ollama (on GPU nodes) LLM serving engine
Task Coordination Custom Resource Definitions (CRDs) Define and manage complex task orchestration
Monitoring Prometheus + Grafana (optional) Cluster and model performance observability

🧩 Custom Resource Definitions (CRDs)

The system uses Kubernetes CRDs to implement the A2A (Agent-to-Agent) protocol, enabling seamless communication between AI agents. Our CRDs define both the agent deployment and task orchestration aspects of the system.

OllamaAgent

A CRD for deploying and managing individual model agents that implement the A2A protocol.

apiVersion: ai.stack/v1alpha1
kind: OllamaAgent
metadata:
  name: summarizer-agent
spec:
  # Reference to the OllamaModelDefinition
  modelDefinition:
    name: summarizer-model
    version: "1.0.0"

  # Core agent configuration
  role: summarizer

  # A2A protocol implementation
  agentCard:
    capabilities:
      - summarization
      - text-analysis
    endpoint: "/api/v1/agent"
    authentication:
      type: "bearer"

  # Resource requirements
  resources:
    gpu: 1
    memory: "8Gi"
    cpu: "2"

  # A2A server configuration
  server:
    streaming: true
    pushNotifications: true
    webhookConfig:
      retryPolicy: exponential
      maxRetries: 3

  # Model-specific settings
  modelConfig:
    temperature: 0.7
    contextWindow: 4096
    responseFormat: "json"

OllamaModelDefinition

A CRD that defines how to build a custom Ollama model with specific capabilities and behaviors. When created, it triggers the build process within the cluster.

apiVersion: ai.stack/v1alpha1
kind: OllamaModelDefinition
metadata:
  name: summarizer-model
spec:
  # Base model configuration
  from: llama2

  # Model build parameters
  build:
    # System prompt defining agent behavior
    system: |
      You are a specialized summarization agent that excels at:
      1. Extracting key information from documents
      2. Creating concise summaries
      3. Identifying main themes and topics

    # Parameters for model behavior
    parameters:
      temperature: 0.7
      contextWindow: 4096
      responseFormat: json

    # Model adaptation and fine-tuning
    template: |
      {{ if .System }}{{.System}}{{ end }}

      Context: {{.Input}}

      Instructions: Create a summary that includes:
      - Main points
      - Key findings
      - Action items

      Response format:
      {{.ResponseFormat}}

    # Custom function definitions
    functions:
      - name: extract_key_points
        description: "Extract main points from the text"
        parameters:
          type: object
          properties:
            main_points:
              type: array
              items:
                type: string
            themes:
              type: array
              items:
                type: string

    # Model tags for versioning and identification
    tags:
      version: "1.0.0"
      type: "summarizer"
      capabilities: ["text-analysis", "summarization"]

    # Resource requirements for build process
    buildResources:
      gpu: 1
      memory: "16Gi"
      cpu: "4"

status:
  phase: Building # Building, Complete, Failed
  buildStartTime: "2025-04-23T13:30:00Z"
  lastBuildTime: "2025-04-23T13:35:00Z"
  modelHash: "sha256:abc123..."
  conditions:
    - type: Built
      status: "True"
      reason: "BuildSucceeded"
      message: "Model successfully built and registered"

TaskOrchestration

A CRD that manages complex task workflows between multiple agents.

apiVersion: ai.stack/v1alpha1
kind: TaskOrchestration
metadata:
  name: document-analysis
spec:
  # Task definition
  input:
    text: "Analyze and summarize this document"
    format: "text/plain"

  # A2A task workflow
  pipeline:
    - name: document-analyzer
      agentRef: analyzer-agent
      timeout: "5m"
      retries: 2
      artifacts:
        - name: analysis-result
          type: "application/json"

    - name: summarizer
      agentRef: summarizer-agent
      dependsOn: ["document-analyzer"]
      inputFrom:
        - taskRef: document-analyzer
          artifactName: analysis-result

    - name: quality-check
      agentRef: qa-agent
      dependsOn: ["summarizer"]
      condition: "success"

  # A2A protocol settings
  communication:
    streaming: true
    pushNotifications:
      enabled: true
      endpoint: "http://callback-service/webhook"

  # Output configuration
  output:
    storage:
      type: "s3"
      bucket: "ai-results"
      prefix: "outputs/"
    format:
      - type: "application/json"
      - type: "text/markdown"

  # Error handling
  errorPolicy:
    maxRetries: 3
    backoffLimit: 600
    failureAction: "rollback"

Controller Implementation

The controllers implement the A2A protocol's core functionality:

  1. Agent Discovery:

    • Automatically generates and manages .well-known/agent.json endpoints
    • Handles capability registration and updates
    • Manages agent metadata and health checks
  2. Task Management:

    • Implements A2A task lifecycle (submitted β†’ working β†’ completed/failed)
    • Handles streaming updates via Server-Sent Events (SSE)
    • Manages task artifacts and state transitions
  3. Communication:

    • Implements A2A message formats and parts
    • Handles both synchronous and streaming communication
    • Manages push notifications and webhooks
  4. Resource Orchestration:

    • GPU allocation and scheduling
    • Memory and compute resource management
    • Model loading and unloading

πŸ” Development Setup

Development Environment

We provide a consistent development environment using VS Code Dev Containers. This ensures all developers have the same tools and versions.

  1. Prerequisites:

  2. Getting Started:

    # Clone the repository
    git clone https://github.com/yourusername/fullStackOllama.git
    cd fullStackOllama
    
    # Open in VS Code
    code .
    
    # Click "Reopen in Container" when prompted
    # or use Command Palette (F1) -> "Remote-Containers: Reopen in Container"

The dev container includes:

  • All required development tools
  • Pre-configured pre-commit hooks
  • VS Code extensions for Terraform, Go, and Kubernetes
  • AWS and Kubernetes config mounting

Alternatively, if you prefer local installation:

Pre-commit Hooks

This repository uses pre-commit hooks to ensure code quality and consistency. The following checks are performed before each commit:

  1. General Checks

    • Trailing whitespace removal
    • End of file fixing
    • YAML syntax validation
    • Large file checks
    • Merge conflict detection
    • Private key detection
  2. Terraform Checks

    • Format validation (terraform fmt)
    • Configuration validation (terraform validate)
    • Documentation updates
    • Security scanning (Checkov)
    • Linting (TFLint)
  3. Go Code Checks

    • Format validation (go fmt)
    • Code analysis (go vet)
    • Comprehensive linting (golangci-lint)
  4. Custom Validations

    • CRD syntax and structure validation
    • Model definition validation
    • Kubernetes resource validation

Setup Instructions

  1. Install pre-commit:

    brew install pre-commit
  2. Install required tools:

    brew install terraform-docs tflint checkov
    go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
  3. Install the pre-commit hooks:

    pre-commit install
  4. (Optional) Run against all files:

    pre-commit run --all-files

Continuous Integration

The same checks are run in CI/CD pipelines to ensure consistency. See the GitHub Actions workflows for details.


πŸ—οΈ Model Build Process

GitOps Workflow

The model building process follows GitOps principles, ensuring that all changes are tracked, reviewed, and automatically deployed:

  1. Model Definition

    # models/summarizer/model.yaml
    apiVersion: ai.stack/v1alpha1
    kind: OllamaModelDefinition
    metadata:
      name: summarizer-model
    spec:
      from: llama2
      build:
        system: |
          You are a specialized summarization agent...
  2. Pull Request Flow

    • Create branch: feature/add-summarizer-model
    • Add/modify model definition in models/ directory
    • Create PR with changes
    • Automated validation:
      • YAML syntax
      • Model definition schema
      • Resource requirements check
      • Security scanning
    • PR review and approval
    • Merge to main branch
  3. Flux Synchronization

    # infra/base/models/kustomization.yaml
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    resources:
      - ../../models  # Watches the models directory
    • Flux detects changes in the models/ directory
    • Applies new/modified OllamaModelDefinition to the cluster
    • Triggers the build controller
  4. Build Process

    sequenceDiagram
      participant Flux
      participant API Server
      participant Build Controller
      participant Build Job
      participant Registry
    
      Flux->>API Server: Apply OllamaModelDefinition
      API Server->>Build Controller: Notify new/modified definition
      Build Controller->>Build Job: Create build job
      Build Job->>Build Job: Execute ollama create
      Build Job->>Registry: Push built model
      Build Job->>API Server: Update status
      Build Controller->>API Server: Update conditions
    
    Loading
  5. Build Controller Actions

    • Creates a Kubernetes Job for building
    • Mounts required GPU resources
    • Executes ollama create with definition
    • Monitors build progress
    • Updates status conditions
    • Handles failures and retries
    • Registers successful builds
  6. Model Registration

    • Successful builds are registered in the cluster
    • Model becomes available for OllamaAgent instances
    • Version tracking and rollback support
    • Automatic cleanup of old versions
  7. Monitoring & Logs

    # Example build job logs
    2025-04-23T13:30:00Z [INFO] Starting build for summarizer-model
    2025-04-23T13:30:05Z [INFO] Downloading base model llama2
    2025-04-23T13:31:00Z [INFO] Applying model adaptations
    2025-04-23T13:32:00Z [INFO] Registering model summarizer-model:1.0.0
    2025-04-23T13:32:05Z [INFO] Build complete

Security Considerations

  • All model definitions are version controlled
  • PR reviews ensure quality and security
  • Base models are pulled from trusted sources
  • Build jobs run in isolated environments
  • Resource limits are strictly enforced
  • Model provenance is tracked and verified

Resource Management

  • Build jobs are scheduled based on GPU availability
  • Parallel builds are supported with resource quotas
  • Failed builds are automatically cleaned up
  • Successful builds are cached for reuse
  • Version tags ensure reproducibility

πŸš€ Getting Started

Prerequisites

  • Kubernetes cluster with GPU-enabled nodes (AWS EKS, GKE, or bare-metal)
  • NVIDIA GPU Operator installed
  • Kubectl + Kustomize + Helm
  • Golang (for controller development)

Deployment Steps

  1. Set up Infrastructure
# Deploy the EKS cluster using Terraform
cd infra/cluster-iac
terraform init
terraform apply
  1. Bootstrap Flux

The repository includes a bootstrap script to set up Flux with the correct configuration:

# Option 1: Using environment variable
export GITHUB_TOKEN=your_github_token
./scripts/bootstrap-flux.sh

# Option 2: Passing token directly
./scripts/bootstrap-flux.sh -t your_github_token

# Additional options available:
./scripts/bootstrap-flux.sh -h  # Show help

The bootstrap script will:

  • Install Flux CLI if not present
  • Clean up any existing Flux installation
  • Configure Flux with your GitHub repository
  • Set up monitoring and logging components
  • Verify the installation and show status
  1. Apply CRDs
kubectl apply -f crds/

4. **Deploy example agents**
kubectl apply -f ollama-operators/service-deployments/

5. **Submit an orchestration task**
kubectl apply -f examples/question-answering/task.yaml

πŸ“Έ Diagrams

See docs/architecture.md and docs/orchestration-diagram.png for detailed system visuals.


🀝 Contributing

This project is a personal and professional showcase. However, contributors are welcome! PRs, Issues, and suggestions encouraged.


πŸ“š Learning Goals

This project is also a journey of exploration. Through it, we aim to learn and demonstrate:

  • GPU scheduling with Kubernetes
  • Multi-agent AI orchestration
  • Building CRDs and operators with Go
  • Best practices in GitOps and cloud-native ML
  • Open-source model hosting and scaling

πŸ“œ License

MIT License


πŸ”— Related Projects

About

Adding in some descrptions, exapnding comments and providing use-cases

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HCL 55.0%
  • Shell 35.4%
  • Dockerfile 9.6%