Skip to content

IvanMerrill/compass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

COMPASS

Comprehensive Observability Multi-Agent Platform for Adaptive System Solutions

AI-powered incident investigation platform that reduces MTTR by 67-90% using parallel OODA loops, ICS principles, and scientific methodology.


What is COMPASS?

The Problem: Traditional incident investigation tools require senior engineers to manually connect dots between metrics, logs, and traces. Average MTTR: 2-4 hours. Knowledge concentrated in a few experts.

The Solution: COMPASS uses AI agents with scientific methodology to systematically test hypotheses in parallel, filtering out noise and presenting only high-confidence root causes to humans.

Key Differentiators:

  • πŸ§ͺ Scientific rigor: Systematic hypothesis disproof (not just pattern matching)
  • ⚑ Parallel OODA loops: 5+ agents testing simultaneously (10x faster than sequential)
  • πŸ€– Bring your own LLM: OpenAI, Anthropic, or any provider (cost-controlled)
  • πŸ‘₯ Learning Teams approach: Focus on contributing causes, not blame
  • πŸ“„ Automatic post-mortems: Markdown documentation for every investigation
  • πŸ’° Cost-aware: $10/investigation routine, $20 critical (transparent budgets)

Current Status: Production-grade foundation ready for Database Agent implementation


Project Status

πŸš€ Phase 5 Complete - Multi-Agent Orchestrator (Production-Ready)

Current Capabilities:

  • βœ… Multi-Agent Orchestration - Sequential dispatch of Application, Database, Network agents
  • βœ… Production-grade agents - ApplicationAgent and NetworkAgent with 95%+ test coverage
  • βœ… CLI Interface - investigate-orchestrator command with budget management
  • βœ… Cost Control - Per-agent budget tracking, transparent cost breakdown
  • βœ… Hypothesis Ranking - Confidence-based ranking across all agents
  • βœ… Graceful Degradation - Continues investigation even if agents fail
  • βœ… OpenTelemetry Tracing - Distributed tracing from day 1

Recent Achievements (Phase 5):

  • Orchestrator: Sequential multi-agent coordination (15/15 tests passing)
  • Competitive Review: Agent Beta promoted for architectural simplification
  • Complexity Reduction: Removed ThreadPoolExecutor (saved 4 hours, zero threading bugs)
  • CLI Integration: Full investigation workflow from command line
  • Documentation: Comprehensive decision rationale and design docs

Previous Achievements:

  • Day 4: Agent LLM/MCP integration, ADR documentation (Handoff)
  • Day 3: OpenAI/Anthropic integration, fixed 8 critical bugs (Report)
  • Day 2: Scientific framework with quality-weighted confidence scoring (Report)

Next: Post-implementation competitive review, then Phase 6 optimization

Last Updated: 2025-11-21


Quick Start

Multi-Agent Investigation (Phase 5 - Production-Ready)

Investigate an incident using orchestrated multi-agent system:

# Simple investigation
python -m compass.cli.main investigate-orchestrator INC-12345

# With budget and affected services
python -m compass.cli.main investigate-orchestrator INC-12345 \
  --budget 15.00 \
  --affected-services payment,checkout \
  --severity critical

What you get:

  • Sequential dispatch of Application, Database, and Network agents
  • Observations consolidated from all agents
  • Top 5 hypotheses ranked by confidence
  • Per-agent cost breakdown with budget utilization

Example Output:

πŸ” Initializing investigation for INC-12345
πŸ’° Budget: $15.00
πŸ“Š Affected Services: payment, checkout
⚠️  Severity: critical

πŸ“Š Observing incident (sequential agent dispatch)...
βœ… Collected 12 observations

🧠 Generating hypotheses...
βœ… Generated 5 hypotheses

πŸ† Top Hypotheses (ranked by confidence):

1. [network] DNS resolution timeout detected
   Confidence: 92.00%

2. [application] High error rate in payment service
   Confidence: 85.00%

3. [database] Connection pool nearing exhaustion
   Confidence: 78.00%

πŸ’° Cost Breakdown:
  Application: $2.1500
  Database:    $1.8500
  Network:     $0.9500
  ─────────────────────────
  Total:       $4.9500 / $15.00
  Utilization: 33.0%

Try It with Demo Environment (Full Stack)

Complete demo environment with real observability stack:

# 1. Start demo environment
./scripts/run-demo.sh

# 2. Trigger an incident (missing index, lock contention, or pool exhaustion)
./scripts/trigger-incident.sh missing_index

# 3. Investigate with COMPASS (classic mode)
poetry run compass investigate \
  --service payment-service \
  --symptom "slow database queries and high latency" \
  --severity high

Full demo guide: DEMO.md (~10 minutes first run)

For Contributors

  1. Start here: docs/product/COMPASS_Product_Reference_Document_v1_1.md
  2. Understand the architecture: docs/architecture/COMPASS_MVP_Architecture_Reference.md
  3. Build guide: docs/guides/COMPASS_MVP_Build_Guide.md
  4. Development workflow: docs/guides/compass-tdd-workflow.md

Project Structure

compass/
β”œβ”€β”€ docs/                      # All documentation
β”‚   β”œβ”€β”€ architecture/          # System architecture documents
β”‚   β”œβ”€β”€ product/               # Product strategy and requirements
β”‚   β”œβ”€β”€ guides/                # Build guides and workflows
β”‚   β”œβ”€β”€ reference/             # Quick references and indexes
β”‚   └── research/              # Research papers (PDFs)
β”‚
β”œβ”€β”€ src/                       # Source code (in development)
β”‚   β”œβ”€β”€ compass/               # Main Python package
β”‚   β”‚   β”œβ”€β”€ core/             # OODA loop, scientific framework
β”‚   β”‚   β”œβ”€β”€ agents/           # Agent implementations
β”‚   β”‚   β”œβ”€β”€ cli/              # CLI interface
β”‚   β”‚   β”œβ”€β”€ api/              # API server
β”‚   β”‚   └── integrations/     # MCP integrations
β”‚   └── tests/                 # Test suite
β”‚
β”œβ”€β”€ planning/                  # Planning conversations
β”‚   β”œβ”€β”€ conversations/         # Original HTML chats
β”‚   └── transcripts/          # Extracted text transcripts
β”‚
β”œβ”€β”€ examples/                  # Example configurations and templates
β”‚   β”œβ”€β”€ configurations/        # Sample YAML configs
β”‚   └── templates/            # Agent templates
β”‚
β”œβ”€β”€ deployment/                # Deployment configurations
β”‚   β”œβ”€β”€ k8s/                  # Kubernetes manifests
β”‚   └── docker/               # Docker files
β”‚
└── scripts/                   # Utility scripts

Core Concepts

What is COMPASS?

COMPASS uses AI agents organized according to Incident Command System (ICS) principles to investigate incidents using parallel OODA loops and scientific methodology.

Key Differentiators:

  • Parallel OODA Loops: 5+ agents test hypotheses simultaneously
  • Scientific Rigor: Systematic hypothesis disproof before human escalation
  • Learning Culture: Learning Teams methodology vs traditional RCA
  • Human-in-the-Loop: Level 1 autonomy - AI proposes, humans decide

Technology Stack

  • Language: Python only (readability over complexity)
  • Database: PostgreSQL + pgvector
  • Observability: LGTM stack (Loki, Grafana, Tempo, Mimir)
  • Deployment: Kubernetes (Tilt for local dev)
  • LLM: Provider agnostic (OpenAI, Anthropic, Copilot, Ollama)

Architecture Highlights

Agent Hierarchy (ICS-based):

Orchestrator
    β”œβ”€β”€ Database Manager β†’ Workers
    β”œβ”€β”€ Network Manager β†’ Workers
    β”œβ”€β”€ Application Manager β†’ Workers
    └── Infrastructure Manager β†’ Workers

OODA Loop Phases:

  1. Observe: Parallel data gathering
  2. Orient: Hypothesis generation and ranking
  3. Decide: Human decision points
  4. Act: Evidence gathering and hypothesis testing

Documentation Map

Essential Reading (Start Here)

  1. Product Overview

  2. Architecture

  3. Build Guides

Quick References

Specialized Topics

Scientific Framework:

Enterprise Features:

Human-AI Interface:

Research Papers (in docs/research/):

  • ICS-Based Multi-Agent AI Systems for Incident Investigation
  • Evaluation of Learning Teams vs Root Cause Analysis
  • Problems with Root Cause Analysis

Development Status

βœ… Completed

  • Product vision and requirements
  • Complete architecture design
  • Scientific framework specification
  • Multi-agent coordination design
  • Enterprise knowledge integration design
  • CLI interface design
  • Prototype code (scientific framework, database agent)
  • Comprehensive documentation
  • Test framework design

πŸ—οΈ In Progress

  • MVP implementation (not started)

πŸ“‹ Roadmap

Phase 1: Foundation (Weeks 1-2)

  • Basic LGTM integration
  • Single agent (database)
  • CLI interface
  • Cost tracking

Phase 2: Trust (Weeks 3-4)

  • Hypothesis confidence scoring
  • Evidence linking
  • Graceful failure handling

Phase 3: Value (Weeks 5-6)

  • Pattern learning
  • Personal runbooks
  • Metrics tracking

Finding Information

Search Planning Conversations

All planning conversations are indexed and searchable:

# Search the conversation index
grep -i "topic_name" docs/reference/COMPASS_CONVERSATIONS_INDEX.md

# Example: Find information about cost management
grep -i "cost" docs/reference/COMPASS_CONVERSATIONS_INDEX.md

See docs/reference/INDEXING_SYSTEM_SUMMARY.md for detailed usage.

Documentation by Topic

  • Getting Started: docs/guides/
  • Architecture Details: docs/architecture/
  • Product Strategy: docs/product/
  • Research Background: docs/research/
  • Planning History: planning/

Key Design Principles

From docs/guides/claude.md:

  1. Production-First: Every component production-ready from inception
  2. Test-Driven Development: TDD rigorously from day 1
  3. OODA Loop Focus: Optimize for iteration speed over perfect analysis
  4. Scientific Method: Systematically disprove hypotheses before presenting
  5. Human Authority: Humans decide, AI advises and accelerates
  6. Cost Management: Token budget caps, transparent pricing
  7. Learning Culture: Focus on contributing causes, not blame

Contributing

See development guides:


License

[To be determined]


Contact

[To be added]


Ready to build! See docs/guides/COMPASS_MVP_Build_Guide.md to get started.

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages