Claritty is an open-source, cloud-native AI Site Reliability Engineering platform for Kubernetes clusters. It combines real-time cluster telemetry with a 6-stage AI agent pipeline to automatically detect, diagnose, and remediate incidents, reducing MTTR from hours to minutes.
You can quickly install and deploy Claritty without needing to build from source. Choose the mode you want to use.
Download the pre-compiled binary to your local machine to diagnose your clusters instantly:
# 1. Download the latest binary (Linux/macOS)
curl -sL https://raw.githubusercontent.com/Vaishnav88sk/claritty/clarctl-go/clarctl-go/install.sh | bash# 2. Run help
clarctl -h# 3. Run a scan!
clarctl scanDeploy the centralized dashboard and the agent into your clusters for continuous monitoring. For detailed steps, see INSTALLATION.md.
Start the Hub Server (Dashboard)
# Run the Hub via Docker Compose
export DATABASE_URL="postgresql://user:pass@host:5432/claritty?sslmode=require"curl -sL https://raw.githubusercontent.com/Vaishnav88sk/claritty/master/sre-agent/docker-compose.yml -o docker-compose.yml
docker-compose up -d
# View dashboard at http://localhost:8822Deploy the Agent to your Clusters
# Apply agent manifests
kubectl apply -f https://raw.githubusercontent.com/Vaishnav88sk/claritty/master/sre-agent/deploy/agent-rbac.yaml
kubectl apply -f https://raw.githubusercontent.com/Vaishnav88sk/claritty/master/sre-agent/deploy/agent-configmap.yaml
kubectl apply -f https://raw.githubusercontent.com/Vaishnav88sk/claritty/master/sre-agent/deploy/agent-deployment.yaml(Remember to update the ConfigMap with your specific Hub IP and Cluster Name!)
Claritty provides two powerful ways to interact with your Kubernetes infrastructure, depending on your needs:
A powerful command-line interface run from your local machine. It connects to your current Kubernetes context to instantly analyze namespaces or specific pods, generate an RCA (Root Cause Analysis), and offer interactive, step-by-step remediation commands. Perfect for on-call engineers debugging live incidents.
A lightweight, in-cluster daemon (the Agent) that continuously monitors your infrastructure. It autonomously performs the 6-stage AI pipeline on failing resources and pushes structured incident reports to a centralized Hub server. The Hub provides a beautiful web dashboard for a multi-cluster overview, Slack alerts, and detailed RCA records. Perfect for production monitoring.
- π Node-level & Pod-level Metrics: Real-time CPU, memory, and resource usage collection.
- β‘ Auto Incident Detection: Detects complex cascading failures, API server throttling, DNS resolution timeouts, Split-Brain StatefulSets, network partition deadlocks, alongside standard CrashLoopBackOff, OOMKilled, and Pending states.
- π§ 6-Stage AI Agent Pipeline: Triage -> Metrics -> Logs -> Infra -> Runbook -> Commander agents collaboratively diagnose root causes.
- π¨ Interactive Auto-Remediation (CLI): Proposes step-by-step kubectl fixes locally. Prompts
y / dry / nbefore executing anything. - π Centralized Dashboard (Agent): Web UI to view multi-cluster health, active incidents, and automated remediation plans.
- π Safety First: All remediation commands are validated against a strict allowlist. Destructive commands are flagged.
- π Built-in Runbooks: Battle-tested YAML runbooks for common failure modes embedded directly in the logic.
- π Incident History: Database-backed incident logging with MTTR tracking and status lifecycle.
Runs locally on the engineer's machine.
Developer Terminal -> clarctl -> Kubeconfig -> K8s API -> AI Pipeline -> Terminal Output
Cluster A (prod) βββΊ claritty-agent ββ
Cluster B (dev) βββΊ claritty-agent ββΌβββΊ Hub Server (port 8822) βββΊ Web Dashboard + Slack Alerts
Cluster C (qa) βββΊ claritty-agent ββ β
PostgreSQL Database
If you want to contribute, modify the code, or build from source:
# Clone the repository
git clone https://github.com/Vaishnav88sk/claritty.git
cd claritty
# Building the CLI
cd clarctl-go
go mod tidy
go build -o clarctl .
# Running the Hub from source
cd ../sre-agent/hub
export DATABASE_URL="postgresql://user:pass@host:5432/claritty"
go run .
# Running the Agent from source locally
cd ../agent
export CLARITTY_CLUSTER_NAME="local-dev"
export CLARITTY_HUB_URL="http://localhost:8822"
export GROQ_API_KEY="your_key_here"
go run .Running clarctl scan namespace prod when a pod is crash-looping:
[Claritty] Scanning namespace 'prod'...
[!] Detected issue: payment-service-84f9b8c-x2z9 (CrashLoopBackOff)
[AI Pipeline] Triage -> Logs -> Metrics -> Infra -> Commander...
π¨ ROOT CAUSE (SEV 1 - 95% Confidence):
The payment-service pod is failing to start because it cannot connect to the Redis cache at 'redis.prod.svc.cluster.local:6379'. Connection refused.
π§ PROPOSED REMEDIATION:
Step 1: Check if the Redis service is running.
Command: kubectl get svc redis -n prod
Execute? [y/N/dry]: y
...
When the sre-agent runs in the cluster, it pushes structured JSON to the Hub:
{
"cluster": "prod-us-east",
"namespace": "billing",
"severity": "SEV2",
"title": "OOMKilled Event on Invoice Generator",
"root_cause": "Container 'worker' exceeded its memory limit of 512Mi. Last usage spike reached 512.4Mi during a large PDF generation task.",
"remediation_plan": [
{
"step_number": 1,
"description": "Increase memory limits for the invoice deployment.",
"command": "kubectl set resources deployment invoice-generator -n billing --limits=memory=1Gi",
"is_destructive": false
}
]
}Claritty's pipeline is trained to handle a vast array of Kubernetes failure states:
- Pod Lifecycle Failures:
CrashLoopBackOff,ImagePullBackOff,CreateContainerConfigError. - Resource Starvation:
OOMKilled, CPU Throttling, Node Disk Pressure. - Network Issues: Service resolution failures, DNS timeouts, missing endpoints.
- Storage Issues: Unbound PersistentVolumeClaims, mounting failures.
- RBAC & Security: Unauthorized API calls, missing service account permissions.
| Feature | Claritty | OpenSRE | Datadog / New Relic | Prometheus/Thanos | Robusta |
|---|---|---|---|---|---|
| In-cluster agent | β Deployment 1 replica | β Sidekick framework | β | β | β |
| AI-powered RCA | β 6-stage LLM pipeline | β Episodic Memory LLM | β (Mostly manual) | β | Partial |
| Multi-cluster hub | β Open Source Hub | β Slack/API focused | β SaaS | β Thanos | β SaaS |
| Self-hosted | β | β | β SaaS only | β | Partial |
| Cost | Free / Open Source | Free / Open Source | $$$$ | Free | Free/Paid |
- CLI for local cluster diagnosis.
- Multi-agent collaborative LLM pipeline.
- Agent deployment for continuous in-cluster monitoring.
- Hub server & Web UI for multi-cluster overview.
- PostgreSQL persistence & Slack integration.
- Add complete K8s observability next (Custom metrics, distributed tracing integration, eBPF network flows).
Claritty is actively maintained and built for modern SRE teams. Contributions and feedback are welcome!


