Infrastructure observability that complements Langfuse for complete AI system visibility. While Langfuse tracks LLM interactions, this stack monitors the distributed systems, service mesh, and infrastructure beneath.
| Layer | Langfuse Provides | This Stack Provides |
|---|---|---|
| LLM | Prompts, completions, token usage | - |
| Application | LLM call traces, evaluations | Service dependencies, distributed traces |
| Infrastructure | - | Container metrics, network I/O, resource usage |
| Data | - | GraphRAG performance, cache efficiency |
| Automation | - | Backup health, git hook triggers |
Key Integration: Both systems share trace IDs via OTLP, enabling end-to-end debugging from LLM call to infrastructure.
# 1. Start the stack
docker compose -f docker-compose.grafana.yml up -d
# 2. Verify health
curl -s http://prometheus.local:9090/api/v1/query?query=up | jq '.data.result[].metric.job'
# 3. Open Grafana
open http://grafana.local # Login: admin/admin- Service Dependency Mapping - See what calls what in your AI architecture
- Distributed Transaction Tracing - Follow requests across MCP servers, databases, and services
- Memory Loop Detection - GraphRAG-specific patterns not visible in LLM traces
- Infrastructure Correlation - Link slow AI responses to resource constraints
- Automated Backup System - Git-driven configuration management with health monitoring
- Quick Start Guide - 5-minute setup with Langfuse integration
- Trace Correlation - Link Langfuse and Tempo traces
- MCP Instrumentation - OpenTelemetry for MCP servers
- Operations Guide - Visual patterns and troubleshooting
- Integration Examples - Real-world scenarios
- Learn More - External resources and documentation
| Service | URL | Purpose |
|---|---|---|
| Grafana | http://grafana.local | Visualization dashboard |
| Prometheus | http://prometheus.local | Metrics storage |
| Tempo | http://tempo.local | Distributed tracing |
| Loki | http://loki.local | Log aggregation |
| Alloy | http://alloy.local | OTLP collector |
# Check all services are running
curl -s http://prometheus.local:9090/api/v1/query?query=up | \
jq '.data.result[] | select(.value[1]=="0") | .metric.job' || \
echo "✅ All exporters up"
# Check current AI operations load
curl -s http://prometheus.local:9090/api/v1/query?query='rate(mcp_tool_invocations_total[1m])' | \
jq '.data.result[0].value[1]' | \
xargs printf "Tool calls/min: %.0f\n"- Docker with Docker Compose
- OrbStack (for automatic *.local domains)
- 4GB RAM minimum, 8GB recommended
- Metrics Retention: 90 days (Prometheus)
- Trace Retention: 30 days (Tempo)
- Log Retention: 3 days (Loki)
- Scrape Intervals: 15-30 seconds
Private project - All rights reserved