Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions src/pages/guide/observability/builder-observability.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
title: Builder Observability
description: Internal tooling and data sources for investigating Tempo block building and validation performance, outliers, and execution breakdowns.
---

# Builder Observability

Tooling and workflows for investigating Tempo block building and validation performance. The goal is to quickly identify outliers (slow builds, timed-out proposals, slow validations) and drill into per-node execution breakdowns.

## Workflow

The investigation loop:

1. **Spot an outlier** — monitor [ValScope](#valscope) or Grafana dashboards for slow proposals, builds, or validations
2. **Inspect the network view** — open the block/view in ValScope to see per-validator timelines across the network
3. **Drill into execution** — jump to [BlockScope](#blockscope) for a detailed per-node execution breakdown (traces, spans, timeline)

## Tools

### ValScope

Real-time validator monitoring dashboard. Ingests consensus and execution logs from all validators, correlates events into per-block timelines, and serves a live web UI.

- **Repo:** [tempoxyz/valscope](https://github.com/tempoxyz/valscope)
- **Testnet:** `dev-joshie:3004` (Tailscale)
- **Mainnet:** `dev-joshie:3005` (Tailscale)

**What it shows:**
- Live block and view tables with validator health stats
- Per-block swim-lane timelines showing events across all validators
- Consensus analytics — gas vs quorum scatter, quorum latency, receive delay heatmap
- Execution analytics — gas vs build time, build time dumbbell, persistence metrics
- Nullified (failed) consensus views

**Key pages:**
| Page | Route | Description |
|---|---|---|
| Overview | `/` | Live block + view tables, validator health |
| Consensus | `/consensus` | Quorum latency, receive delays |
| Execution | `/execution` | Build times, persistence metrics |
| Block Detail | `/blocks/:height` | Full event timeline for a committed block |
| View Detail | `/epoch/:epoch/views/:view` | Full event timeline for a consensus view |

**Validator configs:**
- [Testnet validators](https://github.com/tempoxyz/valscope/blob/main/apps/api/validators.toml)
- [Mainnet validators](https://github.com/tempoxyz/valscope/blob/main/apps/api/validators-mainnet.toml)

### BlockScope

Execution-level dashboard for comparing block processing across clients. Shows per-block trace breakdowns, execution timelines, and mempool overlap analysis.

- **Repo:** [tempoxyz/blockscope](https://github.com/tempoxyz/blockscope)
- **Current deploy:** `dev-alexey:5173` (Tailscale, port-forwarded — being migrated)

**What it shows:**
- Block-by-block comparison across execution clients (reth, nethermind, ethrex)
- Per-block execution trace timeline (state root, sub-blocks, EVM execution)
- Mempool overlap analysis — how much of each block was in the local txpool
- Per-builder block history with overlap stats

**Key pages:**
| Page | Route | Description |
|---|---|---|
| Overview | `/` | Block comparison table across clients |
| Block Detail | `/blocks/:height` | Execution breakdown with trace timeline |
| Mempool | `/mempool` | Gas usage vs overlap scatter plot |
| Builder Detail | `/builder/:name` | Per-builder block history |

## Data Sources

All endpoints are internal Tailscale hostnames — requires being on the Tempo tailnet.

| Service | Env Var | What it does | Testnet | Mainnet |
|---|---|---|---|---|
| External VLogs | `VLOGS_URL` | Logs from partner/external validators (VictoriaLogs) | `dev-euw-vl-partners.tail388b2e.ts.net` | _(none)_ |
| Internal VLogs | `VLOGS_INTERNAL_URL` | Logs from Tempo's own nodes — structured reth output during build/validation (VictoriaLogs) | `stg-nae-vl-internal.tail388b2e.ts.net` | `prd-nae-vl-internal.tail388b2e.ts.net` |
| VM External | `VM_EXTERNAL_URL` | Prometheus-style metrics (block times, gas, peers) from partner nodes (VictoriaMetrics) | `dev-euw-vm-partners.tail388b2e.ts.net` | same (namespace-filtered) |
| VM Internal | `VM_INTERNAL_URL` | Prometheus-style metrics (CPU, memory, block processing) from Tempo's own nodes (VictoriaMetrics) | `stg-nae-vm-internal.tail388b2e.ts.net` | `prd-nae-vm-internal.tail388b2e.ts.net` |
| Tempo Traces | `TEMPO_URL` | Distributed traces/spans — powers execution timeline breakdowns. **Internal nodes only.** (Grafana Tempo) | `stg-nae-grafana-tempo.tail388b2e.ts.net` | `prd-nae-grafana-tempo.tail388b2e.ts.net` |
| Namespace | `NETWORK` | Cluster/namespace selector | `moderato-stable` | `tempo-mainnet-stable` |

## Known Outlier Patterns

Issues surfaced through monitoring:

- **Execution cache mutex contention** — `Updated execution cache` blocked for 400ms+ during fork/reorg scenarios. Tracked in [RETH-498](https://linear.app/tempoxyz/issue/RETH-498)
- **Late build start** — building starts after the view has already begun, reducing available build time. See [tempo#2952](https://github.com/tempoxyz/tempo/pull/2952)
- **Persistence during building** — disk persistence overlapping with block building, observed on memory-constrained machines
- **Long-running newPayload** — inability to cancel an in-progress `newPayload` execution

## Limitations

- **External validators have no traces/spans** — only logs and metrics are available for partner nodes. Detailed execution breakdowns (Grafana Tempo) are internal-only.
- **New instrumentation requires a release** — adding new spans or logs to testnet/mainnet requires shipping a new Tempo version. Existing instrumentation must be used until then.
- **ValScope log parsing limitations** — currently parses log lines with regex, which can be slow and sometimes misses events that need timestamp-based correlation.
10 changes: 10 additions & 0 deletions vocs.config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -551,6 +551,16 @@ export default defineConfig({
},
],
},
{
text: 'Observability',
collapsed: true,
items: [
{
text: 'Builder Observability',
link: '/guide/observability/builder-observability',
},
],
},
// {
// text: 'Infrastructure & Tooling',
// items: [
Expand Down
Loading