feat(benchmarking): enable tests to run in dedicated environment or in docker by chatton · Pull Request #3157 · evstack/ev-node

chatton · 2026-03-12T15:12:45Z

Overview

This PR

enables the tests to run with existing infra via the BENCH_* variables.
Jaeger docker implementation switched to victoria traces to match what we deploy in real environments
Added a richSpan type which is used to output a flowchart which captures spans and subspans

Below are a sample of runs (locally with docker) which we can use to inform settings for the dedicated hardware runs.

Throughput Results (sorted by MGas/s)

  ┌─────┬───────┬────────────────────────┬──────────┬────────┬────────┬───────┬──────────┬─────────────┬──────────────┐
  │  #  │  Run  │  Block / Gas / Scrape  │ Spammers │ MGas/s │ s/Ggas │  TPS  │ blocks/s │ non-empty % │    Status     │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 1   │ run21 │ 500ms / 2G / 250ms     │        8 │  383.0 │   2.61 │  98.0 │     0.21 │        57.0 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 2   │ run22 │ 500ms / 3G / 250ms     │        8 │  374.0 │   2.67 │  91.0 │     0.43 │       100.0 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 3   │ run18 │ 250ms / 1G / 250ms     │        6 │  361.8 │   2.76 │ 106.8 │     0.44 │        58.1 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 4   │ run19 │ 250ms / 2G / 250ms     │        8 │  353.3 │   2.83 │ 101.4 │     0.54 │       100.0 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 5   │ run23 │ 1s / 2G / 250ms        │        8 │  350.0 │   2.85 │  87.0 │     0.21 │        68.0 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 6   │ run25 │ 1s / 5G / 250ms        │       10 │  337.0 │   2.97 │  72.0 │     0.16 │       100.0 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 7   │ run24 │ 1s / 3G / 250ms        │        8 │  332.0 │   3.01 │ 132.0 │     0.38 │        75.0 │ PASS (retry) │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 8   │ run20 │ 500ms / 1G / 250ms     │        6 │  310.1 │   3.23 │  90.0 │     0.35 │        57.4 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 9   │ run07 │ 250ms / 300M / 250ms   │        4 │  297.1 │   3.37 │  59.5 │     0.99 │        48.6 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 10  │ run11 │ 500ms / 500M / 250ms   │        4 │  288.7 │   3.46 │  57.8 │     0.58 │        49.8 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 11  │ run04 │ 100ms / 100M / 50ms    │        4 │  283.4 │   3.53 │  56.8 │     2.84 │        49.9 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 12  │ run14 │ 1s / 500M / 50ms       │        4 │  245.5 │   4.07 │  49.2 │     0.99 │       100.0 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 13  │ run10 │ 500ms / 300M / 250ms   │        4 │  235.8 │   4.24 │  47.3 │     0.79 │        49.8 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 14  │ run13 │ 1s / 500M / 250ms      │        4 │  226.3 │   4.42 │  45.3 │     0.45 │        49.8 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 15  │ run03 │ 100ms / 100M / 250ms   │        4 │  200.5 │   4.99 │  40.2 │     2.01 │        29.6 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 16  │ run06 │ 250ms / 100M / 250ms   │        4 │  190.6 │   5.25 │  38.2 │     1.91 │        48.8 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 17  │ run12 │ 1s / 300M / 250ms      │        4 │  189.4 │   5.28 │  37.9 │     0.63 │        62.9 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 18  │ run15 │ 2s / 500M / 250ms      │        4 │  166.4 │   6.01 │  33.3 │     0.33 │        65.8 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 19  │ run02 │ 100ms / 30M / 50ms     │        4 │  163.8 │   6.10 │  32.8 │     5.47 │        55.4 │ PASS (retry) │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 20  │ run17 │ 2s / 500M / 50ms       │        4 │  125.4 │   7.98 │  25.1 │     0.50 │       100.0 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 21  │ run09 │ 500ms / 100M / 50ms    │        4 │  117.5 │   8.51 │  23.5 │     1.18 │        58.8 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 22  │ run08 │ 500ms / 100M / 250ms   │        4 │  115.5 │   8.66 │  23.1 │     1.16 │        57.6 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 23  │ run16 │ 2s / 300M / 250ms      │        4 │   96.4 │  10.37 │  19.3 │     0.32 │        63.7 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ 24  │ run05 │ 250ms / 30M / 250ms    │        4 │   64.9 │  15.40 │  13.0 │     2.17 │        54.3 │ PASS         │
  ├─────┼───────┼────────────────────────┼──────────┼────────┼────────┼───────┼──────────┼─────────────┼──────────────┤
  │ —   │ run01 │ 100ms / 30M / 250ms    │        4 │      — │      — │     — │        — │           — │ FAILED       │
  └─────┴───────┴────────────────────────┴──────────┴────────┴────────┴───────┴──────────┴─────────────┴──────────────┘

Summary by CodeRabbit

Release Notes

New Features
- Benchmarking tests now support dynamic configuration via environment variables
- Added ASCII flowchart visualization for distributed trace analysis
- Switched tracing backend for improved trace collection and analysis capabilities
Tests
- Enhanced benchmark test suite with improved trace result collection and reporting
- Updated benchmark tests to use configuration-driven approach for greater flexibility

…mark - Create test/e2e/benchmark/ subpackage with SpamoorSuite (testify/suite) - Move spamoor smoke test into suite as TestSpamoorSmoke - Split helpers into focused files: traces.go, output.go, metrics.go - Introduce resultWriter for defer-based benchmark JSON output - Export shared symbols from evm_test_common.go for cross-package use - Restructure CI to fan-out benchmark jobs and fan-in publishing - Run benchmarks on PRs only when benchmark-related files change

Resolve conflicts keeping the benchmark suite refactoring: - benchmark.yml: keep path filters and suite-style test command - evm_spamoor_smoke_test.go: keep deleted (moved to benchmark pkg) - evm_test_common.go: keep exported types, drop writeTraceBenchmarkJSON (now in benchmark/output.go)

go test sets the working directory to the package under test, so the env var should be relative to test/e2e/benchmark/, not test/e2e/.

go test treats all arguments after an unknown flag (--evm-binary) as test binary args, so ./benchmark/ was never recognized as a package pattern.

go test sets the cwd to the package directory (test/e2e/benchmark/), so the binary path needs an extra parent traversal.

The benchmark package doesn't define the --binary flag that test-e2e passes. It has its own CI workflow so it doesn't need to run here.

…nfig collectBlockMetrics hit reth's 20K FilterLogs limit at high tx volumes. Replace with direct header iteration over [startBlock, endBlock] and add Phase 1 metrics: non-empty ratio, block interval p50/p99, gas/block and tx/block p50/p99. Optimize spamoor configuration for 100ms block time: - --slot-duration 100ms, --startup-delay 0 on daemon - throughput=50 per 100ms slot (500 tx/s per spammer) - max_pending=50000 to avoid 3s block poll backpressure - 5 staggered spammers with 50K txs each Results: 55 MGas/s, 1414 TPS, 19.8% non-empty blocks (up from 6%).

- Move startBlock capture after spammer creation to exclude warm-up - Replace 20s drain sleep with smart poll (waitForDrain) - Add deleteAllSpammers cleanup to handle stale spamoor DB entries - Lower trace sample rate to 10% to prevent Jaeger OOM

- make reth tag configurable via EV_RETH_TAG env var (default pr-140) - fix OTLP config: remove duplicate env vars, use http/protobuf protocol - use require.Eventually for host readiness polling - rename requireHTTP to requireHostUp - use non-fatal logging in resultWriter.flush deferred context - fix stale doc comment (setupCommonEVMEnv -> SetupCommonEVMEnv) - rename loop variable to avoid shadowing testing.TB convention - add block/internal/executing/** to CI path trigger - remove unused require import from output.go

# Conflicts: # scripts/test.mk

# Conflicts: # test/e2e/benchmark/suite_test.go

move EV_RETH_TAG resolution and rpc connection limits into setupEnv so all benchmark tests share the same reth configuration. lower ERC20 spammer count from 5 to 2 to reduce resource contention on local hardware while keeping the loop for easy scaling on dedicated infra.

- add blockMetricsSummary with summarize(), log(), and entries() methods - add evNodeOverhead() for computing ProduceBlock vs ExecuteTxs overhead - add collectTraces() suite method to deduplicate trace collection pattern - add addEntries() convenience method on resultWriter - slim TestERC20Throughput from ~217 to ~119 lines - reuse collectTraces in TestSpamoorSmoke

Bumps tastora to pick up host network support in the spamoor builder. Spamoor in external mode now uses host networking so it can resolve the same hostnames as the host machine.

Record startTime when the provider is created and use it as the lower bound for trace queries, preventing spans from previous runs being included in the analysis.

…run-without-infra # Conflicts: # .github/workflows/benchmark.yml # test/e2e/benchmark/gasburner_test.go # test/e2e/benchmark/helpers.go # test/e2e/benchmark/spamoor_erc20_test.go # test/e2e/benchmark/suite_test.go # test/e2e/benchmark/traces.go # test/e2e/go.mod # test/e2e/go.sum

…metrics - Make gas_units_to_burn, max_wallets, num_spammers, throughput, and warmup_txs configurable via BENCH_* env vars - Add rethExecutionRate() for ev-reth GGas/s measurement - Add engineSpanEntries() for ProduceBlock/GetPayload/NewPayload timing - Switch local benchmarks from Jaeger to VictoriaTraces - Add setupExternalEnv for running against pre-deployed infrastructure - Update tastora to 2ee1b0a (victoriatraces support)

- benchConfig consolidates all BENCH_* env vars into a single struct constructed once per test via newBenchConfig(serviceName) - benchmarkResult collects all output metrics (block summary, overhead, GGas/s, engine span timing, seconds_per_gigagas, span averages) and produces entries via a single entries() call - Removes scattered envInt/envOrDefault calls from test files - Removes manual entry-by-entry result assembly from each test - Net reduction of ~129 lines across existing files

coderabbitai · 2026-03-12T15:13:00Z

📝 Walkthrough

Walkthrough

Refactors e2e benchmark infrastructure from hard-coded configuration to environment-variable-driven setup, replaces Jaeger tracing with VictoriaTraces backend, and introduces ASCII flowchart visualization and unified benchmark result aggregation across tests.

Changes

Cohort / File(s)	Summary
Configuration & Metrics Infrastructure `test/e2e/benchmark/config.go`, `test/e2e/benchmark/result.go`	Introduces environment-variable-driven benchConfig structure with sensible defaults and traceResult/benchmarkResult aggregation for unified metric logging and entry generation.
Span Visualization & Output `test/e2e/benchmark/flowchart.go`, `test/e2e/benchmark/output.go`	Adds ASCII flowchart rendering for trace trees with span hierarchies and aggregation; simplifies output.go to delegate span aggregation via spanAvgEntries.
Tracing Backend Migration `test/e2e/benchmark/traces.go`, `test/e2e/benchmark/suite_test.go`	Replaces Jaeger with VictoriaTraces via new traceProvider abstraction; introduces richSpan for hierarchical representation and streaming LogsQL-based span collection with host extraction.
Benchmark Test Helpers `test/e2e/benchmark/helpers.go`	Adds span name constants, requireSpammersRunning poller, rethExecutionRate computation, and context-driven waitForMetricTarget with refactored metric waiting logic.
Test Refactoring `test/e2e/benchmark/gasburner_test.go`, `test/e2e/benchmark/spamoor_erc20_test.go`, `test/e2e/benchmark/spamoor_smoke_test.go`	Converts tests from hard-coded constants to newBenchConfig factory; unifies result reporting via benchmarkResult.entries(); updates gasburner with warmup and trace collection; refactors ERC20 to use result flow.
Dependencies `test/e2e/go.mod`	Updates tastora dependency pseudo-version from -0.20260302131806- to -0.20260312082036-.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Benchmark Test
    participant Cfg as BenchConfig
    participant Spammer as Spammers
    participant VT as VictoriaTraces
    participant Result as BenchmarkResult
    
    Test->>Cfg: newBenchConfig(serviceName)
    Cfg-->>Cfg: Load BENCH_* env vars
    Cfg-->>Test: Return config with parameters
    
    Test->>Spammer: Start spammers (NumSpammers)
    Spammer->>VT: Emit trace spans via OTLP
    Spammer-->>Test: Spammers running
    
    Test->>Test: Execute benchmark workload
    Spammer->>VT: Send transaction traces
    
    Test->>VT: collectSpans(serviceName)
    VT-->>Test: Stream spans via LogsQL
    Test->>Test: Parse spans + extract hostNames
    Test-->>Test: Build richSpan hierarchy
    
    Test->>Result: newBenchmarkResult(name, metrics, traces)
    Result->>Result: Aggregate span durations
    Result->>Result: Build ASCII flowchart
    Result->>Test: Entries with summary + visualization
    
    Test->>Test: Log results & flowcharts

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

evstack/ev-node#3115: Adds gas-burner benchmark and supporting utilities to the refactored benchmark suite in gasburner_test.go and shared helpers.
evstack/ev-node#3114: Modifies benchmark codebase (helpers.go, traces.go, output.go, suite_test.go) with ERC20 benchmarking functionality and tracing helpers.
evstack/ev-node#3107: Reorganizes Spamoor e2e benchmark into testify-based suite, touching suite_test.go, traces.go, and spamoor test files with tracing infrastructure changes.

Suggested reviewers

alpe
julienrbrt
tac0turtle

Poem

🐰 Hops with joy as benchmarks now take flight,
Config-driven, traces glow with VictoriaLight!
Flowcharts bloom in ASCII art so fine,
From Jaeger's past to futures so divine,
Results aggregate in harmony true—huzzah!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description check	✅ Passed	The PR description provides an overview of the three main changes: BENCH_* variable support, Jaeger→VictoriaTraces migration, and richSpan flowchart visualization, with substantial benchmark result data demonstrating practical outcomes.
Title check	✅ Passed	The PR title accurately summarizes the main objective: enabling benchmark tests to run in either a dedicated environment or in Docker by making configuration dynamic and switching tracing backends.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch cian/enable-tests-to-run-without-infra

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-12T15:13:03Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Mar 18, 2026, 9:18 AM

claude · 2026-03-12T15:13:06Z

Claude finished @chatton's task in 2m 0s —— View job

Code Review

Read and analyze changed files
Review code quality & architecture
Review error handling & resilience
Review testing patterns
Review performance & resource efficiency
Compile and post final review

Good overall structure and clean separation of concerns. The refactoring into config.go, result.go, and flowchart.go is a clear improvement over the prior monolith. Below are findings grouped by severity.

Issues

1. waitForMetricTarget docstring vs. implementation mismatch (helpers.go:458)

The docstring says "fails the test on timeout" but the implementation only logs a warning and returns silently. If a test depends on this reaching the target, it may pass incorrectly when the warmup was incomplete.

Fix this →

2. setupExternalEnv: spamoor started before required env var check (suite_test.go:198-199)

BENCH_TRACE_QUERY_URL is checked with s.Require().NotEmpty after spamoor has already been built and started. If the env var is absent the test fails leaving spamoor cleanup to the deferred handler, which works — but the order is surprising and the spamoor spinup was wasteful. Consider moving the env var validation to the top of the function before any Docker work.

Fix this →

3. http.DefaultClient has no timeout (traces.go:169)

fetchLogStream uses http.DefaultClient.Do(req) which has no read or response header timeout. The context passed in provides cancellation, but if the server stalls mid-response bufio.Scanner.Scan() can block even after context cancellation. Prefer a short-lived http.Client with explicit timeouts or at least a ResponseHeaderTimeout.

Fix this →

4. Duplicate network roundtrip for rich and flat spans (suite_test.go:215, result.go:14)

collectTraces calls both collectSpans (flat) and collectRichSpans (rich) against the same VictoriaTraces endpoint. Each call re-fetches the same log stream. Since richSpan is a strict superset of a flat span, the flat spans could be derived from the rich response, halving the number of requests and eliminating the duplicate 3-minute timeout window.

Nits / Polish

5. envInt / envDuration silently swallow bad values (config.go:70-93)

Invalid env var values (e.g. BENCH_NUM_SPAMMERS=abc) silently fall back to defaults. Consider adding a t.Logf("warning: …") or at least a comment noting this is intentional, so operators aren't confused when misconfigured runs produce unexpected results.

6. buildTree: arbitrary root selection when multiple roots exist (flowchart.go:203-207)

If a trace has multiple candidate roots (no parent found in set), the one with the longest duration wins. This is a reasonable heuristic but can silently produce misleading trees if span IDs are missing from the collection window. A comment explaining the tie-breaking rule would help future readers.

7. buildAggregateTree aggregates by name across all positions (flowchart.go:305-317)

The dursByName map accumulates durations across all traces without considering position in the tree. If the same operation name appears in two different parent contexts (e.g. a retry span and a normal span both named "Engine.GetPayload"), their durations are averaged together. This can silently skew the aggregate view.

8. extractHostName allocates a full map[string]string per span (traces.go:276)

Each span line is unmarshalled twice: once fully into logsqlRichSpan and once into map[string]string just to scan for a key containing "host.name". For large trace collections this creates significant GC pressure. A simple strings.Index / bytes.Contains scan on the raw bytes would avoid the allocation.

Fix this →

9. RUST_LOG=debug in benchmark environment (suite_test.go:95)

Running ev-reth with RUST_LOG=debug generates a large volume of log output and adds non-trivial I/O overhead to the process under test. For benchmarks this likely distorts the results. Consider RUST_LOG=info or RUST_LOG=warn unless debug-level spans are explicitly needed.

Fix this →

10. smoke_test.go: time.Sleep(60s) still present (spamoor_smoke_test.go:68)

The reviewer alpe flagged a sleep in gasburner_test.go which has been correctly replaced with requireSpammersRunning. However the 60-second sleep in the smoke test is still a bare time.Sleep. Even for smoke tests it would be better to poll for a condition (e.g. enough transactions sent) rather than sleeping a fixed interval.

11. collectBlockMetrics makes 2N sequential RPC calls (helpers.go:487)

For each block in the range, HeaderByNumber and TransactionCount are called sequentially. For large ranges (hundreds of blocks) this adds measurable latency to test teardown. The header's GasUsed and TxHash/bloom fields could be used to determine emptiness without the extra TransactionCount call in many cases.

12. Missing documentation of BENCH_* env vars in one place

The config.go file is a good start, but suite_test.go introduces additional required vars (BENCH_ETH_RPC_URL, BENCH_PRIVATE_KEY, BENCH_TRACE_QUERY_URL) for external mode that aren't listed in benchConfig. A small doc comment block or README section enumerating all recognized BENCH_* variables (and which are required vs optional, and their defaults) would significantly help operators running against dedicated hardware.

Summary

The core architecture (config extraction, traceProvider interface, richSpan / flowchart additions) is well-designed. The main actionable items before merging from main are: the waitForMetricTarget docstring/behavior mismatch (#1), the env-var ordering in setupExternalEnv (#2), and the RUST_LOG=debug benchmark skew (#9). The rest are quality-of-life improvements that can follow in subsequent PRs.

codecov · 2026-03-12T15:17:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.74%. Comparing base (6e0bd9a) to head (0a4c8a4).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3157      +/-   ##
==========================================
- Coverage   60.75%   60.74%   -0.01%     
==========================================
  Files         117      117              
  Lines       11988    11988              
==========================================
- Hits         7283     7282       -1     
- Misses       3886     3887       +1     
  Partials      819      819

Flag	Coverage Δ
combined	`60.74% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

alpe

Great progress.

alpe · 2026-03-16T08:59:44Z

test/e2e/benchmark/gasburner_test.go


+	// give spammers a moment to start, then verify none failed immediately
+	time.Sleep(3 * time.Second)
+	assertSpammersRunning(t, api, spammerIDs)


nit: avoid sleep.

require.EventuallyWithT(t, func(collect *assert.CollectT) { assertSpammersRunning(t, api, spammerIDs) }, 3*time.Second, time.Second/10)

- replace time.Sleep with require.EventuallyWithT for spammer checks - use benchConfig env vars in TestERC20Throughput instead of hardcoded constants - remove dead truncateID function - fix stale Jaeger comment in smoke test - deduplicate HTTP boilerplate in trace fetching via fetchLogStream helper - fix fragile string comparison for ProduceBlock avg logging - make waitForMetricTarget responsive to context cancellation - add BENCH_WAIT_TIMEOUT env var support

chatton · 2026-03-18T09:19:20Z

This PR is already getting a bit too large, I can do some cleanups in a followup if there are no major blockers once CI is green

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

test/e2e/benchmark/suite_test.go (1)
48-55: ⚠️ Potential issue | 🟠 Major

Pin the default reth image instead of latest.

Using a floating tag makes benchmark tests non-deterministic—the same commit can pull different binaries over time, causing baseline drift and failures unrelated to code changes. Default to an immutable release tag or digest, keeping EV_RETH_TAG as the override mechanism.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/suite_test.go` around lines 48 - 55, The defaultRethTag
constant is set to the floating "latest" tag which makes rethTag()
non-deterministic; replace defaultRethTag with a pinned immutable release tag or
digest (e.g., a specific semver or sha256 digest) so tests pull a fixed image,
while keeping the EV_RETH_TAG environment variable override in rethTag() intact;
update any test docs/comments to reflect that the default is now a pinned
release rather than "latest".

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/benchmark/config.go`:
- Around line 33-47: newBenchConfig currently relies on envInt/envDuration which
silently masks bad/misspelled or nonsensical values; change newBenchConfig (and
any other place constructing benchConfig) to validate parsed environment inputs
and return an error instead of falling back silently. Update the signature of
newBenchConfig to return (benchConfig, error), parse each env var (using
envInt/envDuration) for BENCH_NUM_SPAMMERS, BENCH_COUNT_PER_SPAMMER,
BENCH_THROUGHPUT, BENCH_WARMUP_TXS, BENCH_GAS_UNITS_TO_BURN, BENCH_MAX_WALLETS,
BENCH_WAIT_TIMEOUT, etc., enforce sensible bounds (e.g., non-negative integers,
positive durations, and explicit max limits), and return a descriptive error
when a value is out of range or unparseable; apply the same validation logic
wherever benchConfig is built so invalid BENCH_* overrides fail fast.

In `@test/e2e/benchmark/flowchart.go`:
- Around line 307-315: The code currently uses a per-trace dedupe map `seen`
when populating `countsByName` (in the loop that also fills `dursByName`), so
`countsByName` represents traces-per-operation but `renderAggregateTree` prints
it as "(%d calls)"; either remove the dedupe so you count every occurrence or
change the label to "traces". Fix: in the span-aggregation loop (the block that
creates `seen := make(map[string]bool)` and increments `countsByName[s.name]`),
either delete the `seen` logic and always do `countsByName[s.name]++` for each
span to count calls, or keep `seen` and update `renderAggregateTree` to display
"traces" instead of "calls"; apply the same change to the similar block around
lines 345-351 where `seen`/`countsByName` is used.

In `@test/e2e/benchmark/helpers.go`:
- Around line 456-482: waitForMetricTarget currently logs and returns on timeout
or ctx.Done(), allowing callers like TestGasBurner to continue as if successful;
change the two branches handling <-ctx.Done() and <-timer.C to fail the test
instead (use t.Fatalf with a clear message including name, target, and timeout)
so the benchmark stops immediately when the metric target isn't met; if you
prefer to propagate errors instead, change waitForMetricTarget's signature to
return an error and update callers (e.g., TestGasBurner) to handle the error
instead of assuming success.

In `@test/e2e/benchmark/spamoor_erc20_test.go`:
- Around line 81-86: The block metrics are including warm-up spans because the
trace window isn't reset before collecting traces; mirror TestGasBurner by
waiting for the warm-up gate and calling e.traces.resetStartTime() right before
calling s.collectTraces so engineSpanEntries/spanAvgEntries exclude
deploy/funding spans; specifically, add the same warm-up synchronization used in
TestGasBurner and invoke e.traces.resetStartTime() immediately before
collectTraces/ newBenchmarkResult("ERC20Throughput", ...) so both trace-derived
metrics and the block summary use the same post-warmup window.

In `@test/e2e/benchmark/suite_test.go`:
- Around line 231-237: The current best-effort rich-span collection calls
richSpanCollector.collectRichSpans twice and can block ~3 minutes each even when
results are optional; change the logic in the branch that checks
e.traces.(richSpanCollector) so: 1) only call rc.collectRichSpans(ctx,
"ev-reth") if len(tr.evReth) > 0 to gate that extra call, and 2) replace the
blocking collectRichSpans calls with a short-timeout or non-blocking/try variant
(or wrap the call with a short context timeout) when populating tr.evNodeRich
and tr.evRethRich to avoid multi-minute waits when rich spans are unavailable.
Ensure you update references to richSpanCollector, collectRichSpans,
serviceName, tr.evNodeRich, tr.evRethRich, and tr.evReth accordingly.
- Around line 166-173: The test currently logs and prints raw external secrets
(rpcURL and the BENCH_PRIVATE_KEY env var) and may include them in failure
messages; change logging to never include full URLs or private keys: in the
block using rpcURL, replace the s.Logf call to log only the mode or a sanitized
host (e.g., extract and log url.Host or mask credentials) and stop logging
privateKey entirely; update the s.Require().NotEmpty and s.Require().NoError
messages to avoid interpolating rpcURL or privateKey (use generic messages like
"failed to dial external RPC" or include the sanitized host only), and apply the
same sanitization/no-logging pattern to the other occurrences that reference
rpcURL/BENCH_* (the ethclient.Dial call, s.Require assertions, and any other
s.Logf around those vars).

In `@test/e2e/benchmark/traces.go`:
- Around line 61-67: The uiURL builder in victoriaTraceProvider (function uiURL)
interpolates start and end times directly which can contain '+' from RFC3339
offsets and become spaces in the query; update uiURL to URL-encode the start and
end parameters (e.g., use neturl.QueryEscape or build the query with url.Values)
so both start and end are escaped before concatenation, and apply the same
change to the other analogous builder around lines 155-162 to ensure both
timestamps are properly encoded.
- Line 90: The error messages currently include the raw trace query URL
(v.queryURL) which may contain sensitive data; remove the URL from returned
errors and instead report only the service name and status/error. Locate the
fmt.Errorf calls that reference v.queryURL (e.g., the timed out message using
"timed out waiting for %s traces from %s: %w") and change them to omit
v.queryURL, e.g. format with serviceName and ctx.Err() or a non-sensitive status
string; apply the same replacement for the other similar fmt.Errorf/log calls
that reference v.queryURL at the other noted locations (lines around 118, 145,
and 171-176). Ensure v.queryURL is not included in any returned error or test
output.
- Around line 205-214: The code currently discards errors from strconv.ParseInt
when computing ns and startNs and appends a malformed richSpan
(zero-time/duration); change the ParseInt calls to capture their errors and, if
either returns an error, skip the current row (or log the error) instead of
appending. Specifically, update the parsing around ns, startNs (the
strconv.ParseInt calls) and the subsequent append to richSpan so you check both
err values and continue to the next iteration when a parse fails; preserve use
of row.TraceID, row.SpanID, extractHostName(line), richSpan, startTime and
duration when the parses succeed.

---

Outside diff comments:
In `@test/e2e/benchmark/suite_test.go`:
- Around line 48-55: The defaultRethTag constant is set to the floating "latest"
tag which makes rethTag() non-deterministic; replace defaultRethTag with a
pinned immutable release tag or digest (e.g., a specific semver or sha256
digest) so tests pull a fixed image, while keeping the EV_RETH_TAG environment
variable override in rethTag() intact; update any test docs/comments to reflect
that the default is now a pinned release rather than "latest".

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c92be09c-4806-4c9f-9301-e1af376518a5

📥 Commits

Reviewing files that changed from the base of the PR and between 6e0bd9a and 0a4c8a4.

⛔ Files ignored due to path filters (1)

test/e2e/go.sum is excluded by !**/*.sum

📒 Files selected for processing (11)

test/e2e/benchmark/config.go
test/e2e/benchmark/flowchart.go
test/e2e/benchmark/gasburner_test.go
test/e2e/benchmark/helpers.go
test/e2e/benchmark/output.go
test/e2e/benchmark/result.go
test/e2e/benchmark/spamoor_erc20_test.go
test/e2e/benchmark/spamoor_smoke_test.go
test/e2e/benchmark/suite_test.go
test/e2e/benchmark/traces.go
test/e2e/go.mod

coderabbitai · 2026-03-18T09:34:41Z

test/e2e/benchmark/config.go

+func newBenchConfig(serviceName string) benchConfig {
+	return benchConfig{
+		ServiceName:     serviceName,
+		BlockTime:       envOrDefault("BENCH_BLOCK_TIME", "100ms"),
+		SlotDuration:    envOrDefault("BENCH_SLOT_DURATION", "250ms"),
+		GasLimit:        envOrDefault("BENCH_GAS_LIMIT", ""),
+		ScrapeInterval:  envOrDefault("BENCH_SCRAPE_INTERVAL", "1s"),
+		NumSpammers:     envInt("BENCH_NUM_SPAMMERS", 2),
+		CountPerSpammer: envInt("BENCH_COUNT_PER_SPAMMER", 2000),
+		Throughput:      envInt("BENCH_THROUGHPUT", 200),
+		WarmupTxs:       envInt("BENCH_WARMUP_TXS", 200),
+		GasUnitsToBurn:  envInt("BENCH_GAS_UNITS_TO_BURN", 1_000_000),
+		MaxWallets:      envInt("BENCH_MAX_WALLETS", 500),
+		WaitTimeout:     envDuration("BENCH_WAIT_TIMEOUT", 10*time.Minute),
+	}


⚠️ Potential issue | 🟠 Major

Fail fast on invalid BENCH_* overrides.

envInt/envDuration currently mask typos, and parseable but nonsensical values like negative counts or timeouts still flow through. That means a mis-set BENCH_NUM_SPAMMERS, BENCH_WARMUP_TXS, or BENCH_WAIT_TIMEOUT can still produce benchmark output for a different workload than the job requested. Please validate bounds when building benchConfig and return an error instead of silently defaulting.

As per coding guidelines, "Validate all inputs from external sources".

Also applies to: 68-94

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/config.go` around lines 33 - 47, newBenchConfig currently relies on envInt/envDuration which silently masks bad/misspelled or nonsensical values; change newBenchConfig (and any other place constructing benchConfig) to validate parsed environment inputs and return an error instead of falling back silently. Update the signature of newBenchConfig to return (benchConfig, error), parse each env var (using envInt/envDuration) for BENCH_NUM_SPAMMERS, BENCH_COUNT_PER_SPAMMER, BENCH_THROUGHPUT, BENCH_WARMUP_TXS, BENCH_GAS_UNITS_TO_BURN, BENCH_MAX_WALLETS, BENCH_WAIT_TIMEOUT, etc., enforce sensible bounds (e.g., non-negative integers, positive durations, and explicit max limits), and return a descriptive error when a value is out of range or unparseable; apply the same validation logic wherever benchConfig is built so invalid BENCH_* overrides fail fast.

coderabbitai · 2026-03-18T09:34:41Z

test/e2e/benchmark/flowchart.go

+	for _, spans := range byTrace {
+		// count each unique operation once per trace
+		seen := make(map[string]bool)
+		for _, s := range spans {
+			dursByName[s.name] = append(dursByName[s.name], s.duration)
+			if !seen[s.name] {
+				countsByName[s.name]++
+				seen[s.name] = true
+			}


⚠️ Potential issue | 🟡 Minor

The aggregate chart is labeling per-trace counts as calls.

countsByName is incremented once per trace because of seen, but renderAggregateTree prints that number as (%d calls). Any operation that appears multiple times in one trace will be underreported, so the aggregate chart becomes misleading. Either count every occurrence or relabel the output as traces.

Also applies to: 345-351

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/flowchart.go` around lines 307 - 315, The code currently uses a per-trace dedupe map `seen` when populating `countsByName` (in the loop that also fills `dursByName`), so `countsByName` represents traces-per-operation but `renderAggregateTree` prints it as "(%d calls)"; either remove the dedupe so you count every occurrence or change the label to "traces". Fix: in the span-aggregation loop (the block that creates `seen := make(map[string]bool)` and increments `countsByName[s.name]`), either delete the `seen` logic and always do `countsByName[s.name]++` for each span to count calls, or keep `seen` and update `renderAggregateTree` to display "traces" instead of "calls"; apply the same change to the similar block around lines 345-351 where `seen`/`countsByName` is used.

coderabbitai · 2026-03-18T09:34:41Z

test/e2e/benchmark/helpers.go

 // waitForMetricTarget polls a metric getter function every 2s until the
 // returned value >= target, or fails the test on timeout.
 func waitForMetricTarget(t testing.TB, name string, poll func() (float64, error), target float64, timeout time.Duration) {
 	t.Helper()
-	deadline := time.Now().Add(timeout)
-	for time.Now().Before(deadline) {
+	ctx := t.Context()
+	timer := time.NewTimer(timeout)
+	defer timer.Stop()
+	ticker := time.NewTicker(2 * time.Second)
+	defer ticker.Stop()
+
+	for {
 		v, err := poll()
 		if err == nil && v >= target {
 			t.Logf("metric %s reached %.0f (target %.0f)", name, v, target)
 			return
 		}
-		time.Sleep(2 * time.Second)
+		select {
+		case <-ctx.Done():
+			t.Logf("metric %s: context cancelled (target %.0f)", name, target)
+			return
+		case <-timer.C:
+			t.Logf("metric %s did not reach target %.0f within %v", name, target, timeout)
+			return
+		case <-ticker.C:
+		}
 	}
-	t.Fatalf("metric %s did not reach target %.0f within %v", name, target, timeout)
 }


⚠️ Potential issue | 🟠 Major

Don't let timeout/cancel turn into a partial "successful" benchmark.

On timeout or ctx.Done(), this helper only logs and returns. TestGasBurner then keeps collecting blocks and writes benchmark output even though the target metric was never reached. Please fail here or return an error that callers must handle.

💡 Suggested change

func waitForMetricTarget(t testing.TB, name string, poll func() (float64, error), target float64, timeout time.Duration) { t.Helper() ctx := t.Context() timer := time.NewTimer(timeout) defer timer.Stop() ticker := time.NewTicker(2 * time.Second) defer ticker.Stop() + var lastValue float64 + var lastErr error for { v, err := poll() + lastValue, lastErr = v, err if err == nil && v >= target { t.Logf("metric %s reached %.0f (target %.0f)", name, v, target) return } select { case <-ctx.Done(): - t.Logf("metric %s: context cancelled (target %.0f)", name, target) - return + require.FailNowf(t, "metric wait cancelled", + "%s reached %.0f/%.0f before cancellation: %v", name, lastValue, target, ctx.Err()) case <-timer.C: - t.Logf("metric %s did not reach target %.0f within %v", name, target, timeout) - return + require.FailNowf(t, "metric target not reached", + "%s reached %.0f/%.0f within %v (last error: %v)", name, lastValue, target, timeout, lastErr) case <-ticker.C: } } }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/helpers.go` around lines 456 - 482, waitForMetricTarget currently logs and returns on timeout or ctx.Done(), allowing callers like TestGasBurner to continue as if successful; change the two branches handling <-ctx.Done() and <-timer.C to fail the test instead (use t.Fatalf with a clear message including name, target, and timeout) so the benchmark stops immediately when the metric target isn't met; if you prefer to propagate errors instead, change waitForMetricTarget's signature to return an error and update callers (e.g., TestGasBurner) to handle the error instead of assuming success.

coderabbitai · 2026-03-18T09:34:41Z

test/e2e/benchmark/spamoor_erc20_test.go

+	traces := s.collectTraces(e, cfg.ServiceName)

-	// collect and report traces
-	traces := s.collectTraces(e, serviceName)
-
-	if overhead, ok := evNodeOverhead(traces.evNode); ok {
-		t.Logf("ev-node overhead: %.1f%%", overhead)
-		w.addEntry(entry{Name: "ERC20Throughput - ev-node overhead", Unit: "%", Value: overhead})
-	}
-
-	w.addEntries(summary.entries("ERC20Throughput"))
-	w.addSpans(traces.allSpans())
+	result := newBenchmarkResult("ERC20Throughput", bm, traces)
+	s.Require().Greater(result.summary.SteadyState, time.Duration(0), "expected non-zero steady-state duration")
+	result.log(t, wallClock)
+	w.addEntries(result.entries())


⚠️ Potential issue | 🟠 Major

Trace-derived metrics still include warm-up work.

This test records block metrics after the launcher setup phase, but the trace window is never reset before collectTraces. The block summary therefore measures steady-state throughput while engineSpanEntries and spanAvgEntries still include deploy/funding spans. Mirror TestGasBurner's warm-up gate plus e.traces.resetStartTime() so both data sources describe the same window.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/spamoor_erc20_test.go` around lines 81 - 86, The block metrics are including warm-up spans because the trace window isn't reset before collecting traces; mirror TestGasBurner by waiting for the warm-up gate and calling e.traces.resetStartTime() right before calling s.collectTraces so engineSpanEntries/spanAvgEntries exclude deploy/funding spans; specifically, add the same warm-up synchronization used in TestGasBurner and invoke e.traces.resetStartTime() immediately before collectTraces/ newBenchmarkResult("ERC20Throughput", ...) so both trace-derived metrics and the block summary use the same post-warmup window.

coderabbitai · 2026-03-18T09:34:41Z

test/e2e/benchmark/suite_test.go

+	t.Logf("external mode: using RPC %s", rpcURL)

-// collectTraces fetches ev-node traces (required) and ev-reth traces (optional)
-// from Jaeger, then prints reports for both.
-func (s *SpamoorSuite) collectTraces(e *env, serviceName string) *traceResult {
-	t := s.T()
-	tr := &traceResult{
-		evNode: s.collectServiceTraces(e, serviceName),
-		evReth: s.tryCollectServiceTraces(e, "ev-reth"),
-	}
-	e2e.PrintTraceReport(t, serviceName, tr.evNode)
-	if len(tr.evReth) > 0 {
-		e2e.PrintTraceReport(t, "ev-reth", tr.evReth)
-	}
-	return tr
-}
+	privateKey := os.Getenv("BENCH_PRIVATE_KEY")
+	s.Require().NotEmpty(privateKey, "BENCH_PRIVATE_KEY must be set in external mode")

-// collectServiceTraces fetches traces from Jaeger for the given service and returns the spans.
-func (s *SpamoorSuite) collectServiceTraces(e *env, serviceName string) []e2e.TraceSpan {
-	ctx, cancel := context.WithTimeout(s.T().Context(), 3*time.Minute)
-	defer cancel()
+	// eth client
+	ethClient, err := ethclient.Dial(rpcURL)
+	s.Require().NoError(err, "failed to dial external RPC %s", rpcURL)


⚠️ Potential issue | 🟠 Major

Redact external BENCH_ URLs before logging them.*

These values can carry basic-auth credentials, API tokens, or signed query params, and the current logs/failure output will copy them into CI artifacts. Log only the mode or a sanitized host identifier instead. As per coding guidelines: "Never expose private keys in logs or errors."

Also applies to: 198-200, 227-229

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/suite_test.go` around lines 166 - 173, The test currently logs and prints raw external secrets (rpcURL and the BENCH_PRIVATE_KEY env var) and may include them in failure messages; change logging to never include full URLs or private keys: in the block using rpcURL, replace the s.Logf call to log only the mode or a sanitized host (e.g., extract and log url.Host or mask credentials) and stop logging privateKey entirely; update the s.Require().NotEmpty and s.Require().NoError messages to avoid interpolating rpcURL or privateKey (use generic messages like "failed to dial external RPC" or include the sanitized host only), and apply the same sanitization/no-logging pattern to the other occurrences that reference rpcURL/BENCH_* (the ethclient.Dial call, s.Require assertions, and any other s.Logf around those vars).

coderabbitai · 2026-03-18T09:34:41Z

test/e2e/benchmark/suite_test.go

+	if rc, ok := e.traces.(richSpanCollector); ok {
+		if spans, err := rc.collectRichSpans(ctx, serviceName); err == nil {
+			tr.evNodeRich = spans
+		}
+		if spans, err := rc.collectRichSpans(ctx, "ev-reth"); err == nil {
+			tr.evRethRich = spans
+		}


⚠️ Potential issue | 🟠 Major

Best-effort rich-span collection shouldn't add minutes of blocking.

This branch ignores rich-span errors, but collectRichSpans still waits up to 3 minutes per call. When rich spans are unavailable, optional diagnostics can add roughly 3-6 minutes of dead time to a run. Gate the ev-reth branch on len(tr.evReth) > 0, and use a short-timeout/try variant for rich spans as well.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/suite_test.go` around lines 231 - 237, The current best-effort rich-span collection calls richSpanCollector.collectRichSpans twice and can block ~3 minutes each even when results are optional; change the logic in the branch that checks e.traces.(richSpanCollector) so: 1) only call rc.collectRichSpans(ctx, "ev-reth") if len(tr.evReth) > 0 to gate that extra call, and 2) replace the blocking collectRichSpans calls with a short-timeout or non-blocking/try variant (or wrap the call with a short context timeout) when populating tr.evNodeRich and tr.evRethRich to avoid multi-minute waits when rich spans are unavailable. Ensure you update references to richSpanCollector, collectRichSpans, serviceName, tr.evNodeRich, tr.evRethRich, and tr.evReth accordingly.

coderabbitai · 2026-03-18T09:34:41Z

test/e2e/benchmark/traces.go

+func (v *victoriaTraceProvider) uiURL(serviceName string) string {
+	query := fmt.Sprintf(`_stream:{resource_attr:service.name="%s"}`, serviceName)
+	return fmt.Sprintf("%s/select/vmui/#/query?query=%s&start=%s&end=%s",
+		strings.TrimRight(v.queryURL, "/"),
+		neturl.QueryEscape(query),
+		v.startTime.Format(time.RFC3339),
+		time.Now().Format(time.RFC3339))


⚠️ Potential issue | 🟠 Major

Encode start and end when building VictoriaTraces URLs.

time.RFC3339/time.RFC3339Nano emits +HH:MM on positive-offset machines. Interpolating that directly into the query string turns + into a space, so local trace lookups break outside UTC. Build both URLs with url.Values (or escape start and end too).

Suggested fix

func (v *victoriaTraceProvider) uiURL(serviceName string) string { query := fmt.Sprintf(`_stream:{resource_attr:service.name="%s"}`, serviceName) - return fmt.Sprintf("%s/select/vmui/#/query?query=%s&start=%s&end=%s", - strings.TrimRight(v.queryURL, "/"), - neturl.QueryEscape(query), - v.startTime.Format(time.RFC3339), - time.Now().Format(time.RFC3339)) + params := neturl.Values{ + "query": []string{query}, + "start": []string{v.startTime.UTC().Format(time.RFC3339)}, + "end": []string{time.Now().UTC().Format(time.RFC3339)}, + } + return fmt.Sprintf("%s/select/vmui/#/query?%s", + strings.TrimRight(v.queryURL, "/"), + params.Encode()) } ... end := time.Now() query := fmt.Sprintf(`_stream:{resource_attr:service.name="%s"}`, serviceName) baseURL := strings.TrimRight(v.queryURL, "/") - url := fmt.Sprintf("%s/select/logsql/query?query=%s&start=%s&end=%s", - baseURL, - neturl.QueryEscape(query), - v.startTime.Format(time.RFC3339Nano), - end.Format(time.RFC3339Nano)) + params := neturl.Values{ + "query": []string{query}, + "start": []string{v.startTime.UTC().Format(time.RFC3339Nano)}, + "end": []string{end.UTC().Format(time.RFC3339Nano)}, + } + url := fmt.Sprintf("%s/select/logsql/query?%s", baseURL, params.Encode())

Also applies to: 155-162

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/traces.go` around lines 61 - 67, The uiURL builder in victoriaTraceProvider (function uiURL) interpolates start and end times directly which can contain '+' from RFC3339 offsets and become spaces in the query; update uiURL to URL-encode the start and end parameters (e.g., use neturl.QueryEscape or build the query with url.Values) so both start and end are escaped before concatenation, and apply the same change to the other analogous builder around lines 155-162 to ensure both timestamps are properly encoded.

coderabbitai · 2026-03-18T09:34:41Z

test/e2e/benchmark/traces.go

+
+		select {
+		case <-ctx.Done():
+			return nil, fmt.Errorf("timed out waiting for %s traces from %s: %w", serviceName, v.queryURL, ctx.Err())


⚠️ Potential issue | 🟠 Major

Keep raw trace query URLs out of returned errors.

BENCH_TRACE_QUERY_URL is externally supplied and may include credentials or signed parameters. These messages bubble into failed test output, so keep the raw URL out of the error text and report only the service name / status instead. As per coding guidelines: "Never expose private keys in logs or errors."

Also applies to: 118-118, 145-145, 171-176

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/traces.go` at line 90, The error messages currently include the raw trace query URL (v.queryURL) which may contain sensitive data; remove the URL from returned errors and instead report only the service name and status/error. Locate the fmt.Errorf calls that reference v.queryURL (e.g., the timed out message using "timed out waiting for %s traces from %s: %w") and change them to omit v.queryURL, e.g. format with serviceName and ctx.Err() or a non-sensitive status string; apply the same replacement for the other similar fmt.Errorf/log calls that reference v.queryURL at the other noted locations (lines around 118, 145, and 171-176). Ensure v.queryURL is not included in any returned error or test output.

coderabbitai · 2026-03-18T09:34:41Z

test/e2e/benchmark/traces.go

+		ns, _ := strconv.ParseInt(row.Duration, 10, 64)
+		startNs, _ := strconv.ParseInt(row.StartTimeUnixNano, 10, 64)
+		spans = append(spans, richSpan{
+			traceID:      row.TraceID,
+			spanID:       row.SpanID,
+			parentSpanID: row.ParentSpanID,
+			name:         row.Name,
+			hostName:     extractHostName(line),
+			startTime:    time.Unix(0, startNs),
+			duration:     time.Duration(ns) * time.Nanosecond,


⚠️ Potential issue | 🟡 Minor

Drop malformed rich-span rows instead of synthesizing zero-value spans.

Both ParseInt errors are discarded here, so a bad row becomes a span at Unix epoch with zero duration. That can scramble flowchart ordering and span summaries; skip the row (or log it) when either parse fails.

Suggested fix

- ns, _ := strconv.ParseInt(row.Duration, 10, 64) - startNs, _ := strconv.ParseInt(row.StartTimeUnixNano, 10, 64) + ns, err := strconv.ParseInt(row.Duration, 10, 64) + if err != nil { + continue + } + startNs, err := strconv.ParseInt(row.StartTimeUnixNano, 10, 64) + if err != nil { + continue + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

ns, _ := strconv.ParseInt(row.Duration, 10, 64)

startNs, _ := strconv.ParseInt(row.StartTimeUnixNano, 10, 64)

spans = append(spans, richSpan{

traceID: row.TraceID,

spanID: row.SpanID,

parentSpanID: row.ParentSpanID,

name: row.Name,

hostName: extractHostName(line),

startTime: time.Unix(0, startNs),

duration: time.Duration(ns) * time.Nanosecond,

ns, err := strconv.ParseInt(row.Duration, 10, 64)

if err != nil {

continue

}

startNs, err := strconv.ParseInt(row.StartTimeUnixNano, 10, 64)

if err != nil {

continue

}

spans = append(spans, richSpan{

traceID: row.TraceID,

spanID: row.SpanID,

parentSpanID: row.ParentSpanID,

name: row.Name,

hostName: extractHostName(line),

startTime: time.Unix(0, startNs),

duration: time.Duration(ns) * time.Nanosecond,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/traces.go` around lines 205 - 214, The code currently discards errors from strconv.ParseInt when computing ns and startNs and appends a malformed richSpan (zero-time/duration); change the ParseInt calls to capture their errors and, if either returns an error, skip the current row (or log the error) instead of appending. Specifically, update the parsing around ns, startNs (the strconv.ParseInt calls) and the subsequent append to richSpan so you check both err values and continue to the next iteration when a parse fails; preserve use of row.TraceID, row.SpanID, extractHostName(line), richSpan, startTime and duration when the parses succeed.

chatton · 2026-03-18T11:12:59Z

Will clean up the richspans/flowchart in a follow up, this is still experimental

chatton added 30 commits February 25, 2026 12:16

fix: correct BENCH_JSON_OUTPUT path for spamoor benchmark

18fc15a

go test sets the working directory to the package under test, so the env var should be relative to test/e2e/benchmark/, not test/e2e/.

fix: place package pattern before test binary flags in benchmark CI

fccd9db

go test treats all arguments after an unknown flag (--evm-binary) as test binary args, so ./benchmark/ was never recognized as a package pattern.

fix: adjust evm-binary path for benchmark subpackage working directory

ae525ca

go test sets the cwd to the package directory (test/e2e/benchmark/), so the binary path needs an extra parent traversal.

wip: erc20 benchmark test

039eaf7

fix: exclude benchmark subpackage from make test-e2e

85c9d2d

The benchmark package doesn't define the --binary flag that test-e2e passes. It has its own CI workflow so it doesn't need to run here.

Merge branch 'main' into cian/bench-refactor

1c3b560

# Conflicts: # scripts/test.mk

chore: specify http

03b9239

chore: filter out benchmark tests from test-e2e

fe3ca23

Merge branch 'main' into cian/bench-refactor

8752fee

Merge branch 'cian/bench-refactor' into cian/erc20-benchmark

06f532b

# Conflicts: # test/e2e/benchmark/suite_test.go

Merge remote-tracking branch 'origin/main' into cian/erc20-benchmark

560974a

# Conflicts: # test/e2e/benchmark/suite_test.go

chore: collect all traces at once

26bb117

chore: self review

b88cae3

docs: add detailed documentation to benchmark helper methods

676e0d1

ci: add ERC20 throughput benchmark job

f4949a1

chore: remove span assertions

4c7b7e1

chore: adding gas burner test

6f56c80

feat: allowing for env vars to be set for external resources

40ff689

chore: adding private key as env var

4a7af54

chore: use host networking for spamoor in external mode

d2aeebc

Bumps tastora to pick up host network support in the spamoor builder. Spamoor in external mode now uses host networking so it can resolve the same hostnames as the host machine.

fix: pin tastora to correct commit with host network support

7a85f3b

chore: change service name

f42ffe0

chore: scope victoria traces queries to current test run

53d12e1

Record startTime when the provider is created and use it as the lower bound for trace queries, preventing spans from previous runs being included in the analysis.

chatton added 17 commits March 4, 2026 15:13

chore: revert to 4 spammers to avoid nonce collisions

f5d0eaa

chore: bump gasburner fees to 1000/500 gwei to clear stale txs

6c4ca40

chore: adding support for rich flowchart display

2d4ab9a

adding host name to span

9bd2381

chore: add re-broadcast

dffc429

chore: adding flowchart

7fac0a3

chore: bump fee

4445cde

chore: bump wallet amounts

64633d8

chore: increase throughput

f6100a2

chore: 4x tx volume

51754b8

chore: fewer tx, higher gas

f1e68d2

chore: remove amount

a3325dd

Bumped from 50/20 ETH to 500/200 ETH per wallet

aea69a2

chore: cleaning up flowchart dissplay and fixing UI link error

87348de

github-actions bot assigned chatton Mar 12, 2026

alpe reviewed Mar 16, 2026

View reviewed changes

chatton added 2 commits March 18, 2026 09:02

Merge branch 'main' into cian/enable-tests-to-run-without-infra

0a4c8a4

chatton marked this pull request as ready for review March 18, 2026 09:18

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

chatton changed the title ~~feat(benchmarking): [WIP] enable tests to run in dedicated environment or in docker~~ feat(benchmarking): enable tests to run in dedicated environment or in docker Mar 18, 2026

Conversation

chatton commented Mar 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Issues

Nits / Polish

Summary

Uh oh!

codecov bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

alpe left a comment

Choose a reason for hiding this comment

Uh oh!

alpe Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

chatton commented Mar 18, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatton commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chatton commented Mar 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 12, 2026 •

edited

Loading

github-actions bot commented Mar 12, 2026 •

edited

Loading

claude bot commented Mar 12, 2026 •

edited

Loading

codecov bot commented Mar 12, 2026 •

edited

Loading