-
Notifications
You must be signed in to change notification settings - Fork 371
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Replace ORDER BY rand() LIMIT 1000 with deterministic cityHash64(SpanId) sampling and adaptive sample sizing based on total span count, with a visible sample annotation in the legend.
Problem
- Non-deterministic sampling:
ORDER BY rand()means the same hover on the same bar highlights different heatmap cells after each query re-fetch, creating a confusing experience - Fixed sample size: 1,000 rows works for small datasets but under-represents rare attribute values at scale (100K+ spans). For tiny datasets (< 1,000), the sample IS the population but users can't tell
- No transparency: Users see percentages like "2.3%" without knowing if that's 23/1000 sampled or 23/23 total
Changes
Deterministic sampling
STABLE_SAMPLE_EXPR = 'cityHash64(SpanId)'— used inORDER BYclause for all sample queries (outlier, inlier, all-spans, and PartIds CTE). Same data always produces the same sample- Set to
'rand()'to restore non-deterministic behavior (tunable constant)
Adaptive sample sizing
computeEffectiveSampleSize(totalCount)—clamp(MIN_SAMPLE_SIZE, ceil(totalCount * SAMPLE_RATIO), MAX_SAMPLE_SIZE)- Constants:
SAMPLE_SIZE=1000(fallback),MIN_SAMPLE_SIZE=500,MAX_SAMPLE_SIZE=5000,SAMPLE_RATIO=0.01(1%) - Lightweight
count()query runs in parallel — ClickHouse resolves from MergeTree metadata (near-instant) - Falls back to
SAMPLE_SIZEwhen count is unavailable (query still loading)
Legend annotation
- Shows
(n=X of Y sampled)when total count is available - Shows
(n=X sampled)as fallback
Files
packages/app/src/components/deltaChartUtils.ts(SAMPLE_SIZE, MIN/MAX_SAMPLE_SIZE, SAMPLE_RATIO, STABLE_SAMPLE_EXPR, computeEffectiveSampleSize)packages/app/src/components/DBDeltaChart.tsx(count query, effectiveSampleSize in all query configs, legend annotation)packages/app/src/components/__tests__/DBDeltaChart.test.ts(computeEffectiveSampleSize tests)
Dependencies
None — standalone improvement to the sampling mechanism.
Test plan
- Same data + same hover always highlights the same heatmap cells (deterministic)
- Small dataset (100 spans) → sample size = MIN_SAMPLE_SIZE (500)
- Medium dataset (200K spans) → sample size = 2,000 (1% of 200K)
- Large dataset (10M spans) → sample size = MAX_SAMPLE_SIZE (5,000)
- Legend shows "(n=1,000 of 483,291 sampled)" with actual numbers
- Setting
STABLE_SAMPLE_EXPR = 'rand()'restores non-deterministic behavior
Context
Part of the Event Deltas improvement series. Reference implementation in PR #1797.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request