-
Notifications
You must be signed in to change notification settings - Fork 371
Description
Summary
Replace the basic max-delta sorting with Shannon entropy scoring for distribution mode and proportional normalization for comparison mode, ensuring the most useful attributes for outlier detection appear first.
Problem
Distribution mode: The original max(pct) - mean(pcts) skewness score doesn't handle multi-modal or power-law distributions well, and has no awareness of well-known OTel attributes.
Comparison mode: The original max(abs(outlierPct - inlierPct)) uses raw percentages where each group's denominator is totalRows of that group. When selection (500 rows) and background (1500 rows) have different sizes, a field with identical proportions (100% "message" in both) shows artificial deltas like |80% - 27%| = 53, pushing it above genuinely different fields.
Changes
Distribution mode scoring
computeEntropyScore(valuePercentages)— Shannon entropy:1 - H(p)/log2(N). Returns 0 for single-value or uniform fields, close to 1 for highly skewed. Naturally handles power-law distributionscomputeDistributionScore(valuePercentages)— original skewness score kept as alternative, selectable viaDISTRIBUTION_SCORINGconstantsemanticBoost(key)— returns 1 for well-known OTel attribute suffixes (service.name, http.method, http.status_code, error, deployment.environment, rpc.method, db.system, etc.). Applied as 0.1 tiebreaker only whenbaseScore > 0— single-value boosted attributes score 0
Comparison mode scoring
computeComparisonScore(outlierValues, inlierValues)— normalizes each group's percentages to sum to 100% before computing max delta. Fields with identical proportional distributions (100% "message" in both groups) score 0 regardless of coverage rate differences. Falls back to raw delta when one group has no data for a property
Sort integration
- Distribution mode:
sortScore = baseScore + (baseScore > 0 ? semanticBoost(key) * 0.1 : 0) - Comparison mode:
sortScore = computeComparisonScore(outlierCount, inlierCount)
Files
packages/app/src/components/deltaChartUtils.ts(computeEntropyScore, computeComparisonScore, computeDistributionScore, semanticBoost, DISTRIBUTION_SCORING, BOOSTED_ATTRIBUTE_SUFFIXES)packages/app/src/components/DBDeltaChart.tsx(sort logic in useMemo)packages/app/src/components/__tests__/DBDeltaChart.test.ts(entropy tests, comparison tests, boost integration tests)
Dependencies
- Event Deltas: Field filtering and priority classification #1825 (Field filtering) — uses the field classification to determine visible vs hidden properties before sorting
Test plan
- Distribution mode: fields with unequal distribution appear before single-value fields
- Distribution mode: service.name at 100% scores 0 (not boosted above multi-value fields)
- Distribution mode: among similar-scored fields, boosted OTel attributes rank slightly higher
- Comparison mode: Events.Name[N] with same "message" in both groups scores 0
- Comparison mode: fields with genuinely different distributions (error in selection, success in background) rank highest
- Power-law distributions score higher than uniform distributions
Context
Part of the Event Deltas improvement series. Reference implementation in PR #1797.