Skip to content

Event Deltas: Improved attribute sorting with entropy scoring and proportional comparison #1826

@alex-fedotyev

Description

@alex-fedotyev

Summary

Replace the basic max-delta sorting with Shannon entropy scoring for distribution mode and proportional normalization for comparison mode, ensuring the most useful attributes for outlier detection appear first.

Problem

Distribution mode: The original max(pct) - mean(pcts) skewness score doesn't handle multi-modal or power-law distributions well, and has no awareness of well-known OTel attributes.

Comparison mode: The original max(abs(outlierPct - inlierPct)) uses raw percentages where each group's denominator is totalRows of that group. When selection (500 rows) and background (1500 rows) have different sizes, a field with identical proportions (100% "message" in both) shows artificial deltas like |80% - 27%| = 53, pushing it above genuinely different fields.

Changes

Distribution mode scoring

  • computeEntropyScore(valuePercentages) — Shannon entropy: 1 - H(p)/log2(N). Returns 0 for single-value or uniform fields, close to 1 for highly skewed. Naturally handles power-law distributions
  • computeDistributionScore(valuePercentages) — original skewness score kept as alternative, selectable via DISTRIBUTION_SCORING constant
  • semanticBoost(key) — returns 1 for well-known OTel attribute suffixes (service.name, http.method, http.status_code, error, deployment.environment, rpc.method, db.system, etc.). Applied as 0.1 tiebreaker only when baseScore > 0 — single-value boosted attributes score 0

Comparison mode scoring

  • computeComparisonScore(outlierValues, inlierValues) — normalizes each group's percentages to sum to 100% before computing max delta. Fields with identical proportional distributions (100% "message" in both groups) score 0 regardless of coverage rate differences. Falls back to raw delta when one group has no data for a property

Sort integration

  • Distribution mode: sortScore = baseScore + (baseScore > 0 ? semanticBoost(key) * 0.1 : 0)
  • Comparison mode: sortScore = computeComparisonScore(outlierCount, inlierCount)

Files

  • packages/app/src/components/deltaChartUtils.ts (computeEntropyScore, computeComparisonScore, computeDistributionScore, semanticBoost, DISTRIBUTION_SCORING, BOOSTED_ATTRIBUTE_SUFFIXES)
  • packages/app/src/components/DBDeltaChart.tsx (sort logic in useMemo)
  • packages/app/src/components/__tests__/DBDeltaChart.test.ts (entropy tests, comparison tests, boost integration tests)

Dependencies

Test plan

  • Distribution mode: fields with unequal distribution appear before single-value fields
  • Distribution mode: service.name at 100% scores 0 (not boosted above multi-value fields)
  • Distribution mode: among similar-scored fields, boosted OTel attributes rank slightly higher
  • Comparison mode: Events.Name[N] with same "message" in both groups scores 0
  • Comparison mode: fields with genuinely different distributions (error in selection, success in background) rank highest
  • Power-law distributions score higher than uniform distributions

Context

Part of the Event Deltas improvement series. Reference implementation in PR #1797.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions