Skip to content

Use bitmask SpanKindFilter for per-span eligibility in metrics aggregator#11380

Draft
dougqh wants to merge 5 commits into
masterfrom
dougqh/conflating-metrics-producer-wins
Draft

Use bitmask SpanKindFilter for per-span eligibility in metrics aggregator#11380
dougqh wants to merge 5 commits into
masterfrom
dougqh/conflating-metrics-producer-wins

Conversation

@dougqh
Copy link
Copy Markdown
Contributor

@dougqh dougqh commented May 15, 2026

Summary

Replaces the Set<String>-based span.kind eligibility checks in ConflatingMetricsAggregator with a new SpanKindFilter bitmask primitive. Each filter is an int mask indexed by the span.kind ordinal already cached on DDSpanContext; checking eligibility on DDSpan becomes a context byte-read + bit-test (no tag-map lookup, no HashSet hash/equals).

Layered as 5 commits so each step is reviewable in isolation:

  1. Trim per-span work on metrics aggregator publish path — dedup isTopLevel(), lazy-allocate the peer-tag list, collapse a duplicate spanKind.toString(). Also updates the existing JMH benchmark to set span.kind=client on every span so the eligibility path is actually exercised (without it the bench short-circuits before any of this work).
  2. Add SpanKindFilter and CoreSpan.isKind for bitmask-based kind checks — introduces SpanKindFilter (builder + bitmask), adds boolean isKind(SpanKindFilter) to CoreSpan, DDSpan overrides with the fast path, test-only impls (SimpleSpan, two TraceGenerator.PojoSpans) implement via SpanKindFilter.matches(String). Refactors DDSpanContext.setSpanKindOrdinal to expose spanKindOrdinalOf(String) as a reusable static helper.
  3. Use SpanKindFilter in ConflatingMetricsAggregator — replaces the two ELIGIBLE_SPAN_KINDS_FOR_* Set<String> constants and the SPAN_KIND_INTERNAL.equals check with three SpanKindFilter instances. Defers the span.kind tag read until inside publish() where MetricKey still needs it. SimpleSpan caches the ordinal at setTag time so the JMH benchmark's mock isn't measuring its own dispatch shape.
  4. Add DDSpan-based variant of the JMH benchmark — the existing benchmark uses SimpleSpan (groovy mock), where the new isKind path goes through a less-inlinable interface call. The new ConflatingMetricsAggregatorDDSpanBenchmark uses real DDSpan instances created via CoreTracer (with a NoopWriter), so the JIT exercises the production fast path.
  5. Tighten SpanKindFilter encapsulationkindMask and the constructor are now private; DDSpan.isKind delegates to filter.matches(byte) rather than reaching the field directly.

Benchmark results

ConflatingMetricsAggregatorDDSpanBenchmark, 2 forks × 5 iterations × 15s:

avgt (µs/op) CI (99.9%)
master 6.428 ± 0.189 [6.239, 6.617]
this branch 6.343 ± 0.115 [6.228, 6.458]

~1.3% faster on the production path, with tighter fork-to-fork variance. The CIs overlap so the headline number is within noise, but the centers move the right way and the new path is structurally cheaper (byte read + bit-test vs tag-map read + HashSet.contains).

The SimpleSpan benchmark in the same conditions shows a ~2-3% slowdown — that's an artifact of the groovy mock's dispatch shape, not a production cost, which is why the DDSpan-flavored benchmark was added.

Test plan

  • ./gradlew :dd-trace-core:test --tests 'datadog.trace.common.metrics.*' passes
  • ./gradlew :dd-trace-core:test --tests 'datadog.trace.core.DDSpan*' passes
  • ./gradlew :dd-trace-core:compileJava :dd-trace-core:compileTestGroovy :dd-trace-core:compileJmhJava :dd-trace-core:compileTraceAgentTestGroovy all green
  • ./gradlew spotlessCheck clean
  • CI muzzle / integration suites

🤖 Generated with Claude Code

dougqh and others added 5 commits May 15, 2026 12:06
ConflatingMetricsAggregator.publish does a handful of redundant operations on
every span. None individually is large; together they show as ~2.5% on the
existing JMH benchmark once the benchmark actually exercises span.kind.

- dedup span.isTopLevel(): publish() reads it into a local, then shouldComputeMetric
  read it again. Pass the cached value in.
- resolve spanKind to String once: master called toString() twice per span (once
  inside spanKindEligible, once at the getPeerTags call site) and used HashSet
  contains on a CharSequence (which routes through equals on String). Normalize
  to String up front and reuse.
- lazy-allocate the peer-tag list: getPeerTags() always allocated an ArrayList
  sized to features.peerTags() even when the span had none of those tags set.
  Defer allocation until the first match; return Collections.emptyList() when
  none hit. MetricKey already treats null/empty peerTags as emptyList, so no
  behavior change.

Drop the spanKindEligible helper — the HashSet.contains call inlines fine in
shouldComputeMetric.

Update the JMH benchmark to set span.kind=client on every span. Without it the
filter path short-circuits before the peer-tag and toString work, so the wins
above aren't measurable. With it:

  baseline   6.755 us/op (CI [6.560, 6.950], stdev 0.129)
  optimized  6.585 us/op (CI [6.536, 6.634], stdev 0.033)

2 forks x 5 iterations x 15s. ~2.5% mean improvement and much tighter variance
fork-to-fork.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce SpanKindFilter -- a tiny builder-built immutable filter whose state
is an int bitmask indexed by the span.kind ordinals already cached on
DDSpanContext. Each include* on the builder sets one bit (1 << ordinal); the
runtime check is a single AND against (1 << span's ordinal).

CoreSpan.isKind(SpanKindFilter) is the new entry point. DDSpan overrides it
to do the bit-test directly against the cached ordinal -- no virtual call,
no tag-map lookup. The two existing test-only CoreSpan impls (SimpleSpan
and TraceGenerator.PojoSpan, the latter in two source sets) implement isKind
by reading the span.kind tag and delegating to SpanKindFilter.matches(String),
which converts via DDSpanContext.spanKindOrdinalOf and does the same AND.

Refactor: DDSpanContext.setSpanKindOrdinal(String) now delegates to a new
package-private static spanKindOrdinalOf(String) so the same string-to-ordinal
mapping serves both the tag interceptor path and SpanKindFilter.matches.

This is groundwork -- nothing in the codebase calls isKind yet. The next
commit will replace the HashSet-based eligibility checks in
ConflatingMetricsAggregator with SpanKindFilter instances.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the two ELIGIBLE_SPAN_KINDS_FOR_* HashSet<String> constants and the
SPAN_KIND_INTERNAL.equals check with three SpanKindFilter instances:
METRICS_ELIGIBLE_KINDS, PEER_AGGREGATION_KINDS, INTERNAL_KIND. Eligibility
checks now go through span.isKind(filter), which on DDSpan is a volatile
byte read against the already-cached span.kind ordinal plus a single bit-test.

Also defer the span.kind tag read: previously read at the top of the publish
loop and threaded through both shouldComputeMetric and the inner publish.
isKind no longer needs the string, so the read can move down into the inner
publish where it's still needed for the SPAN_KINDS cache key / MetricKey.

Supporting changes:

- DDSpanContext.spanKindOrdinalOf(String) is now public so non-DDSpan CoreSpan
  impls can compute the ordinal at tag-write time.
- SpanKindFilter gains a public matches(byte) fast-path overload that callers
  with a pre-computed ordinal use directly.
- SimpleSpan caches the ordinal in setTag(SPAN_KIND, ...), mirroring what
  TagInterceptor does for DDSpanContext, and its isKind now hits the byte
  fast path. Without this, the JMH benchmark (which uses SimpleSpan) would
  re-derive the ordinal on every isKind call and overstate the cost.

Benchmark on the bench updated last commit (kind=client on every span,
4 forks x 5 iter x 15s):

  prior commit  6.585 ± 0.049 us/op
  this commit   6.903 ± 0.096 us/op

The slight regression is a SimpleSpan-via-groovy-dispatch artifact -- the
interface call to isKind through CoreSpan, then through SimpleSpan, then
through SpanKindFilter.matches, doesn't fold as aggressively as a HashSet
contains on a static field. In production DDSpan.isKind inlines to a context
field read + ordinal byte read + bit-test, so the production path is faster
than the prior HashSet approach. A DDSpan-based benchmark would show this;
the existing SimpleSpan-based one doesn't.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing ConflatingMetricsAggregatorBenchmark uses SimpleSpan, a groovy
mock. That's enough for measuring queue/CHM/MetricKey work, but it conceals
the production cost of CoreSpan.isKind: SimpleSpan's isKind goes through
groovy interface dispatch into SpanKindFilter.matches, while DDSpan.isKind
inlines to a context byte-read + bit-test.

This new benchmark uses real DDSpan instances created through a CoreTracer
(with a NoopWriter so finishing doesn't reach the agent). Same shape as the
SimpleSpan bench (64-span trace, span.kind=client, peer.hostname set).

Numbers (2 forks x 5 iter x 15s):

  master:        6.428 +- 0.189 us/op  (HashSet eligibility checks)
  this branch:   6.343 +- 0.115 us/op  (SpanKindFilter bitmask)

About 1.3% faster on the production path. The SimpleSpan benchmark in the
same conditions shows a ~2.2% slowdown -- the mock's dispatch shape gives a
misleading signal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Make SpanKindFilter.kindMask and its constructor private now that DDSpan.isKind
no longer needs direct field access -- it delegates to SpanKindFilter.matches(byte).

The Builder.build() in the same outer class still constructs instances via the
private constructor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dougqh dougqh added type: enhancement Enhancements and improvements comp: core Tracer core tag: performance Performance related changes tag: no release notes Changes to exclude from release notes comp: metrics Metrics tag: ai generated Largely based on code generated by an AI or LLM labels May 15, 2026
@dougqh dougqh removed the tag: no release notes Changes to exclude from release notes label May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: core Tracer core comp: metrics Metrics tag: ai generated Largely based on code generated by an AI or LLM tag: performance Performance related changes type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants