Skip to content

enhancement(tag_cardinality_limit transform): A setting for per-metric vs global tag cardinality tracking#25372

Open
ArunPiduguDD wants to merge 1 commit intomasterfrom
arun.pidugu/tag-cardinality-tracking-scope
Open

enhancement(tag_cardinality_limit transform): A setting for per-metric vs global tag cardinality tracking#25372
ArunPiduguDD wants to merge 1 commit intomasterfrom
arun.pidugu/tag-cardinality-tracking-scope

Conversation

@ArunPiduguDD
Copy link
Copy Markdown
Contributor

@ArunPiduguDD ArunPiduguDD commented May 5, 2026

Summary

When metrics do not have an explicit per_metric_limits entry, their tag values were always pooled into a single shared bucket. This can lead to some of the following example scenarios:

  • If metric1 and metric2 have the host tag, but metric1 has a high cardinality for the host tag (above the limit), the host tag will be dropped on metric2 (even if the tag on metric2 only has 1-2 cardinality)
  • If there are ~100 metrics with the host tag, and each tag has 1-2 unique values per metric, then a cardinality limit of 50 will drop this tag across all metrics.

The new tracking_scope setting lets users opt into per-metric tracking buckets instead, providing isolation at the cost of higher memory.

Default is global (current behavior); per_metric gives every distinct (namespace, name) its own bucket regardless of per_metric_limits membership.

Vector configuration

sources:
  otel:
    type: opentelemetry
    grpc:
      address: "0.0.0.0:4317"
    http:
      address: "0.0.0.0:4318"

  cardinality:
    type: tag_cardinality_limit
    inputs: ["otel.metrics"]
    value_limit: 5
    mode: exact
    limit_exceeded_action: drop_event

    # The new setting under test. Try toggling between `global` (current behavior:
    # all metrics without a `per_metric_limits` entry share one bucket) and
    # `per_metric` (every metric name gets its own bucket).
    tracking_scope: per_metric

    per_metric_limits:
      # Tighter override on this specific metric — applies regardless of `tracking_scope`.
      demo_value_gauge:
        value_limit: 2
        mode: exact
        limit_exceeded_action: drop_event

      demo_value_counter:
        value_limit: 6
        mode: exact
        limit_exceeded_action: drop_tag

sinks:
  console:
    type: console
    inputs: ["cardinality"]
    encoding:
      codec: json

How did you test this PR?

Tested with above configuration. Simulated an Otel Collector with the following Python script:

import random
import string
from uuid import uuid4

from opentelemetry import metrics
from opentelemetry.metrics import Observation
from opentelemetry.exporter.otlp.proto.http.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import InMemoryMetricReader, MetricExportResult
from opentelemetry.sdk.metrics.view import View, DropAggregation
from opentelemetry.sdk.resources import Resource


VECTOR_METRICS_ENDPOINT = "http://localhost:4318/v1/metrics"


def rand_token(prefix: str, n: int = 8) -> str:
    return f"{prefix}-{''.join(random.choices(string.ascii_lowercase + string.digits, k=n))}"


def random_trace_id() -> str:
    return uuid4().hex + uuid4().hex


def random_environment() -> str:
    return rand_token(random.choice(["dev", "staging", "prod", "local", "qa"]))


def build_common_tags() -> dict[str, str]:
    return {
        "trace_id": random_trace_id(),
        "environment": random_environment(),
    }


def build_system_process_tags() -> dict[str, str]:
    tags = build_common_tags()
    tags.update(
        {
            "host_id": rand_token("host"),
            "process_group": rand_token("pg"),
            "shard": rand_token("shard"),
            "worker": rand_token("worker"),
        }
    )
    return tags


def main() -> None:
    resource = Resource.create(
        {
            "service.name": "vector-http-metrics-demo",
            "service.version": "1.0.0",
        }
    )

    reader = InMemoryMetricReader()

    provider = MeterProvider(
        resource=resource,
        metric_readers=[reader],
        views=[
            View(instrument_name="*", aggregation=DropAggregation()),
            View(instrument_name="demo_value_gauge"),
            View(instrument_name="system.process.count"),
            View(instrument_name="demo_value_counter"),
            View(instrument_name="demo_value_secondary_gauge"),
            View(instrument_name="demo_value_secondary_counter"),
        ],
    )
    metrics.set_meter_provider(provider)

    exporter = OTLPMetricExporter(
        endpoint=VECTOR_METRICS_ENDPOINT,
        timeout=3000,
    )

    meter = metrics.get_meter("demo-meter")

    state = {
        "demo_value_gauge": {
            "value": 0.0,
            "tags": build_common_tags(),
        },
        "system.process.count": {
            "value": 0.0,
            "tags": build_system_process_tags(),
        },
        "demo_value_counter": {
            "value": 0.0,
            "tags": build_common_tags(),
        },
        "demo_value_secondary_gauge": {
            "value": 0.0,
            "tags": build_common_tags(),
        },
        "demo_value_secondary_counter": {
            "value": 0.0,
            "tags": build_common_tags(),
        },
    }

    def demo_value_gauge_callback(_options):
        s = state["demo_value_gauge"]
        return [Observation(s["value"], s["tags"])]

    def system_process_count_callback(_options):
        s = state["system.process.count"]
        return [Observation(s["value"], s["tags"])]

    def demo_value_counter_callback(_options):
        s = state["demo_value_counter"]
        return [Observation(s["value"], s["tags"])]

    def demo_value_secondary_gauge_callback(_options):
        s = state["demo_value_secondary_gauge"]
        return [Observation(s["value"], s["tags"])]

    def demo_value_secondary_counter_callback(_options):
        s = state["demo_value_secondary_counter"]
        return [Observation(s["value"], s["tags"])]

    meter.create_observable_gauge(
        name="demo_value_gauge",
        callbacks=[demo_value_gauge_callback],
        description="Gauge metric exported to Vector over OTLP/HTTP",
        unit="1",
    )

    meter.create_observable_gauge(
        name="system.process.count",
        callbacks=[system_process_count_callback],
        description="Process count metric exported to Vector over OTLP/HTTP",
        unit="1",
    )

    meter.create_observable_counter(
        name="demo_value_counter",
        callbacks=[demo_value_counter_callback],
        description="Counter metric exported to Vector over OTLP/HTTP",
        unit="1",
    )

    meter.create_observable_gauge(
        name="demo_value_secondary_gauge",
        callbacks=[demo_value_secondary_gauge_callback],
        description="Second demo gauge metric exported to Vector over OTLP/HTTP",
        unit="1",
    )

    meter.create_observable_counter(
        name="demo_value_secondary_counter",
        callbacks=[demo_value_secondary_counter_callback],
        description="Second demo counter metric exported to Vector over OTLP/HTTP",
        unit="1",
    )

    print(f"Configured OTLP/HTTP metrics endpoint: {VECTOR_METRICS_ENDPOINT}")
    print("Press Enter to send all five metrics with random values and random tags.")
    print("Type q and press Enter to quit.")

    try:
        while True:
            user_input = input("> ").strip().lower()
            if user_input in {"q", "quit", "exit"}:
                break

            state["demo_value_gauge"]["value"] = round(random.uniform(0, 100), 2)
            state["system.process.count"]["value"] = float(random.randint(1, 500))
            state["demo_value_counter"]["value"] += float(random.randint(1, 20))
            state["demo_value_secondary_gauge"]["value"] = round(random.uniform(-50, 50), 2)
            state["demo_value_secondary_counter"]["value"] += float(random.randint(1, 10))

            state["demo_value_gauge"]["tags"] = build_common_tags()
            state["system.process.count"]["tags"] = build_system_process_tags()
            state["demo_value_counter"]["tags"] = build_common_tags()
            state["demo_value_secondary_gauge"]["tags"] = build_common_tags()
            state["demo_value_secondary_counter"]["tags"] = build_common_tags()

            metrics_data = reader.get_metrics_data()
            if metrics_data is None:
                print("send failed: no metrics data collected")
                continue

            result = exporter.export(metrics_data)

            if result is MetricExportResult.SUCCESS:
                print("sent all metrics")
                print(
                    f"  demo_value_gauge value={state['demo_value_gauge']['value']} "
                    f"tags={state['demo_value_gauge']['tags']}"
                )
                print(
                    f"  system.process.count value={state['system.process.count']['value']} "
                    f"tags={state['system.process.count']['tags']}"
                )
                print(
                    f"  demo_value_counter value={state['demo_value_counter']['value']} "
                    f"tags={state['demo_value_counter']['tags']}"
                )
                print(
                    f"  demo_value_secondary_gauge value={state['demo_value_secondary_gauge']['value']} "
                    f"tags={state['demo_value_secondary_gauge']['tags']}"
                )
                print(
                    f"  demo_value_secondary_counter value={state['demo_value_secondary_counter']['value']} "
                    f"tags={state['demo_value_secondary_counter']['tags']}"
                )
            else:
                print("send failed for this export batch")

    finally:
        exporter.shutdown()
        provider.shutdown()


if __name__ == "__main__":
    main()

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.

@ArunPiduguDD ArunPiduguDD requested review from a team as code owners May 5, 2026 17:24
@github-actions github-actions Bot added docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. domain: transforms Anything related to Vector's transform components domain: external docs Anything related to Vector's external, public documentation labels May 5, 2026
@ArunPiduguDD ArunPiduguDD marked this pull request as draft May 5, 2026 17:25
@ArunPiduguDD ArunPiduguDD changed the title feat(tag_cardinality_limit transform): add tracking_scope setting f… feat(tag_cardinality_limit transform): A setting for per-metric vs global tag cardinality tracking May 5, 2026
…or per-metric vs global tag tracking

When metrics do not have an explicit `per_metric_limits` entry, their tag values
were always pooled into a single shared bucket. The new `tracking_scope` setting
lets users opt into per-metric tracking buckets instead, providing isolation at
the cost of higher memory.

Default is `global` (current behavior); `per_metric` gives every distinct
(namespace, name) its own bucket regardless of `per_metric_limits` membership.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@ArunPiduguDD ArunPiduguDD changed the title feat(tag_cardinality_limit transform): A setting for per-metric vs global tag cardinality tracking enhancement(tag_cardinality_limit transform): A setting for per-metric vs global tag cardinality tracking May 5, 2026
@ArunPiduguDD ArunPiduguDD force-pushed the arun.pidugu/tag-cardinality-tracking-scope branch from 8044081 to 9a136f2 Compare May 5, 2026 19:03
@ArunPiduguDD ArunPiduguDD marked this pull request as ready for review May 5, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. domain: external docs Anything related to Vector's external, public documentation domain: transforms Anything related to Vector's transform components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants