Skip to content

Does usage of target-allocator reduce load on api-server #3529

@sfc-gh-akrishnan

Description

@sfc-gh-akrishnan

Component(s)

target allocator

Describe the issue you're reporting

I am using a daemon-set of open-telemetry collector in tandem with 1 instance of target-allocator with per-node allocation strategy. All scraping targets are node local, and I use kubernetes_sd_config for service discovery.

I compared the above setup against a daemonset of otel-collector each using relabel_config and kubernetes_sd_config to filter the node local pods to scrape from.

Since, Target Allocator (TA) doc reads:

The TA is a mechanism for decoupling the service discovery and metric collection functions of Prometheus such that they can be scaled independently

I expected the load on api-server to go down with Otel + TA against only Otel. But my observation is contrary where the load on api-server with and without TA is similar.

Can I get some clarity if there is a gap in my understanding, or if there is a tunable that I can configure?

Sample TA config:

    # Used by TargetAllocator watcher to discover Otel-Collector pods using labels
    collector_selector:
      matchlabels:
        cluster-addon-name: otel-collector

    # Algorithm to use to allocate endpoints amongst Otel-Collector pods
    allocation_strategy: per-node

    # Since we are using `per-node` allocation strategy, this would not take effect
    # for endpoints which are not associated with any node (e.g. apiserver)
    # For those cases we use the fallback strategy
    allocation_fallback_strategy: least-weighted

    # Should relabel-config be respected? (Yes)
    filter_strategy: relabel-config

    # Actual receiver config
    config:
      scrape_configs:
        ...

Sample Otel-config:

    receivers:
      prometheus:
        target_allocator:
          endpoint: http://target-allocator-service.system-metrics.svc.internal
          interval: 60s
          collector_id: "${POD_NAME}"

    processors:
      batch:
        send_batch_size: 1000
        timeout: 5s
      memory_limiter:
        limit_mib: 2500
        spike_limit_mib: 150
        check_interval: 5s
....

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions