-
Notifications
You must be signed in to change notification settings - Fork 541
Description
Component(s)
target allocator
Describe the issue you're reporting
I am using a daemon-set of open-telemetry collector in tandem with 1 instance of target-allocator with per-node
allocation strategy. All scraping targets are node local, and I use kubernetes_sd_config for service discovery.
I compared the above setup against a daemonset of otel-collector each using relabel_config and kubernetes_sd_config to filter the node local pods to scrape from.
Since, Target Allocator (TA) doc reads:
The TA is a mechanism for decoupling the service discovery and metric collection functions of Prometheus such that they can be scaled independently
I expected the load on api-server to go down with Otel + TA against only Otel. But my observation is contrary where the load on api-server with and without TA is similar.
Can I get some clarity if there is a gap in my understanding, or if there is a tunable that I can configure?
Sample TA config:
# Used by TargetAllocator watcher to discover Otel-Collector pods using labels
collector_selector:
matchlabels:
cluster-addon-name: otel-collector
# Algorithm to use to allocate endpoints amongst Otel-Collector pods
allocation_strategy: per-node
# Since we are using `per-node` allocation strategy, this would not take effect
# for endpoints which are not associated with any node (e.g. apiserver)
# For those cases we use the fallback strategy
allocation_fallback_strategy: least-weighted
# Should relabel-config be respected? (Yes)
filter_strategy: relabel-config
# Actual receiver config
config:
scrape_configs:
...
Sample Otel-config:
receivers:
prometheus:
target_allocator:
endpoint: http://target-allocator-service.system-metrics.svc.internal
interval: 60s
collector_id: "${POD_NAME}"
processors:
batch:
send_batch_size: 1000
timeout: 5s
memory_limiter:
limit_mib: 2500
spike_limit_mib: 150
check_interval: 5s
....