You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/metrics.md
+110-5Lines changed: 110 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -485,9 +485,9 @@ practice is important:
485
485
486
486
***Delta Temporality**: The SDK "forgets" the state after each
487
487
collection/export cycle. This means in each new interval, the SDK can track
488
-
up to the cardinality limit of distinct attribute combinations.
489
-
Over time, your metrics backend might see far more than the configured limit
490
-
of distinct combinations from a single process.
488
+
up to the cardinality limit of distinct attribute combinations. Over time,
489
+
your metrics backend might see far more than the configured limit of
490
+
distinct combinations from a single process.
491
491
492
492
***Cumulative Temporality**: Since the SDK maintains state across export
493
493
intervals, once the cardinality limit is reached, new attribute combinations
@@ -560,7 +560,108 @@ The exported metrics would be:
560
560
words, attributes used to create `Meter` or `Resource` attributes are not
561
561
subject to this cap.
562
562
563
-
// TODO: Document how to pick cardinality limit.
563
+
#### Cardinality Limits - How to Choose the Right Limit
564
+
565
+
Choosing the right cardinality limit is crucial for maintaining efficient memory
566
+
usage and predictable performance in your metrics system. The optimal limit
567
+
depends on your temporality choice and application characteristics.
568
+
569
+
Setting the limit incorrectly can have consequences:
570
+
571
+
***Limit too high**: Due to the SDK's [memory
572
+
preallocation](#memory-preallocation) strategy, excess memory will be
573
+
allocated upfront and remain unused, leading to resource waste.
574
+
***Limit too low**: Measurements will be folded into the overflow bucket
575
+
(`{"otel.metric.overflow": true}`), losing granular attribute information and
576
+
making attribute-based queries unreliable.
577
+
578
+
Consider these guidelines when determining the appropriate limit:
579
+
580
+
##### Choosing the Right Limit for Cumulative Temporality
581
+
582
+
Cumulative metrics retain every unique attribute combination that has *ever*
583
+
been observed since the start of the process.
584
+
585
+
* You must account for the theoretical maximum number of attribute combinations.
586
+
* This can be estimated by multiplying the number of possible values for each
587
+
attribute.
588
+
* If certain attribute combinations are invalid or will never occur in practice,
589
+
you can reduce the limit accordingly.
590
+
591
+
###### Example - Fruit Sales Scenario
592
+
593
+
Attributes:
594
+
595
+
*`name` can be "apple" or "lemon" (2 values)
596
+
*`color` can be "red", "yellow", or "green" (3 values)
597
+
598
+
The theoretical maximum is 2 × 3 = 6 unique attribute sets.
599
+
600
+
For this example, the simplest approach is to use the theoretical maximum and **set the cardinality limit to 6**.
601
+
602
+
However, if you know that certain combinations will never occur (for example, if "red lemons" don't exist in your application domain), you could reduce the limit to only account for valid combinations. In this case, if only 5 combinations are valid, **setting the cardinality limit to 5** would be more memory-efficient.
603
+
604
+
##### Choosing the Right Limit for Delta Temporality
605
+
606
+
Delta metrics reset their aggregation state after every export interval. This
607
+
approach enables more efficient memory utilization by focusing only on attributes
608
+
observed during each interval rather than maintaining state for all combinations.
609
+
610
+
***When attributes are low-cardinality** (as in the fruit example), use the
611
+
same calculation method as with cumulative temporality.
612
+
***When high-cardinality attribute(s) exist** like `user_id`, leverage Delta
613
+
temporality's "forget state" nature to set a much lower limit based on active
614
+
usage patterns. This is where Delta temporality truly excels - when the set of
615
+
active values changes dynamically and only a small subset is active during any
616
+
given interval.
617
+
618
+
###### Example - High Cardinality Attribute Scenario
619
+
620
+
Export interval: 60 sec
621
+
622
+
Attributes:
623
+
624
+
*`user_id` (up to 1 million unique users)
625
+
*`success` (true or false, 2 values)
626
+
627
+
Theoretical limit: 1 million users × 2 = 2 million attribute sets
628
+
629
+
But if only 10,000 users are typically active during a 60 sec export interval:
630
+
10,000 × 2 = 20,000
631
+
632
+
**You can set the limit to 20,000, dramatically reducing memory usage during
633
+
normal operation.**
634
+
635
+
###### Export Interval Tuning
636
+
637
+
Shorter export intervals further reduce the required cardinality:
638
+
639
+
* If your interval is halved (e.g., from 60 sec to 30 sec), the number of unique
640
+
attribute sets seen per interval may also be halved.
641
+
642
+
> [!NOTE] More frequent exports increase CPU/network overhead due to
643
+
> serialization and transmission costs.
644
+
645
+
##### Choosing the Right Limit - Backend Considerations
646
+
647
+
While delta temporality offers certain advantages for cardinality management,
648
+
your choice may be constrained by backend support:
649
+
650
+
***Backend Restrictions:** Some metrics backends only support cumulative
651
+
temporality. For example, Prometheus requires cumulative temporality and
652
+
cannot directly consume delta metrics.
653
+
***Collector Conversion:** To leverage delta temporality's memory advantages
654
+
while maintaining backend compatibility, configure your SDK to use delta
655
+
temporality and deploy an OpenTelemetry Collector with a delta-to-cumulative
656
+
conversion processor. This approach pushes the memory overhead from your
657
+
application to the collector, which can be more easily scaled and managed
658
+
independently.
659
+
660
+
TODO: Add the memory cost incurred by each data points, so users can know the
661
+
memory impact of setting a higher limits.
662
+
663
+
TODO: Add example of how query can be affected when overflow occurs, use
0 commit comments