Skip to content

Commit 1f05706

Browse files
brettimusBrett Beutell
and
Brett Beutell
authored
Add build_info metric and use it in generated queries (#35)
* Add blurb to readme about identifying commits * Remove "coming soon" from readme item on adding links to live Prom charts * Initialize Prometheus Gauge for build_info * Add updown counter for build info to otel tracker * Implement set_build_info for OTEL and Prom, and call when we set the default tracker * Move set_build_info call into create_tracker * Update prometheus queries * Update prometheus URL tests * Add test for build_info gauge for prometheus tracker (skipped test for otel tracker) * Update otel tracker and tracker tests after finding otel prometheus bug * Ensure set_build_info is only called once * Update changelog * Add set_build_info to the TrackMetrics Protocol * Fix build_info query based off of autometrics-dev/autometrics-shared#8 * Rename create_tracker to init_tracker * Update pyright * Update README to mention OpenTelemetry tracker does not work with build_info --------- Co-authored-by: Brett Beutell <[email protected]>
1 parent d157981 commit 1f05706

11 files changed

+153
-21
lines changed

CHANGELOG.md

+1
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
1212

1313
### Added
1414

15+
- Support for build_info metrics in Prometheus (#35)
1516
- OpenTelemetry Support (#28)
1617
- Fly.io example (#26)
1718
- Django example (#22)

README.md

+20-2
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,8 @@ See [Why Autometrics?](https://github.com/autometrics-dev#why-autometrics) for m
1717
most useful metrics
1818
- 💡 Writes Prometheus queries so you can understand the data generated without
1919
knowing PromQL
20-
- 🔗 Create links to live Prometheus charts directly into each functions docstrings (with tooltips coming soon!)
20+
- 🔗 Create links to live Prometheus charts directly into each function's docstring
21+
- [🔍 Identify commits](#identifying-commits-that-introduced-problems) that introduced errors or increased latency
2122
- [🚨 Define alerts](#alerts--slos) using SLO best practices directly in your source code
2223
- [📊 Grafana dashboards](#dashboards) work out of the box to visualize the performance of instrumented functions & SLOs
2324
- [⚙️ Configurable](#metrics-libraries) metric collection library (`opentelemetry`, `prometheus`, or `metrics`)
@@ -112,7 +113,22 @@ def api_handler():
112113
Configure the crate that autometrics will use to produce metrics by using one of the following feature flags:
113114

114115
- `opentelemetry` - (enabled by default, can also be explicitly set using the AUTOMETRICS_TRACKER="OPEN_TELEMETERY" env var) uses
115-
- `prometheus` -(using the AUTOMETRICS_TRACKER env var set to "PROMETHEUS")
116+
- `prometheus` - (using the AUTOMETRICS_TRACKER env var set to "PROMETHEUS")
117+
118+
## Identifying commits that introduced problems
119+
120+
> **NOTE** - As of writing, `build_info` will not work correctly when using the default tracker (`AUTOMETRICS_TRACKER=OPEN_TELEMETRY`).
121+
> This will be fixed once the following PR is merged on the opentelemetry-python project: https://github.com/open-telemetry/opentelemetry-python/pull/3306
122+
>
123+
> autometrics-py will track support for build_info using the OpenTelemetry tracker via #38
124+
125+
Autometrics makes it easy to identify if a specific version or commit introduced errors or increased latencies.
126+
127+
It uses a separate metric (`build_info`) to track the version and, optionally, git commit of your service. It then writes queries that group metrics by the `version` and `commit` labels so you can spot correlations between those and potential issues.
128+
129+
The `version` is read from the `AUTOMETRICS_VERSION` environment variable, and the `commit` value uses the environment variable `AUTOMETRICS_COMMIT`.
130+
131+
This follows the method outlined in [Exposing the software version to Prometheus](https://www.robustperception.io/exposing-the-software-version-to-prometheus/).
116132

117133
## Development of the package
118134

@@ -149,4 +165,6 @@ poetry run black .
149165
poetry run pyright
150166
# Run the tests using pytest
151167
poetry run pytest
168+
# Run a single test, and clear the cache
169+
poetry run pytest --cache-clear -k test_tracker
152170
```

poetry.lock

+4-4
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pyproject.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ typing-extensions = "^4.5.0"
2727
optional = true
2828

2929
[tool.poetry.group.dev.dependencies]
30-
pyright = "^1.1.302"
30+
pyright = "^1.1.307"
3131
pytest = "^7.3.0"
3232
pytest-asyncio = "^0.21.0"
3333
black = "^23.3.0"

src/autometrics/constants.py

+7
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,24 @@
22

33
COUNTER_NAME = "function.calls.count"
44
HISTOGRAM_NAME = "function.calls.duration"
5+
# NOTE - The Rust implementation does not use `build.info`, instead opts for just `build_info`
6+
BUILD_INFO_NAME = "build_info"
57

68
COUNTER_NAME_PROMETHEUS = COUNTER_NAME.replace(".", "_")
79
HISTOGRAM_NAME_PROMETHEUS = HISTOGRAM_NAME.replace(".", "_")
810

911
COUNTER_DESCRIPTION = "Autometrics counter for tracking function calls"
1012
HISTOGRAM_DESCRIPTION = "Autometrics histogram for tracking function call duration"
13+
BUILD_INFO_DESCRIPTION = (
14+
"Autometrics info metric for tracking software version and build details"
15+
)
1116

1217
# The following constants are used to create the labels
1318
OBJECTIVE_NAME = "objective.name"
1419
OBJECTIVE_PERCENTILE = "objective.percentile"
1520
OBJECTIVE_LATENCY_THRESHOLD = "objective.latency_threshold"
21+
VERSION_KEY = "version"
22+
COMMIT_KEY = "commit"
1623

1724
# The values are updated to use underscores instead of periods to avoid issues with prometheus.
1825
# A similar thing is done in the rust library, which supports multiple exporters

src/autometrics/prometheus_url.py

+5-3
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
from typing import Optional
44
from dotenv import load_dotenv
55

6+
ADD_BUILD_INFO_LABELS = "* on (instance, job) group_left(version, commit) (last_over_time(build_info[1s]) or on (instance, job) up)"
7+
68

79
def cleanup_url(url: str) -> str:
810
"""Remove the trailing slash if there is one."""
@@ -26,9 +28,9 @@ def __init__(
2628

2729
def create_urls(self):
2830
"""Create the prometheus query urls for the function and module."""
29-
request_rate_query = f'sum by (function, module) (rate (function_calls_count_total{{function="{self.function_name}",module="{self.module_name}"}}[5m]))'
30-
latency_query = f'sum by (le, function, module) (rate(function_calls_duration_bucket{{function="{self.function_name}",module="{self.module_name}"}}[5m]))'
31-
error_ratio_query = f'sum by (function, module) (rate (function_calls_count_total{{function="{self.function_name}",module="{self.module_name}", result="error"}}[5m])) / {request_rate_query}'
31+
request_rate_query = f'sum by (function, module, commit, version) (rate (function_calls_count_total{{function="{self.function_name}",module="{self.module_name}"}}[5m]) {ADD_BUILD_INFO_LABELS})'
32+
latency_query = f'sum by (le, function, module, commit, version) (rate(function_calls_duration_bucket{{function="{self.function_name}",module="{self.module_name}"}}[5m]) {ADD_BUILD_INFO_LABELS})'
33+
error_ratio_query = f'sum by (function, module, commit, version) (rate (function_calls_count_total{{function="{self.function_name}",module="{self.module_name}", result="error"}}[5m]) {ADD_BUILD_INFO_LABELS}) / {request_rate_query}'
3234

3335
queries = {
3436
"Request rate URL": request_rate_query,

src/autometrics/test_prometheus_url.py

+5-4
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,12 @@ def test_create_prometheus_url_with_default_url(default_url_generator: Generator
2424
def test_create_urls_with_default_url(default_url_generator: Generator):
2525
urls = default_url_generator.create_urls()
2626

27-
# print(urls.keys())
27+
print(urls)
28+
2829
result = {
29-
"Request rate URL": "http://localhost:9090/graph?g0.expr=sum%20by%20%28function%2C%20module%29%20%28rate%20%28function_calls_count_total%7Bfunction%3D%22myFunction%22%2Cmodule%3D%22myModule%22%7D%5B5m%5D%29%29&g0.tab=0",
30-
"Latency URL": "http://localhost:9090/graph?g0.expr=sum%20by%20%28le%2C%20function%2C%20module%29%20%28rate%28function_calls_duration_bucket%7Bfunction%3D%22myFunction%22%2Cmodule%3D%22myModule%22%7D%5B5m%5D%29%29&g0.tab=0",
31-
"Error Ratio URL": "http://localhost:9090/graph?g0.expr=sum%20by%20%28function%2C%20module%29%20%28rate%20%28function_calls_count_total%7Bfunction%3D%22myFunction%22%2Cmodule%3D%22myModule%22%2C%20result%3D%22error%22%7D%5B5m%5D%29%29%20/%20sum%20by%20%28function%2C%20module%29%20%28rate%20%28function_calls_count_total%7Bfunction%3D%22myFunction%22%2Cmodule%3D%22myModule%22%7D%5B5m%5D%29%29&g0.tab=0",
30+
"Request rate URL": "http://localhost:9090/graph?g0.expr=sum%20by%20%28function%2C%20module%2C%20commit%2C%20version%29%20%28rate%20%28function_calls_count_total%7Bfunction%3D%22myFunction%22%2Cmodule%3D%22myModule%22%7D%5B5m%5D%29%20%2A%20on%20%28instance%2C%20job%29%20group_left%28version%2C%20commit%29%20%28last_over_time%28build_info%5B1s%5D%29%20or%20on%20%28instance%2C%20job%29%20up%29%29&g0.tab=0",
31+
"Latency URL": "http://localhost:9090/graph?g0.expr=sum%20by%20%28le%2C%20function%2C%20module%2C%20commit%2C%20version%29%20%28rate%28function_calls_duration_bucket%7Bfunction%3D%22myFunction%22%2Cmodule%3D%22myModule%22%7D%5B5m%5D%29%20%2A%20on%20%28instance%2C%20job%29%20group_left%28version%2C%20commit%29%20%28last_over_time%28build_info%5B1s%5D%29%20or%20on%20%28instance%2C%20job%29%20up%29%29&g0.tab=0",
32+
"Error Ratio URL": "http://localhost:9090/graph?g0.expr=sum%20by%20%28function%2C%20module%2C%20commit%2C%20version%29%20%28rate%20%28function_calls_count_total%7Bfunction%3D%22myFunction%22%2Cmodule%3D%22myModule%22%2C%20result%3D%22error%22%7D%5B5m%5D%29%20%2A%20on%20%28instance%2C%20job%29%20group_left%28version%2C%20commit%29%20%28last_over_time%28build_info%5B1s%5D%29%20or%20on%20%28instance%2C%20job%29%20up%29%29%20/%20sum%20by%20%28function%2C%20module%2C%20commit%2C%20version%29%20%28rate%20%28function_calls_count_total%7Bfunction%3D%22myFunction%22%2Cmodule%3D%22myModule%22%7D%5B5m%5D%29%20%2A%20on%20%28instance%2C%20job%29%20group_left%28version%2C%20commit%29%20%28last_over_time%28build_info%5B1s%5D%29%20or%20on%20%28instance%2C%20job%29%20up%29%29&g0.tab=0",
3233
}
3334
assert result == urls
3435

src/autometrics/tracker/opentelemetry.py

+20
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
Meter,
55
Counter,
66
Histogram,
7+
UpDownCounter,
78
set_meter_provider,
89
)
910

@@ -21,6 +22,8 @@
2122
COUNTER_NAME,
2223
HISTOGRAM_DESCRIPTION,
2324
HISTOGRAM_NAME,
25+
BUILD_INFO_NAME,
26+
BUILD_INFO_DESCRIPTION,
2427
OBJECTIVE_NAME,
2528
OBJECTIVE_PERCENTILE,
2629
OBJECTIVE_LATENCY_THRESHOLD,
@@ -39,6 +42,7 @@ class OpenTelemetryTracker:
3942

4043
__counter_instance: Counter
4144
__histogram_instance: Histogram
45+
__up_down_counter_instance: UpDownCounter
4246

4347
def __init__(self):
4448
exporter = PrometheusMetricReader("")
@@ -60,6 +64,11 @@ def __init__(self):
6064
name=HISTOGRAM_NAME,
6165
description=HISTOGRAM_DESCRIPTION,
6266
)
67+
self.__up_down_counter_instance = meter.create_up_down_counter(
68+
name=BUILD_INFO_NAME,
69+
description=BUILD_INFO_DESCRIPTION,
70+
)
71+
self._has_set_build_info = False
6372

6473
def __count(
6574
self,
@@ -116,6 +125,17 @@ def __histogram(
116125
},
117126
)
118127

128+
def set_build_info(self, commit: str, version: str):
129+
if not self._has_set_build_info:
130+
self._has_set_build_info = True
131+
self.__up_down_counter_instance.add(
132+
1.0,
133+
attributes={
134+
"commit": commit,
135+
"version": version,
136+
},
137+
)
138+
119139
def finish(
120140
self,
121141
start_time: float,

src/autometrics/tracker/prometheus.py

+16-1
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,20 @@
11
import time
22
from typing import Optional
3-
from prometheus_client import Counter, Histogram
3+
from prometheus_client import Counter, Histogram, Gauge
44
from .tracker import Result
55

66
from ..constants import (
77
COUNTER_NAME_PROMETHEUS,
88
HISTOGRAM_NAME_PROMETHEUS,
9+
BUILD_INFO_NAME,
910
COUNTER_DESCRIPTION,
1011
HISTOGRAM_DESCRIPTION,
12+
BUILD_INFO_DESCRIPTION,
1113
OBJECTIVE_NAME_PROMETHEUS,
1214
OBJECTIVE_PERCENTILE_PROMETHEUS,
1315
OBJECTIVE_LATENCY_THRESHOLD_PROMETHEUS,
16+
COMMIT_KEY,
17+
VERSION_KEY,
1418
)
1519
from ..objectives import Objective
1620

@@ -41,6 +45,12 @@ class PrometheusTracker:
4145
OBJECTIVE_LATENCY_THRESHOLD_PROMETHEUS,
4246
],
4347
)
48+
prom_gauge = Gauge(
49+
BUILD_INFO_NAME, BUILD_INFO_DESCRIPTION, [COMMIT_KEY, VERSION_KEY]
50+
)
51+
52+
def __init__(self) -> None:
53+
self._has_set_build_info = False
4454

4555
def _count(
4656
self,
@@ -93,6 +103,11 @@ def _histogram(
93103
threshold,
94104
).observe(duration)
95105

106+
def set_build_info(self, commit: str, version: str):
107+
if not self._has_set_build_info:
108+
self._has_set_build_info = True
109+
self.prom_gauge.labels(commit, version).set(1)
110+
96111
# def start(self, function: str = None, module: str = None):
97112
# """Start tracking metrics for a function call."""
98113
# pass

src/autometrics/tracker/test_tracker.py

+56-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,10 @@
1+
from prometheus_client.exposition import generate_latest
2+
import pytest
3+
14
from .opentelemetry import OpenTelemetryTracker
25
from .prometheus import PrometheusTracker
36

4-
from .tracker import default_tracker
7+
from .tracker import default_tracker, init_tracker, TrackerType
58

69

710
def test_default_tracker(monkeypatch):
@@ -22,3 +25,55 @@ def test_default_tracker(monkeypatch):
2225
monkeypatch.setenv("AUTOMETRICS_TRACKER", "something_else")
2326
tracker = default_tracker()
2427
assert isinstance(tracker, OpenTelemetryTracker)
28+
29+
30+
def test_init_prometheus_tracker_set_build_info(monkeypatch):
31+
"""Test that init_tracker (for a Prometheus tracker) calls set_build_info using env vars."""
32+
33+
commit = "d6abce3"
34+
version = "1.0.1"
35+
36+
monkeypatch.setenv("AUTOMETRICS_COMMIT", commit)
37+
monkeypatch.setenv("AUTOMETRICS_VERSION", version)
38+
39+
prom_tracker = init_tracker(TrackerType.PROMETHEUS)
40+
assert isinstance(prom_tracker, PrometheusTracker)
41+
42+
blob = generate_latest()
43+
assert blob is not None
44+
data = blob.decode("utf-8")
45+
46+
prom_build_info = f"""build_info{{commit="{commit}",version="{version}"}} 1.0"""
47+
assert prom_build_info in data
48+
49+
monkeypatch.delenv("AUTOMETRICS_VERSION", raising=False)
50+
monkeypatch.delenv("AUTOMETRICS_COMMIT", raising=False)
51+
52+
53+
def test_init_otel_tracker_set_build_info(monkeypatch):
54+
"""
55+
Test that init_tracker (for an OTEL tracker) calls set_build_info using env vars.
56+
Note that the OTEL collector translates metrics to Prometheus.
57+
"""
58+
pytest.skip(
59+
"Skipping test because OTEL collector does not create a gauge when it translates UpDownCounter to Prometheus"
60+
)
61+
62+
commit = "a29a178"
63+
version = "0.0.1"
64+
65+
monkeypatch.setenv("AUTOMETRICS_COMMIT", commit)
66+
monkeypatch.setenv("AUTOMETRICS_VERSION", version)
67+
68+
otel_tracker = init_tracker(TrackerType.OPENTELEMETRY)
69+
assert isinstance(otel_tracker, OpenTelemetryTracker)
70+
71+
blob = generate_latest()
72+
assert blob is not None
73+
data = blob.decode("utf-8")
74+
75+
prom_build_info = f"""build_info{{commit="{commit}",version="{version}"}} 1.0"""
76+
assert prom_build_info in data
77+
78+
monkeypatch.delenv("AUTOMETRICS_VERSION", raising=False)
79+
monkeypatch.delenv("AUTOMETRICS_COMMIT", raising=False)

src/autometrics/tracker/tracker.py

+18-5
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@ class Result(Enum):
1616
class TrackMetrics(Protocol):
1717
"""Protocol for tracking metrics."""
1818

19+
def set_build_info(self, commit: str, version: str):
20+
"""Observe the build info. Should only be called once per tracker instance"""
21+
1922
def finish(
2023
self,
2124
start_time: float,
@@ -35,18 +38,28 @@ class TrackerType(Enum):
3538
PROMETHEUS = "prometheus"
3639

3740

38-
def create_tracker(tracker_type: TrackerType) -> TrackMetrics:
41+
def init_tracker(tracker_type: TrackerType) -> TrackMetrics:
3942
"""Create a tracker"""
43+
44+
tracker_instance: TrackMetrics
4045
if tracker_type == TrackerType.OPENTELEMETRY:
4146
# pylint: disable=import-outside-toplevel
4247
from .opentelemetry import OpenTelemetryTracker
4348

44-
return OpenTelemetryTracker()
49+
tracker_instance = OpenTelemetryTracker()
4550
elif tracker_type == TrackerType.PROMETHEUS:
4651
# pylint: disable=import-outside-toplevel
4752
from .prometheus import PrometheusTracker
4853

49-
return PrometheusTracker()
54+
tracker_instance = PrometheusTracker()
55+
56+
# NOTE - Only set the build info when the tracker is initialized
57+
tracker_instance.set_build_info(
58+
commit=os.getenv("AUTOMETRICS_COMMIT") or "",
59+
version=os.getenv("AUTOMETRICS_VERSION") or "",
60+
)
61+
62+
return tracker_instance
5063

5164

5265
def get_tracker_type() -> TrackerType:
@@ -60,7 +73,7 @@ def get_tracker_type() -> TrackerType:
6073
def default_tracker():
6174
"""Setup the default tracker."""
6275
preferred_tracker = get_tracker_type()
63-
return create_tracker(preferred_tracker)
76+
return init_tracker(preferred_tracker)
6477

6578

6679
tracker: TrackMetrics = default_tracker()
@@ -74,4 +87,4 @@ def get_tracker() -> TrackMetrics:
7487
def set_tracker(tracker_type: TrackerType):
7588
"""Set the tracker type."""
7689
global tracker
77-
tracker = create_tracker(tracker_type)
90+
tracker = init_tracker(tracker_type)

0 commit comments

Comments
 (0)