Skip to content

Commit dc534b7

Browse files
committed
feat: implement observability log alertgroups
1 parent 592425e commit dc534b7

File tree

15 files changed

+1860
-13
lines changed

15 files changed

+1860
-13
lines changed

docs/data-sources/observability_alertgroup.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@
33
page_title: "stackit_observability_alertgroup Data Source - stackit"
44
subcategory: ""
55
description: |-
6-
Observability alert group resource schema. Must have a region specified in the provider configuration.
6+
Observability alert group datasource schema. Used to create alerts based on metrics (Thanos). Must have a region specified in the provider configuration.
77
---
88

99
# stackit_observability_alertgroup (Data Source)
1010

11-
Observability alert group resource schema. Must have a `region` specified in the provider configuration.
11+
Observability alert group datasource schema. Used to create alerts based on metrics (Thanos). Must have a `region` specified in the provider configuration.
1212

1313
## Example Usage
1414

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
---
2+
# generated by https://github.com/hashicorp/terraform-plugin-docs
3+
page_title: "stackit_observability_logalertgroup Data Source - stackit"
4+
subcategory: ""
5+
description: |-
6+
Observability log alert group datasource schema. Used to create alerts based on logs (Loki). Must have a region specified in the provider configuration.
7+
---
8+
9+
# stackit_observability_logalertgroup (Data Source)
10+
11+
Observability log alert group datasource schema. Used to create alerts based on logs (Loki). Must have a `region` specified in the provider configuration.
12+
13+
## Example Usage
14+
15+
```terraform
16+
data "stackit_observability_logalertgroup" "example" {
17+
project_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
18+
instance_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
19+
name = "example-log-alert-group"
20+
}
21+
```
22+
23+
<!-- schema generated by tfplugindocs -->
24+
## Schema
25+
26+
### Required
27+
28+
- `instance_id` (String) Observability instance ID to which the log alert group is associated.
29+
- `name` (String) The name of the log alert group. Is the identifier and must be unique in the group.
30+
- `project_id` (String) STACKIT project ID to which the log alert group is associated.
31+
32+
### Read-Only
33+
34+
- `id` (String) Terraform's internal resource ID. It is structured as "`project_id`,`instance_id`,`name`".
35+
- `interval` (String) Specifies the frequency at which rules within the group are evaluated. The interval must be at least 60 seconds and defaults to 60 seconds if not set. Supported formats include hours, minutes, and seconds, either singly or in combination. Examples of valid formats are: '5h30m40s', '5h', '5h30m', '60m', and '60s'.
36+
- `rules` (Attributes List) (see [below for nested schema](#nestedatt--rules))
37+
38+
<a id="nestedatt--rules"></a>
39+
### Nested Schema for `rules`
40+
41+
Read-Only:
42+
43+
- `alert` (String) The name of the alert rule. Is the identifier and must be unique in the group.
44+
- `annotations` (Map of String) A map of key:value. Annotations to add or overwrite for each alert
45+
- `expression` (String) The LogQL expression to evaluate. Every evaluation cycle this is evaluated at the current time, and all resultant time series become pending/firing alerts.
46+
- `for` (String) Alerts are considered firing once they have been returned for this long. Alerts which have not yet fired for long enough are considered pending. Default is 0s
47+
- `labels` (Map of String) A map of key:value. Labels to add or overwrite for each alert

docs/guides/ske_log_alerts.md

+199
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
---
2+
page_title: "SKE Log Alerts with STACKIT Observability"
3+
---
4+
# SKE Log Alerts with STACKIT Observability
5+
6+
## Overview
7+
8+
This guide walks you through setting up log-based alerting in STACKIT Observability using Promtail to ship Kubernetes logs.
9+
10+
1. **Set Up Providers**
11+
12+
Begin by configuring the STACKIT and Kubernetes providers to connect to the STACKIT services.
13+
14+
```hcl
15+
provider "stackit" {
16+
region = "eu01"
17+
}
18+
19+
provider "kubernetes" {
20+
host = yamldecode(stackit_ske_kubeconfig.example.kube_config).clusters.0.cluster.server
21+
client_certificate = base64decode(yamldecode(stackit_ske_kubeconfig.example.kube_config).users.0.user.client-certificate-data)
22+
client_key = base64decode(yamldecode(stackit_ske_kubeconfig.example.kube_config).users.0.user.client-key-data)
23+
cluster_ca_certificate = base64decode(yamldecode(stackit_ske_kubeconfig.example.kube_config).clusters.0.cluster.certificate-authority-data)
24+
}
25+
26+
provider "helm" {
27+
kubernetes {
28+
host = yamldecode(stackit_ske_kubeconfig.example.kube_config).clusters.0.cluster.server
29+
client_certificate = base64decode(yamldecode(stackit_ske_kubeconfig.example.kube_config).users.0.user.client-certificate-data)
30+
client_key = base64decode(yamldecode(stackit_ske_kubeconfig.example.kube_config).users.0.user.client-key-data)
31+
cluster_ca_certificate = base64decode(yamldecode(stackit_ske_kubeconfig.example.kube_config).clusters.0.cluster.certificate-authority-data)
32+
}
33+
}
34+
```
35+
36+
2. **Create SKE Cluster and Kubeconfig Resource**
37+
38+
Set up a STACKIT SKE Cluster and generate the associated kubeconfig resource.
39+
40+
```hcl
41+
resource "stackit_ske_cluster" "example" {
42+
project_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
43+
name = "example"
44+
kubernetes_version = "1.31"
45+
node_pools = [
46+
{
47+
name = "standard"
48+
machine_type = "c1.4"
49+
minimum = "3"
50+
maximum = "9"
51+
max_surge = "3"
52+
availability_zones = ["eu01-1", "eu01-2", "eu01-3"]
53+
os_version_min = "4081.2.1"
54+
os_name = "flatcar"
55+
volume_size = 32
56+
volume_type = "storage_premium_perf6"
57+
}
58+
]
59+
maintenance = {
60+
enable_kubernetes_version_updates = true
61+
enable_machine_image_version_updates = true
62+
start = "01:00:00Z"
63+
end = "02:00:00Z"
64+
}
65+
}
66+
67+
resource "stackit_ske_kubeconfig" "example" {
68+
project_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
69+
cluster_name = stackit_ske_cluster.example.name
70+
refresh = true
71+
}
72+
```
73+
74+
3. **Create Observability Instance and Credentials**
75+
76+
Establish a STACKIT Observability instance and its credentials to handle alerts.
77+
78+
```hcl
79+
locals {
80+
alert_config = {
81+
route = {
82+
receiver = "EmailStackit",
83+
repeat_interval = "1m",
84+
continue = true
85+
}
86+
receivers = [
87+
{
88+
name = "EmailStackit",
89+
email_configs = [
90+
{
91+
to = "<email>"
92+
}
93+
]
94+
}
95+
]
96+
}
97+
}
98+
99+
resource "stackit_observability_instance" "example" {
100+
project_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
101+
name = "example"
102+
plan_name = "Observability-Large-EU01"
103+
alert_config = local.alert_config
104+
}
105+
106+
resource "stackit_observability_credential" "example" {
107+
project_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
108+
instance_id = stackit_observability_instance.example.instance_id
109+
}
110+
```
111+
112+
4. **Install Promtail**
113+
114+
Deploy Promtail via Helm to collect logs and forward them to the STACKIT Observability Loki endpoint.
115+
116+
```hcl
117+
resource "helm_release" "promtail" {
118+
name = "promtail"
119+
repository = "https://grafana.github.io/helm-charts"
120+
chart = "promtail"
121+
namespace = kubernetes_namespace.monitoring.metadata.0.name
122+
version = "6.16.4"
123+
124+
values = [
125+
<<-EOF
126+
config:
127+
clients:
128+
# Loki push url is available in the dashboard
129+
- url: "https://${stackit_observability_credential.example.username}:${stackit_observability_credential.example.password}@<your-loki-push-url>/instances/${stackit_observability_instance.example.instance_id}/loki/api/v1/push"
130+
EOF
131+
]
132+
}
133+
```
134+
135+
5. **Create Alert Group**
136+
137+
Create a log alert that triggers when a specific pod logs an error message.
138+
139+
```hcl
140+
resource "stackit_observability_logalertgroup" "example" {
141+
project_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
142+
instance_id = stackit_observability_instance.example.instance_id
143+
name = "TestLogAlertGroup"
144+
interval = "60m"
145+
rules = [
146+
{
147+
alert = "SimplePodLogAlertCheck"
148+
expression = "sum(rate({namespace=\"example\", pod=\"logger\"} |= \"Simulated error message\" [1m])) > 0"
149+
for = "60s"
150+
labels = {
151+
severity = "critical"
152+
},
153+
annotations = {
154+
summary : "Test Log Alert is working"
155+
description : "Test Log Alert"
156+
},
157+
},
158+
]
159+
}
160+
```
161+
162+
6. **Deploy Test Pod**
163+
164+
Launch a pod that emits simulated error logs. This should trigger the alert if everything is set up correctly.
165+
166+
```hcl
167+
resource "kubernetes_namespace" "example" {
168+
metadata {
169+
name = "example"
170+
}
171+
}
172+
173+
resource "kubernetes_pod" "logger" {
174+
metadata {
175+
name = "logger"
176+
namespace = kubernetes_namespace.example.metadata[0].name
177+
labels = {
178+
app = "logger"
179+
}
180+
}
181+
182+
spec {
183+
container {
184+
name = "logger"
185+
image = "bash"
186+
command = [
187+
"bash",
188+
"-c",
189+
<<EOF
190+
while true; do
191+
sleep $(shuf -i 1-3 -n 1)
192+
echo "ERROR: $(date) - Simulated error message $(shuf -i 1-100 -n 1)" 1>&2
193+
done
194+
EOF
195+
]
196+
}
197+
}
198+
}
199+
```

docs/resources/observability_alertgroup.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@
33
page_title: "stackit_observability_alertgroup Resource - stackit"
44
subcategory: ""
55
description: |-
6-
Observability alert group resource schema. Must have a region specified in the provider configuration.
6+
Observability alert group resource schema. Used to create alerts based on metrics (Thanos). Must have a region specified in the provider configuration.
77
---
88

99
# stackit_observability_alertgroup (Resource)
1010

11-
Observability alert group resource schema. Must have a `region` specified in the provider configuration.
11+
Observability alert group resource schema. Used to create alerts based on metrics (Thanos). Must have a `region` specified in the provider configuration.
1212

1313
## Example Usage
1414

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
---
2+
# generated by https://github.com/hashicorp/terraform-plugin-docs
3+
page_title: "stackit_observability_logalertgroup Resource - stackit"
4+
subcategory: ""
5+
description: |-
6+
Observability log alert group resource schema. Used to create alerts based on logs (Loki). Must have a region specified in the provider configuration.
7+
---
8+
9+
# stackit_observability_logalertgroup (Resource)
10+
11+
Observability log alert group resource schema. Used to create alerts based on logs (Loki). Must have a `region` specified in the provider configuration.
12+
13+
## Example Usage
14+
15+
```terraform
16+
resource "stackit_observability_logalertgroup" "example" {
17+
project_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
18+
instance_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
19+
name = "example-log-alert-group"
20+
interval = "60m"
21+
rules = [
22+
{
23+
alert = "example-log-alert-name"
24+
expression = "sum(rate({namespace=\"example\", pod=\"logger\"} |= \"Simulated error message\" [1m])) > 0"
25+
for = "60s"
26+
labels = {
27+
severity = "critical"
28+
},
29+
annotations = {
30+
summary : "example summary"
31+
description : "example description"
32+
}
33+
},
34+
{
35+
alert = "example-log-alert-name-2"
36+
expression = "sum(rate({namespace=\"example\", pod=\"logger\"} |= \"Another error message\" [1m])) > 0"
37+
for = "60s"
38+
labels = {
39+
severity = "critical"
40+
},
41+
annotations = {
42+
summary : "example summary"
43+
description : "example description"
44+
}
45+
},
46+
]
47+
}
48+
```
49+
50+
<!-- schema generated by tfplugindocs -->
51+
## Schema
52+
53+
### Required
54+
55+
- `instance_id` (String) Observability instance ID to which the log alert group is associated.
56+
- `name` (String) The name of the log alert group. Is the identifier and must be unique in the group.
57+
- `project_id` (String) STACKIT project ID to which the log alert group is associated.
58+
- `rules` (Attributes List) Rules for the log alert group (see [below for nested schema](#nestedatt--rules))
59+
60+
### Optional
61+
62+
- `interval` (String) Specifies the frequency at which rules within the group are evaluated. The interval must be at least 60 seconds and defaults to 60 seconds if not set. Supported formats include hours, minutes, and seconds, either singly or in combination. Examples of valid formats are: '5h30m40s', '5h', '5h30m', '60m', and '60s'.
63+
64+
### Read-Only
65+
66+
- `id` (String) Terraform's internal resource ID. It is structured as "`project_id`,`instance_id`,`name`".
67+
68+
<a id="nestedatt--rules"></a>
69+
### Nested Schema for `rules`
70+
71+
Required:
72+
73+
- `alert` (String) The name of the alert rule. Is the identifier and must be unique in the group.
74+
- `expression` (String) The LogQL expression to evaluate. Every evaluation cycle this is evaluated at the current time, and all resultant time series become pending/firing alerts.
75+
76+
Optional:
77+
78+
- `annotations` (Map of String) A map of key:value. Annotations to add or overwrite for each alert
79+
- `for` (String) Alerts are considered firing once they have been returned for this long. Alerts which have not yet fired for long enough are considered pending. Default is 0s
80+
- `labels` (Map of String) A map of key:value. Labels to add or overwrite for each alert
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
data "stackit_observability_logalertgroup" "example" {
2+
project_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
3+
instance_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
4+
name = "example-log-alert-group"
5+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
resource "stackit_observability_logalertgroup" "example" {
2+
project_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
3+
instance_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
4+
name = "example-log-alert-group"
5+
interval = "60m"
6+
rules = [
7+
{
8+
alert = "example-log-alert-name"
9+
expression = "sum(rate({namespace=\"example\", pod=\"logger\"} |= \"Simulated error message\" [1m])) > 0"
10+
for = "60s"
11+
labels = {
12+
severity = "critical"
13+
},
14+
annotations = {
15+
summary : "example summary"
16+
description : "example description"
17+
}
18+
},
19+
{
20+
alert = "example-log-alert-name-2"
21+
expression = "sum(rate({namespace=\"example\", pod=\"logger\"} |= \"Another error message\" [1m])) > 0"
22+
for = "60s"
23+
labels = {
24+
severity = "critical"
25+
},
26+
annotations = {
27+
summary : "example summary"
28+
description : "example description"
29+
}
30+
},
31+
]
32+
}

0 commit comments

Comments
 (0)