Skip to content

Commit 09457f3

Browse files
matej-gsquat
authored andcommitted
init
0 parents  commit 09457f3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

84 files changed

+3539
-0
lines changed

1-globalview/courseBase.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
#!/usr/bin/env bash
2+
3+
docker pull quay.io/prometheus/prometheus:v2.16.0
4+
docker pull quay.io/thanos/thanos:v0.26.0

1-globalview/finish.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Summary
2+
3+
Congratulations! 🎉🎉🎉
4+
You completed our very first Thanos tutorial. Let's summarize what we learned:
5+
6+
* The most basic installation of Thanos with Sidecars and Querier allows global view for Prometheus queries.
7+
* Querier operates on `StoreAPI` gRPC API. It does not know if it's Prometheus, OpenTSDB, another Querier or any other storage, as long as API is implemented.
8+
* With Thanos you can (and it's recommended to do so!) run multi-replica Prometheus servers. Thanos Querier `--query.replica-label` flag controls this behaviour.
9+
* Sidecar allows to dynamically reload configuration for Prometheus and recording & alerting rules in Prometheus.
10+
11+
See next courses for other tutorials about different deployment models and more advanced features of Thanos!
12+
13+
### Feedback
14+
15+
Do you see any bug, typo in the tutorial or you have some feedback for us?
16+
Let us know on https://github.com/thanos-io/thanos or #thanos slack channel linked on https://thanos.io

1-globalview/index.json

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
{
2+
"title": "Intro: Global View and seamless HA for Prometheus",
3+
"description": "Learn how to easily transform Prometheus into centralized, highly available monitoring using Thanos.",
4+
"difficulty": "Beginner",
5+
"time": "10-15 Minutes",
6+
"details": {
7+
"steps": [
8+
{
9+
"title": "Initial Prometheus Setup",
10+
"text": "step1.md",
11+
"verify": "step1-verify.sh",
12+
"answer": "step1-answer.md"
13+
},
14+
{
15+
"title": "Thanos Sidecars",
16+
"text": "step2.md",
17+
"verify": "step2-verify.sh"
18+
},
19+
{
20+
"title": "Thanos Querier",
21+
"text": "step3.md",
22+
"verify": "step3-verify.sh"
23+
}
24+
],
25+
"intro": {
26+
"text": "intro.md",
27+
"courseData": "courseBase.sh",
28+
"credits": "https://thanos.io"
29+
},
30+
"finish": {
31+
"text": "finish.md",
32+
"credits": "test"
33+
}
34+
},
35+
"files": [
36+
"prometheus0_eu1.yml",
37+
"prometheus0_us1.yml",
38+
"prometheus1_us1.yml"
39+
],
40+
"environment": {
41+
"uilayout": "editor-terminal",
42+
"uisettings": "yaml",
43+
"showdashboard": true,
44+
"dashboards": [
45+
{"name": "Prometheus 0 EU1", "port": 9090},
46+
{"name": "Prometheus 0 US1", "port": 9091},
47+
{"name": "Prometheus 1 US1", "port": 9092},
48+
{"name": "Thanos Query", "port": 29090}
49+
]
50+
},
51+
"backend": {
52+
"imageid": "ubuntu"
53+
}
54+
}

1-globalview/intro.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Welcome to the Thanos introduction!
2+
3+
[Thanos](https://thanos.io) is a set of components that can be composed into a highly available metric system with unlimited storage capacity.
4+
It can be added seamlessly on top of existing Prometheus deployments.
5+
6+
Thanos provides a global query view, data backup, and historical data access as its core features.
7+
All three features can be run independently of each other. This allows you to have a subset of Thanos features ready for
8+
immediate benefit or testing, while also making it flexible for gradual adoption in more complex environments.
9+
10+
Thanos will work in cloud native environments like Kubernetes as well as more traditional ones. However, this course uses docker
11+
containers which will allow us to use pre-built docker images available [here](https://quay.io/repository/thanos/thanos)
12+
13+
This tutorial will take us from transforming vanilla Prometheus to basic Thanos deployment enabling:
14+
15+
* Reliable querying multiple Prometheus instances from single [Prometheus API endpoint](https://prometheus.io/docs/prometheus/latest/querying/api/#expression-queries).
16+
* Seamless handling of Highly Available Prometheus (multiple replicas)
17+
18+
Let's jump in! 🤓
19+
20+
### Feedback
21+
22+
Do you see any bug, typo in the tutorial or you have some feedback for us?
23+
Let us know on https://github.com/thanos-io/thanos or #thanos slack channel linked on https://thanos.io
24+
25+
### Contributed by:
26+
27+
* Bartek [@bwplotka](https://bwplotka.dev/)

1-globalview/step1-answer.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
## Answer
2+
3+
**How many series (metrics) we collect overall on all Prometheus instances we have?**
4+
5+
How to get this information? As you probably guess it's not straightforward. The current step would be:
6+
7+
* Query <a href="{{TRAFFIC_HOST1_9090}}/graph?g0.range_input=1h&g0.expr=sum(prometheus_tsdb_head_series)&g0.tab=1&g1.range_input=5m&g1.expr=prometheus_tsdb_head_series&g1.tab=0">Prometheus-0 EU1</a> for `prometheus_tsdb_head_series`
8+
* Query <a href="{{TRAFFIC_HOST1_9091}}/graph?g0.range_input=1h&g0.expr=sum(prometheus_tsdb_head_series)&g0.tab=1&g1.range_input=5m&g1.expr=prometheus_tsdb_head_series&g1.tab=0">Prometheus-0 US1</a> or <a href="{{TRAFFIC_HOST1_9092}}/graph?g0.range_input=1h&g0.expr=sum(prometheus_tsdb_head_series)&g0.tab=1&g1.range_input=5m&g1.expr=prometheus_tsdb_head_series&g1.tab=0">Prometheus-1 US1</a> for `prometheus_tsdb_head_series`
9+
Both holds the same data (number of series for each replica) so we just need to choose available one.
10+
* Sum both results manually.
11+
12+
As you can see this is not very convenient for both human as well as automation on top of metrics (e.g Alerting).
13+
14+
The feature we are missing here is called **Global View** and it might be necessary once you scale out Prometheus to multiple instances.
15+
16+
Great! We have now running 3 Prometheus instances.
17+
18+
In the next steps we will learn how we can install Thanos on top of our initial Prometheus setup to solve problems shown in the challenge.

1-globalview/step1-verify.sh

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#!/usr/bin/env bash
2+
3+
curl -s 172.17.0.1:9090/metrics >/dev/null || exit 1
4+
curl -s 172.17.0.1:9091/metrics >/dev/null || exit 1
5+
curl -s 172.17.0.1:9092/metrics >/dev/null || exit 1
6+
7+
echo '"done"'

1-globalview/step1.md

Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
# Step 1 - Start initial Prometheus servers
2+
3+
Thanos is meant to scale and extend vanilla Prometheus. This means that you can gradually, without disruption, deploy Thanos on top of your existing Prometheus setup.
4+
5+
Let's start our tutorial by spinning up three Prometheus servers. Why three?
6+
The real advantage of Thanos is when you need to scale out Prometheus from a single replica. Some reason for scale-out might be:
7+
8+
* Adding functional sharding because of metrics high cardinality
9+
* Need for high availability of Prometheus e.g: Rolling upgrades
10+
* Aggregating queries from multiple clusters
11+
12+
For this course, let's imagine the following situation:
13+
14+
![initial-case](https://docs.google.com/drawings/d/e/2PACX-1vQ5n5dAJSJPRXWA9INOViJJy9Ci6TUwlCrDv7_TtV9vE41rFOpg26V3jQv9gf1NQjVWSFyauG5XgzOW/pub?w=1061&h=604)
15+
16+
1. We have one Prometheus server in some `eu1` cluster.
17+
2. We have 2 replica Prometheus servers in some `us1` cluster that scrapes the same targets.
18+
19+
Let's start this initial Prometheus setup for now.
20+
21+
## Prometheus Configuration Files
22+
23+
Now, we will prepare configuration files for all Prometheus instances.
24+
25+
Click on the box and it will get copied
26+
27+
Switch on to the Editor tab and make a `prometheus0_eu1.yml` file and paste the above code in it.
28+
29+
First, for the EU Prometheus server that scrapes itself:
30+
31+
```
32+
global:
33+
scrape_interval: 15s
34+
evaluation_interval: 15s
35+
external_labels:
36+
cluster: eu1
37+
replica: 0
38+
39+
scrape_configs:
40+
- job_name: 'prometheus'
41+
static_configs:
42+
- targets: ['172.17.0.1:9090']
43+
```{{copy}}
44+
45+
46+
For the second cluster we set two replicas:
47+
48+
Make a `prometheus0_us1.yml` file and paste the above code in it.
49+
50+
```
51+
global:
52+
scrape_interval: 15s
53+
evaluation_interval: 15s
54+
external_labels:
55+
cluster: us1
56+
replica: 0
57+
58+
scrape_configs:
59+
- job_name: 'prometheus'
60+
static_configs:
61+
- targets: ['172.17.0.1:9091','172.17.0.1:9092']
62+
```{{copy}}
63+
64+
65+
Make a `prometheus1_us1.yml` file and paste the above code in it.
66+
67+
```
68+
global:
69+
scrape_interval: 15s
70+
evaluation_interval: 15s
71+
external_labels:
72+
cluster: us1
73+
replica: 1
74+
75+
scrape_configs:
76+
- job_name: 'prometheus'
77+
static_configs:
78+
- targets: ['172.17.0.1:9091','172.17.0.1:9092']
79+
```{{copy}}
80+
81+
**NOTE** : Every Prometheus instance must have a globally unique set of identifying labels. These labels are important as they represent certain "stream" of data (e.g in the form of TSDB blocks). Within those exact external labels, the compactions and downsampling are performed, Querier filters its store APIs, further sharding option, deduplication, and potential multi-tenancy capabilities are available. Those are not easy to edit retroactively, so it's important to provide a compatible set of external labels as in order to for Thanos to aggregate data across all the available instances.
82+
83+
## Starting Prometheus Instances
84+
85+
Let's now start three containers representing our three different Prometheus instances.
86+
87+
Please note the extra flags we're passing to Prometheus:
88+
89+
* `--web.enable-admin-api` allows Thanos Sidecar to get metadata from Prometheus like `external labels`.
90+
* `--web.enable-lifecycle` allows Thanos Sidecar to reload Prometheus configuration and rule files if used.
91+
92+
Execute following commands:
93+
94+
### Prepare "persistent volumes"
95+
96+
```
97+
mkdir -p prometheus0_eu1_data prometheus0_us1_data prometheus1_us1_data
98+
```{{execute}}
99+
100+
### Deploying "EU1"
101+
102+
```
103+
docker run -d --net=host --rm \
104+
-v $(pwd)/prometheus0_eu1.yml:/etc/prometheus/prometheus.yml \
105+
-v $(pwd)/prometheus0_eu1_data:/prometheus \
106+
-u root \
107+
--name prometheus-0-eu1 \
108+
quay.io/prometheus/prometheus:v2.14.0 \
109+
--config.file=/etc/prometheus/prometheus.yml \
110+
--storage.tsdb.path=/prometheus \
111+
--web.listen-address=:9090 \
112+
--web.external-url={{TRAFFIC_HOST1_9090}} \
113+
--web.enable-lifecycle \
114+
--web.enable-admin-api && echo "Prometheus EU1 started!"
115+
```{{execute}}
116+
117+
NOTE: We are using the latest Prometheus image so we can take profit from the latest remote read protocol.
118+
119+
### Deploying "US1"
120+
121+
```
122+
docker run -d --net=host --rm \
123+
-v $(pwd)/prometheus0_us1.yml:/etc/prometheus/prometheus.yml \
124+
-v $(pwd)/prometheus0_us1_data:/prometheus \
125+
-u root \
126+
--name prometheus-0-us1 \
127+
quay.io/prometheus/prometheus:v2.14.0 \
128+
--config.file=/etc/prometheus/prometheus.yml \
129+
--storage.tsdb.path=/prometheus \
130+
--web.listen-address=:9091 \
131+
--web.external-url={{TRAFFIC_HOST1_9091}} \
132+
--web.enable-lifecycle \
133+
--web.enable-admin-api && echo "Prometheus 0 US1 started!"
134+
```{{execute}}
135+
136+
and
137+
138+
```
139+
docker run -d --net=host --rm \
140+
-v $(pwd)/prometheus1_us1.yml:/etc/prometheus/prometheus.yml \
141+
-v $(pwd)/prometheus1_us1_data:/prometheus \
142+
-u root \
143+
--name prometheus-1-us1 \
144+
quay.io/prometheus/prometheus:v2.14.0 \
145+
--config.file=/etc/prometheus/prometheus.yml \
146+
--storage.tsdb.path=/prometheus \
147+
--web.listen-address=:9092 \
148+
--web.external-url={{TRAFFIC_HOST1_9092}} \
149+
--web.enable-lifecycle \
150+
--web.enable-admin-api && echo "Prometheus 1 US1 started!"
151+
```{{execute}}
152+
153+
## Setup Verification
154+
155+
Once started you should be able to reach all of those Prometheus instances:
156+
157+
* [Prometheus-0 EU1]({{TRAFFIC_HOST1_9090}}/)
158+
* [Prometheus-1 US1]({{TRAFFIC_HOST1_9091}}/)
159+
* [Prometheus-2 US1]({{TRAFFIC_HOST1_9092}}/)
160+
161+
## Additional info
162+
163+
Why would one need multiple Prometheus instances?
164+
165+
* High Availability (multiple replicas)
166+
* Scaling ingestion: Functional Sharding
167+
* Multi cluster/environment architecture
168+
169+
## Problem statement: Global view challenge
170+
171+
Let's try to play with this setup a bit. You are free to query any metrics, however, let's try to fetch some certain information from
172+
our multi-cluster setup: **How many series (metrics) we collect overall on all Prometheus instances we have?**
173+
174+
Tip: Look for `prometheus_tsdb_head_series` metric.
175+
176+
🕵️‍♂️
177+
178+
Try to get this information from the current setup!
179+
180+
To see the answer to this question click SHOW SOLUTION below.
181+
182+
## Next
183+
184+
Great! We have now running 3 Prometheus instances.
185+
186+
In the next steps, we will learn how we can install Thanos on top of our initial Prometheus setup to solve problems shown in the challenge.

1-globalview/step2-verify.sh

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#!/usr/bin/env bash
2+
3+
curl -s 172.17.0.1:9090/metrics >/dev/null || exit 1
4+
curl -s 172.17.0.1:9091/metrics >/dev/null || exit 1
5+
curl -s 172.17.0.1:9092/metrics >/dev/null || exit 1
6+
7+
curl -s 172.17.0.1:19090/metrics >/dev/null || exit 1
8+
curl -s 172.17.0.1:19091/metrics >/dev/null || exit 1
9+
curl -s 172.17.0.1:19092/metrics >/dev/null || exit 1
10+
11+
echo '"done"'

0 commit comments

Comments
 (0)