Skip to content

Commit f896e3e

Browse files
committed
Add prometheus and alertmanager steps to 2.0 and 2.1 k8s tutorial
1 parent cf844c3 commit f896e3e

14 files changed

+559
-135
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
Despite CockroachDB's various [built-in safeguards against failure](high-availability.html), it is critical to actively monitor the overall health and performance of a cluster running in production and to create alerting rules that promptly send notifications when there are events that require investigation or intervention.
2+
3+
### Configure Prometheus
4+
5+
Every node of a CockroachDB cluster exports granular timeseries metrics formatted for easy integration with [Prometheus](https://prometheus.io/), an open source tool for storing, aggregating, and querying timeseries data. This section shows you how to orchestrate Prometheus as part of your Kubernetes cluster and pull these metrics into Prometheus for external monitoring.
6+
7+
This guidance is based on [CoreOS's Prometheus Operator](https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/getting-started.md), which allows a Prometheus instance to be managed using native Kubernetes concepts.
8+
9+
<section class="filter-content" markdown="1" data-scope="gke-hosted">
10+
{{site.data.alerts.callout_info}}
11+
Before starting, make sure the email address associated with your Google Cloud account is part of the `cluster-admin` RBAC group, as shown in [Step 1. Start Kubernetes](#step-1-start-kubernetes).
12+
{{site.data.alerts.end}}
13+
</section>
14+
15+
1. From your local workstation, edit the `cockroachdb` service to add the `prometheus: cockroachdb` label:
16+
17+
{% include copy-clipboard.html %}
18+
~~~ shell
19+
$ kubectl label svc cockroachdb prometheus=cockroachdb
20+
~~~
21+
22+
~~~
23+
service "cockroachdb" labeled
24+
~~~
25+
26+
This ensures that there is a prometheus job and monitoring data only for the `cockroachdb` service, not for the `cockroach-public` service.
27+
28+
2. Install [CoreOS's Prometheus Operator](https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.20/bundle.yaml):
29+
30+
{% include copy-clipboard.html %}
31+
~~~ shell
32+
$ kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.20/bundle.yaml
33+
~~~
34+
35+
~~~
36+
clusterrolebinding "prometheus-operator" created
37+
clusterrole "prometheus-operator" created
38+
serviceaccount "prometheus-operator" created
39+
deployment "prometheus-operator" created
40+
~~~
41+
42+
3. Confirm that the `prometheus-operator` has started:
43+
44+
{% include copy-clipboard.html %}
45+
~~~ shell
46+
$ kubectl get deploy prometheus-operator
47+
~~~
48+
49+
~~~
50+
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
51+
prometheus-operator 1 1 1 1 1m
52+
~~~
53+
54+
4. Use our [`prometheus.yaml`](https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/prometheus/prometheus.yaml) file to create the various objects necessary to run a Prometheus instance:
55+
56+
{% include copy-clipboard.html %}
57+
~~~ shell
58+
$ kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/prometheus/prometheus.yaml
59+
~~~
60+
61+
~~~
62+
clusterrole "prometheus" created
63+
clusterrolebinding "prometheus" created
64+
servicemonitor "cockroachdb" created
65+
prometheus "cockroachdb" created
66+
~~~
67+
68+
5. Access the Prometheus UI locally and verify that CockroachDB is feeding data into Prometheus:
69+
70+
1. Port-forward from your local machine to the pod running Prometheus:
71+
72+
{% include copy-clipboard.html %}
73+
~~~ shell
74+
$ kubectl port-forward prometheus-cockroachdb-0 9090
75+
~~~
76+
77+
2. Go to [http://localhost:9090](http://localhost:9090) in your browser.
78+
79+
3. To verify that each CockroachDB node is connected to Prometheus, go to **Status > Targets**. The screen should look like this:
80+
81+
<img src="{{ 'images/v2.1/kubernetes-prometheus-targets.png' | relative_url }}" alt="Prometheus targets" style="border:1px solid #eee;max-width:100%" />
82+
83+
4. To verify that data is being collected, go to **Graph**, enter the `sys_uptime` variable in the field, click **Execute**, and then click the **Graph** tab. The screen should like this:
84+
85+
<img src="{{ 'images/v2.1/kubernetes-prometheus-graph.png' | relative_url }}" alt="Prometheus graph" style="border:1px solid #eee;max-width:100%" />
86+
87+
{{site.data.alerts.callout_success}}
88+
Prometheus auto-completes CockroachDB time series metrics for you, but if you want to see a full listing, with descriptions, port-forward as described in {% if page.secure == true %}[Access the Admin UI](#step-6-access-the-admin-ui){% else %}[Access the Admin UI](#step-5-access-the-admin-ui){% endif %} and then point your browser to [http://localhost:8080/_status/vars](http://localhost:8080/_status/vars).
89+
90+
For more details on using the Prometheus UI, see their [official documentation](https://prometheus.io/docs/introduction/getting_started/).
91+
{{site.data.alerts.end}}
92+
93+
### Configure Alertmanager
94+
95+
Active monitoring helps you spot problems early, but it is also essential to send notifications when there are events that require investigation or intervention. This section shows you how to use [Alertmanager](https://prometheus.io/docs/alerting/alertmanager/) and CockroachDB's starter [alerting rules](https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/prometheus/alert-rules.yaml) to do this.
96+
97+
1. Download our <a href="https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/prometheus/alertmanager-config.yaml" download><code>alertmanager-config.yaml</code></a> configuration file.
98+
99+
2. Edit the `alertmanager-config.yaml` file to [specify the desired receivers for notifications](https://prometheus.io/docs/alerting/configuration/). Initially, the file contains a dummy web hook.
100+
101+
3. Add this configuration to the Kubernetes cluster as a secret, renaming it to `alertmanager.yaml` and labelling it to make it easier to find:
102+
103+
{% include copy-clipboard.html %}
104+
~~~ shell
105+
$ kubectl create secret generic alertmanager-cockroachdb --from-file=alertmanager.yaml=alertmanager-config.yaml
106+
~~~
107+
108+
~~~
109+
secret "alertmanager-cockroachdb" created
110+
~~~
111+
112+
{% include copy-clipboard.html %}
113+
~~~ shell
114+
$ kubectl label secret alertmanager-cockroachdb app=cockroachdb
115+
~~~
116+
117+
~~~
118+
secret "alertmanager-cockroachdb" labeled
119+
~~~
120+
121+
{{site.data.alerts.callout_danger}}
122+
The name of the secret, `alertmanager-cockroachdb`, must match the name used in the `altermanager.yaml` file. If they differ, the Alertmanager instance will start without configuration, and nothing will happen.
123+
{{site.data.alerts.end}}
124+
125+
4. Use our [`alertmanager.yaml`](https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/prometheus/alertmanager.yaml) file to create the various objects necessary to run an Alertmanager instance, including a ClusterIP service so that Prometheus can forward alerts:
126+
127+
{% include copy-clipboard.html %}
128+
~~~ shell
129+
$ kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/prometheus/alertmanager.yaml
130+
~~~
131+
132+
~~~
133+
alertmanager "cockroachdb" created
134+
service "alertmanager-cockroachdb" created
135+
~~~
136+
137+
5. Verify that Alertmanager is running:
138+
139+
1. Port-forward from your local machine to the pod running Alertmanager:
140+
141+
{% include copy-clipboard.html %}
142+
~~~ shell
143+
$ kubectl port-forward alertmanager-cockroachdb-0 9093
144+
~~~
145+
146+
2. Go to [http://localhost:9093](http://localhost:9093) in your browser. The screen should look like this:
147+
148+
<img src="{{ 'images/v2.1/kubernetes-alertmanager-home.png' | relative_url }}" alt="Alertmanager" style="border:1px solid #eee;max-width:100%" />
149+
150+
6. Ensure that the Alertmanagers are visible to Prometheus by opening [http://localhost:9090/status](http://localhost:9090/status). The screen should look like this:
151+
152+
<img src="{{ 'images/v2.1/kubernetes-prometheus-alertmanagers.png' | relative_url }}" alt="Alertmanager" style="border:1px solid #eee;max-width:100%" />
153+
154+
7. Add CockroachDB's starter [alerting rules](https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/prometheus/alert-rules.yaml):
155+
156+
{% include copy-clipboard.html %}
157+
~~~ shell
158+
$ kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/prometheus/alert-rules.yaml
159+
~~~
160+
161+
~~~
162+
prometheusrule "prometheus-cockroachdb-rules" created
163+
~~~
164+
165+
8. Ensure that the rules are visible to Prometheus by opening [http://localhost:9090/rules](http://localhost:9090/rules). The screen should look like this:
166+
167+
<img src="{{ 'images/v2.1/kubernetes-prometheus-alertrules.png' | relative_url }}" alt="Alertmanager" style="border:1px solid #eee;max-width:100%" />
168+
169+
9. Verify that the example alert is firing by opening [http://localhost:9090/alerts](http://localhost:9090/alerts). The screen should look like this:
170+
171+
<img src="{{ 'images/v2.1/kubernetes-prometheus-alerts.png' | relative_url }}" alt="Alertmanager" style="border:1px solid #eee;max-width:100%" />
172+
173+
10. To remove the example alert:
174+
175+
1. Use the `kubectl edit` command to open the rules for editing:
176+
177+
{% include copy-clipboard.html %}
178+
~~~ shell
179+
$ kubectl edit prometheusrules prometheus-cockroachdb-rules
180+
~~~
181+
182+
2. Remove the `dummy.rules` block and save the file:
183+
184+
~~~
185+
- name: rules/dummy.rules
186+
rules:
187+
- alert: TestAlertManager
188+
expr: vector(1)
189+
~~~

_includes/v2.0/orchestration/start-kubernetes.md

+9-10
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Step 1. Choose your deployment environment
1+
## Step 1. Start Kubernetes
22

33
Choose whether you want to orchestrate CockroachDB with Kubernetes using the hosted Google Kubernetes Engine (GKE) service or manually on Google Compute Engine (GCE) or AWS. The instructions below will change slightly depending on your choice.
44

@@ -8,15 +8,15 @@ Choose whether you want to orchestrate CockroachDB with Kubernetes using the hos
88
<button class="filter-button" data-scope="aws-manual">Manual AWS</button>
99
</div>
1010

11-
## Step 2. Start Kubernetes
12-
1311
<section class="filter-content" markdown="1" data-scope="gke-hosted">
1412

1513
1. Complete the **Before You Begin** steps described in the [Google Kubernetes Engine Quickstart](https://cloud.google.com/kubernetes-engine/docs/quickstart) documentation.
1614

1715
This includes installing `gcloud`, which is used to create and delete Kubernetes Engine clusters, and `kubectl`, which is the command-line tool used to manage Kubernetes from your workstation.
1816

19-
{{site.data.alerts.callout_success}}The documentation offers the choice of using Google's Cloud Shell product or using a local shell on your machine. Choose to use a local shell if you want to be able to view the CockroachDB Admin UI using the steps in this guide.{{site.data.alerts.end}}
17+
{{site.data.alerts.callout_success}}
18+
The documentation offers the choice of using Google's Cloud Shell product or using a local shell on your machine. Choose to use a local shell if you want to be able to view the CockroachDB Admin UI using the steps in this guide.
19+
{{site.data.alerts.end}}
2020

2121
2. From your local workstation, start the Kubernetes cluster:
2222

@@ -33,8 +33,6 @@ Choose whether you want to orchestrate CockroachDB with Kubernetes using the hos
3333

3434
The process can take a few minutes, so don't move on to the next step until you see a `Creating cluster cockroachdb...done` message and details about your cluster.
3535
36-
{% if page.secure == true %}
37-
3836
3. Get the email address associated with your Google Cloud account:
3937
4038
{% include copy-clipboard.html %}
@@ -46,20 +44,21 @@ Choose whether you want to orchestrate CockroachDB with Kubernetes using the hos
4644
Account: [[email protected]]
4745
~~~
4846
47+
{{site.data.alerts.callout_danger}}
48+
This command returns your email address in all lowercase. However, in the next step, you must enter the address using the accurate capitalization. For example, if your address is [email protected], you must use [email protected] and not [email protected].
49+
{{site.data.alerts.end}}
50+
4951
4. [Create the RBAC roles](https://cloud.google.com/kubernetes-engine/docs/how-to/role-based-access-control#prerequisites_for_using_role-based_access_control) CockroachDB needs for running on GKE, using the address from the previous step:
5052
5153
{% include copy-clipboard.html %}
5254
~~~ shell
53-
$ kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=<[email protected]>
55+
$ kubectl create clusterrolebinding $USER-cluster-admin-binding --clusterrole=cluster-admin --user=<[email protected]>
5456
~~~
5557
5658
~~~
5759
clusterrolebinding "cluster-admin-binding" created
5860
~~~
5961
60-
61-
{% endif %}
62-
6362
</section>
6463
6564
<section class="filter-content" markdown="1" data-scope="gce-manual">

0 commit comments

Comments
 (0)