Skip to content

Commit 37a6677

Browse files
reggeenrqu1queee
authored andcommitted
enabled metrics-collector for ICL
1 parent a148d1f commit 37a6677

13 files changed

+665
-8
lines changed

metrics-collector/README.md

Lines changed: 79 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
Code Engine job that demonstrates how to collect resource metrics (CPU, memory and disk usage) of running Code Engine apps, jobs, and builds
44

5+
![Dashboard overview](./images/icl-dashboard-overview.png)
6+
57
## Installation
68

79
### Capture metrics every n seconds
@@ -17,11 +19,11 @@ $ ibmcloud ce job create \
1719
--wait
1820
```
1921

20-
* Submit a daemon job that collects metrics in an endless loop. The daemon job queries the Metrics API every 10 seconds
22+
* Submit a daemon job that collects metrics in an endless loop. The daemon job queries the Metrics API every 30 seconds
2123
```
2224
$ ibmcloud ce jobrun submit \
2325
--job metrics-collector \
24-
--env INTERVAL=10
26+
--env INTERVAL=30
2527
```
2628

2729

@@ -57,6 +59,81 @@ One can use the environment variable `COLLECT_DISKUSAGE=true` to also collect th
5759

5860
Once your IBM Cloud Code Engine project has detected a corresponding IBM Cloud Logs instance, which is configured to receive platform logs, you can consume the resource metrics in IBM Cloud Logs. Use the filter `metric:instance-resources` to filter for log lines that print resource metrics for each detected IBM Cloud Code Engine instance that is running in a project.
5961

62+
### Custom dashboard
63+
64+
Follow the steps below to create a custom dashboard in your IBM Cloud Logs instance, to gain insights into resource consumption metrics.
65+
66+
![Dashboard overview](./images/icl-dashboard-overview.png)
67+
68+
**Setup instructions:**
69+
70+
* Navigate to the "Custom dashboards" view, hover of the "New" button, and click "Import dashboard"
71+
72+
![New dashboard](./images/icl-dashboard-new.png)
73+
74+
* In the "Import" modal, select the file [./setup/dashboard-code_engine_resource_consumption_metrics.json](./setup/dashboard-code_engine_resource_consumption_metrics.json) located in this repository, and click "Import"
75+
76+
![Import modal](./images/icl-dashboard-import.png)
77+
78+
* Confirm the import by clicking "Import" again
79+
80+
![Import confirmation](./images/icl-dashboard-import-confirm.png)
81+
82+
83+
### Logs view
84+
85+
Follow the steps below to create a Logs view in your IBM Cloud Logs instance, that allows you to drill into individual instance-resources log lines.
86+
87+
![Logs overview](./images/icl-logs-view-overview.png)
88+
89+
**Setup instructions:**
90+
91+
* Filter only log lines related collected istio-proxy logs, by filtering for the following query
92+
```
93+
app:"codeengine" AND message.metric:"instance-resources"
94+
```
95+
96+
![Query](./images/icl-logs-view-query.png)
97+
98+
* In the left bar, click "Add Filter" and add the following filters
99+
* `Application`
100+
* `App`
101+
* `Label.Project`
102+
* `Message.Component_name`
103+
104+
![Filters](./images/icl-logs-view-filters.png)
105+
106+
* In the top-right corner, click on "Columns" and configure the following columns:
107+
* `Timestamp`
108+
* `label.Project`
109+
* `message.component_type`
110+
* `message.component_name`
111+
* `message.message`
112+
* `Text`
113+
114+
![Columns](./images/icl-logs-view-columns.png)
115+
116+
* Once applied adjust the column widths appropriately
117+
118+
* In the top-right corner, select `1-line` as view mode
119+
120+
![View](./images/icl-logs-view-mode.png)
121+
122+
* In the graph title it says "**Count** all grouped by **Severity**". Click on `Severity` and select `message.component_name` instead. Furthermore, select `Max` as aggregation metric and choose `message.memory.usage` as aggregation field
123+
124+
![Graph](./images/icl-logs-view-graph.png)
125+
126+
* Save the view
127+
128+
![Save](./images/icl-logs-view-save.png)
129+
130+
* Utilize the custom logs view to drill into HTTP requests
131+
132+
![Logs overview](./images/icl-logs-view-overview.png)
133+
134+
135+
## IBM Log Analysis setup (deprecated)
136+
60137
### Log lines
61138

62139
Along with a human readable message, like `Captured metrics of app instance 'load-generator-00001-deployment-677d5b7754-ktcf6': 3m vCPU, 109 MB memory, 50 MB ephemeral storage`, each log line passes specific resource utilization details in a structured way allowing to apply advanced filters on them.
Loading
Loading
24.2 KB
Loading
Loading
Loading
Loading
45.4 KB
Loading
Loading
Loading
54.5 KB
Loading

metrics-collector/main.go

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,12 @@ func main() {
3434
}
3535

3636
// If the 'INTERVAL' env var is set then sleep for that many seconds
37-
sleepDuration := 10
37+
sleepDuration := 30
3838
if t := os.Getenv("INTERVAL"); t != "" {
3939
sleepDuration, _ = strconv.Atoi(t)
40+
if sleepDuration < 30 {
41+
sleepDuration = 30
42+
}
4043
}
4144

4245
// In daemon mode, collect resource metrics in an endless loop
@@ -111,10 +114,10 @@ func collectInstanceMetrics() {
111114

112115
// fetches all pods
113116
pods := getAllPods(coreClientset, namespace, config)
114-
117+
115118
// fetch all pod metrics
116119
podMetrics := getAllPodMetrics(namespace, config)
117-
120+
118121
var wg sync.WaitGroup
119122

120123
for _, metric := range *podMetrics {
@@ -258,7 +261,7 @@ func getAllPods(coreClientset *kubernetes.Clientset, namespace string, config *r
258261

259262
// Helper function to retrieve all pods from the Kube API
260263
func obtainDiskUsage(coreClientset *kubernetes.Clientset, namespace string, pod string, container string, config *rest.Config) float64 {
261-
264+
262265
// per default, we do not collect disk space statistics
263266
if os.Getenv("COLLECT_DISKUSAGE") != "true" {
264267
return 0
@@ -304,12 +307,16 @@ func obtainDiskUsage(coreClientset *kubernetes.Clientset, namespace string, pod
304307

305308
// Render captured system error messages, in case the stdout stream did not receive any valid content
306309
if err != nil {
307-
fmt.Println("obtainDiskUsage of pod:" + pod + "/container:" + container + " failed with a stream err - " + err.Error() + " - stderr: '" + errBuf.String() + "'")
310+
if err.Error() == "Internal error occurred: failed calling webhook \"validating.webhook.pod-exec-auth-check.codeengine.cloud.ibm.com\": failed to call webhook: Post \"https://validating-webhook-serving.ibm-cfn-system.svc:443/validate/pod-exec?timeout=5s\": EOF" {
311+
// Do nothing and silently ignore this issue as it is most likely related to pod terminations
312+
} else {
313+
fmt.Println("obtainDiskUsage of pod:" + pod + "/container:" + container + " failed with a stream err - " + err.Error() + " - stderr: '" + errBuf.String() + "'")
314+
}
308315
}
309316

310317
return float64(0)
311318
}
312-
319+
313320
// Parse the output "4000 /" by splitting the words
314321
diskUsageOutput := strings.Fields(strings.TrimSuffix(diskUsageOutputStr, "\n"))
315322
if len(diskUsageOutput) > 2 {

0 commit comments

Comments
 (0)