Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uyuni Health Check Tool Disconnected Solution #9322

Draft
wants to merge 54 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
9ac0bcc
Initial Uyuni Health Check tool commit
ycedres Oct 4, 2024
3fece3d
Refactor show stats and errors
ycedres Oct 10, 2024
5f681ed
Fix pagination of full error logs
ycedres Oct 10, 2024
ba22374
Avoid breaking if no errors are found
ycedres Oct 10, 2024
b1363b7
Fix error in exporter initialization
ycedres Oct 11, 2024
db1d93f
Parametrize default grafana time range on startup
ycedres Oct 18, 2024
831967c
Wait for loki to ingest all jobs
ycedres Oct 18, 2024
0d7ed47
Add dashboard for showing error logs
ycedres Oct 18, 2024
6acd46e
Add static metrics to display in Grafana
ycedres Oct 29, 2024
52a29a0
Wait for Promtail to finish parsing log files
ycedres Nov 6, 2024
c0d852b
Refactor static metrics in supportconfig exporter
ycedres Nov 7, 2024
8ae98d4
Clean up unused files and relocation
ycedres Nov 11, 2024
e72852d
Fix container startup
m-czernek Nov 28, 2024
0fc710b
Update exporter to serve static data
m-czernek Dec 5, 2024
2e2977e
Modify the Grafana dashboard to display new static data
m-czernek Dec 5, 2024
cd06e22
Fix main codepath and style
m-czernek Dec 5, 2024
d59d329
Document a way to execute health check without hacking pythonpath
m-czernek Dec 6, 2024
2fbed83
Upgrade promtail to fix memory leak
m-czernek Dec 9, 2024
07b0457
Expose CPU count property
m-czernek Dec 10, 2024
c2f7e23
Include first alert
m-czernek Dec 10, 2024
9b1e570
Add Salt-perf alerts
m-czernek Dec 12, 2024
9e3d682
Parse journalctl in promtail
m-czernek Dec 12, 2024
fa07385
Add info about memory and fs layout
m-czernek Jan 3, 2025
c62bf57
Add RAM table and alerts, display disk layout
m-czernek Jan 8, 2025
4eb24c6
Add further alerts
m-czernek Jan 9, 2025
6046b31
Provide additional alerts
m-czernek Jan 13, 2025
b93ad14
provide more alerts and parse reposync logs
m-czernek Jan 13, 2025
abaaa5b
Add alerts when disk mount is out of space or has insufficient size
m-czernek Jan 20, 2025
ecc3e8c
Refactor the config usage
m-czernek Jan 24, 2025
d5adc88
[health-check] Code style and flow refactor
m-czernek Jan 30, 2025
1bdc8b6
[health-check] Apply linting and formatting rules
m-czernek Feb 6, 2025
d19045e
Deactivate ingest complete checks for Promtail
ycedres Feb 14, 2025
596f436
Merge pull request #9742 from ycedres/promtail-container-adjustments
ycedres Feb 14, 2025
1eb2278
Rename and move health check to python
m-czernek Feb 18, 2025
650cfee
Remove python 3.6 incompatible pattern matching
m-czernek Feb 18, 2025
77bb8d4
Merge pull request #9799 from m-czernek/move-health-check
ycedres Feb 20, 2025
c335293
Rename scripts
m-czernek Feb 27, 2025
5642cc0
Merge pull request #9862 from m-czernek/health-check-rename-scripts
ycedres Feb 27, 2025
f4ed364
Remove old unused dashboard for live server
meaksh Feb 28, 2025
58c2729
Fix issue building project after health check renaming
meaksh Feb 28, 2025
9dacefd
Trigger alerts immediately without pending state
meaksh Feb 28, 2025
f71164e
Dashboard: adjust panels to fit windows size
meaksh Feb 28, 2025
db96a62
Allow supportconfig_exporter to handle concurrent HTTP requests
meaksh Feb 28, 2025
aa737e5
Fix some cosmetic issues and display grafana URL when finished
meaksh Feb 28, 2025
61de837
Rename commands to start/stop and fix issue with date parameters
meaksh Feb 28, 2025
94ed04e
exporter: prevent uncontrolled growth of JSON
meaksh Feb 28, 2025
096af6e
Fix problem writting config file
meaksh Feb 28, 2025
f601d37
Do cleanup on Dockerfiles. Remove unused
meaksh Feb 28, 2025
65b5954
Use opensuse/grafana:11.5.1 image for Grafana
meaksh Feb 28, 2025
6c3e875
Add more panels to supportconfig dashboard
meaksh Feb 28, 2025
3a22b41
Rename 'Uyuni' from panels and remove date interval preset
meaksh Mar 3, 2025
6d7cafe
Rename health check CLI to mgr-health-check
meaksh Mar 3, 2025
bd7960a
Align unit type for between memory panels
meaksh Mar 3, 2025
34bd9f1
Merge pull request #9871 from uyuni-project/health-check-skeleton-ext…
meaksh Mar 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions health-check/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
build
dist
.eggs
*.egg-info
logcli-linux-amd64
promtail-linux-amd64
__pycache__
**/config/exporter/config.yaml
**/config/promtail/config.yaml
**/config/grafana/dashboards/supportconfig_with_logs.json

.vscode/
41 changes: 41 additions & 0 deletions health-check/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
### uyuni-health-check

A tool providing dashboard, metrics and logs from an Uyuni server supportconfig to visualise its health status.

## Requirements

* `python3`
* `podman`

## Building and installing

Install the tool locally into a virtual environment:

```
python3 -m venv venv
. venv/bin/activate
pip install .
```

## Getting started

This tool builds and deploys the necessary containers to scrape some metrics and logs from an Uyuni server supportconfig directory.
Execute the `run` phase of the tool as such:

```
uyuni-health-check -s ~/path/to/supportconfig run --logs --from_datetime=2024-01-01T00:00:00Z --to_datetime=2024-06-01T20:00:00Z
```

This will create and start the following containers locally:

- uyuni-health-exporter (port `9000`)
- grafana (port `3000`)
- loki (port `9100`)
- promtail (port `9081`)

After you start the containers, visit `localhost:3000` and select the `Supportconfig with Logs` dashboard.
If necessary, the default username/password for Grafana is `admin:admin`.

## Security notes
After running this tool, and until containers are destroyed, the Grafana Dashboards (and other metrics) are exposing metrics and logs messages that may contain sensitive data and information to any non-root user in the system or to anyone that have access to this host in the network.

43 changes: 43 additions & 0 deletions health-check/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# SPDX-FileCopyrightText: 2023 SUSE LLC
#
# SPDX-License-Identifier: Apache-2.0

[project]
name = "uyuni-health-check"
description = "Show Uyuni server health metrics and logs"
readme = "README.md"
requires-python = ">=3.6"
classifiers = [
"Programming Language :: Python :: 3",
"Operating System :: OS Independent",
]
dependencies = [
"Click",
"rich",
"requests",
"Jinja2",
"PyYAML",
]
maintainers = [
{name = "Pablo Suárez Hernández", email = "[email protected]"},
]
dynamic = ["version"]

[project.urls]
homepage = "https://github.com/uyuni-project/uyuni"
tracker = "https://github.com/uyuni-project/uyuni/issues"

[project.scripts]
uyuni-health-check = "uyuni_health_check.main:main"

[tool.setuptools]
package-dir = {"" = "src"}

[build-system]
requires = [
"setuptools>=42",
"setuptools_scm[toml]",
"wheel",
]
build-backend = "setuptools.build_meta"

Empty file.
11 changes: 11 additions & 0 deletions health-check/src/uyuni_health_check/config.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[podman]
network_name = health-check-network

[loki]
loki_container_name = uyuni_health_check_loki
loki_port = 3100
jobs = cobbler,postgresql,rhn,apache

[logcli]
logcli_container_name = uyuni_health_check_logcli
logcli_image_name = logcli
92 changes: 92 additions & 0 deletions health-check/src/uyuni_health_check/config/grafana/alerts.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
apiVersion: 1
groups:
- orgId: 1
name: alert-eval
folder: alerts
interval: 1m
rules:
- uid: ce6i8dhdhj400e
title: More worker threads than CPUs
condition: C
data:
- refId: A
relativeTimeRange:
from: 600
to: 0
datasourceUid: infinity
model:
columns: []
datasource:
type: yesoreyeram-infinity-datasource
uid: infinity
filters: []
format: table
global_query_id: ""
hide: false
intervalMs: 1000
maxDataPoints: 43200
parser: backend
refId: A
root_selector: salt_configuration[name="worker_threads"].value
source: url
type: json
url: http://uyuni_health_check_supportconfig-exporter:9000/metrics.json
url_options:
data: ""
method: GET
- refId: B
relativeTimeRange:
from: 600
to: 0
datasourceUid: infinity
model:
columns: []
datasource:
type: yesoreyeram-infinity-datasource
uid: infinity
filters: []
format: table
global_query_id: ""
hide: false
intervalMs: 1000
maxDataPoints: 43200
parser: backend
refId: B
root_selector: hw[name="cpu_count"].value
source: url
type: json
url: http://uyuni_health_check_supportconfig-exporter:9000/metrics.json
url_options:
data: ""
method: GET
- refId: C
datasourceUid: __expr__
model:
conditions:
- evaluator:
params:
- 0
- 0
type: gt
operator:
type: and
query:
params: []
reducer:
params: []
type: avg
type: query
datasource:
name: Expression
type: __expr__
uid: __expr__
expression: $B - $A < 0
hide: false
intervalMs: 1000
maxDataPoints: 43200
refId: C
type: math
noDataState: NoData
execErrState: Error
for: 1m
isPaused: false
16 changes: 16 additions & 0 deletions health-check/src/uyuni_health_check/config/grafana/dashboard.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# SPDX-FileCopyrightText: 2023 SUSE LLC
#
# SPDX-License-Identifier: Apache-2.0

apiVersion: 1

providers:
- name: "Dashboard provider"
orgId: 1
type: file
disableDeletion: false
updateIntervalSeconds: 10
allowUiUpdates: false
options:
path: /var/lib/grafana/dashboards
foldersFromFilesStructure: true
Loading
Loading