Skip to content

Add gpbackup_exporter — a Prometheus exporter for gpbackup history database.#87

Merged
tuhaihe merged 13 commits into
apache:mainfrom
woblerr:add_gpbackup_exporter
May 12, 2026
Merged

Add gpbackup_exporter — a Prometheus exporter for gpbackup history database.#87
tuhaihe merged 13 commits into
apache:mainfrom
woblerr:add_gpbackup_exporter

Conversation

@woblerr
Copy link
Copy Markdown
Collaborator

@woblerr woblerr commented Apr 13, 2026

Added gpbackup_exporter - a Prometheus exporter for collecting metrics from the gpbackup history database (gpbackup_history.db).

It is based on the original gpbackup_exporter project and has been adapted for integration into the cloudberry-backup repository.

Motivation

gpbackup does not expose built-in Prometheus metrics. gpbackup_exporter fills this gap. This allows integrating backup health monitoring into existing Prometheus/Grafana stacks.

Features

gpbackup_exporter exposes the following Prometheus metrics:

Backup metrics:

  • gpbackup_backup_status — backup status (success / failure);
  • gpbackup_backup_deletion_status — backup deletion status;
  • gpbackup_backup_info — backup info (version, compression, plugin, etc.);
  • gpbackup_backup_duration_seconds — backup duration in seconds.

Last backup metrics:

  • gpbackup_backup_since_last_completion_seconds — seconds elapsed since the last completed backup per database and backup type.

Exporter self-metrics:

  • gpbackup_exporter_status — gpbackup exporter get data status;
  • gpbackup_exporter_build_info — information about gpbackup exporter.

Filtering flags:

  • --gpbackup.db-include / --gpbackup.db-exclude — limit collection to specific databases;
  • --gpbackup.backup-type — limit collection to a specific backup type;
  • --gpbackup.collect-deleted / --gpbackup.collect-failed — include deleted / failed backups;
  • --collect.depth — collect metrics only for backups not older than N days;
  • --web.config.file — TLS and basic authentication support via the Prometheus exporter toolkit.

Verification

Unit tests pass without errors:

$ make unit

[1776080594] TOC Suite - 45/45 specs ••••••••••••••••••••••••••••••••••••••••••••• SUCCESS! 2.217417ms PASS
[1776080594] testutils tests - 8/8 specs •••••••• SUCCESS! 552.083µs PASS
[1776080594] Textmsg Suite - 14/14 specs •••••••••••••• SUCCESS! 604.292µs PASS
[1776080594] Gpbckpconfig Suite - 47/47 specs ••••••••••••••••••••••••••••••••••••••••••••••• SUCCESS! 1.909583ms PASS
[1776080594] utils tests - 118/118 specs •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• SUCCESS! 103.433167ms PASS
[1776080594] restore tests - 118/118 specs •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• SUCCESS! 85.166959ms PASS
[1776080594] Cmd Suite - 14/14 specs •••••••••••••• SUCCESS! 3.561667ms PASS
[1776080594] Filepath Suite - 31/31 specs ••••••••••••••••••••••••••••••• SUCCESS! 2.903958ms PASS
[1776080594] Options Suite - 27/27 specs ••••••••••••••••••••••••••• SUCCESS! 6.135125ms PASS
[1776080594] Exporter Suite - 37/37 specs ••••••••••••••••••••••••••••••••••••• SUCCESS! 18.97375ms PASS
[1776080594] History Suite - 8/8 specs •••••••• SUCCESS! 15.766416ms PASS
[1776080594] Report Suite - 34/34 specs •••••••••••••••••••••••••••••••••• SUCCESS! 9.29725ms PASS
[1776080594] s3_plugin tests - 32/32 specs •••••••••••••••••••••••••••••••• SUCCESS! 4.222791ms PASS
[1776080594] backup tests - 585/586 specs •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••S••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••S••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••S••••••••••••••••••••••••••••••••S••••••••••••••••••••••P••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••S••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••S•• SUCCESS! 172.169042ms PASS

Ginkgo ran 14 suites in 10.961838459s
Test Suite Passed

Open Questions

  1. The exporter uses promslog for logging, which outputs in logfmt or json format — the standard for Prometheus exporters. Other tools in this repository use gplog from cloudberry-go-libs. Should gpbackup_exporter be migrated to gplog for consistency, or is it acceptable to keep the Prometheus logger?

  2. The exporter's Makefile targets use additional -ldflags (GIT_REVISION, GIT_BRANCH, BUILD_DATE) injected into github.com/prometheus/common/version. This is required by the Prometheus exporter convention to populate gpbackup_exporter_build_info metric. Other tools in the repo use a simpler single -X version=... flag pattern. Is this acceptable?

Related links

@woblerr woblerr requested review from robertmu and tuhaihe April 13, 2026 12:30
@tuhaihe tuhaihe requested review from leborchuk and ostinru April 24, 2026 08:53
Comment thread gpbackup_exporter.go Outdated
Comment thread exporter/gpbckp_exporter.go
Comment thread gpbackup_exporter.go
Comment thread exporter/README.md
Comment thread gpbackup_exporter.go
Copy link
Copy Markdown

@MisterRaindrop MisterRaindrop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding more reviewers.

Comment thread exporter/gpbckp_exporter.go Outdated
Comment thread exporter/gpbckp_exporter.go
Comment thread end_to_end/exporter_test.go Outdated
@liang8283
Copy link
Copy Markdown

Verification

Step Result
make BIN_DIR=/tmp/pr87-bin build ✅ all 6 binaries compile, no warnings
gpbackup_exporter --version ✅ exit 0, prints version=2.2.0, branch=add_gpbackup_exporter, revision=6a29090, goversion=go1.24.13, tags=gpbackup_exporter
gpbackup_exporter --help ✅ exit 0, all documented flags present
ginkgo -r exporter/ ✅ 37/37 specs pass in 0.09s (matches PR body claim)
live scrape vs real /data0/master/gpseg-1/gpbackup_history.db (8 prior metadata-only backups) ✅ 177-line /metrics, 8× each of gpbackup_backup_{status,info,duration_seconds,deletion_status}, 1× gpbackup_backup_since_last_completion_seconds, 1× gpbackup_exporter_status{database_name="perf_acl_test"} 1, 1× gpbackup_exporter_build_info populated. Durations and labels match gpbackman back up-info output.

@woblerr woblerr force-pushed the add_gpbackup_exporter branch 4 times, most recently from 3ae938b to de77f35 Compare April 29, 2026 21:14
@woblerr
Copy link
Copy Markdown
Collaborator Author

woblerr commented Apr 29, 2026

The description for gpbackup_exporter_status metric metric has been updated in de77f35 to match the current behaviour.

Copy link
Copy Markdown

@MisterRaindrop MisterRaindrop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

woblerr added 12 commits May 7, 2026 11:15
Introduce the Prometheus metrics exporter as a new standalone binary within the repository.

- Create gpbackup_exporter.go entry point with CLI flags and collection loop.
- Port core exporter logic to the new `exporter/` package.
- Adapt type system to use `*history.BackupConfig` directly, aligning with the cloudberry-backup architecture.
- Switch receiver methods to standalone gpbckpconfig helpers.
- Add Prometheus and Kingpin dependencies to `go.mod`.
- Update Makefile to support building, testing, and packaging.
- Port unit tests to Ginkgo/Gomega.
Add e2e test that runs  gpbackup commands and validates that gpbackup_exporter correctly reads history database and exposes metrics.

Also remove dead gpbackmanPath assignment from useOldBackupVersion in e2e tests.
And suppress noisy test logger output by writing to bytes.Buffer instead of os.Stdout in exporter unit tests.
SIGINT means "interrupt," signaling a user wants to stop the current operation. SIGTERM means "terminate," requesting a polite program shutdown. So' it's correct to consider as graceful shutdown
To prevent a panic with an error like "panic: http: invalid pattern",  exit in case of an empty endpoint.
Remove unused getDataSuccessStatus.

Clarify include/exclude handling:
* DB present in both dbInclude and dbExclude - warn and emit gpbackup_exporter_status=0.
* DB only in dbExclude - skip, emit no metrics.
* dbInclude empty or DB in dbInclude -  process and emit full backup metrics.

Add unit tests.
@tuhaihe tuhaihe force-pushed the add_gpbackup_exporter branch from de77f35 to 39a950b Compare May 7, 2026 03:15
@tuhaihe
Copy link
Copy Markdown
Member

tuhaihe commented May 11, 2026

Hi @MisterRaindrop @woblerr could you see this PR again? I saw some errors returned in the unit tests. Thanks!

This aligns the test with the recent behavior change in commit 1035fb8.
`OpenHistoryDB()` now fails fast on missing files during the open stage,
rather than failing later during the first SQL query.
@woblerr
Copy link
Copy Markdown
Collaborator Author

woblerr commented May 11, 2026

Failed test is fixed. Thanks

@tuhaihe tuhaihe merged commit 253159b into apache:main May 12, 2026
10 checks passed
@woblerr woblerr deleted the add_gpbackup_exporter branch May 12, 2026 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants