[management] record pat usage metrics #4888

pascal-fischer · 2025-12-01T17:06:44Z

Describe your changes

This PR records a histogram of how often different pats are used.

Issue ticket number and link

Stack

Checklist

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

I added/updated documentation for this change
Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

Release Notes

New Features
- Added PAT usage tracking and metrics reporting capabilities for enhanced observability
- Usage metrics are now collected and reported periodically to monitor token activity patterns

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-01T17:06:55Z

Walkthrough

This PR introduces PAT (Personal Access Token) usage tracking to the authentication middleware using OpenTelemetry metrics. A new PATUsageTracker component accumulates per-token usage counts and periodically emits metrics. The authentication middleware is wired to invoke usage tracking before rate-limiting checks.

Changes

Cohort / File(s)	Summary
Constructor and Handler Updates `management/server/http/handler.go`, `management/server/http/middleware/auth_middleware.go`, `management/server/http/middleware/auth_middleware_test.go`	Updated NewAuthMiddleware call site to pass `appMetrics.GetMeter()` and extended constructor signature to accept optional `metric.Meter` parameter. When meter is provided, PATUsageTracker is initialized. Tests updated to pass nil meter argument.
PAT Usage Tracker Implementation `management/server/http/middleware/pat_usage_tracker.go`	New file introducing concurrent-safe PATUsageTracker that maintains per-token usage counters in memory. Background loop emits usage distribution metrics via OpenTelemetry Int64Histogram every minute. Includes graceful shutdown via Stop() method.
Middleware Integration `management/server/http/middleware/auth_middleware.go`	Added patUsageTracker field to AuthMiddleware struct. IncrementUsage(token) is invoked in checkPATFromRequest prior to rate-limiting checks when tracker is available.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant Auth as AuthMiddleware
    participant Tracker as PATUsageTracker
    participant OTEL as OpenTelemetry<br/>Metrics

    Client->>Auth: HTTP Request with PAT
    Auth->>Auth: Extract PAT from request
    alt Tracker Available
        Auth->>Tracker: IncrementUsage(token)
        Tracker->>Tracker: Increment in-memory counter
    end
    Auth->>Auth: Rate-limiting checks
    Auth->>Client: Response

    par Background Process
        Tracker->>Tracker: Every 60s: reportLoop triggered
        Tracker->>Tracker: Snapshot counters & reset map
        Tracker->>OTEL: Record usage distribution<br/>to Int64Histogram
        OTEL->>Tracker: Metrics emitted
    end

    Client->>Auth: Stop signal (shutdown)
    Auth->>Tracker: Stop()
    Tracker->>Tracker: Close stopChan

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

patUsageTracker initialization logic: Verify proper error handling when histogram creation fails and ensure nil-safety when meter is not provided
Concurrent access patterns: Review mutex usage in IncrementUsage and reportLoop to ensure thread-safety with the counters map
Background loop lifecycle: Confirm proper goroutine cleanup and context cancellation handling in reportLoop and Stop() method
Metric emission accuracy: Validate that the histogram bucketing and per-minute reporting logic captures expected usage patterns

Possibly related PRs

[management] add pat rate limiting #4741: Modifies the same NewAuthMiddleware constructor signature and handler.go wiring; both extend AuthMiddleware with additional fields and update the same call sites.

Poem

🐰 Metrics hop through every token's dance,
OpenTelemetry gives us a glance,
Usage tracked with concurrent care,
Each PAT counted, metrics to share,
A minute ticks, the data takes flight! 📊✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description check	❓ Inconclusive	The description provides a basic summary but is incomplete: 'Describe your changes' section is minimal, no issue ticket is linked, no checklist items are marked despite applicable options, and the Docs PR URL field contains only a placeholder.	Mark the relevant checklist item (likely 'Is a feature enhancement'), verify if an issue ticket exists, and provide concrete reasoning for why documentation is not needed.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[management] record pat usage metrics' is concise and directly reflects the main change: implementing PAT usage histogram tracking in the management service.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/record-pat-usage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

management/server/http/middleware/pat_usage_tracker.go (1)
52-64: Consider using the stored context for cancellation.

The ctx field is stored but not used in reportLoop. Using ctx.Done() alongside stopChan would allow the tracker to respect context cancellation (e.g., during graceful shutdown):
 func (t *PATUsageTracker) reportLoop() {
 	ticker := time.NewTicker(1 * time.Minute)
 	defer ticker.Stop()

 	for {
 		select {
 		case <-ticker.C:
 			t.reportUsageBuckets()
 		case <-t.stopChan:
 			return
+		case <-t.ctx.Done():
+			return
 		}
 	}
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b77359 and f471d11.

📒 Files selected for processing (4)

management/server/http/handler.go (1 hunks)
management/server/http/middleware/auth_middleware.go (4 hunks)
management/server/http/middleware/auth_middleware_test.go (7 hunks)
management/server/http/middleware/pat_usage_tracker.go (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)

GitHub Check: Management / Benchmark (amd64, sqlite)
GitHub Check: Management / Benchmark (amd64, postgres)
GitHub Check: Management / Integration (amd64, postgres)
GitHub Check: Management / Integration (amd64, sqlite)
GitHub Check: Management / Benchmark (API) (amd64, sqlite)
GitHub Check: Management / Benchmark (API) (amd64, postgres)
GitHub Check: Management / Unit (amd64, mysql)
GitHub Check: Management / Unit (amd64, sqlite)
GitHub Check: Management / Unit (amd64, postgres)
GitHub Check: Signal / Unit (amd64)
GitHub Check: Relay / Unit (amd64, -race)
GitHub Check: Relay / Unit (386)
GitHub Check: Client / Unit (386)
GitHub Check: Client / Unit (amd64)
GitHub Check: Client (Docker) / Unit
GitHub Check: iOS / Build
GitHub Check: Android / Build
GitHub Check: release_ui_darwin
GitHub Check: release_ui
GitHub Check: release
GitHub Check: Client / Unit
GitHub Check: Linux
GitHub Check: Client / Unit
GitHub Check: Windows
GitHub Check: Darwin
GitHub Check: JS / Lint
GitHub Check: Client / Unit

🔇 Additional comments (4)

management/server/http/middleware/auth_middleware_test.go (1)

211-211: LGTM - Test updates correctly accommodate the new meter parameter.

The nil values appropriately disable PAT usage tracking in these tests, maintaining focus on authentication behavior. Consider adding dedicated unit tests for PATUsageTracker in a separate test file to verify metric recording and the background reporting loop.

Also applies to: 270-270, 322-322, 365-365, 409-409, 473-473, 590-590

management/server/http/middleware/pat_usage_tracker.go (2)

44-49: LGTM - Thread-safe increment implementation.

The mutex properly protects the map access, and the lock scope is appropriately minimal.

67-84: LGTM - Good snapshot-and-reset pattern.

The implementation correctly:

Takes a snapshot under the lock

Resets the map while still holding the lock

Releases the lock before the potentially slow histogram recording

This avoids holding the lock during I/O operations.

management/server/http/handler.go (1)

102-109: No action needed regarding nil safety of appMetrics.

The appMetrics parameter is guaranteed to be non-nil. The Metrics() function at the call site either returns a valid AppMetrics instance or terminates the program with log.Fatalf() if initialization fails. Additionally, appMetrics is used without nil checks in multiple places (lines 108, 114, 121), confirming it is always expected to be non-nil. No nil safety checks are required at this call site.

Likely an incorrect or invalid review comment.

coderabbitai · 2025-12-01T17:10:11Z

management/server/http/middleware/auth_middleware.go

+	var patUsageTracker *PATUsageTracker
+	if meter != nil {
+		var err error
+		patUsageTracker, err = NewPATUsageTracker(context.Background(), meter)
+		if err != nil {
+			log.Errorf("Failed to create PAT usage tracker: %s", err)
+		}
+	}


⚠️ Potential issue | 🟠 Major

The PATUsageTracker goroutine is never stopped, causing a resource leak.

NewPATUsageTracker spawns a background goroutine via reportLoop(), but Stop() is never called. When the server shuts down, this goroutine will leak.

Consider:

Storing the tracker reference and calling Stop() during graceful shutdown

Using the passed context for cancellation instead of a separate stop channel

🤖 Prompt for AI Agents

In management/server/http/middleware/auth_middleware.go around lines 52-59, the PATUsageTracker created by NewPATUsageTracker spawns a background goroutine via reportLoop() but is never stopped, causing a goroutine/resource leak; store the returned patUsageTracker on the server/middleware struct (or a shutdown registry) and ensure Stop() is called during the server’s graceful shutdown path, or modify NewPATUsageTracker to accept and use the provided request/server context for cancellation so the goroutine is terminated when that context is cancelled; update initialization to propagate a cancellable context and add a shutdown hook that calls patUsageTracker.Stop() (or cancel the context) to cleanly terminate the background loop.

coderabbitai · 2025-12-01T17:10:11Z

management/server/http/middleware/auth_middleware.go

+	if m.patUsageTracker != nil {
+		m.patUsageTracker.IncrementUsage(token)
+	}


⚠️ Potential issue | 🟠 Major

Usage is tracked before token validation, which may record invalid tokens.

IncrementUsage is called before validating the PAT via GetPATInfo. This means:

Invalid/expired tokens will be counted in metrics

Raw (potentially malicious) token strings are stored as map keys

Consider moving the tracking after successful validation (after line 191) using pat.ID instead of the raw token to:

Only track valid tokens

Avoid storing raw token values in memory

- if m.patUsageTracker != nil { - m.patUsageTracker.IncrementUsage(token) - } - if m.rateLimiter != nil { if !m.rateLimiter.Allow(token) { return r, status.Errorf(status.TooManyRequests, "too many requests") @@ -189,6 +185,10 @@ func (m *AuthMiddleware) checkPATFromRequest(r *http.Request, authHeaderParts [] if time.Now().After(pat.GetExpirationDate()) { return r, fmt.Errorf("token expired") } + + if m.patUsageTracker != nil { + m.patUsageTracker.IncrementUsage(pat.ID) + }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In management/server/http/middleware/auth_middleware.go around lines 174–176, the code increments PAT usage before the token is validated, causing invalid tokens and raw token strings to be recorded; move the patUsageTracker.IncrementUsage call to after successful validation (after line 191) and change it to use the validated PAT's ID (pat.ID) instead of the raw token. Ensure you keep the existing nil check for m.patUsageTracker and call IncrementUsage only when pat != nil and validation succeeded, so metrics only count valid tokens and raw token values are not stored.

coderabbitai · 2025-12-01T17:10:11Z

management/server/http/middleware/pat_usage_tracker.go

+func (t *PATUsageTracker) Stop() {
+	close(t.stopChan)
+}


⚠️ Potential issue | 🟡 Minor

Stop() will panic if called more than once.

Closing an already-closed channel causes a panic. Add protection using sync.Once:

type PATUsageTracker struct { usageCounters map[string]int64 mu sync.Mutex stopChan chan struct{} ctx context.Context histogram metric.Int64Histogram + stopOnce sync.Once }

// Stop stops the reporting goroutine func (t *PATUsageTracker) Stop() { - close(t.stopChan) + t.stopOnce.Do(func() { + close(t.stopChan) + }) }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In management/server/http/middleware/pat_usage_tracker.go around lines 87 to 89, Stop() currently unconditionally closes t.stopChan which will panic if Stop is called more than once; add a sync.Once field to the PATUsageTracker struct (e.g., stopOnce sync.Once) and change Stop to call t.stopOnce.Do(func(){ close(t.stopChan) }) (ensure stopChan is non-nil or initialized where the struct is created); this guarantees the channel is closed only once and prevents the panic.

sonarqubecloud · 2025-12-02T16:51:01Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

management/server/http/middleware/pat_usage_tracker.go (1)
84-87: Guard Stop() against double close of stopChan

If Stop() is ever called more than once (e.g., from multiple shutdown paths or tests), close(t.stopChan) will panic because closing an already-closed channel is illegal. This was already noted in an earlier review; the fix is still applicable here.

You can make Stop() idempotent by adding a sync.Once to the struct and using it here:
 type PATUsageTracker struct {
 	usageCounters map[string]int64
 	mu            sync.Mutex
-	stopChan      chan struct{}
+	stopChan      chan struct{}
+	stopOnce      sync.Once
 	ctx           context.Context
 	histogram     metric.Int64Histogram
 }
@@
 // Stop stops the reporting goroutine
 func (t *PATUsageTracker) Stop() {
-	close(t.stopChan)
+	t.stopOnce.Do(func() {
+		close(t.stopChan)
+	})
 }

🧹 Nitpick comments (2)

management/server/http/middleware/pat_usage_tracker.go (2)
52-65: Consider also stopping on context cancellation in reportLoop

Right now the goroutine only terminates when Stop() closes t.stopChan. If the tracker’s lifetime is meant to follow ctx, you could make shutdown more robust by also listening on t.ctx.Done():
 func (t *PATUsageTracker) reportLoop() {
 	ticker := time.NewTicker(1 * time.Minute)
 	defer ticker.Stop()

 	for {
 		select {
 		case <-ticker.C:
 			t.reportUsageBuckets()
-		case <-t.stopChan:
+		case <-t.stopChan:
 			return
+		case <-t.ctx.Done():
+			return
 		}
 	}
 }
This way the goroutine won’t leak if someone forgets to call Stop() but does cancel the context.

68-81: Metric emission logic is correct; optional hardening around PAT keys

The snapshot/clear pattern under the mutex is safe, and recording each token’s per-minute count into the histogram gives you the intended distribution; logging only the number of unique tokens avoids leaking secrets to logs.

One optional hardening step you might consider is storing an anonymized identifier (e.g., a stable hash or token ID) instead of the raw PAT string as the usageCounters key, so secrets don’t live in long‑lived maps even for a minute. Given you don’t log the keys and clear them every minute, this is more of a defense‑in‑depth improvement than a blocker.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68fb6fe and 1800f2e.

📒 Files selected for processing (1)

management/server/http/middleware/pat_usage_tracker.go (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)

GitHub Check: Management / Benchmark (API) (amd64, postgres)
GitHub Check: Management / Benchmark (API) (amd64, sqlite)
GitHub Check: Management / Integration (amd64, postgres)
GitHub Check: Management / Benchmark (amd64, postgres)
GitHub Check: Management / Integration (amd64, sqlite)
GitHub Check: Management / Unit (amd64, postgres)
GitHub Check: Management / Benchmark (amd64, sqlite)
GitHub Check: Management / Unit (amd64, sqlite)
GitHub Check: Management / Unit (amd64, mysql)
GitHub Check: Relay / Unit (386)
GitHub Check: Relay / Unit (amd64, -race)
GitHub Check: Signal / Unit (amd64)
GitHub Check: Client / Unit (amd64)
GitHub Check: Client / Unit (386)
GitHub Check: Client (Docker) / Unit
GitHub Check: Android / Build
GitHub Check: release_ui_darwin
GitHub Check: release
GitHub Check: JS / Lint
GitHub Check: Linux
GitHub Check: Darwin
GitHub Check: Windows
GitHub Check: Client / Unit
GitHub Check: Client / Unit
GitHub Check: Client / Unit

🔇 Additional comments (3)

management/server/http/middleware/pat_usage_tracker.go (3)

13-43: Tracker struct and constructor look correct; metrics wiring is reasonable

The state held in PATUsageTracker is minimal and sufficient, usageCounters is initialized, and reportLoop is started from NewPATUsageTracker so callers can’t forget to start it. Using metric.Int64Histogram with unit "1" to capture per-minute usage counts via the ticker matches typical OpenTelemetry usage patterns.

Please verify in your staging environment that management.pat.usage_distribution actually shows up in your metrics backend with the expected shape once traffic is flowing.

45-50: IncrementUsage is simple and thread-safe

Locking around the map mutation and using defer to unlock is correct; there are no obvious contention or correctness issues here for the intended usage.

3-7: The project's go.mod is already set to go 1.24.10, which well exceeds the Go 1.21 requirement for maps.Clone. No action needed.

pascal-fischer added 2 commits December 1, 2025 17:27

record pat usage

40f32e3

extract pat usage tracker

f471d11

coderabbitai bot reviewed Dec 1, 2025

View reviewed changes

pascal-fischer added 2 commits December 2, 2025 15:12

Merge branch 'main' into feature/record-pat-usage

68fb6fe

use map clone and clear

1800f2e

crn4 approved these changes Dec 2, 2025

View reviewed changes

coderabbitai bot reviewed Dec 2, 2025

View reviewed changes

pascal-fischer merged commit a232cf6 into main Dec 2, 2025
56 of 57 checks passed

pascal-fischer deleted the feature/record-pat-usage branch December 2, 2025 17:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[management] record pat usage metrics #4888

[management] record pat usage metrics #4888

Uh oh!

pascal-fischer commented Dec 1, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 1, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 1, 2025

Uh oh!

coderabbitai bot Dec 1, 2025

Uh oh!

coderabbitai bot Dec 1, 2025

Uh oh!

sonarqubecloud bot commented Dec 2, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[management] record pat usage metrics #4888

[management] record pat usage metrics #4888

Uh oh!

Conversation

pascal-fischer commented Dec 1, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Issue ticket number and link

Stack

Checklist

Documentation

Docs PR URL (required if "docs added" is checked)

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Dec 2, 2025

Quality Gate passed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pascal-fischer commented Dec 1, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 1, 2025 •

edited

Loading