Skip to content

Conversation

@pascal-fischer
Copy link
Collaborator

@pascal-fischer pascal-fischer commented Dec 1, 2025

Describe your changes

This PR records a histogram of how often different pats are used.

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

Release Notes

  • New Features
    • Added PAT usage tracking and metrics reporting capabilities for enhanced observability
    • Usage metrics are now collected and reported periodically to monitor token activity patterns

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 1, 2025

Walkthrough

This PR introduces PAT (Personal Access Token) usage tracking to the authentication middleware using OpenTelemetry metrics. A new PATUsageTracker component accumulates per-token usage counts and periodically emits metrics. The authentication middleware is wired to invoke usage tracking before rate-limiting checks.

Changes

Cohort / File(s) Summary
Constructor and Handler Updates
management/server/http/handler.go, management/server/http/middleware/auth_middleware.go, management/server/http/middleware/auth_middleware_test.go
Updated NewAuthMiddleware call site to pass appMetrics.GetMeter() and extended constructor signature to accept optional metric.Meter parameter. When meter is provided, PATUsageTracker is initialized. Tests updated to pass nil meter argument.
PAT Usage Tracker Implementation
management/server/http/middleware/pat_usage_tracker.go
New file introducing concurrent-safe PATUsageTracker that maintains per-token usage counters in memory. Background loop emits usage distribution metrics via OpenTelemetry Int64Histogram every minute. Includes graceful shutdown via Stop() method.
Middleware Integration
management/server/http/middleware/auth_middleware.go
Added patUsageTracker field to AuthMiddleware struct. IncrementUsage(token) is invoked in checkPATFromRequest prior to rate-limiting checks when tracker is available.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant Auth as AuthMiddleware
    participant Tracker as PATUsageTracker
    participant OTEL as OpenTelemetry<br/>Metrics

    Client->>Auth: HTTP Request with PAT
    Auth->>Auth: Extract PAT from request
    alt Tracker Available
        Auth->>Tracker: IncrementUsage(token)
        Tracker->>Tracker: Increment in-memory counter
    end
    Auth->>Auth: Rate-limiting checks
    Auth->>Client: Response

    par Background Process
        Tracker->>Tracker: Every 60s: reportLoop triggered
        Tracker->>Tracker: Snapshot counters & reset map
        Tracker->>OTEL: Record usage distribution<br/>to Int64Histogram
        OTEL->>Tracker: Metrics emitted
    end

    Client->>Auth: Stop signal (shutdown)
    Auth->>Tracker: Stop()
    Tracker->>Tracker: Close stopChan
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • patUsageTracker initialization logic: Verify proper error handling when histogram creation fails and ensure nil-safety when meter is not provided
  • Concurrent access patterns: Review mutex usage in IncrementUsage and reportLoop to ensure thread-safety with the counters map
  • Background loop lifecycle: Confirm proper goroutine cleanup and context cancellation handling in reportLoop and Stop() method
  • Metric emission accuracy: Validate that the histogram bucketing and per-minute reporting logic captures expected usage patterns

Possibly related PRs

  • [management] add pat rate limiting #4741: Modifies the same NewAuthMiddleware constructor signature and handler.go wiring; both extend AuthMiddleware with additional fields and update the same call sites.

Poem

🐰 Metrics hop through every token's dance,
OpenTelemetry gives us a glance,
Usage tracked with concurrent care,
Each PAT counted, metrics to share,
A minute ticks, the data takes flight! 📊✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ❓ Inconclusive The description provides a basic summary but is incomplete: 'Describe your changes' section is minimal, no issue ticket is linked, no checklist items are marked despite applicable options, and the Docs PR URL field contains only a placeholder. Mark the relevant checklist item (likely 'Is a feature enhancement'), verify if an issue ticket exists, and provide concrete reasoning for why documentation is not needed.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title '[management] record pat usage metrics' is concise and directly reflects the main change: implementing PAT usage histogram tracking in the management service.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/record-pat-usage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
management/server/http/middleware/pat_usage_tracker.go (1)

52-64: Consider using the stored context for cancellation.

The ctx field is stored but not used in reportLoop. Using ctx.Done() alongside stopChan would allow the tracker to respect context cancellation (e.g., during graceful shutdown):

 func (t *PATUsageTracker) reportLoop() {
 	ticker := time.NewTicker(1 * time.Minute)
 	defer ticker.Stop()

 	for {
 		select {
 		case <-ticker.C:
 			t.reportUsageBuckets()
 		case <-t.stopChan:
 			return
+		case <-t.ctx.Done():
+			return
 		}
 	}
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b77359 and f471d11.

📒 Files selected for processing (4)
  • management/server/http/handler.go (1 hunks)
  • management/server/http/middleware/auth_middleware.go (4 hunks)
  • management/server/http/middleware/auth_middleware_test.go (7 hunks)
  • management/server/http/middleware/pat_usage_tracker.go (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: Management / Benchmark (amd64, sqlite)
  • GitHub Check: Management / Benchmark (amd64, postgres)
  • GitHub Check: Management / Integration (amd64, postgres)
  • GitHub Check: Management / Integration (amd64, sqlite)
  • GitHub Check: Management / Benchmark (API) (amd64, sqlite)
  • GitHub Check: Management / Benchmark (API) (amd64, postgres)
  • GitHub Check: Management / Unit (amd64, mysql)
  • GitHub Check: Management / Unit (amd64, sqlite)
  • GitHub Check: Management / Unit (amd64, postgres)
  • GitHub Check: Signal / Unit (amd64)
  • GitHub Check: Relay / Unit (amd64, -race)
  • GitHub Check: Relay / Unit (386)
  • GitHub Check: Client / Unit (386)
  • GitHub Check: Client / Unit (amd64)
  • GitHub Check: Client (Docker) / Unit
  • GitHub Check: iOS / Build
  • GitHub Check: Android / Build
  • GitHub Check: release_ui_darwin
  • GitHub Check: release_ui
  • GitHub Check: release
  • GitHub Check: Client / Unit
  • GitHub Check: Linux
  • GitHub Check: Client / Unit
  • GitHub Check: Windows
  • GitHub Check: Darwin
  • GitHub Check: JS / Lint
  • GitHub Check: Client / Unit
🔇 Additional comments (4)
management/server/http/middleware/auth_middleware_test.go (1)

211-211: LGTM - Test updates correctly accommodate the new meter parameter.

The nil values appropriately disable PAT usage tracking in these tests, maintaining focus on authentication behavior. Consider adding dedicated unit tests for PATUsageTracker in a separate test file to verify metric recording and the background reporting loop.

Also applies to: 270-270, 322-322, 365-365, 409-409, 473-473, 590-590

management/server/http/middleware/pat_usage_tracker.go (2)

44-49: LGTM - Thread-safe increment implementation.

The mutex properly protects the map access, and the lock scope is appropriately minimal.


67-84: LGTM - Good snapshot-and-reset pattern.

The implementation correctly:

  1. Takes a snapshot under the lock
  2. Resets the map while still holding the lock
  3. Releases the lock before the potentially slow histogram recording

This avoids holding the lock during I/O operations.

management/server/http/handler.go (1)

102-109: No action needed regarding nil safety of appMetrics.

The appMetrics parameter is guaranteed to be non-nil. The Metrics() function at the call site either returns a valid AppMetrics instance or terminates the program with log.Fatalf() if initialization fails. Additionally, appMetrics is used without nil checks in multiple places (lines 108, 114, 121), confirming it is always expected to be non-nil. No nil safety checks are required at this call site.

Likely an incorrect or invalid review comment.

Comment on lines +52 to +59
var patUsageTracker *PATUsageTracker
if meter != nil {
var err error
patUsageTracker, err = NewPATUsageTracker(context.Background(), meter)
if err != nil {
log.Errorf("Failed to create PAT usage tracker: %s", err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The PATUsageTracker goroutine is never stopped, causing a resource leak.

NewPATUsageTracker spawns a background goroutine via reportLoop(), but Stop() is never called. When the server shuts down, this goroutine will leak.

Consider:

  1. Storing the tracker reference and calling Stop() during graceful shutdown
  2. Using the passed context for cancellation instead of a separate stop channel
🤖 Prompt for AI Agents
In management/server/http/middleware/auth_middleware.go around lines 52-59, the
PATUsageTracker created by NewPATUsageTracker spawns a background goroutine via
reportLoop() but is never stopped, causing a goroutine/resource leak; store the
returned patUsageTracker on the server/middleware struct (or a shutdown
registry) and ensure Stop() is called during the server’s graceful shutdown
path, or modify NewPATUsageTracker to accept and use the provided request/server
context for cancellation so the goroutine is terminated when that context is
cancelled; update initialization to propagate a cancellable context and add a
shutdown hook that calls patUsageTracker.Stop() (or cancel the context) to
cleanly terminate the background loop.

Comment on lines +174 to +176
if m.patUsageTracker != nil {
m.patUsageTracker.IncrementUsage(token)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Usage is tracked before token validation, which may record invalid tokens.

IncrementUsage is called before validating the PAT via GetPATInfo. This means:

  1. Invalid/expired tokens will be counted in metrics
  2. Raw (potentially malicious) token strings are stored as map keys

Consider moving the tracking after successful validation (after line 191) using pat.ID instead of the raw token to:

  • Only track valid tokens
  • Avoid storing raw token values in memory
-	if m.patUsageTracker != nil {
-		m.patUsageTracker.IncrementUsage(token)
-	}
-
 	if m.rateLimiter != nil {
 		if !m.rateLimiter.Allow(token) {
 			return r, status.Errorf(status.TooManyRequests, "too many requests")
@@ -189,6 +185,10 @@ func (m *AuthMiddleware) checkPATFromRequest(r *http.Request, authHeaderParts []
 	if time.Now().After(pat.GetExpirationDate()) {
 		return r, fmt.Errorf("token expired")
 	}
+
+	if m.patUsageTracker != nil {
+		m.patUsageTracker.IncrementUsage(pat.ID)
+	}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In management/server/http/middleware/auth_middleware.go around lines 174–176,
the code increments PAT usage before the token is validated, causing invalid
tokens and raw token strings to be recorded; move the
patUsageTracker.IncrementUsage call to after successful validation (after line
191) and change it to use the validated PAT's ID (pat.ID) instead of the raw
token. Ensure you keep the existing nil check for m.patUsageTracker and call
IncrementUsage only when pat != nil and validation succeeded, so metrics only
count valid tokens and raw token values are not stored.

Comment on lines +87 to +89
func (t *PATUsageTracker) Stop() {
close(t.stopChan)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Stop() will panic if called more than once.

Closing an already-closed channel causes a panic. Add protection using sync.Once:

 type PATUsageTracker struct {
 	usageCounters map[string]int64
 	mu            sync.Mutex
 	stopChan      chan struct{}
 	ctx           context.Context
 	histogram     metric.Int64Histogram
+	stopOnce      sync.Once
 }
 // Stop stops the reporting goroutine
 func (t *PATUsageTracker) Stop() {
-	close(t.stopChan)
+	t.stopOnce.Do(func() {
+		close(t.stopChan)
+	})
 }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In management/server/http/middleware/pat_usage_tracker.go around lines 87 to 89,
Stop() currently unconditionally closes t.stopChan which will panic if Stop is
called more than once; add a sync.Once field to the PATUsageTracker struct
(e.g., stopOnce sync.Once) and change Stop to call t.stopOnce.Do(func(){
close(t.stopChan) }) (ensure stopChan is non-nil or initialized where the struct
is created); this guarantees the channel is closed only once and prevents the
panic.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 2, 2025

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
management/server/http/middleware/pat_usage_tracker.go (1)

84-87: Guard Stop() against double close of stopChan

If Stop() is ever called more than once (e.g., from multiple shutdown paths or tests), close(t.stopChan) will panic because closing an already-closed channel is illegal. This was already noted in an earlier review; the fix is still applicable here.

You can make Stop() idempotent by adding a sync.Once to the struct and using it here:

 type PATUsageTracker struct {
 	usageCounters map[string]int64
 	mu            sync.Mutex
-	stopChan      chan struct{}
+	stopChan      chan struct{}
+	stopOnce      sync.Once
 	ctx           context.Context
 	histogram     metric.Int64Histogram
 }
@@
 // Stop stops the reporting goroutine
 func (t *PATUsageTracker) Stop() {
-	close(t.stopChan)
+	t.stopOnce.Do(func() {
+		close(t.stopChan)
+	})
 }
🧹 Nitpick comments (2)
management/server/http/middleware/pat_usage_tracker.go (2)

52-65: Consider also stopping on context cancellation in reportLoop

Right now the goroutine only terminates when Stop() closes t.stopChan. If the tracker’s lifetime is meant to follow ctx, you could make shutdown more robust by also listening on t.ctx.Done():

 func (t *PATUsageTracker) reportLoop() {
 	ticker := time.NewTicker(1 * time.Minute)
 	defer ticker.Stop()

 	for {
 		select {
 		case <-ticker.C:
 			t.reportUsageBuckets()
-		case <-t.stopChan:
+		case <-t.stopChan:
 			return
+		case <-t.ctx.Done():
+			return
 		}
 	}
 }

This way the goroutine won’t leak if someone forgets to call Stop() but does cancel the context.


68-81: Metric emission logic is correct; optional hardening around PAT keys

The snapshot/clear pattern under the mutex is safe, and recording each token’s per-minute count into the histogram gives you the intended distribution; logging only the number of unique tokens avoids leaking secrets to logs.

One optional hardening step you might consider is storing an anonymized identifier (e.g., a stable hash or token ID) instead of the raw PAT string as the usageCounters key, so secrets don’t live in long‑lived maps even for a minute. Given you don’t log the keys and clear them every minute, this is more of a defense‑in‑depth improvement than a blocker.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68fb6fe and 1800f2e.

📒 Files selected for processing (1)
  • management/server/http/middleware/pat_usage_tracker.go (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
  • GitHub Check: Management / Benchmark (API) (amd64, postgres)
  • GitHub Check: Management / Benchmark (API) (amd64, sqlite)
  • GitHub Check: Management / Integration (amd64, postgres)
  • GitHub Check: Management / Benchmark (amd64, postgres)
  • GitHub Check: Management / Integration (amd64, sqlite)
  • GitHub Check: Management / Unit (amd64, postgres)
  • GitHub Check: Management / Benchmark (amd64, sqlite)
  • GitHub Check: Management / Unit (amd64, sqlite)
  • GitHub Check: Management / Unit (amd64, mysql)
  • GitHub Check: Relay / Unit (386)
  • GitHub Check: Relay / Unit (amd64, -race)
  • GitHub Check: Signal / Unit (amd64)
  • GitHub Check: Client / Unit (amd64)
  • GitHub Check: Client / Unit (386)
  • GitHub Check: Client (Docker) / Unit
  • GitHub Check: Android / Build
  • GitHub Check: release_ui_darwin
  • GitHub Check: release
  • GitHub Check: JS / Lint
  • GitHub Check: Linux
  • GitHub Check: Darwin
  • GitHub Check: Windows
  • GitHub Check: Client / Unit
  • GitHub Check: Client / Unit
  • GitHub Check: Client / Unit
🔇 Additional comments (3)
management/server/http/middleware/pat_usage_tracker.go (3)

13-43: Tracker struct and constructor look correct; metrics wiring is reasonable

The state held in PATUsageTracker is minimal and sufficient, usageCounters is initialized, and reportLoop is started from NewPATUsageTracker so callers can’t forget to start it. Using metric.Int64Histogram with unit "1" to capture per-minute usage counts via the ticker matches typical OpenTelemetry usage patterns.

Please verify in your staging environment that management.pat.usage_distribution actually shows up in your metrics backend with the expected shape once traffic is flowing.


45-50: IncrementUsage is simple and thread-safe

Locking around the map mutation and using defer to unlock is correct; there are no obvious contention or correctness issues here for the intended usage.


3-7: The project's go.mod is already set to go 1.24.10, which well exceeds the Go 1.21 requirement for maps.Clone. No action needed.

@pascal-fischer pascal-fischer merged commit a232cf6 into main Dec 2, 2025
56 of 57 checks passed
@pascal-fischer pascal-fischer deleted the feature/record-pat-usage branch December 2, 2025 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants