-
-
Notifications
You must be signed in to change notification settings - Fork 961
[management] record pat usage metrics #4888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis PR introduces PAT (Personal Access Token) usage tracking to the authentication middleware using OpenTelemetry metrics. A new PATUsageTracker component accumulates per-token usage counts and periodically emits metrics. The authentication middleware is wired to invoke usage tracking before rate-limiting checks. Changes
Sequence DiagramsequenceDiagram
participant Client as Client
participant Auth as AuthMiddleware
participant Tracker as PATUsageTracker
participant OTEL as OpenTelemetry<br/>Metrics
Client->>Auth: HTTP Request with PAT
Auth->>Auth: Extract PAT from request
alt Tracker Available
Auth->>Tracker: IncrementUsage(token)
Tracker->>Tracker: Increment in-memory counter
end
Auth->>Auth: Rate-limiting checks
Auth->>Client: Response
par Background Process
Tracker->>Tracker: Every 60s: reportLoop triggered
Tracker->>Tracker: Snapshot counters & reset map
Tracker->>OTEL: Record usage distribution<br/>to Int64Histogram
OTEL->>Tracker: Metrics emitted
end
Client->>Auth: Stop signal (shutdown)
Auth->>Tracker: Stop()
Tracker->>Tracker: Close stopChan
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
management/server/http/middleware/pat_usage_tracker.go (1)
52-64: Consider using the stored context for cancellation.The
ctxfield is stored but not used inreportLoop. Usingctx.Done()alongsidestopChanwould allow the tracker to respect context cancellation (e.g., during graceful shutdown):func (t *PATUsageTracker) reportLoop() { ticker := time.NewTicker(1 * time.Minute) defer ticker.Stop() for { select { case <-ticker.C: t.reportUsageBuckets() case <-t.stopChan: return + case <-t.ctx.Done(): + return } } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
management/server/http/handler.go(1 hunks)management/server/http/middleware/auth_middleware.go(4 hunks)management/server/http/middleware/auth_middleware_test.go(7 hunks)management/server/http/middleware/pat_usage_tracker.go(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: Management / Benchmark (amd64, sqlite)
- GitHub Check: Management / Benchmark (amd64, postgres)
- GitHub Check: Management / Integration (amd64, postgres)
- GitHub Check: Management / Integration (amd64, sqlite)
- GitHub Check: Management / Benchmark (API) (amd64, sqlite)
- GitHub Check: Management / Benchmark (API) (amd64, postgres)
- GitHub Check: Management / Unit (amd64, mysql)
- GitHub Check: Management / Unit (amd64, sqlite)
- GitHub Check: Management / Unit (amd64, postgres)
- GitHub Check: Signal / Unit (amd64)
- GitHub Check: Relay / Unit (amd64, -race)
- GitHub Check: Relay / Unit (386)
- GitHub Check: Client / Unit (386)
- GitHub Check: Client / Unit (amd64)
- GitHub Check: Client (Docker) / Unit
- GitHub Check: iOS / Build
- GitHub Check: Android / Build
- GitHub Check: release_ui_darwin
- GitHub Check: release_ui
- GitHub Check: release
- GitHub Check: Client / Unit
- GitHub Check: Linux
- GitHub Check: Client / Unit
- GitHub Check: Windows
- GitHub Check: Darwin
- GitHub Check: JS / Lint
- GitHub Check: Client / Unit
🔇 Additional comments (4)
management/server/http/middleware/auth_middleware_test.go (1)
211-211: LGTM - Test updates correctly accommodate the new meter parameter.The nil values appropriately disable PAT usage tracking in these tests, maintaining focus on authentication behavior. Consider adding dedicated unit tests for
PATUsageTrackerin a separate test file to verify metric recording and the background reporting loop.Also applies to: 270-270, 322-322, 365-365, 409-409, 473-473, 590-590
management/server/http/middleware/pat_usage_tracker.go (2)
44-49: LGTM - Thread-safe increment implementation.The mutex properly protects the map access, and the lock scope is appropriately minimal.
67-84: LGTM - Good snapshot-and-reset pattern.The implementation correctly:
- Takes a snapshot under the lock
- Resets the map while still holding the lock
- Releases the lock before the potentially slow histogram recording
This avoids holding the lock during I/O operations.
management/server/http/handler.go (1)
102-109: No action needed regarding nil safety ofappMetrics.The
appMetricsparameter is guaranteed to be non-nil. TheMetrics()function at the call site either returns a validAppMetricsinstance or terminates the program withlog.Fatalf()if initialization fails. Additionally,appMetricsis used without nil checks in multiple places (lines 108, 114, 121), confirming it is always expected to be non-nil. No nil safety checks are required at this call site.Likely an incorrect or invalid review comment.
| var patUsageTracker *PATUsageTracker | ||
| if meter != nil { | ||
| var err error | ||
| patUsageTracker, err = NewPATUsageTracker(context.Background(), meter) | ||
| if err != nil { | ||
| log.Errorf("Failed to create PAT usage tracker: %s", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PATUsageTracker goroutine is never stopped, causing a resource leak.
NewPATUsageTracker spawns a background goroutine via reportLoop(), but Stop() is never called. When the server shuts down, this goroutine will leak.
Consider:
- Storing the tracker reference and calling
Stop()during graceful shutdown - Using the passed context for cancellation instead of a separate stop channel
🤖 Prompt for AI Agents
In management/server/http/middleware/auth_middleware.go around lines 52-59, the
PATUsageTracker created by NewPATUsageTracker spawns a background goroutine via
reportLoop() but is never stopped, causing a goroutine/resource leak; store the
returned patUsageTracker on the server/middleware struct (or a shutdown
registry) and ensure Stop() is called during the server’s graceful shutdown
path, or modify NewPATUsageTracker to accept and use the provided request/server
context for cancellation so the goroutine is terminated when that context is
cancelled; update initialization to propagate a cancellable context and add a
shutdown hook that calls patUsageTracker.Stop() (or cancel the context) to
cleanly terminate the background loop.
| if m.patUsageTracker != nil { | ||
| m.patUsageTracker.IncrementUsage(token) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage is tracked before token validation, which may record invalid tokens.
IncrementUsage is called before validating the PAT via GetPATInfo. This means:
- Invalid/expired tokens will be counted in metrics
- Raw (potentially malicious) token strings are stored as map keys
Consider moving the tracking after successful validation (after line 191) using pat.ID instead of the raw token to:
- Only track valid tokens
- Avoid storing raw token values in memory
- if m.patUsageTracker != nil {
- m.patUsageTracker.IncrementUsage(token)
- }
-
if m.rateLimiter != nil {
if !m.rateLimiter.Allow(token) {
return r, status.Errorf(status.TooManyRequests, "too many requests")
@@ -189,6 +185,10 @@ func (m *AuthMiddleware) checkPATFromRequest(r *http.Request, authHeaderParts []
if time.Now().After(pat.GetExpirationDate()) {
return r, fmt.Errorf("token expired")
}
+
+ if m.patUsageTracker != nil {
+ m.patUsageTracker.IncrementUsage(pat.ID)
+ }Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In management/server/http/middleware/auth_middleware.go around lines 174–176,
the code increments PAT usage before the token is validated, causing invalid
tokens and raw token strings to be recorded; move the
patUsageTracker.IncrementUsage call to after successful validation (after line
191) and change it to use the validated PAT's ID (pat.ID) instead of the raw
token. Ensure you keep the existing nil check for m.patUsageTracker and call
IncrementUsage only when pat != nil and validation succeeded, so metrics only
count valid tokens and raw token values are not stored.
| func (t *PATUsageTracker) Stop() { | ||
| close(t.stopChan) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stop() will panic if called more than once.
Closing an already-closed channel causes a panic. Add protection using sync.Once:
type PATUsageTracker struct {
usageCounters map[string]int64
mu sync.Mutex
stopChan chan struct{}
ctx context.Context
histogram metric.Int64Histogram
+ stopOnce sync.Once
} // Stop stops the reporting goroutine
func (t *PATUsageTracker) Stop() {
- close(t.stopChan)
+ t.stopOnce.Do(func() {
+ close(t.stopChan)
+ })
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In management/server/http/middleware/pat_usage_tracker.go around lines 87 to 89,
Stop() currently unconditionally closes t.stopChan which will panic if Stop is
called more than once; add a sync.Once field to the PATUsageTracker struct
(e.g., stopOnce sync.Once) and change Stop to call t.stopOnce.Do(func(){
close(t.stopChan) }) (ensure stopChan is non-nil or initialized where the struct
is created); this guarantees the channel is closed only once and prevents the
panic.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
management/server/http/middleware/pat_usage_tracker.go (1)
84-87: GuardStop()against double close ofstopChanIf
Stop()is ever called more than once (e.g., from multiple shutdown paths or tests),close(t.stopChan)will panic because closing an already-closed channel is illegal. This was already noted in an earlier review; the fix is still applicable here.You can make
Stop()idempotent by adding async.Onceto the struct and using it here:type PATUsageTracker struct { usageCounters map[string]int64 mu sync.Mutex - stopChan chan struct{} + stopChan chan struct{} + stopOnce sync.Once ctx context.Context histogram metric.Int64Histogram } @@ // Stop stops the reporting goroutine func (t *PATUsageTracker) Stop() { - close(t.stopChan) + t.stopOnce.Do(func() { + close(t.stopChan) + }) }
🧹 Nitpick comments (2)
management/server/http/middleware/pat_usage_tracker.go (2)
52-65: Consider also stopping on context cancellation inreportLoopRight now the goroutine only terminates when
Stop()closest.stopChan. If the tracker’s lifetime is meant to followctx, you could make shutdown more robust by also listening ont.ctx.Done():func (t *PATUsageTracker) reportLoop() { ticker := time.NewTicker(1 * time.Minute) defer ticker.Stop() for { select { case <-ticker.C: t.reportUsageBuckets() - case <-t.stopChan: + case <-t.stopChan: return + case <-t.ctx.Done(): + return } } }This way the goroutine won’t leak if someone forgets to call
Stop()but does cancel the context.
68-81: Metric emission logic is correct; optional hardening around PAT keysThe snapshot/
clearpattern under the mutex is safe, and recording each token’s per-minute count into the histogram gives you the intended distribution; logging only the number of unique tokens avoids leaking secrets to logs.One optional hardening step you might consider is storing an anonymized identifier (e.g., a stable hash or token ID) instead of the raw PAT string as the
usageCounterskey, so secrets don’t live in long‑lived maps even for a minute. Given you don’t log the keys and clear them every minute, this is more of a defense‑in‑depth improvement than a blocker.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
management/server/http/middleware/pat_usage_tracker.go(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
- GitHub Check: Management / Benchmark (API) (amd64, postgres)
- GitHub Check: Management / Benchmark (API) (amd64, sqlite)
- GitHub Check: Management / Integration (amd64, postgres)
- GitHub Check: Management / Benchmark (amd64, postgres)
- GitHub Check: Management / Integration (amd64, sqlite)
- GitHub Check: Management / Unit (amd64, postgres)
- GitHub Check: Management / Benchmark (amd64, sqlite)
- GitHub Check: Management / Unit (amd64, sqlite)
- GitHub Check: Management / Unit (amd64, mysql)
- GitHub Check: Relay / Unit (386)
- GitHub Check: Relay / Unit (amd64, -race)
- GitHub Check: Signal / Unit (amd64)
- GitHub Check: Client / Unit (amd64)
- GitHub Check: Client / Unit (386)
- GitHub Check: Client (Docker) / Unit
- GitHub Check: Android / Build
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: JS / Lint
- GitHub Check: Linux
- GitHub Check: Darwin
- GitHub Check: Windows
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
🔇 Additional comments (3)
management/server/http/middleware/pat_usage_tracker.go (3)
13-43: Tracker struct and constructor look correct; metrics wiring is reasonableThe state held in
PATUsageTrackeris minimal and sufficient,usageCountersis initialized, andreportLoopis started fromNewPATUsageTrackerso callers can’t forget to start it. Usingmetric.Int64Histogramwith unit"1"to capture per-minute usage counts via the ticker matches typical OpenTelemetry usage patterns.Please verify in your staging environment that
management.pat.usage_distributionactually shows up in your metrics backend with the expected shape once traffic is flowing.
45-50: IncrementUsage is simple and thread-safeLocking around the map mutation and using
deferto unlock is correct; there are no obvious contention or correctness issues here for the intended usage.
3-7: The project'sgo.modis already set togo 1.24.10, which well exceeds the Go 1.21 requirement formaps.Clone. No action needed.



Describe your changes
This PR records a histogram of how often different pats are used.
Issue ticket number and link
Stack
Checklist
Documentation
Select exactly one:
Docs PR URL (required if "docs added" is checked)
Paste the PR link from https://github.com/netbirdio/docs here:
https://github.com/netbirdio/docs/pull/__
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.