Skip to content

Refactor session manager to use simpler concurrency control [2/3]#6631

Closed
glightfoot wants to merge 2 commits into
moby:masterfrom
glightfoot:refactor-session-manager
Closed

Refactor session manager to use simpler concurrency control [2/3]#6631
glightfoot wants to merge 2 commits into
moby:masterfrom
glightfoot:refactor-session-manager

Conversation

@glightfoot
Copy link
Copy Markdown

@glightfoot glightfoot commented Mar 30, 2026

Summary

This PR refactors session.Manager to use simpler, more deterministic concurrency control and removes global condition-variable wakeup behavior that caused unnecessary contention under load.

The implementation replaces shared mutable state guarded by sync.Mutex + sync.Cond with a single-owner state model and explicit per-session wait coordination.

This is part of the changes we run in our fork, along with #6630 and #6632, but broken out into separate PRs to make it easier to review. We have been running this change in our internal fork with no issues.

Motivation

The previous manager implementation relied on:

  • a global mutex around the session map
  • broadcast wakeups for all waiters
  • lock-heavy create/get paths that could amplify contention during high concurrency

Under churn (many waiters + session creation/cancellation), this could lead to avoidable wakeup storms and poor tail latency for unrelated lookups.

What Changed

Session manager concurrency model

  • Introduced internal sessionState with explicit buckets:
    • active sessions
    • reserved session IDs (creation in progress)
    • pending waiters per session ID
  • Added a single-owner state channel (chan *sessionState) and helper methods (withState, tryWithState) for serialized state mutation.
  • Replaced global Cond.Broadcast() wakeups with targeted per-session waiter notification (pendingWait / notifyPending / releasePendingWait).

Session lifecycle handling

  • Added explicit reserve -> activate -> remove lifecycle to prevent duplicate registration races.
  • Refactored HandleHTTPRequest and HandleConn through a shared runSession flow with clear setup/session/cleanup phases.
  • Kept cleanup robust even on failure paths by ensuring reserved/active state is released.

Get behavior improvements

  • Added fast-path non-blocking lookup for active sessions.
  • Preserved no-wait semantics (noWait=true returns immediately if no active session).
  • Added cancellation-aware waiter cleanup so canceled callers do not leak pending wait entries.
  • Improved timeout/cancel error context for easier debugging.

Test Coverage

This PR adds broad test coverage for correctness and concurrency behavior, including:

  • session/manager_test.go
    • no-wait and prefix lookup behavior
    • cancellation while waiting
    • pending waiter cleanup
    • high concurrency lookup/create scenarios
    • wakeup/broadcast stress cases
    • duplicate session handling and HTTP/raw session paths
    • deadline behavior under lock contention
  • session/manager_concurrency_test.go
    • thundering-herd contention scenarios
  • session/group_test.go
    • Any() edge cases, callback errors, and high-concurrency access patterns

Impact

  • Reduces global lock contention in session lookup/registration paths.
  • Eliminates broad broadcast wakeups in favor of targeted waiter signaling.
  • Improves determinism and isolation for concurrent Get, HandleConn, and HandleHTTPRequest operations.
  • Makes session manager behavior easier to reason about and safer under heavy parallel workloads.

Fixes #6633

Signed-off-by: Greg Lightfoot <greg.lightfoot@reddit.com>
@glightfoot glightfoot changed the title Refactor session manager to use simpler concurrency control [2/2] Refactor session manager to use simpler concurrency control [2/3] Mar 30, 2026
@glightfoot glightfoot marked this pull request as ready for review March 30, 2026 18:44
Signed-off-by: Greg Lightfoot <greg.lightfoot@reddit.com>
@tonistiigi
Copy link
Copy Markdown
Member

#6633 (comment)

@tonistiigi tonistiigi closed this Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deadlocks in auth to repositories

2 participants