Skip to content

kernel: park Timekeeper when idle#13533

Open
lyphyser wants to merge 1 commit into
google:masterfrom
lyphyser:park-timekeeper
Open

kernel: park Timekeeper when idle#13533
lyphyser wants to merge 1 commit into
google:masterfrom
lyphyser:park-timekeeper

Conversation

@lyphyser

Copy link
Copy Markdown
Contributor

AI usage: written by Claude with a TON of back and forth and rewrites until we landed on this relatively minimal design; reviewed all the code in detail

Currently the Timekeeper unconditionally wakes up every minute even when idle.

This PR adds a mechanism to park it and to synchronously calibrate when the timekeeper is parked and either a task starts running or GetTime is called. This is achieved using a reference count mechanism that keeps it alive until tasks running becomes 0 and for the GetTime call duration.

The parking mechanism works by first transitioning to idle if no reference are taken when the timer fires, and then if no tasks run and no GetTime calls are made for a whole interval (one second) then it transitions to parked and stops rearming the timer. This design means that there is no behavior change in the non-idle case, and makes everything much simpler.

Note that I think there is a pre-existing bug in the codebase, which is that GetTime can be called while the timekeeper is stopped, because it is stopped before netstack during save. Once this is fixed, then the code in this commit can be adjust to panic if anything tries to take a reference to the Timekeeper while stopped instead of silently allowing it with miscalibrated clocks.

@konstantin-s-bogom konstantin-s-bogom left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct on a first pass but it's very verbose

Comment thread pkg/sentry/kernel/timekeeper.go Outdated
Comment thread pkg/sentry/kernel/timekeeper.go Outdated
@lyphyser

lyphyser commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

rewrote (or deleted) all comments in the style I would use, which is quite minimal, compared to the LLM's hyperverbose style. The version with verbose comments is still accessible at https://github.com/lyphyser/gvisor/tree/park-timekeeper-verbose

Also added the new lock to the existing lock order comment.

The code has been the result of prompting until it was in the exact shape I wanted with some manual adjustments, and now the comments are all rewritten by me, and the PR explanation also is written by me so effectively only the commit message remains LLM-written. I reread it and agree with it, so it seems fine to leave as-is; alternatively, it could be replace with the PR explanation.

Some more details about how the code evolved:

  • Initially I tried parking immediately upon idle (which would thus result in parking all the time in non-idle usage), and tried several strategies to handle parking continuously without changing previous behavior, but neither me nor the LLM could find a way to make it work (my guess is that it's impossible)
  • Then I settled on parking only when idle for a full interval, which eliminates all those issues, but folded refs != 0 into the updaterState, which was awkward and required to either reach back into the kernel to check if anything was running or a 5-state updaterState, and didn't keep timekeeper active during GetTime cleanly
  • Finally settled into the current design with refs and 4-state updaterState

Timekeeper.startUpdater drove a timer that re-armed every update interval
for the lifetime of the sentry, so a fully idle sandbox (no runnable tasks)
woke the sentry on every tick only to rewrite VDSO clock parameters that
nothing could read.

Make the updater park itself while nothing needs a fresh clock, mirroring the
CPU clock ticker's idle handling. The updater goroutine's state is held in an
atomic, Timekeeper.updaterState: a lifecycle state, updaterStopped, plus three
running states ordered parked < idle < active. updaterState is mutated by the
updater goroutine's tick, by addRef, and by start/stopUpdater; all hold updateMu
except addRef's lockless idle->active CAS, the fast path that re-activates an
idle updater.

Pinning the updater active is a reference count (Timekeeper.refs):

  - addRef pins the updater active until the matching release, waking and
    re-calibrating it if it had parked; release drops the reference. The goroutine
    winds down one step per idle tick (active->idle, then idle->parked) and re-reads
    refs after each store, while addRef adds its reference before reading the state
    -- store-then-load on both sides -- so a referenced updater is never left winding
    down. addRef is lockless when the updater is active (nothing to do) or idle (a
    single idle->active CAS); only waking a parked updater takes updateMu, where it
    re-calibrates (the params have drifted) before returning so the caller never reads
    a stale clock.

  - incRunningTasks takes a reference on the 0->1 task transition and
    decRunningTasks drops it on 1->0, so the updater stays active for as long as
    any task can read the VDSO. The reference is taken before the running count is
    published, so a concurrent resumer that observes the count cannot read a
    drifted clock. incRunningTasks holds runningTasksMu across addRef; the lock
    order is always runningTasksMu -> updateMu (the updater goroutine reads refs
    locklessly and never takes runningTasksMu), so this cannot deadlock, and only
    a task resume from a fully idle sandbox re-calibrates under that lock.

  - GetTime brackets the actual clock read with addRef/release, so the updater
    stays active -- and the clock fresh -- for the whole read, even if the reader
    is descheduled in the middle of it. The read is lockless unless it finds the
    updater parked -- i.e. the sandbox has been idle long enough (a full update
    interval) for the updater to park -- when addRef takes updateMu to wake and
    re-calibrate it.

The kernel drives the updater entirely through addRef/release; the Timekeeper
reads no kernel state. refs is nosave -- at save time no task is running and no
read is in flight, so it is zero, and it is rebuilt as tasks resume and reads
occur.

The VDSO parameter page has a single writer, so the sample-and-write step is
factored into Timekeeper.update() and every refresh holds updateMu across it: the
updater goroutine's tick, startUpdater's activation refresh (only when it starts
with a reference already held), and addRef's parked slow path. updaterState's
idle->active CAS is lockless, but every transition that publishes fresh parameters
(out of parked or stopped) holds updateMu, so a reader that observes a non-parked
state can rely on the parameters being fresh.

updaterStopped is a lifecycle state, distinct from the idle wake path, that keeps
PauseUpdates/Destroy correct: stopUpdater publishes updaterStopped under updateMu
before it closes stop and waits for the goroutine, so an addRef arriving afterward
cannot restart the updater or rewrite the page (it still counts the reference, but
leaves the state stopped). While stopped, a runnable task may execute in
sentry/system mode; startUpdater (via ResumeUpdates) restarts the updater parked,
or active if a reference is already held. Only startUpdater leaves updaterStopped.

The Clocks.Update interface gains a parked bool: on resume from a parked period
the gap was unobserved and at least one update interval long, so the calibrated
clock re-calibrates to the freshly sampled parameters rather than slewing the
accumulated drift forward over a single interval.

The updater starts parked rather than active, activating on the first reference.
A fully idle sentry performs no periodic clock wakeups; freshness on resume and on
internal reads is preserved.

Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@konstantin-s-bogom konstantin-s-bogom left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to change the comments, looks a lot more coherent now. Note that our linters may fire because IIRC they look for variable comments like

// $VARIABLE_NAME is blah blah

If so I might end up changing them slightly

@milantracy

Copy link
Copy Markdown
Contributor

@lyphyser can you rebase your change, there some conflicts at pkg/sentry/kernel/kernel.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants