kernel: park Timekeeper when idle#13533
Conversation
konstantin-s-bogom
left a comment
There was a problem hiding this comment.
Looks correct on a first pass but it's very verbose
c5ec4ad to
7e670f1
Compare
|
rewrote (or deleted) all comments in the style I would use, which is quite minimal, compared to the LLM's hyperverbose style. The version with verbose comments is still accessible at https://github.com/lyphyser/gvisor/tree/park-timekeeper-verbose Also added the new lock to the existing lock order comment. The code has been the result of prompting until it was in the exact shape I wanted with some manual adjustments, and now the comments are all rewritten by me, and the PR explanation also is written by me so effectively only the commit message remains LLM-written. I reread it and agree with it, so it seems fine to leave as-is; alternatively, it could be replace with the PR explanation. Some more details about how the code evolved:
|
7e670f1 to
8bd0f04
Compare
Timekeeper.startUpdater drove a timer that re-armed every update interval
for the lifetime of the sentry, so a fully idle sandbox (no runnable tasks)
woke the sentry on every tick only to rewrite VDSO clock parameters that
nothing could read.
Make the updater park itself while nothing needs a fresh clock, mirroring the
CPU clock ticker's idle handling. The updater goroutine's state is held in an
atomic, Timekeeper.updaterState: a lifecycle state, updaterStopped, plus three
running states ordered parked < idle < active. updaterState is mutated by the
updater goroutine's tick, by addRef, and by start/stopUpdater; all hold updateMu
except addRef's lockless idle->active CAS, the fast path that re-activates an
idle updater.
Pinning the updater active is a reference count (Timekeeper.refs):
- addRef pins the updater active until the matching release, waking and
re-calibrating it if it had parked; release drops the reference. The goroutine
winds down one step per idle tick (active->idle, then idle->parked) and re-reads
refs after each store, while addRef adds its reference before reading the state
-- store-then-load on both sides -- so a referenced updater is never left winding
down. addRef is lockless when the updater is active (nothing to do) or idle (a
single idle->active CAS); only waking a parked updater takes updateMu, where it
re-calibrates (the params have drifted) before returning so the caller never reads
a stale clock.
- incRunningTasks takes a reference on the 0->1 task transition and
decRunningTasks drops it on 1->0, so the updater stays active for as long as
any task can read the VDSO. The reference is taken before the running count is
published, so a concurrent resumer that observes the count cannot read a
drifted clock. incRunningTasks holds runningTasksMu across addRef; the lock
order is always runningTasksMu -> updateMu (the updater goroutine reads refs
locklessly and never takes runningTasksMu), so this cannot deadlock, and only
a task resume from a fully idle sandbox re-calibrates under that lock.
- GetTime brackets the actual clock read with addRef/release, so the updater
stays active -- and the clock fresh -- for the whole read, even if the reader
is descheduled in the middle of it. The read is lockless unless it finds the
updater parked -- i.e. the sandbox has been idle long enough (a full update
interval) for the updater to park -- when addRef takes updateMu to wake and
re-calibrate it.
The kernel drives the updater entirely through addRef/release; the Timekeeper
reads no kernel state. refs is nosave -- at save time no task is running and no
read is in flight, so it is zero, and it is rebuilt as tasks resume and reads
occur.
The VDSO parameter page has a single writer, so the sample-and-write step is
factored into Timekeeper.update() and every refresh holds updateMu across it: the
updater goroutine's tick, startUpdater's activation refresh (only when it starts
with a reference already held), and addRef's parked slow path. updaterState's
idle->active CAS is lockless, but every transition that publishes fresh parameters
(out of parked or stopped) holds updateMu, so a reader that observes a non-parked
state can rely on the parameters being fresh.
updaterStopped is a lifecycle state, distinct from the idle wake path, that keeps
PauseUpdates/Destroy correct: stopUpdater publishes updaterStopped under updateMu
before it closes stop and waits for the goroutine, so an addRef arriving afterward
cannot restart the updater or rewrite the page (it still counts the reference, but
leaves the state stopped). While stopped, a runnable task may execute in
sentry/system mode; startUpdater (via ResumeUpdates) restarts the updater parked,
or active if a reference is already held. Only startUpdater leaves updaterStopped.
The Clocks.Update interface gains a parked bool: on resume from a parked period
the gap was unobserved and at least one update interval long, so the calibrated
clock re-calibrates to the freshly sampled parameters rather than slewing the
accumulated drift forward over a single interval.
The updater starts parked rather than active, activating on the first reference.
A fully idle sentry performs no periodic clock wakeups; freshness on resume and on
internal reads is preserved.
Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8bd0f04 to
50a3cd9
Compare
|
@lyphyser can you rebase your change, there some conflicts at pkg/sentry/kernel/kernel.go |
AI usage: written by Claude with a TON of back and forth and rewrites until we landed on this relatively minimal design; reviewed all the code in detail
Currently the Timekeeper unconditionally wakes up every minute even when idle.
This PR adds a mechanism to park it and to synchronously calibrate when the timekeeper is parked and either a task starts running or GetTime is called. This is achieved using a reference count mechanism that keeps it alive until tasks running becomes 0 and for the GetTime call duration.
The parking mechanism works by first transitioning to idle if no reference are taken when the timer fires, and then if no tasks run and no GetTime calls are made for a whole interval (one second) then it transitions to parked and stops rearming the timer. This design means that there is no behavior change in the non-idle case, and makes everything much simpler.
Note that I think there is a pre-existing bug in the codebase, which is that GetTime can be called while the timekeeper is stopped, because it is stopped before netstack during save. Once this is fixed, then the code in this commit can be adjust to panic if anything tries to take a reference to the Timekeeper while stopped instead of silently allowing it with miscalibrated clocks.