Skip to content

Commit 966609a

Browse files
rscgopherbot
authored andcommitted
time: avoid stale receives after Timer/Ticker Stop/Reset return
A proposal discussion in mid-2020 on #37196 decided to change time.Timer and time.Ticker so that their Stop and Reset methods guarantee that no old value (corresponding to the previous configuration of the Timer or Ticker) will be received after the method returns. The trivial way to do this is to make the Timer/Ticker channels unbuffered, create a goroutine per Timer/Ticker feeding the channel, and then coordinate with that goroutine during Stop/Reset. Since Stop/Reset coordinate with the goroutine and the channel is unbuffered, there is no possibility of a stale value being sent after Stop/Reset returns. Of course, we do not want an extra goroutine per Timer/Ticker, but that's still a good semantic model: behave like the channels are unbuffered and fed by a coordinating goroutine. The actual implementation is more effort but behaves like the model. Specifically, the timer channel has a 1-element buffer like it always has, but len(t.C) and cap(t.C) are special-cased to return 0 anyway, so user code cannot see what's in the buffer except with a receive. Stop/Reset lock out any stale sends and then clear any pending send from the buffer. Some programs will change behavior. For example: package main import "time" func main() { t := time.NewTimer(2 * time.Second) time.Sleep(3 * time.Second) if t.Reset(2*time.Second) != false { panic("expected timer to have fired") } <-t.C <-t.C } This program (from #11513) sleeps 3s after setting a 2s timer, resets the timer, and expects Reset to return false: the Reset is too late and the send has already occurred. It then expects to receive two values: the one from before the Reset, and the one from after the Reset. With an unbuffered timer channel, it should be clear that no value can be sent during the time.Sleep, so the time.Reset returns true, indicating that the Reset stopped the timer from going off. Then there is only one value to receive from t.C: the one from after the Reset. In 2015, I used the above example as an argument against this change. Note that a correct version of the program would be: func main() { t := time.NewTimer(2 * time.Second) time.Sleep(3 * time.Second) if !t.Reset(2*time.Second) { <-t.C } <-t.C } This works with either semantics, by heeding t.Reset's result. The change should not affect correct programs. However, one way that the change would be visible is when programs use len(t.C) (instead of a non-blocking receive) to poll whether the timer has triggered already. We might legitimately worry about breaking such programs. In 2020, discussing #37196, Bryan Mills and I surveyed programs using len on timer channels. These are exceedingly rare to start with; nearly all the uses are buggy; and all the buggy programs would be fixed by the new semantics. The details are at [1]. To further reduce the impact of this change, this CL adds a temporary GODEBUG setting, which we didn't know about yet in 2015 and 2020. Specifically, asynctimerchan=1 disables the change and is the default for main programs in modules that use a Go version before 1.23. We hope to be able to retire this setting after the minimum 2-year window. Setting asynctimerchan=1 also disables the garbage collection change from CL 568341, although users shouldn't need to know that since it is not a semantically visible change (unless we have bugs!). As an undocumented bonus that we do not officially support, asynctimerchan=2 disables the channel buffer change but keeps the garbage collection change. This may help while we are shaking out bugs in either of them. Fixes #37196. [1] #37196 (comment) Change-Id: I8925d3fb2b86b2ae87fd2acd055011cbf7bd5916 Reviewed-on: https://go-review.googlesource.com/c/go/+/568341 Reviewed-by: Austin Clements <[email protected]> Auto-Submit: Russ Cox <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
1 parent 0159150 commit 966609a

File tree

8 files changed

+382
-160
lines changed

8 files changed

+382
-160
lines changed

doc/godebug.md

+6-1
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,12 @@ and the [go command documentation](/cmd/go#hdr-Build_and_test_caching).
128128

129129
### Go 1.23
130130

131-
TODO: `asynctimerchan` setting.
131+
Go 1.23 changed the channels created by package time to be unbuffered
132+
(synchronous), which makes correct use of the [`Timer.Stop`](/pkg/time/#Timer.Stop)
133+
and [`Timer.Reset`](/pkg/time/#Timer.Reset) method results much easier.
134+
The [`asynctimerchan` setting](/pkg/time/#NewTimer) disables this change.
135+
There are no runtime metrics for this change,
136+
This setting may be removed in a future release, Go 1.27 at the earliest.
132137

133138
Go 1.23 changed the mode bits reported by [`os.Lstat`](/pkg/os#Lstat) and [`os.Stat`](/pkg/os#Stat)
134139
for reparse points, which can be controlled with the `winsymlink` setting.

src/internal/godebugs/table.go

+1-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ type Info struct {
2525
// Note: After adding entries to this table, update the list in doc/godebug.md as well.
2626
// (Otherwise the test in this package will fail.)
2727
var All = []Info{
28-
{Name: "asynctimerchan", Package: "time", Opaque: true},
28+
{Name: "asynctimerchan", Package: "time", Changed: 23, Old: "1", Opaque: true},
2929
{Name: "execerrdot", Package: "os/exec"},
3030
{Name: "gocachehash", Package: "cmd/go"},
3131
{Name: "gocachetest", Package: "cmd/go"},

src/runtime/chan.go

+47-1
Original file line numberDiff line numberDiff line change
@@ -323,6 +323,35 @@ func send(c *hchan, sg *sudog, ep unsafe.Pointer, unlockf func(), skip int) {
323323
goready(gp, skip+1)
324324
}
325325

326+
// timerchandrain removes all elements in channel c's buffer.
327+
// It reports whether any elements were removed.
328+
// Because it is only intended for timers, it does not
329+
// handle waiting senders at all (all timer channels
330+
// use non-blocking sends to fill the buffer).
331+
func timerchandrain(c *hchan) bool {
332+
// Note: Cannot use empty(c) because we are called
333+
// while holding c.timer.sendLock, and empty(c) will
334+
// call c.timer.maybeRunChan, which will deadlock.
335+
// We are emptying the channel, so we only care about
336+
// the count, not about potentially filling it up.
337+
if atomic.Loaduint(&c.qcount) == 0 {
338+
return false
339+
}
340+
lock(&c.lock)
341+
any := false
342+
for c.qcount > 0 {
343+
any = true
344+
typedmemclr(c.elemtype, chanbuf(c, c.recvx))
345+
c.recvx++
346+
if c.recvx == c.dataqsiz {
347+
c.recvx = 0
348+
}
349+
c.qcount--
350+
}
351+
unlock(&c.lock)
352+
return any
353+
}
354+
326355
// Sends and receives on unbuffered or empty-buffered channels are the
327356
// only operations where one running goroutine writes to the stack of
328357
// another running goroutine. The GC assumes that stack writes only
@@ -748,16 +777,33 @@ func chanlen(c *hchan) int {
748777
if c == nil {
749778
return 0
750779
}
751-
if c.timer != nil {
780+
async := debug.asynctimerchan.Load() != 0
781+
if c.timer != nil && async {
752782
c.timer.maybeRunChan()
753783
}
784+
if c.timer != nil && !async {
785+
// timer channels have a buffered implementation
786+
// but present to users as unbuffered, so that we can
787+
// undo sends without users noticing.
788+
return 0
789+
}
754790
return int(c.qcount)
755791
}
756792

757793
func chancap(c *hchan) int {
758794
if c == nil {
759795
return 0
760796
}
797+
if c.timer != nil {
798+
async := debug.asynctimerchan.Load() != 0
799+
if async {
800+
return int(c.dataqsiz)
801+
}
802+
// timer channels have a buffered implementation
803+
// but present to users as unbuffered, so that we can
804+
// undo sends without users noticing.
805+
return 0
806+
}
761807
return int(c.dataqsiz)
762808
}
763809

src/runtime/lockrank.go

+33-30
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/runtime/mklockrank.go

+4-2
Original file line numberDiff line numberDiff line change
@@ -55,9 +55,11 @@ NONE <
5555
# Test only
5656
NONE < testR, testW;
5757
58+
NONE < timerSend;
59+
5860
# Scheduler, timers, netpoll
5961
NONE < allocmW, execW, cpuprof, pollCache, pollDesc, wakeableSleep;
60-
scavenge, sweep, testR, wakeableSleep < hchan;
62+
scavenge, sweep, testR, wakeableSleep, timerSend < hchan;
6163
assistQueue,
6264
cpuprof,
6365
forcegc,
@@ -81,7 +83,7 @@ NONE < notifyList;
8183
hchan, notifyList < sudog;
8284
8385
hchan, pollDesc, wakeableSleep < timers;
84-
timers < timer < netpollInit;
86+
timers, timerSend < timer < netpollInit;
8587
8688
# Semaphores
8789
NONE < root;

src/runtime/time.go

+117-24
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,18 @@ type timer struct {
3636
// a well-behaved function and not block.
3737
//
3838
// The arg and seq are client-specified opaque arguments passed back to f.
39-
// When used from package time, arg is a channel (for After, NewTicker)
40-
// or the function to call (for AfterFunc) and seq is unused (0).
4139
// When used from netpoll, arg and seq have meanings defined by netpoll
4240
// and are completely opaque to this code; in that context, seq is a sequence
4341
// number to recognize and squech stale function invocations.
42+
// When used from package time, arg is a channel (for After, NewTicker)
43+
// or the function to call (for AfterFunc) and seq is unused (0).
44+
//
45+
// Package time does not know about seq, but if this is a channel timer (t.isChan == true),
46+
// this file uses t.seq as a sequence number to recognize and squelch
47+
// sends that correspond to an earlier (stale) timer configuration,
48+
// similar to its use in netpoll. In this usage (that is, when t.isChan == true),
49+
// writes to seq are protected by both t.mu and t.sendLock,
50+
// so reads are allowed when holding either of the two mutexes.
4451
//
4552
// The delay argument is nanotime() - t.when, meaning the delay in ns between
4653
// when the timer should have gone off and now. Normally that amount is
@@ -69,6 +76,10 @@ type timer struct {
6976
// Since writes to whenHeap are protected by two locks (t.mu and t.ts.mu),
7077
// it is permitted to read whenHeap when holding either one.
7178
whenHeap int64
79+
80+
// sendLock protects sends on the timer's channel.
81+
// Not used for async (pre-Go 1.23) behavior when debug.asynctimerchan.Load() != 0.
82+
sendLock mutex
7283
}
7384

7485
// init initializes a newly allocated timer t.
@@ -167,7 +178,7 @@ func (t *timer) trace1(op string) {
167178
return
168179
}
169180
bits := [4]string{"h", "m", "z", "c"}
170-
for i := range bits {
181+
for i := range 3 {
171182
if t.state&(1<<i) == 0 {
172183
bits[i] = "-"
173184
}
@@ -199,6 +210,18 @@ func (t *timer) unlock() {
199210
unlock(&t.mu)
200211
}
201212

213+
// hchan returns the channel in t.arg.
214+
// t must be a timer with a channel.
215+
func (t *timer) hchan() *hchan {
216+
if !t.isChan {
217+
badTimer()
218+
}
219+
// Note: t.arg is a chan time.Time,
220+
// and runtime cannot refer to that type,
221+
// so we cannot use a type assertion.
222+
return (*hchan)(efaceOf(&t.arg).data)
223+
}
224+
202225
// updateHeap updates t.whenHeap as directed by t.state, updating t.state
203226
// and returning a bool indicating whether the state (and t.whenHeap) changed.
204227
// The caller must hold t's lock, or the world can be stopped instead.
@@ -309,6 +332,7 @@ func newTimer(when, period int64, f func(arg any, seq uintptr, delay int64), arg
309332
racerelease(unsafe.Pointer(&t.timer))
310333
}
311334
if c != nil {
335+
lockInit(&t.sendLock, lockRankTimerSend)
312336
t.isChan = true
313337
c.timer = &t.timer
314338
if c.dataqsiz == 0 {
@@ -372,24 +396,45 @@ func (ts *timers) addHeap(t *timer) {
372396
}
373397
}
374398

375-
// stop stops the timer t. It may be on some other P, so we can't
376-
// actually remove it from the timers heap. We can only mark it as stopped.
377-
// It will be removed in due course by the P whose heap it is on.
378-
// Reports whether the timer was stopped before it was run.
379-
func (t *timer) stop() bool {
380-
t.lock()
381-
t.trace("stop")
399+
// maybeRunAsync checks whether t needs to be triggered and runs it if so.
400+
// The caller is responsible for locking the timer and for checking that we
401+
// are running timers in async mode. If the timer needs to be run,
402+
// maybeRunAsync will unlock and re-lock it.
403+
// The timer is always locked on return.
404+
func (t *timer) maybeRunAsync() {
405+
assertLockHeld(&t.mu)
382406
if t.state&timerHeaped == 0 && t.isChan && t.when > 0 {
383407
// If timer should have triggered already (but nothing looked at it yet),
384408
// trigger now, so that a receive after the stop sees the "old" value
385409
// that should be there.
410+
// (It is possible to have t.blocked > 0 if there is a racing receive
411+
// in blockTimerChan, but timerHeaped not being set means
412+
// it hasn't run t.maybeAdd yet; in that case, running the
413+
// timer ourselves now is fine.)
386414
if now := nanotime(); t.when <= now {
387415
systemstack(func() {
388416
t.unlockAndRun(now) // resets t.when
389417
})
390418
t.lock()
391419
}
392420
}
421+
}
422+
423+
// stop stops the timer t. It may be on some other P, so we can't
424+
// actually remove it from the timers heap. We can only mark it as stopped.
425+
// It will be removed in due course by the P whose heap it is on.
426+
// Reports whether the timer was stopped before it was run.
427+
func (t *timer) stop() bool {
428+
async := debug.asynctimerchan.Load() != 0
429+
if !async && t.isChan {
430+
lock(&t.sendLock)
431+
}
432+
433+
t.lock()
434+
t.trace("stop")
435+
if async {
436+
t.maybeRunAsync()
437+
}
393438
if t.state&timerHeaped != 0 {
394439
t.state |= timerModified
395440
if t.state&timerZombie == 0 {
@@ -399,7 +444,20 @@ func (t *timer) stop() bool {
399444
}
400445
pending := t.when > 0
401446
t.when = 0
447+
448+
if !async && t.isChan {
449+
// Stop any future sends with stale values.
450+
// See timer.unlockAndRun.
451+
t.seq++
452+
}
402453
t.unlock()
454+
if !async && t.isChan {
455+
unlock(&t.sendLock)
456+
if timerchandrain(t.hchan()) {
457+
pending = true
458+
}
459+
}
460+
403461
return pending
404462
}
405463

@@ -439,8 +497,16 @@ func (t *timer) modify(when, period int64, f func(arg any, seq uintptr, delay in
439497
if period < 0 {
440498
throw("timer period must be non-negative")
441499
}
500+
async := debug.asynctimerchan.Load() != 0
501+
502+
if !async && t.isChan {
503+
lock(&t.sendLock)
504+
}
442505

443506
t.lock()
507+
if async {
508+
t.maybeRunAsync()
509+
}
444510
t.trace("modify")
445511
t.period = period
446512
if f != nil {
@@ -449,20 +515,6 @@ func (t *timer) modify(when, period int64, f func(arg any, seq uintptr, delay in
449515
t.seq = seq
450516
}
451517

452-
if t.state&timerHeaped == 0 && t.isChan && t.when > 0 {
453-
// This is a timer for an unblocked channel.
454-
// Perhaps it should have expired already.
455-
if now := nanotime(); t.when <= now {
456-
// The timer should have run already,
457-
// but nothing has checked it yet.
458-
// Run it now.
459-
systemstack(func() {
460-
t.unlockAndRun(now) // resets t.when
461-
})
462-
t.lock()
463-
}
464-
}
465-
466518
wake := false
467519
pending := t.when > 0
468520
t.when = when
@@ -483,7 +535,20 @@ func (t *timer) modify(when, period int64, f func(arg any, seq uintptr, delay in
483535
}
484536

485537
add := t.needsAdd()
538+
539+
if !async && t.isChan {
540+
// Stop any future sends with stale values.
541+
// See timer.unlockAndRun.
542+
t.seq++
543+
}
486544
t.unlock()
545+
if !async && t.isChan {
546+
if timerchandrain(t.hchan()) {
547+
pending = true
548+
}
549+
unlock(&t.sendLock)
550+
}
551+
487552
if add {
488553
t.maybeAdd()
489554
}
@@ -936,7 +1001,35 @@ func (t *timer) unlockAndRun(now int64) {
9361001
if ts != nil {
9371002
ts.unlock()
9381003
}
1004+
1005+
async := debug.asynctimerchan.Load() != 0
1006+
if !async && t.isChan {
1007+
// For a timer channel, we want to make sure that no stale sends
1008+
// happen after a t.stop or t.modify, but we cannot hold t.mu
1009+
// during the actual send (which f does) due to lock ordering.
1010+
// It can happen that we are holding t's lock above, we decide
1011+
// it's time to send a time value (by calling f), grab the parameters,
1012+
// unlock above, and then a t.stop or t.modify changes the timer
1013+
// and returns. At that point, the send needs not to happen after all.
1014+
// The way we arrange for it not to happen is that t.stop and t.modify
1015+
// both increment t.seq while holding both t.mu and t.sendLock.
1016+
// We copied the seq value above while holding t.mu.
1017+
// Now we can acquire t.sendLock (which will be held across the send)
1018+
// and double-check that t.seq is still the seq value we saw above.
1019+
// If not, the timer has been updated and we should skip the send.
1020+
// We skip the send by reassigning f to a no-op function.
1021+
lock(&t.sendLock)
1022+
if t.seq != seq {
1023+
f = func(any, uintptr, int64) {}
1024+
}
1025+
}
1026+
9391027
f(arg, seq, delay)
1028+
1029+
if !async && t.isChan {
1030+
unlock(&t.sendLock)
1031+
}
1032+
9401033
if ts != nil {
9411034
ts.lock()
9421035
}

0 commit comments

Comments
 (0)