Skip to content

feat(pinner): add Close with ErrClosed lifecycle#1150

Open
lidel wants to merge 5 commits into
mainfrom
feat/pinner-close
Open

feat(pinner): add Close with ErrClosed lifecycle#1150
lidel wants to merge 5 commits into
mainfrom
feat/pinner-close

Conversation

@lidel
Copy link
Copy Markdown
Member

@lidel lidel commented Apr 24, 2026

Follow-up to #1146. The Pinner had no lifecycle, so kubo/pebble could close the backing datastore while pinner goroutines were still reading it and hit the use-after-close panic that #1146 papers over with recover().

👉 This PR adds Close() error to the Pinner interface and implements it in dspinner.

Close fires context.AfterFunc on every admitted op's derived ctx, so in-flight Pin, Unpin, Flush, Update, and streaming goroutines that honor ctx bail out promptly with context.Canceled (scalar) or ErrClosed (stream) rather than draining to completion. After Close, every other method fails fast with ErrClosed; streaming methods deliver it as StreamedPin.Err on a single entry then close the channel. Close is idempotent and goroutine-safe.

This means Close returns about as fast as the slowest in-flight op takes to honor its ctx. Hosts with a hard shutdown deadline (e.g. kubo via ipfs/kubo#11329's shutdown.CloseWithCtx) get bounded shutdown for free, and the panic-recovery from #1146 stays as defense in depth.

Breaking change: downstream Pinner implementations must add Close() error. Hosts owning the datastore should call Close on the pinner before closing the datastore.

Kubo PR that uses this:

Pinner gains Close() error. Close waits for every in-flight operation,
including streaming goroutines from RecursiveKeys, DirectKeys, and
InternalPins, to finish. After Close, every other Pinner method fails
fast with a new ErrClosed sentinel; streaming methods surface it as the
Err of a single entry on the returned channel, which is then closed.

dspinner gains the matching implementation: Close is idempotent,
admission is serialised with sync.Mutex + sync.WaitGroup, and stream
sends select on a shutdown channel so a parked consumer cannot stall
Close. The panic-recovery and context guards added in #1146 stay in
place as defence in depth for hosts that do not wire Close correctly.

Hosts that own the backing datastore (e.g. kubo) should call Close on
the pinner before closing the datastore to avoid use-after-close
panics in pebble and similar stores.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 98.64865% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 63.24%. Comparing base (95ad7a5) to head (121e7ad).

Files with missing lines Patch % Lines
pinning/pinner/dspinner/pin.go 98.64% 0 Missing and 1 partial ⚠️

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1150      +/-   ##
==========================================
+ Coverage   63.11%   63.24%   +0.13%     
==========================================
  Files         267      267              
  Lines       26777    26848      +71     
==========================================
+ Hits        16899    16979      +80     
+ Misses       8158     8152       -6     
+ Partials     1720     1717       -3     
Files with missing lines Coverage Δ
pinning/pinner/pin.go 31.25% <ø> (ø)
pinning/pinner/dspinner/pin.go 68.48% <98.64%> (+4.52%) ⬆️

... and 10 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lidel lidel marked this pull request as ready for review May 12, 2026 14:41
@lidel lidel requested a review from a team as a code owner May 12, 2026 14:41
fix the errClosedChan rationale (buffering is needed because the
send is synchronous and has no reader; there is no goroutine to
leak) and the InternalPins comment that claimed the call was not
tracked by p.wg despite begin() doing wg.Add.

- changelog: condense the two close bullets, promote the
  downstream-impl break to an action-required callout, add #1150
  refs
- pinner.Pinner.Close / ErrClosed godoc: lead with the load-bearing
  verb, drop restatement
- dspinner: errClosedChan, InternalPins, streamIndex send/recover,
  begin, Close, SetAutosync comments shortened or corrected
lidel added a commit to ipfs/kubo that referenced this pull request May 13, 2026
@lidel lidel marked this pull request as draft May 13, 2026 21:52
Close fires context.AfterFunc on every admitted op's derived ctx, so
in-flight Pin/Unpin/streams that honor ctx bail out promptly instead
of forcing kubo's shutdown to wait or trip the watchdog.

- stopCtx + AfterFunc fan-out replaces the chan-based done signal
- begin returns the derived ctx; admitted methods thread it through
- streamIndex send watches one ctx.Done; Close-driven cancel from
  snapshotIndex surfaces as ErrClosed on the stream
lidel added a commit to ipfs/kubo that referenced this pull request May 13, 2026
ipfs/boxo#1150 was reworked to use context fan-out instead of a done
channel. Pinner.Close now cancels every admitted op and waits for
them to return, broadening the shutdown contract from "drain
streams" to "drain everything". Comments and changelog reworded to
match.
@lidel lidel marked this pull request as ready for review May 13, 2026 22:29

require.ErrorIs(t, <-pinReturned, context.Canceled)
require.NoError(t, <-closeReturned)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would testing.Synctest be good here too?

lidel added 2 commits May 14, 2026 17:26
drop the 20ms/50ms timing in TestCloseWaitsForInFlightOperation and
move it into a synctest bubble (gammazero review). Go 1.26 does not
treat sync.RWMutex.Lock waiting behind a durably-held RLock as
durably blocked, so swap the non-ctx-aware blocker from the pinner's
RWMutex to a tiny ctxIgnoringDAGService whose Add parks on a channel
and ignores ctx. Same behavioral assertion (Close must wait on wg
when the cancellation fan-out cannot unstick the in-flight op),
deterministic and ~50x faster.

also drop the duplicated doc paragraph above the test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants