Skip to content

Make Cache.Open return io.ReadSeekCloser to support Range requests #341

Description

@alecthomas

Summary

Change the Cache interface so that Open() returns an io.ReadSeekCloser instead of an io.ReadCloser, in order to support HTTP Range requests when serving cached objects.

For most backends this is trivial. For backends that stream over the network (S3 and the Remote cache client), we introduce a wrapper that supports a single Seek() to set the start offset, followed by purely sequential reads, backed by a range request. The higher-level Range-serving code is written to use exactly this access pattern.

Interface change

In internal/cache/api.go, change:

Open(ctx context.Context, key Key) (io.ReadCloser, http.Header, error)

to return io.ReadSeekCloser. Document that callers serving ranges MUST use a single seek-to-start followed by sequential reads (no seek-to-end probing). Because io.ReadSeekCloser is a superset of io.ReadCloser, all existing sequential consumers (http.Fetch, git snapshot/bundle, gomod cacher, cachetest suite, etc.) continue to compile and work unchanged.

Shared seek helper

Add one reusable "seek-once, lazily open at offset, then sequential" wrapper implementing io.ReadSeekCloser, parameterised by an "open underlying stream at offset" function:

  • Holds a pending start offset (default 0).
  • Seek is only meaningful before the first Read: it sets the start offset, resolving io.SeekStart / io.SeekCurrent / io.SeekEnd against the known object size. After reading begins, Seek returns an error.
  • On first Read, lazily opens the underlying stream at the offset, then reads sequentially.
  • Close tears down the underlying stream.

This helper is shared by the S3 and Remote backends (DRY).

Backend changes

  • disk (disk.go): return *os.File directly — already an io.ReadSeekCloser. Signature only.
  • memory (memory.go): wrap the existing *bytes.Reader (already seekable) in a no-op-close wrapper instead of io.NopCloser.
  • noop (noop.go): signature only (always returns a cache miss).
  • s3 (s3.go): implement the helper's "open at offset" using the existing parallelGet / GetObject path, starting from the seek offset instead of 0.
  • remote (remote.go + client/*.go): implement "open at offset" via a ranged GET. This requires:
    • a new Range(start) RequestOption in the client package that sets Range: bytes=start-;
    • client.Open accepting 206 Partial Content in addition to 200 OK.

Server-side Range support

Add single-range support to httputil.ServeCacheHit (shared by the API handler and the generic caching handler). Because the S3/Remote readers only support seek-to-start (not seek-to-end), parse the Range header manually rather than using http.ServeContent (which probes the end via Seek(0, io.SeekEnd)):

  • Use the existing Content-Length header for the object size (no seek-to-end).
  • For a satisfiable single range: Seek(start, io.SeekStart) once, then io.CopyN, emitting 206 Partial Content, Content-Range, Content-Length, and Accept-Ranges: bytes.
  • For an unsatisfiable range: 416 Range Not Satisfiable with Content-Range: bytes */size.
  • No range / full request: behave as today (advertise Accept-Ranges: bytes).
  • Preserve existing conditional (If-Match / If-None-Match) handling.

This change powers both the API endpoint and, transitively, the Remote backend's ranged reads.

Tiered cache behaviour

The tiered backfill must not commit a truncated object when a range request reads only a slice.

  • Full sequential read from a higher tier: keep today's free tee-backfill into tier 0 (no extra GET).
  • Ranged read from a higher tier (a non-trivial Seek): abandon the tee (cancel the tier-0 write so the partial entry is discarded) and kick off a singleton full copy — a background, request-independent (context.WithoutCancel) download of the whole object from the hitting tier into tier 0, deduplicated so N concurrent range readers trigger at most one copy.
  • A bytes=0- whole-object range (Seek to current position 0 before any read) is treated as a no-op and keeps the cheap tee path.

Mechanics:

  • backfillReadCloser becomes seekable and tracks bytes read. Seek to the current position before reading delegates to the source and keeps teeing; any other Seek cancels the tee, fires the singleton-copy trigger once, then delegates the seek to the source.
  • Singleton copy dedup lives on Tiered via a shared *sync.Map keyed by namespace + "/" + key. Since Tiered.Namespace() returns a fresh value per request, this map (and a namespace field) must be carried through Namespace() by pointer so dedup spans requests.
  • On trigger: LoadOrStore the key; if present, no-op. Otherwise spawn a goroutine that re-Opens the object from the hitting tier (full, unseeked read), writes it to tier 0 via WriteFunc, and deletes the dedup entry on completion. Errors are logged, not returned (best-effort warming).

Consequence: a ranged read against a cold local tier causes two reads from the higher tier (the range plus the deduplicated background full copy). This is the cost of warming tier 0 on range access.

Tests

  • S3 seekable reader: seek-then-sequential-read, error on seek-after-read.
  • ServeCacheHit ranges: 206 + Content-Range, 416 unsatisfiable, Accept-Ranges advertised, full request unchanged.
  • Remote range round-trip (client Range option + 206 handling end-to-end).
  • Tiered: ranged read does not commit a truncated tier-0 entry; ranged read triggers a (deduplicated) full singleton copy that warms tier 0; full read still tees as before.
  • Add a Range case to the cachetest suite so every backend is exercised.

Validation

  • just tasks / go test ./...
  • linters (golangci-lint via the repo's just target)

Out of scope / notes

  • Multi-range (multipart/byteranges) responses are not supported; only single ranges.
  • The "warm tier 0 on range access" copy is best-effort and fire-and-forget.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions