Delegate parallel I/O to zarrs via extendr
Problem
pizzarr's parallel I/O is a cross-platform mess. get_parallel_settings() dispatches across six closures to accommodate the matrix of future / parallel::cluster / mclapply × progress bar × OS. R6 objects can't serialize across PSOCK workers, so chunk_getitem is split into part1/part2 — an artifact of R's parallelism model, not the problem itself. crul::HttpClient fails inside workers, so HttpStore carries an AsyncVaried workaround plus a fallback that guesses whether HTTP handle errors came from a worker even when the parallel option didn't propagate (#128). And there's no S3 support at all.
Proposed solution
Delegate chunk I/O, codec execution, and store abstraction to the zarrs Rust crate via extendr. zarrs handles parallel chunk fetch and decode through its own rayon thread pool — parallelism becomes invisible to R. The R6 class hierarchy, indexers, slicing logic, and NestedArray stay in R; only the chunk loop and store I/O move to Rust.
The R-native single-threaded path (DirectoryStore, MemoryStore, plain HttpStore via lapply) is retained permanently as the dependency-free baseline. zarrs sits beside it as the performance and capability tier. This is not a fallback — it is the deliberate split between the simple path and the fast path.
Build and distribution
zarrs 0.23.x requires Rust edition 2024 / rust-version = "1.91". As of April 2026, CRAN's macOS build machines ship rustc 1.84.1 — too old. Rather than pin an older zarrs, we target the latest stable zarrs and defer the CRAN Rust build until CRAN catches up.
Two tiers:
- CRAN tier — Pure R, no Rust compilation. The CRAN tarball has no
src/ directory. A tools/cran-build.sh script produces the submission tarball by stripping Rust artifacts from a repo copy. The package builds and works identically to v0.1.x — all I/O uses the R-native code path.
- r-universe tier — Full zarrs via pre-built binaries: filesystem, HTTP, S3, GCS, gzip, blosc, zstd, sharding. No Rust toolchain needed for end users. r-universe builds from the full repo against the latest stable rustc.
End users install from CRAN for the baseline or from r-universe for parallel I/O and cloud storage. No environment variables, no source-build gymnastics — two channels, two audiences.
When CRAN's macOS toolchain reaches rustc ≥ 1.91, we can vendor the zarrs dependency tree and submit a Rust-enabled CRAN build. The architecture is designed for this — it just doesn't ship yet.
Sequencing
Phases 1–4 introduce zarrs alongside the existing R code: extendr scaffolding, read/write paths, r-universe CI, remote store support. The R package passes R CMD check both with Rust (full repo) and without (CRAN tarball from cran-build.sh).
Phase 5 strips the R parallel infrastructure — get_parallel_settings(), part1/part2, the HttpStore workaround, and the pbapply / future / parallel Suggests. Ships to CRAN as a pure-R update. Users never lose parallel capability; they get a better version of it via r-universe.
Design spec: TODO.md. Rust coding conventions: RUST-STYLE.md.
Use cases by tier
- Local read with parallel decompression — r-universe (zarrs); CRAN (R-native sequential)
- HTTPS read — CRAN: R-native sequential; r-universe: zarrs parallel + pooled
- S3/GCS read+write — r-universe only (object_store)
- Local write — r-universe (zarrs); CRAN (R-native)
- zarr-python codec compatibility — full on r-universe; gzip+blosc on CRAN (R-native codecs)
Implementation
Phase 1 — scaffolding and metadata:
- extendr setup,
Cargo.toml targeting zarrs 0.23.x, Makevars, configure script
tools/cran-build.sh for CRAN tarball production
.onLoad availability probe (is_zarrs_available())
Store$get_store_identifier() for dispatch
- Rust functions:
zarrs_compiled_features, zarrs_runtime_info, zarrs_set_codec_concurrent_target, zarrs_open_array_metadata, zarrs_node_exists, zarrs_close_store
Phase 2 — read path:
zarrs_retrieve_subset with two-step dtype dispatch (retrieve as stored type, widen in Rust)
selection_to_ranges() bridge from pizzarr indexers to zarrs ArraySubset ranges
- Dispatch from
ZarrArray$get_item; benchmark against R-native path
Phase 3 — write path:
zarrs_store_subset, zarrs_create_array with codec presets (gzip/blosc/zstd)
- Round-trip read-write tests on local stores
Phase 4 — r-universe and remote stores:
- r-universe CI for
--features full,s3,gcs
- HTTP reads via zarrs_http and object_store; S3 reads against public bucket
- Publish r-universe binaries
Phase 5 — simplify R-native:
- Strip
get_parallel_settings(), part1/part2, parallel Suggests, HttpStore workaround
- Collapse R-native chunk loop to
lapply
- Updated vignettes, install guide, migration notes
- Ship to CRAN as pure-R update
Tradeoffs
Two code paths permanently, by design. Two distribution tiers (CRAN = pure R, r-universe = zarrs binaries). No Rust on CRAN until macOS toolchain catches up. zarrs is under active development and the extendr bridge needs to stay thin to absorb upstream changes.
Delegate parallel I/O to zarrs via extendr
Problem
pizzarr's parallel I/O is a cross-platform mess.
get_parallel_settings()dispatches across six closures to accommodate the matrix offuture/parallel::cluster/mclapply× progress bar × OS. R6 objects can't serialize across PSOCK workers, sochunk_getitemis split into part1/part2 — an artifact of R's parallelism model, not the problem itself.crul::HttpClientfails inside workers, soHttpStorecarries anAsyncVariedworkaround plus a fallback that guesses whether HTTP handle errors came from a worker even when the parallel option didn't propagate (#128). And there's no S3 support at all.Proposed solution
Delegate chunk I/O, codec execution, and store abstraction to the zarrs Rust crate via extendr. zarrs handles parallel chunk fetch and decode through its own rayon thread pool — parallelism becomes invisible to R. The R6 class hierarchy, indexers, slicing logic, and
NestedArraystay in R; only the chunk loop and store I/O move to Rust.The R-native single-threaded path (DirectoryStore, MemoryStore, plain HttpStore via
lapply) is retained permanently as the dependency-free baseline. zarrs sits beside it as the performance and capability tier. This is not a fallback — it is the deliberate split between the simple path and the fast path.Build and distribution
zarrs 0.23.x requires Rust edition 2024 /
rust-version = "1.91". As of April 2026, CRAN's macOS build machines ship rustc 1.84.1 — too old. Rather than pin an older zarrs, we target the latest stable zarrs and defer the CRAN Rust build until CRAN catches up.Two tiers:
src/directory. Atools/cran-build.shscript produces the submission tarball by stripping Rust artifacts from a repo copy. The package builds and works identically to v0.1.x — all I/O uses the R-native code path.End users install from CRAN for the baseline or from r-universe for parallel I/O and cloud storage. No environment variables, no source-build gymnastics — two channels, two audiences.
When CRAN's macOS toolchain reaches rustc ≥ 1.91, we can vendor the zarrs dependency tree and submit a Rust-enabled CRAN build. The architecture is designed for this — it just doesn't ship yet.
Sequencing
Phases 1–4 introduce zarrs alongside the existing R code: extendr scaffolding, read/write paths, r-universe CI, remote store support. The R package passes
R CMD checkboth with Rust (full repo) and without (CRAN tarball fromcran-build.sh).Phase 5 strips the R parallel infrastructure —
get_parallel_settings(), part1/part2, the HttpStore workaround, and thepbapply/future/parallelSuggests. Ships to CRAN as a pure-R update. Users never lose parallel capability; they get a better version of it via r-universe.Design spec: TODO.md. Rust coding conventions: RUST-STYLE.md.
Use cases by tier
Implementation
Phase 1 — scaffolding and metadata:
Cargo.tomltargeting zarrs 0.23.x, Makevars, configure scripttools/cran-build.shfor CRAN tarball production.onLoadavailability probe (is_zarrs_available())Store$get_store_identifier()for dispatchzarrs_compiled_features,zarrs_runtime_info,zarrs_set_codec_concurrent_target,zarrs_open_array_metadata,zarrs_node_exists,zarrs_close_storePhase 2 — read path:
zarrs_retrieve_subsetwith two-step dtype dispatch (retrieve as stored type, widen in Rust)selection_to_ranges()bridge from pizzarr indexers to zarrsArraySubsetrangesZarrArray$get_item; benchmark against R-native pathPhase 3 — write path:
zarrs_store_subset,zarrs_create_arraywith codec presets (gzip/blosc/zstd)Phase 4 — r-universe and remote stores:
--features full,s3,gcsPhase 5 — simplify R-native:
get_parallel_settings(), part1/part2, parallel Suggests, HttpStore workaroundlapplyTradeoffs
Two code paths permanently, by design. Two distribution tiers (CRAN = pure R, r-universe = zarrs binaries). No Rust on CRAN until macOS toolchain catches up. zarrs is under active development and the extendr bridge needs to stay thin to absorb upstream changes.