pizzarr ships in two tiers. The CRAN build is pure R — no Rust
compilation, no system dependencies. It handles local and HTTP Zarr
stores with sequential chunk I/O via lapply. The r-universe
build compiles in the zarrs
Rust crate via extendr, adding
parallel decompression, cloud-native store backends (S3, GCS), and
codecs beyond what R packages provide.
The split exists because CRAN’s macOS build machines ship a Rust
toolchain (rustc 1.84) that is too old for zarrs, which requires rustc
>= 1.91. r-universe builds against the latest stable toolchain, so it
can compile zarrs and distribute pre-built binaries. End users on either
tier install with install.packages() — no Rust toolchain
needed.
pizzarr_compiled_features() lists the feature flags
compiled into the zarrs backend. On the CRAN tier it returns
character(0) with a message; on the r-universe tier it
returns the compiled capabilities:
pizzarr_compiled_features()
#> zarrs backend not available (pure R install).
#> See ?pizzarr_upgrade for the r-universe install.
#> character(0)The internal flag .pizzarr_env$zarrs_available is a
logical scalar set once at package load. Dispatch logic throughout
pizzarr checks this flag to decide whether to call into Rust or fall
through to the R-native path:
pizzarr_upgrade() prints the r-universe install command
when zarrs is not compiled in, or confirms that the backend is already
present:
pizzarr_upgrade()
#> Install pizzarr with the zarrs backend from r-universe:
#>
#> install.packages("pizzarr", repos = "https://zarr-developers.r-universe.dev")The startup message that CRAN users see on
library(pizzarr) can be silenced with
options(pizzarr.suggest_runiverse = FALSE).
The examples below require the zarrs backend. When this vignette is built without it, the code chunks are not evaluated.
zarrs_node_exists() opens a filesystem store via the
Rust backend, probes for V2 and V3 metadata keys at a given path, and
returns a list with three fields: exists (logical),
node_type (character), and zarr_format
(integer or NULL). The store handle is cached on the Rust side —
subsequent calls to the same store path reuse it without re-opening.
The Rust backend holds open store handles in a process-global cache
keyed by normalized path. zarrs_close_store() removes a
handle from the cache and returns TRUE. A second call to
the same path returns FALSE — it was already removed:
zarrs_open_array_metadata() opens a zarrs array and
returns its metadata as a named list. The store handle is cached, so
repeated calls to the same store are fast. The returned list contains
shape, chunks, dtype,
r_type, fill_value_json,
zarr_format, and order.
V3 arrays work the same way. The zarr_format field
distinguishes V2 from V3:
The r_type field maps zarrs data types to R-compatible
type families. zarrs numeric types are classified as
"double", "integer", or "logical"
based on what R can represent natively:
Unsupported types (strings, complex) report
"unsupported" and fall back to the R-native code path.
zarrs_runtime_info() reports the current zarrs
configuration — the codec concurrency target, thread pool size, how many
store handles are cached, and which features were compiled in:
pizzarr_config() is the main interface for viewing and
changing concurrency settings. Called with no arguments it returns the
current state; with arguments it sets the specified values:
# View current settings
pizzarr_config()
# Set codec concurrency to 2 parallel operations per read/write
pizzarr_config(concurrent_target = 2L)
zarrs_runtime_info()$codec_concurrent_targetThree settings are available:
PIZZARR_NTHREADS environment
variable before starting R.All three settings can also be configured via environment variables
(PIZZARR_NTHREADS, PIZZARR_CONCURRENT_TARGET,
PIZZARR_HTTP_BATCH_RANGE_REQUESTS) or R options
(pizzarr.nthreads, etc.), which are read at package load
time. Environment variables persist across sessions without needing
.Rprofile edits.
The lower-level zarrs_set_codec_concurrent_target()
function is still available for direct use:
When the zarrs backend is available and the selection is a contiguous
slice (step == 1), ZarrArray$get_item() dispatches reads to
zarrs automatically. zarrs handles chunk identification, parallel
decompression, and codec execution internally, bypassing pizzarr’s
R-native chunk loop. Scalar integer selections (e.g., selecting a single
row of a matrix) are also eligible — they become length-1 ranges on the
Rust side. Unsupported selections (step > 1 slices, fancy indexing,
MemoryStore) fall through to the R-native path transparently.
d <- tempfile("zarrs_vignette_")
z <- zarr_create(store = d, shape = c(100L, 50L), chunks = c(10L, 10L),
dtype = "<f8")
z$set_item("...", array(as.double(seq_len(5000)), dim = c(100, 50)))
# Re-open and read a subset --- zarrs handles the chunk I/O
z2 <- zarr_open(store = d)
result <- z2$get_item(list(slice(1L, 10L), slice(1L, 5L)))
dim(result$data)For lower-level access, zarrs_get_subset() reads a
contiguous subset directly via the Rust backend. Ranges are 0-based with
exclusive stop, matching zarrs conventions:
The optional concurrent_target parameter (or the
pizzarr.concurrent_target R option) controls how many
parallel codec operations zarrs uses within a single read call. Setting
it to 1L disables parallel decompression:
When the zarrs backend is available and the store is a writable
filesystem path, zarr_create() dispatches array creation to
zarrs instead of building metadata JSON in R. zarrs validates the
metadata structure, writes it to the store, and the array is ready for
data. The dispatch is transparent — the same zarr_create()
call works on both tiers, and unsupported configurations (MemoryStore,
object dtypes, custom filters) fall through to the R-native path.
The zarr_create() examples earlier in this vignette
already use this path when zarrs is available. The zarrs backend handles
V2 and V3 formats, all 11 numeric data types, and four codec
presets:
zarrs_create_array() provides lower-level access to the
Rust creation path. It accepts V3-style data type names
("float64", "int32", "bool",
etc.) and a codec preset string ("none",
"gzip", "blosc", or "zstd"). The
return value is the same metadata list as
zarrs_open_array_metadata():
d <- tempfile("zarrs_create_direct_")
dir.create(d)
meta <- zarrs_create_array(
store_url = d,
array_path = "",
shape = c(100L, 50L),
chunks = c(10L, 10L),
dtype = "float64",
codec_preset = "gzip",
fill_value = 0.0,
attributes_json = "{}",
zarr_format = 3L
)
str(meta)The array is immediately usable for reads and writes:
The zarrs creation path supports four named codec presets. Custom codec configurations fall through to the R-native path.
| Preset | V2 compressor | V3 codec chain | Notes |
|---|---|---|---|
"none" |
null | bytes only | No compression |
"gzip" |
gzip, level 1 | bytes + gzip(1) | Fast, reasonable ratio |
"blosc" |
blosc, lz4, clevel 5 | bytes + blosc(lz4, 5) | Requires blosc feature |
"zstd" |
— | bytes + zstd(3) | V3 only; requires zstd feature |
One difference from the R-native path: zarrs uses the
"gzip" compressor id for V2 arrays, while zarr-python uses
"zlib". Both produce gzip-compatible output, and zarrs
reads either id when opening existing arrays.
The write path mirrors the read path. When the zarrs backend is
available and the selection qualifies (contiguous slices,
filesystem-backed store), ZarrArray$set_item() dispatches
writes to zarrs instead of iterating over chunks in R. zarrs encodes the
data, splits it across the affected chunks, and writes them to disk —
using its internal thread pool for parallel compression when multiple
chunks are involved.
Data type narrowing happens on the Rust side. R doubles narrow to the array’s stored type (float32, int64, uint32, etc.) and R integers narrow to smaller integer types (int16, int8, uint8, uint16) with range checking. An out-of-range value produces an error rather than silent truncation.
d <- tempfile("zarrs_write_vignette_")
z <- zarr_create(store = d, shape = c(20L, 10L), chunks = c(10L, 10L),
dtype = "<f8")
# set_item dispatches to zarrs when eligible
z$set_item("...", array(as.double(1:200), dim = c(20, 10)))
# Read back to confirm
z2 <- zarr_open(store = d)
result <- z2$get_item(list(slice(1L, 5L), slice(1L, 3L)))
result$dataWriting to a subset of an existing array works the same way. zarrs reads the affected chunks, merges the new data, and writes them back:
zarrs_set_subset() provides lower-level access to the
Rust write path. Data is a flat vector in R’s native F-order
(column-major) — the Rust backend handles the F-to-C order conversion
internally. The function returns TRUE on success:
When the http_sync feature is compiled in, the zarrs
backend can read directly from HTTP/HTTPS Zarr stores using the zarrs_http crate. This
bypasses pizzarr’s R-native crul-based chunk loop, giving
parallel chunk decode on remote data.
HTTP stores are read-only in zarrs — write dispatch
(set_item) falls through to the R-native path
automatically.
The zarrs fast path activates automatically when an
HttpStore-backed array is read with a contiguous selection.
No code changes are needed compared to the R-native path:
zarrs_get_subset() also works with HTTP URLs. The store
handle is cached on the Rust side, so repeated reads to the same URL
reuse the connection:
When the s3 feature is compiled in, the zarrs backend
can read from Amazon S3 buckets using the object_store crate with an
async-to-sync adapter. Public buckets work without credentials (unsigned
requests). Authenticated access uses standard AWS environment variables
(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,
AWS_REGION).
S3 stores are currently read-only via zarrs — write operations fall through to the R-native path.
# OME-Zarr bonsai dataset on AWS Open Data (V2, zstd, uint8)
s3_url <- "s3://ome-zarr-scivis/v0.4/64x0/bonsai.ome.zarr"
# Read array metadata
meta <- zarrs_open_array_metadata(s3_url, "scale0/bonsai")
str(meta[c("shape", "dtype", "zarr_format")])# Read a small subset (first 4x4x4 corner)
result <- zarrs_get_subset(s3_url, "scale0/bonsai",
list(c(0L, 4L), c(0L, 4L), c(0L, 4L)), NULL)
str(result)GCS data hosted on Google Cloud Storage is publicly accessible via HTTPS endpoints. The zarrs HTTP backend reads these directly:
# Pangeo ECCO ocean basins (V2, blosc/lz4, float32)
gcs_url <- "https://storage.googleapis.com/pangeo-data/ECCO_basins.zarr"
meta <- zarrs_open_array_metadata(gcs_url, "basin_mask")
cat("Shape:", paste(meta$shape, collapse = " x "), "\n")
cat("Dtype:", meta$dtype, "\n")# Read a single basin mask slice
result <- zarrs_get_subset(gcs_url, "basin_mask",
list(c(0L, 1L), c(0L, 90L), c(0L, 90L)), NULL)
cat("Slice dimensions:", paste(result$shape, collapse = " x "), "\n")Authenticated GCS access via gs:// URLs requires the
gcs compiled feature and GCP credentials (environment
variables or application default credentials). The S3Store
and GcsStore R6 classes provide URL wrappers for high-level
use with zarr_open():
zarrs stores data in C-order (row-major), while R uses F-order (column-major). The Rust backend handles this conversion transparently:
zarrs_get_subset() returns data
in F-order, ready for array(data, dim = shape) with no
aperm() needed.zarrs_set_subset() accepts
F-order data and converts to C-order internally before writing to the
store.The transpose uses cache-blocked tiling for 2D arrays and
output-order iteration with incremental index tracking for higher
dimensions, matching or exceeding the performance of R’s C-level
aperm().