Description
Problem
The consolidated metadata abstraction is leaky, in the sense that users are often forced to make explicit choices about something that should ideally just be an automatic hidden optimization.
For example when interacting with zarr via xarray our users often have to pass consolidated=True/False
in order to benefit from it or avoid warnings. This is annoying as it adds boilerplate kwargs to every single xarray.open_zarr()
and Dataset.to_zarr()
call. It comes up for icechunk, which doesn't need explicit consolidated metadata (as it effectively has its own implementation of consolidated metadata).
Coming up with a general one-size-fits-all rule for consolidated metadata doesn't work - see the differing opinions in pydata/xarray#10122.
The problem is that fundamentally whether or not to try to use consolidated metadata is a store-implementation-specific choice. For some stores it's really important (cloud stores), for some it doesn't really matter (local stores), and for some it's not implemented (or even not even implementable at all!).
Proposed solution
We teach the Store
implementation to know whether or not it wants you to use consolidated metadata, so libraries like xarray can ask the store for its preference.
We could do this by adding a new property to the Store
ABC:
class Store:
@property
@abstractmethod
def supports_consolidated_metadata(self) -> bool:
"""Does the store support consolidated metadata?"""
...
This could be False for subclasses by default, but True for e.g. FsspecStore
or IcechunkStore
.
That way xarray can learn what the expected value of the consolidated
kwarg should be. The user could then override that value by passing consolidated
explicitly, but xarray would be able to default to the sensible choice without explicit specification by the user.
I think that should allow xarray to keep reading/writing consolidated metadata by default for stores that benefit from it, whilst not use it for stores which don't, without the user having to understand and specify which is which.
I don't think this requires any changes to the zarr spec, because consolidated metadata is currently not in the spec.
cc @d-v-b @jhamman @shoyer @ianhi @aladinor
P.S. There is a similar issue for passing zarr_version
, which could be fixed in a similar way, but I think that deserves it's own issue.