Skip to content

Consolidated metadata preferences on a Store-specific basis #2937

Open
@TomNicholas

Description

@TomNicholas

Problem

The consolidated metadata abstraction is leaky, in the sense that users are often forced to make explicit choices about something that should ideally just be an automatic hidden optimization.

For example when interacting with zarr via xarray our users often have to pass consolidated=True/False in order to benefit from it or avoid warnings. This is annoying as it adds boilerplate kwargs to every single xarray.open_zarr() and Dataset.to_zarr() call. It comes up for icechunk, which doesn't need explicit consolidated metadata (as it effectively has its own implementation of consolidated metadata).

Coming up with a general one-size-fits-all rule for consolidated metadata doesn't work - see the differing opinions in pydata/xarray#10122.

The problem is that fundamentally whether or not to try to use consolidated metadata is a store-implementation-specific choice. For some stores it's really important (cloud stores), for some it doesn't really matter (local stores), and for some it's not implemented (or even not even implementable at all!).

Proposed solution

We teach the Store implementation to know whether or not it wants you to use consolidated metadata, so libraries like xarray can ask the store for its preference.

We could do this by adding a new property to the Store ABC:

class Store:
    @property
    @abstractmethod
    def supports_consolidated_metadata(self) -> bool:
        """Does the store support consolidated metadata?"""
        ...

This could be False for subclasses by default, but True for e.g. FsspecStore or IcechunkStore.

That way xarray can learn what the expected value of the consolidated kwarg should be. The user could then override that value by passing consolidated explicitly, but xarray would be able to default to the sensible choice without explicit specification by the user.

I think that should allow xarray to keep reading/writing consolidated metadata by default for stores that benefit from it, whilst not use it for stores which don't, without the user having to understand and specify which is which.

I don't think this requires any changes to the zarr spec, because consolidated metadata is currently not in the spec.

cc @d-v-b @jhamman @shoyer @ianhi @aladinor

P.S. There is a similar issue for passing zarr_version, which could be fixed in a similar way, but I think that deserves it's own issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions