-
-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidated metadata preferences on a Store-specific basis #2937
Comments
one small point to add, without wanting to complicate an already complicated issue: NFS is commonly used by folks (including by me!) storing huge zarr archive. and NFS:
so you'd want consolidated metadata, assuming the numbers that @shoyer quotes in pydata/xarray#10122 are representative so purely discriminating by the store may not be sufficient for all cases maaaaybe the code could check for block size and use that to estimate whether it's nfs vs local, but could also be confusing (tbc, this isn't dispositive, just something to weigh) |
which stores cannot support consolidated metadata? |
Interesting @max-sixty - I still think this is a store-level concern though.
Something like that could maybe be done in the implementation of the property / store subclass, but my argument is simply that client code such as xarray should be told what to do by the store.
I don't know, hence why I didn't write an example 😅 I just thought I heard @jhamman say that earlier. I don't think the difference between "does not support", and "could never support" matters for my argument though. |
I'm not sure this is actually something that should be a property of a store?
Instead I feel like your problem would be better solved by a function that takes a store |
Icechunk for one! There is little value to supporting it in the ZipStore as well (though it technically works). |
As a subtle variation of this idea, what consolidated metadata was implemented by specific Stores, not merely something that might be indirectly supported by stores? For example: class ConsolidatedFsspecStore(FsspecStore):
def close(self):
if self._metadata_changed():
consolidate_metadata(self) There is probably something I'm missing about why this wouldn't work, but it could make for a simpler user experience. |
@shoyer one reason why that would be hard today is because our |
@jhamman explained the fundamental issue to me and I think I understand why my initial suggestion of a function that maps stores to bools wouldn't work: if someone (e.g., earthmover) defines their own so I agree that individual store implementations need to carry around information about whether they support consolidated metadata. Here is one idea to solve this:
|
Problem
The consolidated metadata abstraction is leaky, in the sense that users are often forced to make explicit choices about something that should ideally just be an automatic hidden optimization.
For example when interacting with zarr via xarray our users often have to pass
consolidated=True/False
in order to benefit from it or avoid warnings. This is annoying as it adds boilerplate kwargs to every singlexarray.open_zarr()
andDataset.to_zarr()
call. It comes up for icechunk, which doesn't need explicit consolidated metadata (as it effectively has its own implementation of consolidated metadata).Coming up with a general one-size-fits-all rule for consolidated metadata doesn't work - see the differing opinions in pydata/xarray#10122.
The problem is that fundamentally whether or not to try to use consolidated metadata is a store-implementation-specific choice. For some stores it's really important (cloud stores), for some it doesn't really matter (local stores), and for some it's not implemented (or even not even implementable at all!).
Proposed solution
We teach the
Store
implementation to know whether or not it wants you to use consolidated metadata, so libraries like xarray can ask the store for its preference.We could do this by adding a new property to the
Store
ABC:This could be False for subclasses by default, but True for e.g.
FsspecStore
orIcechunkStore
.That way xarray can learn what the expected value of the
consolidated
kwarg should be. The user could then override that value by passingconsolidated
explicitly, but xarray would be able to default to the sensible choice without explicit specification by the user.I think that should allow xarray to keep reading/writing consolidated metadata by default for stores that benefit from it, whilst not use it for stores which don't, without the user having to understand and specify which is which.
I don't think this requires any changes to the zarr spec, because consolidated metadata is currently not in the spec.
cc @d-v-b @jhamman @shoyer @ianhi @aladinor
P.S. There is a similar issue for passing
zarr_version
, which could be fixed in a similar way, but I think that deserves it's own issue.The text was updated successfully, but these errors were encountered: