-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add Index.validate_dataarray_coord #10137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
... when check_default_indexes=False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick thoughts
Functions with a leading underscore are marked by pyright as unused if they are not used from within the module in which they are defined. Also remove unneeded nested import.
Move check_dataarray_coords in xarray.core.coordinates module and rename it to validate_dataarray_coords (name consistent with Index.validate_dataarray_coord). Move CoordinateValidationError from xarray.core.indexes to xarray.core.coordinates module.
Thanks @shoyer for taking a look! This is now ready for (another round of) review. Compared to #10116 this PR still represents a major data model change for DataArray (in the sense that it allows overriding strict enforcement of the DataArray model in certain cases), but I think that the risk is mitigated since it is here done explicitly via @dcherian regarding your #10116 (comment), in the example notebook the DataArray reprs shown without any list of dimensions for the Coordinates section do not strike me much. That said, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Yeah I think it could be improved, but certainly not a blocker here. IMO we can merge. I'm not a huge fan of the method name |
Co-authored-by: Deepak Cherian <[email protected]>
Co-authored-by: Deepak Cherian <[email protected]>
for more information, see https://pre-commit.ci
Thanks @benbovy for this change, and apologies for the slow review. My one big question is if The fall-back logic of checking coordinate dimensions could also potentially be implemented on the base |
Instead of adding the ``Index.validate_dataarray_coord`` method that may raise an error, add ``Index.should_add_coord_in_datarray`` method that returns a boolean.
@shoyer I like your suggestion and find it cleaner. In 3e55af0 I replaced the I also removed the fall-back in Is that what you had in mind? |
Also do not add a coordinate in a DataArray indexed from a Dataset if ``index.should_add_coord_in_datarray`` returns False, even if the coordinate variable dimensions are compatible with the DataArray dimensions (note: there's no test for this case yet, I don't know if it represents any real use case).
Yeah this method's a hard one to name. The only thing that comes to mind is |
whats-new.rst
api.rst
Same goal than #10116 using the alternative approach suggested in #10116 (comment) where the propagation (validation) of coordinates in a DataArray is delegated to their index (if any).
I find this approach cleaner than #10116. Here is the same notebook example adapted for this PR.
Index API alternatives
Index.validate_dataarray_coord(self, name: Hashable, var: Variable, dims: set[Hashable]) -> None
Index.validate_dataarray_coords(self, variables: dict[Hashable, Variable], dims: set[Hashable]) -> None
Indexes.group_by_index
while in option 1 we simply need to iterate over_variables
,_coords
and/or_indexes
).Index.validate_dataarray_coords(self, variables: dict[Hashable, Variable], dims: set[Hashable]) -> Coordinates
Option 1, the simplest and working well with IntervalIndex, is currently implemented in this PR.
cc @shoyer @dcherian