-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
interp - Prefer broadcast over reindex when possible #10554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
xarray/core/dataset.py
Outdated
to_broadcast = (var.copy().squeeze(),) + tuple( | ||
dest for index, dest in use_indexers.values() | ||
) | ||
variables[name] = broadcast_variables(*to_broadcast)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the semantics from copies to views. We'll have to manually deepcopy these vars to avoid confusing downstream errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
broadcast_variables(*to_broadcast)[0].copy(deep=True)
should do the trick I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that would work. This could be a good opportunity to look for optimizations in reindex if you have the bandwidth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using copy(deep=True)
now. I couldn't see a noticeable difference with the example above.
Last time I followed the reindex path Dask was the bottleneck. Though I'm not very familiar with those functions.
I recall reindex was about the same speed as interpolation a few years ago.
for more information, see https://pre-commit.ci
Co-authored-by: Deepak Cherian <[email protected]>
When a variable is a scalar it is faster to broadcast instead of using reindex. Use that when doing dataset interpolation.
whats-new.rst
Main:
PR: