-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Don't fallback to pandas after simpe hasattr
check
#17678
Comments
Just wanted to comment that this is an interesting class of optimization that I think would be nice for
|
I would be fine with not falling back just to do a check that we know will also fail on the other end, but we will have to be careful to handle some special cases and I fear that it'll be hard to have a general solution here without accepting some degradation in correctness (i.e. we'll have some cases that would have succeeded with fallback that no longer will). In particular, in addition to checking whether the attribute exists on the pandas type, we will also have to check whether the pandas type overrides For example, you can access DataFrame columns using its
If our implementation is missing, we will fail the attribute check in cudf, but we will also then proceed to fail to see it in pandas because As Marco says, there may be enough usage of |
thanks @vyasr If it's not possible to do this everywhere, would you consider not doing the fallback for the following methods:
These are just methods that we check whether incoming object have, and for cuDF, I'd expect |
I don't think your original request is impossible, I just think it's harder than simply checking for the existence of the attribute in pandas. I still certainly think it's worth doing in the general case, or at least worth trying to do before we decide to special-case a handful of APIs. |
Discovered in Narwhals
If I have a function like:
and run it with
cudf.pandas
, then this is enough to cause a fallback to pandas - even though cudf would have been perfectly capable of executing it!This isn't farfetched - you can find examples of libraries doing hasattr checks all over the place, e.g. this one in pymc
https://github.com/pymc-devs/pymc/blob/ae4a6292aa48d5722dba6ab422e1a3d895ca7bf7/pymc/data.py#L232-L248
Scikit-learn:
https://github.com/scikit-learn/scikit-learn/blob/c9aeb15f8f1c7c54ed4ef27c871f7167e2ce3077/sklearn/feature_selection/_base.py#L129-L131
Plotly:
https://github.com/plotly/plotly.py/blob/231edaa61f8590f715134d42ea0bc2f858dd713e/packages/python/plotly/plotly/express/_imshow.py#L303-L312
I asked about this on Slack, and got the response
As a user, I consider falling back to pandas unnecessarily to be significantly worse than raising a slightly inconsistent AttributeError message - especially when using
hasattr
which is when theAttributeError
would be caught anyway.If exception raising needs to be perfectly consistent, perhaps there's a way to still raise it without needing to fallback to pandas? e.g.
The text was updated successfully, but these errors were encountered: