-
-
Notifications
You must be signed in to change notification settings - Fork 18.9k
BUG: Dataframe arithmatic operators don't work with Series using fill_value #61828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
||
|
||
@pytest.mark.parametrize("op", ["add", "sub", "mul", "div", "mod", "truediv", "pow"]) | ||
def test_df_series_fill_value(op): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ill need to take a closer look at this, just because im really skeptical that the fix is this easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, i think the trouble is that in _maybe_align_series_as_frame we will broadcast the 1D object to 2D for numpy dtypes, but not EA dtypes. so can you add a test for non-numpy dtypes and see how it goes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct about the EAs, Ill update the testcases and then change the function to work with other EA types (mainly the ones that work with operators, int and float I believe)
Im closing the PR for now until the additional fixes for EA are deployed |
Reopened to talk about fixes for this specific issue before I get sidetracked by 1D operations again (ignore all the failed checks for now) |
The appropriate fix is going to be in _maybe_align_series_as_frame |
So this was what I was working on locally, and had questions about. I was able to reshape EAs in _maybe_align_series_as_frame and am still working on various places to get the operation smoothed out. But I feel like this issue deviates from the original issue, which is only related to fill_value. As far as I can tell this is not related to that issue so we should probably file it under another and mark the original closed for bookkeeping. I can add another test case which wouldn't require 2D EA operations for the dtype test. (There was original a bunch of brain spew about issues I was currently having, but I'll organize it before reposting if needed) |
Just making sure, do you agree with splitting the 1D part off? |
It looks like the change might have inadvertently changed some behavior that I don't know if I should keep or not. It reverts the error message that is expected in the test_period_add_timestamp_raises test back to what it was pre-resolution-inference according to your comment from a year ago. And it makes the test_add_strings in test_string.py return a success, rather than the xfail that it was supposed to be. test_add_frame unfortunately still fails though so I don't know if I should purposefully break it to keep the actions in line with each other. I read the linked issue but don't think there was a consensus (#28527 ) |
whats the updated exception messsage for the period one? Fixing xfailed tests is a good thing. |
Can you remove the xfail and let’s see how the CI does |
So interestingly, it seems to pass the tests it failed previously while failing the ones it previously succeeded. Do you know if there is a significant difference between the subset of unit tests that are different than the others? (Freethreading, Numpy Dev, Linux-32-bit. Linux-Musl, Pyodide, and Without PyArrow). Alternatively, I can carve out StringArray for now and investigate it as a separate issue |
pandas/core/arrays/arrow/array.py
Outdated
other = self._box_pa(other) | ||
other_NA = self._box_pa(other) | ||
# pyarrow gets upset if you try to join a NullArray | ||
other = other_NA.cast(pa_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it obvious this is always right? e.g. what if self is pa.timestamp("us")
and other is pa.int64()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair, I did try to only check for NullArrays, but that returned the error about how it couldn't concatenate the frame in the original add_to_frame testcase.
We could circumvent that by casting the initial df as an object but I didn't want to mess with the test case because I didn't know if that was something it was testing for.
Alternatively, I can just reimplement a check and check for dtypes we'd want to let go through
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this is the cause of a bunch of test failures FAILED pandas/tests/extension/test_arrow.py::test_arithmetic_temporal[pa_type11] - pyarrow.lib.ArrowNotImplementedError: Unsupported cast from duration[us] to timestamp using function cast_timestamp
.
are you running the tests locally before committing/pushing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only ran the array folder because the full suite takes a lot of time, Ill be sure to run the full thing going forward. That's on me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ill add official testcases once the build clears CI due to the weird tack-on nature of this bug fix. Just from some local testing, it looks like there is already a preexisting error message for trying to use the add operation on dtypes like Datetime and TimeDelta.
That being said, it looks like the CI is throwing errors on some of the builds but not others again, and what do you know, they're not replicated on my local machine. Would you know who I could talk to to figure out why that is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the exception message in a test needs to be updated thats fine as long as the new one makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
Any pointers for the CI or should I ask it during the meeting tmr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i havent looked too closely, but the CI failiures ll look like cases of "the test needs to be updated to check for the new exception message".
none of the edits to the ArrowEA are necessary, nor is the special-casing for Period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that 2/3 of the unit tests succeed as-is, so it doesn't make sense why only 7 are failing. Especially since the error is about a float being concatenated with a string, which all the other builds are able to do. My guess was that something was different about their set up process.
I modified ArrowEA to address the xfail issue in add_to_frame across environments. With the changes, add_to_string passes, but add_to_frame behaves inconsistently on some CIs (occasionally passing despite xfail). If preferred, I can revert the edits, though that would leave the inconsistency still. |
When i tried this locally i didn't need to modify ArrowEA at all. What breaks without that change? |
Before modifying ArrayEA, these were the failing tests in ArrowEA: pandas/tests/arrays/string_/test_string.py::test_add_frame[string=string[python]] These tests were expected to fail but did not. I was unable to replicate the failures locally, and most CI runs did not encounter the issue; it appeared only in a small subset. I modified ArrayEA to reconcile these differences, but the same CI runs are still encountering issues. |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Removed a test which checked for expected error to be raised and a corner case. Added a test case to test multiple operators with Dataframe x Series operations while using fill_value