Skip to content

Conversation

eicchen
Copy link
Contributor

@eicchen eicchen commented Jul 10, 2025

Removed a test which checked for expected error to be raised and a corner case. Added a test case to test multiple operators with Dataframe x Series operations while using fill_value



@pytest.mark.parametrize("op", ["add", "sub", "mul", "div", "mod", "truediv", "pow"])
def test_df_series_fill_value(op):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ill need to take a closer look at this, just because im really skeptical that the fix is this easy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, i think the trouble is that in _maybe_align_series_as_frame we will broadcast the 1D object to 2D for numpy dtypes, but not EA dtypes. so can you add a test for non-numpy dtypes and see how it goes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct about the EAs, Ill update the testcases and then change the function to work with other EA types (mainly the ones that work with operators, int and float I believe)

@eicchen
Copy link
Contributor Author

eicchen commented Jul 15, 2025

Im closing the PR for now until the additional fixes for EA are deployed

@eicchen eicchen closed this Jul 15, 2025
@eicchen eicchen reopened this Aug 19, 2025
@eicchen
Copy link
Contributor Author

eicchen commented Aug 19, 2025

Reopened to talk about fixes for this specific issue before I get sidetracked by 1D operations again (ignore all the failed checks for now)

@jbrockmendel
Copy link
Member

The appropriate fix is going to be in _maybe_align_series_as_frame

@eicchen
Copy link
Contributor Author

eicchen commented Aug 20, 2025

The appropriate fix is going to be in _maybe_align_series_as_frame

So this was what I was working on locally, and had questions about. I was able to reshape EAs in _maybe_align_series_as_frame and am still working on various places to get the operation smoothed out. But I feel like this issue deviates from the original issue, which is only related to fill_value. As far as I can tell this is not related to that issue so we should probably file it under another and mark the original closed for bookkeeping.

I can add another test case which wouldn't require 2D EA operations for the dtype test.

(There was original a bunch of brain spew about issues I was currently having, but I'll organize it before reposting if needed)

@eicchen
Copy link
Contributor Author

eicchen commented Aug 20, 2025

Just making sure, do you agree with splitting the 1D part off?

@eicchen
Copy link
Contributor Author

eicchen commented Aug 21, 2025

It looks like the change might have inadvertently changed some behavior that I don't know if I should keep or not.

It reverts the error message that is expected in the test_period_add_timestamp_raises test back to what it was pre-resolution-inference according to your comment from a year ago.

And it makes the test_add_strings in test_string.py return a success, rather than the xfail that it was supposed to be. test_add_frame unfortunately still fails though so I don't know if I should purposefully break it to keep the actions in line with each other. I read the linked issue but don't think there was a consensus (#28527 )

@jbrockmendel
Copy link
Member

whats the updated exception messsage for the period one?

Fixing xfailed tests is a good thing.

@eicchen
Copy link
Contributor Author

eicchen commented Aug 21, 2025

it is now "cannot add PeriodArray and DatetimeArray", which is inline with what it is for everything else.

here's the code snippet. I modified.
image

However, it looks like contrary to my earlier statement, add_to_frame doesn’t consistently pass as xfail on the pipeline, some jobs fail while others don’t. It works as expected locally, so I’m not sure how best to debug this properly. Do you have any advice?

@jbrockmendel
Copy link
Member

Can you remove the xfail and let’s see how the CI does

@eicchen
Copy link
Contributor Author

eicchen commented Aug 23, 2025

Can you remove the xfail and let’s see how the CI does

So interestingly, it seems to pass the tests it failed previously while failing the ones it previously succeeded. Do you know if there is a significant difference between the subset of unit tests that are different than the others? (Freethreading, Numpy Dev, Linux-32-bit. Linux-Musl, Pyodide, and Without PyArrow). Alternatively, I can carve out StringArray for now and investigate it as a separate issue

other = self._box_pa(other)
other_NA = self._box_pa(other)
# pyarrow gets upset if you try to join a NullArray
other = other_NA.cast(pa_type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it obvious this is always right? e.g. what if self is pa.timestamp("us") and other is pa.int64()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, I did try to only check for NullArrays, but that returned the error about how it couldn't concatenate the frame in the original add_to_frame testcase.

We could circumvent that by casting the initial df as an object but I didn't want to mess with the test case because I didn't know if that was something it was testing for.

Alternatively, I can just reimplement a check and check for dtypes we'd want to let go through

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this is the cause of a bunch of test failures FAILED pandas/tests/extension/test_arrow.py::test_arithmetic_temporal[pa_type11] - pyarrow.lib.ArrowNotImplementedError: Unsupported cast from duration[us] to timestamp using function cast_timestamp .

are you running the tests locally before committing/pushing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only ran the array folder because the full suite takes a lot of time, Ill be sure to run the full thing going forward. That's on me.

Copy link
Contributor Author

@eicchen eicchen Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill add official testcases once the build clears CI due to the weird tack-on nature of this bug fix. Just from some local testing, it looks like there is already a preexisting error message for trying to use the add operation on dtypes like Datetime and TimeDelta.

That being said, it looks like the CI is throwing errors on some of the builds but not others again, and what do you know, they're not replicated on my local machine. Would you know who I could talk to to figure out why that is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the exception message in a test needs to be updated thats fine as long as the new one makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Any pointers for the CI or should I ask it during the meeting tmr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i havent looked too closely, but the CI failiures ll look like cases of "the test needs to be updated to check for the new exception message".

none of the edits to the ArrowEA are necessary, nor is the special-casing for Period.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that 2/3 of the unit tests succeed as-is, so it doesn't make sense why only 7 are failing. Especially since the error is about a float being concatenated with a string, which all the other builds are able to do. My guess was that something was different about their set up process.

@eicchen
Copy link
Contributor Author

eicchen commented Aug 27, 2025

I modified ArrowEA to address the xfail issue in add_to_frame across environments. With the changes, add_to_string passes, but add_to_frame behaves inconsistently on some CIs (occasionally passing despite xfail). If preferred, I can revert the edits, though that would leave the inconsistency still.

@jbrockmendel
Copy link
Member

When i tried this locally i didn't need to modify ArrowEA at all. What breaks without that change?

@eicchen
Copy link
Contributor Author

eicchen commented Aug 27, 2025

Before modifying ArrayEA, these were the failing tests in ArrowEA:

pandas/tests/arrays/string_/test_string.py::test_add_frame[string=string[python]]
pandas/tests/arrays/string_/test_string.py::test_add_frame[string=str[python]]
(Job link: https://github.com/pandas-dev/pandas/actions/runs/17139673229/job/48624140109)

These tests were expected to fail but did not. I was unable to replicate the failures locally, and most CI runs did not encounter the issue; it appeared only in a small subset. I modified ArrayEA to reconcile these differences, but the same CI runs are still encountering issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pd.DataFrame.mul has not support fill_value?
2 participants