Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix KeyError: "None of [Index(['0_x', '1_x', '0_y', '1_y'], dtype='object')] are in the [columns]" in find_objects #384

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

joshua-gould
Copy link

No description provided.

@joshua-gould joshua-gould changed the title fix KeyError: "None of [Index(['0_x', '1_x', '0_y', '1_y'], dtype='object')] are in the [columns]" in find_object fix KeyError: "None of [Index(['0_x', '1_x', '0_y', '1_y'], dtype='object')] are in the [columns]" in find_objects Jul 18, 2024
@m-albert
Copy link
Collaborator

m-albert commented Jul 23, 2024

@joshua-gould Thanks for your PR!

I was trying to reproduce the error you're fixing and found that it relates to #335.

The following code

    test_labels = da.zeros((10, 10), dtype='int', chunks=(3, 3))
    test_labels[0, 0] = 1
    computed_result = dask_image.ndmeasure.find_objects(test_labels).compute()

fails in the presence of pyarrow in the environment and runs through in its absence.

Unsure how to proceed, there might be an error to reproduce upstream in dask.dataframe.

@jmuhlich
Copy link

I ran into this issue with find_objects too, and setting dataframe.convert-string to False does not fully fix it for me. (dask=2025.2.0, dask-image=2024.5.3, also tested with dask-image latest from github today) I found that triggering the issue depends on the specific layout of the image and which chunks are all zero. Generally it seems that empty chunks "earlier" (left or above?) are problematic. A fully empty label image that has more than one chunk in each dimension always errors. The only thing that fixes it for me is reverting to dask=2024.12.1 without dask-expr.

@m-albert
Copy link
Collaborator

m-albert commented Mar 20, 2025

Thanks @jmuhlich for reporting this here!

It seems that the following example (which is also included in the tests added in this PR):

import dask.array as da
import dask_image.ndmeasure

test_labels = da.zeros((10, 10), dtype='int', chunks=(3, 3))
test_labels[0, 0] = 1
computed_result = dask_image.ndmeasure.find_objects(test_labels).compute(scheduler='single-threaded')
  1. fails on main
  2. fails on Fix CI test failures #393 (which sets dataframe.convert-string to False)
  3. runs through on this PR

In this PR, @joshua-gould works around problems that occur when merging dask dataframes.

Also here theres a mention of a pandas bug when merging dataframes. The error here might be related to that.

I didn't have the time yet to find out what's going wrong in the merge. I think it'd be good to report the results of this upstream.

Independent of upstream we should incorporate this workaround here I think.

I found that triggering the issue depends on the specific layout of the image and which chunks are all zero

@jmuhlich Does the code in this PR fix the problems you mention?

@m-albert m-albert mentioned this pull request Mar 20, 2025
@jakirkham
Copy link
Member

Fixed up some conflicts introduced by a recent PR fixing CI: #393

Hope that is ok

Please feel free to tweak further as needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants