Skip to content

Conversation

josephnowak
Copy link
Contributor

@josephnowak josephnowak commented Jul 8, 2025

I forgot to pass the align_chunks to the to_zarr method on the datasets, which makes the feature useless for this kind of data structure. I added a specific test to cover this issue.

Now, the align_chunks works on the cases where the data is smaller than a single Zarr chunk (also, one test was added to cover this scenario).

I modified (again) the error message shown with the safe_chunks, now it includes information about the two chunks that overlap with a single Zarr chunk. From what I saw on the error 10501, the original message was not helping the users to understand what was happening.

@josephnowak josephnowak marked this pull request as draft July 9, 2025 14:15
@josephnowak
Copy link
Contributor Author

Looks like those tests failing are not related to this PR

@josephnowak
Copy link
Contributor Author

josephnowak commented Jul 29, 2025

Hi @max-sixty, sorry for bothering you. I'm not sure if you have some free time to review this PR.

I'm not sure how the reviewer is assigned on Xarray in this case, as no other person was involved in the issue, and probably no one else was notified to review this.

@lbesnard
Copy link

lbesnard commented Aug 7, 2025

@josephnowak if that helps for the reviewing process, I can confirm that those changes work for me

@lbesnard
Copy link

@josephnowak do you think someone else could maybe look at this?

@josephnowak
Copy link
Contributor Author

josephnowak commented Aug 19, 2025

Hi @dcherian, sorry for bothering you, not sure if you have some free time to take a look on this, or if you know of someone else that have the time to review this PR would be awesome.

@josephnowak
Copy link
Contributor Author

Hi @lbesnard,

Unfortunately, I don't know of anyone else who could review the PR. The previous time that I sent a PR to Xarray, it was reviewed in a couple of days, so probably most of the maintainers are busy during this month.

I think that as a temporary solution, you can copy and paste the function grid_rechunk that is on my branch and use it directly on your code as I did here. It is more problematic because you need to keep track of the actual chunk structure of your data, but it should help.

@lbesnard
Copy link

Oh i ve been using your hash commit which is great. But im using this in a production tool, which is using poetry, (tldr; pointing xarray package to this hash fails on my CICD pipeline), so Im a bit blocked at the moment.

Its just creating more work for you as you always have to rebase.. thanks a lot anyway!

However I'm surprised no one else seems to have notice this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_zarr() via dask with silent data loss on append
3 participants