Skip to content

map_blocks should dispatch to ChunkManager #8545

Open
@TomNicholas

Description

@TomNicholas

Is your feature request related to a problem?

#7019 generalized most of xarrays internals to be able to use any chunked array type that we can create a ChunkManagerEntrypoint for. Most functions now go through this (e.g. apply_ufunc), but I did not redirect xarray.map_blocks to go through ChunkManagerEntrypoint.

This redirection works by dispatching to high-level dask.array primitives such as dask.array.apply_gufunc, dask.array.blockwise, and dask.array.map_blocks. However the current implementation of xarray.map_blocks is much lower-level, building a custom HLG, so it was not obvious how to swap it out.

Describe the solution you'd like

I would like to either:

  1. Replace the current internals of xarray.map_blocks with a simple call to ChunkManagerEntrypoint.map_blocks. This would be the cleanest separation of concerns we could do here. Presumably there is some obvious reason why this cannot or should not be done, but I have yet to understand what that reason is. (either @dcherian or @tomwhite can you enlighten me perhaps? 🙏)

  2. (More likely) refactor so that the existing guts of xarray.map_blocks are only called from the ChunkManagerEntrypoint, and a non-dask chunked array (i.e. cubed, but in theory other types too) would be able to specify how it wants to perform the map_blocks.

Describe alternatives you've considered

Leaving it as the status quo breaks the nice abstraction and separation of concerns that #7019 introduced.

Additional context

Split off from #8414

Metadata

Metadata

Assignees

No one assigned

    Labels

    topic-chunked-arraysManaging different chunked backends, e.g. dask

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions