Skip to content

Implement dask-specific methods on DataTree #9355

Open
@TomNicholas

Description

@TomNicholas

What is your issue?

xr.Dataset implements a bunch of dask-specific methods, such as __dask_tokenize__ and __dask_graph__. It also obviously has public methods that involve dask such as .compute() and .load().

In DataTree on the other hand, I haven't yet implemented any methods like these, or even written any tests that involve dask! You can probably still use dask with datatree right now, but from dask's perspective the datatree is presumably merely a set of unconnected Dataset objects.

We could choose to implement methods like .load() as just a mapping over the tree, i.e.

def load(self):
    for node in self.subtree:
        if node.has_data:
            node.ds.load()

Most of that should just already work (or work very easily) using map_over_subtree.

There are also special double-underscore methods defined on Dataset

https://docs.dask.org/en/stable/custom-collections.html

Xarray objects satisfy this Collections protocol, so you can do dask.tokenize(xarray_thing), dask.compute(xarray_thing) etc (visualize, persist).


We could add these, but it would be rather nice if someone who understands the double-underscore dask methods really well just took this on. @darothen helpfully started this in xarray-contrib/datatree#196 but it stalled.

@jrbourbeau are you/Coiled interested in submitting a PR to get xarray.DataTree fully integrated with dask?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions