Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save custom .attrs in SpatialElements to disk #839

Open
lucas-diedrich opened this issue Jan 21, 2025 · 8 comments
Open

Save custom .attrs in SpatialElements to disk #839

lucas-diedrich opened this issue Jan 21, 2025 · 8 comments

Comments

@lucas-diedrich
Copy link

lucas-diedrich commented Jan 21, 2025

Hi! Thanks for the great package!

I am currently working with spatialdata to load and parse microscopy files (particularly in the proprietary Carl Zeiss .czi format). For this purpose, it would be extremely useful to persistently store metadata related to the acquisition of the microscopy data (e.g. magnification, resolution, etc.) in the .attrs attribute of the SpatialElement. While I can manipulate the Elements in memory, it is currently not possible to write them to disk (see example)

While it would be possible to write the metadata to a custom tables object, it feels much more natural to have this data directly associated with the image.

Therefore, it would be fantastic if SpatialElements would support writing the .attrs keys to disk.

Example

import spatialdata as sd
import numpy as np 
import tempfile
import os

dir = tempfile.tempdir
save_path = os.path.join(dir, "blobs.zarr")


# Load dataset and set custom metadata
blobs = sd.datasets.blobs()
blobs.images["blobs_image"].attrs["metadata"] = {
    "info1": 1,
    "info2": "2",
    "info3": np.zeros(shape=(3, 3))
}

sd.models.Image2DModel().validate(blobs.images["blobs_image"])

# Metadata is in memory
assert "metadata" in blobs.images["blobs_image"].attrs

# Writing to disk leads to loss of the information 
blobs.write(save_path)
sdata = sd.read_zarr(save_path)
assert "metadata" not in sdata.images["blobs_image"].attrs
@LucaMarconato
Copy link
Member

Hi, thanks for reaching out. In principle we could extend the way in which we write SpatialData.attrs to disk

# read attrs metadata
also for the various elements (shapes, points; for tables as you observed the uns are already serialized).

What do you think @giovp @timtreis @melonora?

The only issue that I may see is that in our APIs we currently only take care of forwarding the metadata that we define in the format (for instance after cropping an element, the information on the coordinate system are passed to the new cropped object). Not sure if we should ignore the rest of values of .attrs and have the user manually copying them, or if there should be an automatic function. In such case, should we shallow copy or deep copy?

@LucaMarconato
Copy link
Member

LucaMarconato commented Jan 21, 2025

Also, it is a good idea to explore what libraries like pandas, xarray and dask do. To this regard, the situation is a bit unfortunate:

@timtreis
Copy link
Member

timtreis commented Jan 23, 2025

Hey @lucas-diedrich, if I'm not mistaken, SpatialData has an .attrs level at the highest level, so not directly associated with the individual elements, that is persistent (see #711). Until we make a decision on this discussion here, you could probably convert your "metadata" slot into a dict with the key being the element you're annotating and the value being your info.

@lucas-diedrich
Copy link
Author

Hi Tim, hi Luca, thanks for being so responsive and your input!

Thanks, that's a great suggestion, I'll do that in the meantime.

@giovp
Copy link
Member

giovp commented Jan 24, 2025

soo I'm trying to get back in the dev discussion of the ecosystem and this is a first attempt

I think if we support the serialization of attrs the way uns does it, we might encounter some challenges in the decision of what we serialize. For adata.uns for example, it supported some serialization, but then issues raised during the years because people started to want more support, or encountered bugs.
Even for the example above,

blobs.images["blobs_image"].attrs["metadata"] = {
    "info1": 1,
    "info2": "2",
    "info3": np.zeros(shape=(3, 3))
}

an humble array already raises a bunch of options on how should that be serialized.....

so I guess my position is not against it in principle, but we need to undersatnd what we will get into...

@LucaMarconato
Copy link
Member

an humble array already raises a bunch of options on how should that be serialized.....
so I guess my position is not against it in principle, but we need to undersatnd what we will get into...

@giovp good point! Maybe we could overcome problems like this by requiring .attrs to be serializable from and to JSON. pandas requires this (it's a bit of a hidden requirement that I found at some point while navigating the issues in pandas; I can't find the exact reference for it). If we require this, we could allow users to store any object, as long as there is a JSON representation that we can construct during IO. To achieve this, we could subclass JSONEncoder, or maybe require the implementation of some methods that we can discover, such to_json() and from_json() (more on this [here](https://stackoverflow.com/questions/3768895/how-to-make-a-class-json-serializable)).

Or in alternative, for the time being, we should maybe endorse the workaround suggested by @timtreis and choose not implement this feature. The rationale for this would be that the serialization of .attrs seems actually not to be implemented in pandas as well: pandas-dev/pandas#20521 pandas-dev/pandas#53577 pandas-dev/pandas#51012 pandas-dev/pandas#34596.

@LucaMarconato
Copy link
Member

I found it, this is the requirement for .attrs in pandas to be serializable to JSON: pandas-dev/pandas#51012 (comment).

@lucas-diedrich
Copy link
Author

lucas-diedrich commented Jan 24, 2025

Maybe for context: At least for my specific use case the metadata is indeed expected to be json serializable. For your reference, I attached some example data from two different parsers (carl-zeiss pyCZIlibrw for .czi files, openslide for .mirax data)

czi.json
openslide.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants