Skip to content

Conversation

timtreis
Copy link
Member

@timtreis timtreis commented Sep 30, 2025

For other downstream functions, such as #1036, one needs a function to robustly detect where in the image the tissue is and how many of those there are.

This function
a) implements two algorithms for identifying the tissue (otsu & felzenszwalb)
b) deals with arbitrary channel input
c) heuristically tries to identify what is a sample and what is either just random stuff (dirt, Visium frame etc). As a fallback, one can pass in the number of samples expected which should be more robust
d) adds the mask back to the sdata object with the same structure and transformations as the original image had
e) does everything in dask so it's quite fast

sdata = sq.datasets.visium_hne_sdata()

sq.exp.im.detect_tissue(
    sdata,
    image_key="hne",
)

sdata
SpatialData object, with associated Zarr store: [/Users/tim.treis/.cache/squidpy/visium_hne_sdata.zarr](https://file+.vscode-resource.vscode-cdn.net/Users/tim.treis/.cache/squidpy/visium_hne_sdata.zarr)
├── Images
│     └── 'hne': DataTree[cyx] (3, 11757, 11291), (3, 5878, 5645), (3, 2939, 2822), (3, 1469, 1411)
├── Labels
│     └── 'hne_tissue': DataTree[yx] (11757, 11291), (5878, 5645), (2939, 2822), (1469, 1411)
├── Shapes
│     └── 'spots': GeoDataFrame shape: (2688, 2) (2D shapes)
└── Tables
      └── 'adata': AnnData (2688, 18078)
with coordinate systems:
    ▸ 'global', with elements:
        hne (Images), hne_tissue (Labels), spots (Shapes)
with the following elements not in the Zarr store:
    ▸ hne_tissue (Labels)
(
    sdata
    .pl.render_images("hne")
    .pl.render_labels("hne_tissue", fill_alpha=0, contour_px=10, outline_alpha=1)
    .pl.show()
)
image

Todo

  • Manual tests on a bunch of different inputs IHC / H&E / DAPI / multichannel etc
  • Test with multiple samples in the same image
  • Write unit tests for functions

@timtreis timtreis linked an issue Sep 30, 2025 that may be closed by this pull request
Copy link
Member

@selmanozleyen selmanozleyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi these are some initial feedbacks I will get into more details tomorrow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually prefer calling this module experimental instead of exp (also similar to {anndata,scanpy}.experimental module). Because I was wondering what it was until I saw this.

If the mask is saved to the SpatialData object, it will inherit the scale_factors
of the image, if present.
**kwargs
Optional keyword arguments:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am strongly against using kwargs. I can easily imagine the user mistyping felzenszwalb_params and wondering why it doesn't change the results :D . We need errors for cases like these.

Copy link
Member

@flying-sheep flying-sheep Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, makes no sense to arbitrarily nest them in the docs and collect them into a dict in the code.

I think having advanced parameters tucked away in the data class signatures the way it is is ideal already.

“Optional keyword arguments” are keyword arguments that have a default, so often all of them.

Comment on lines +63 to +65
sdata: sd.SpatialData,
image_key: str,
scale: str = "auto",
Copy link
Member

@flying-sheep flying-sheep Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always use a * if a function signature that has more than like 2 or 3 arguments total. Usually you can just put it after the parameters that have no default, in some cases it makes sense that the first optional parameter can be specified by keyword or position.

Suggested change
sdata: sd.SpatialData,
image_key: str,
scale: str = "auto",
sdata: sd.SpatialData,
image_key: str,
*,
scale: str = "auto",

I’d personally do this to make sdata a mandatory positional parameter, but people can still do image_key="..." if they want:

Suggested change
sdata: sd.SpatialData,
image_key: str,
scale: str = "auto",
sdata: sd.SpatialData,
/,
image_key: str,
*,
scale: str = "auto",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Function to automatically generate tissue masks in H&E
3 participants