Skip to content

Zarr (AnnData) sparse matrices#6

Open
Artur-man wants to merge 10 commits intoBioconductor:develfrom
Artur-man:sparse_matrix
Open

Zarr (AnnData) sparse matrices#6
Artur-man wants to merge 10 commits intoBioconductor:develfrom
Artur-man:sparse_matrix

Conversation

@Artur-man
Copy link
Copy Markdown
Contributor

@Artur-man Artur-man commented May 1, 2026

Hi @hpages,

I gave this a try again and after some experiments with read_zarr_array and h5mread I was able to mimic H5SparseMatrix. Please ignore if you were already working on this.

  • The codebase of ZarrSparseMatrix and ZarrADMatrix is almost identical to H5s. The PR tests the CSC matrices of both zarr v2 and v3 datasets (anndata).

  • There are lots of utilities that do not exist in Rarr is introduced here to, e.g. detect groups, arrays etc., since it is needed to validate the existence of arrays that make up CSR and CSC matrices.

  • zarr_mread is an auxiliary function that mimics h5mread package (I am guessing there is no need for such a separate package for zarr).

  • Please also check this PR in anndataR that introduces delayed support
    Add support for DelayedArray reading scverse/anndataR#387

  • Going back to the possibility of having native sparse matrix support in Zarr, I was not able to find any such utility (@Bisaloo?). Apart from anndata, backedarray has zarr support too but the sparse encoding looks identical: https://pypi.org/project/backedarray/

ZarrArray/TODO

Lines 12 to 17 in 498be8c

o Add support for sparse arrays. Does Zarr have native support for sparse
arrays or do we need to use the same approach as in H5SparseMatrixSeed?
If the latter, then Artür already has a working version of that in
https://github.com/BIMSBbioinfo/ZarrArray
See his comment in the spatialdata-devel channel on Zulip:
https://community-bioc.zulipchat.com/#narrow/channel/507643-spatialdata-devel/topic/anndataR-zarr/near/561372916

  • Please find below the comparison of h5ad and zarr delayed matrices.
> store <- file.path(td, "example_v2.zarr")
> name <- "layers/csc_counts"
> ZarrSparseMatrix(store, name)
<100 x 50> sparse ZarrSparseMatrix object of type "double":
        [,1]  [,2]  [,3] ... [,49] [,50]
  [1,]     3     1     2   .     1     0
  [2,]     2     0     1   .     4     6
  [3,]     5     0     2   .     1     2
  [4,]     1     1     1   .     4     4
  [5,]     0     1     2   .     1     3
   ...     .     .     .   .     .     .
 [96,]     4     2     1   .     1     1
 [97,]     1     0     2   .     3     1
 [98,]     1     0     4   .     2     4
 [99,]     1     5     4   .     3     1
[100,]     6     3     1   .     2     5
> hdf5_file <- system.file("extdata", "example.h5ad", package = "anndataR")
> name <- "layers/csc_counts"
> H5SparseMatrix(hdf5_file, name)
<100 x 50> sparse H5SparseMatrix object of type "double":
        [,1]  [,2]  [,3] ... [,49] [,50]
  [1,]     3     1     2   .     1     0
  [2,]     2     0     1   .     4     6
  [3,]     5     0     2   .     1     2
  [4,]     1     1     1   .     4     4
  [5,]     0     1     2   .     1     3
   ...     .     .     .   .     .     .
 [96,]     4     2     1   .     1     1
 [97,]     1     0     2   .     3     1
 [98,]     1     0     4   .     2     4
 [99,]     1     5     4   .     3     1
[100,]     6     3     1   .     2     5

CC @HelenaLC @Bisaloo, so we can discuss details if needed.

@Artur-man
Copy link
Copy Markdown
Contributor Author

Artur-man commented May 3, 2026

I am also adding now a higher level class for AnnData-zarr associated with feature observation matrices, similar to H5ADMatrix. How about if I call it ZarrAnnDataMatrix or ZarrADMatrix ?

> store <- file.path(td, "example_v2.zarr")
> ZarrADMatrix(store)
<100 x 50> sparse ZarrADMatrix object of type "double":
          Cell000   Cell001   Cell002 ...   Cell048   Cell049
Gene000 1.3469266 0.6892936 1.1073005   .  0.747668  0.000000
Gene001 1.0636960 0.0000000 0.6996704   .  1.695299  1.948128
Gene002 1.7479203 0.0000000 1.1073005   .  0.747668  1.100338
Gene003 0.6670749 0.6892936 0.6996704   .  1.695299  1.611508
Gene004 0.0000000 0.6892936 1.1073005   .  0.747668  1.388235
    ...         .         .         .   .         .         .
Gene095 1.5673896 1.0934708 0.6996704   . 0.7476680 0.6944417
Gene096 0.6670749 0.0000000 1.1073005   . 1.4670000 0.6944417
Gene097 0.6670749 0.0000000 1.6198547   . 1.1706656 1.6115083
Gene098 0.6670749 1.7853285 1.6198547   . 1.4670000 0.6944417
Gene099 1.9007897 1.3805084 0.6996704   . 1.1706656 1.7939160

@Artur-man Artur-man changed the title ZarrSparseMatrix Zarr (AnnData) sparse matrices May 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant