Please describe your wishes and possible alternatives to achieve the desired result.
Since #504, AnnData supports nullable int and bool columns in obs. Support for strings is planned in #679.
However, this only works if the nullable columns are represented as the appropriate pandas Array extension type.
For instance this
import anndata
import numpy as np
import pandas as pd
adata = anndata.AnnData(
X=None,
obs=pd.DataFrame().assign(
test_int=np.array([1, 2, None, 3]),
test_bool=[True, False, None, False],
),
)
adata.write_h5ad("test.h5ad")
fails with TypeError: Can't implicitly convert non-string objects to strings.
After converting the columns to pandas arrays, the object can be saved:
for c in adata.obs.columns:
adata.obs[c] = pd.array(adata.obs[c].values)
adata.write_h5ad("test.h5ad")
Unfortunately, the pandas extension arrays are little known and Nones might end up in adata.obs for various reasons (for instance scverse/scirpy#434).
I was wondering if such columns should be automatically converted to the appropriate pandas array, e.g. on save?
Or maybe there should be an equivalent to AnnData.strings_to_categoricals that can be called to sanitize such columns?
Please describe your wishes and possible alternatives to achieve the desired result.
Since #504, AnnData supports nullable int and bool columns in
obs. Support for strings is planned in #679.However, this only works if the nullable columns are represented as the appropriate pandas
Arrayextension type.For instance this
fails with
TypeError: Can't implicitly convert non-string objects to strings.After converting the columns to pandas arrays, the object can be saved:
Unfortunately, the pandas extension arrays are little known and
Nones might end up inadata.obsfor various reasons (for instance scverse/scirpy#434).I was wondering if such columns should be automatically converted to the appropriate pandas array, e.g. on save?
Or maybe there should be an equivalent to
AnnData.strings_to_categoricalsthat can be called to sanitize such columns?