StrInt

The software implementation of the method in Deciphering more accurate cell-cell interactions by modeling cells and their interactions.

Pre-requirements

numpy, pandas==1.5.2
scipy, scanpy, umap
loess
smurf-imputation

Installation

pip install pyStrint

Input file format

1. DataFrame format

Expand section

Spatial Transcriptomics (ST) Count Data
- st_exp dataframe with spots as rows and genes as columns
Spatial coordinates
- st_coord dataframe with spot as rows, axis x and y as columns
Cell-type deconvoluted spatial matrix
- st_decon dataframe with spot as rows and cell-type as columns
Single-cell RNA-seq Count Data
- sc_exp dataframe with cells as rows and genes as columns
Single-cell RNA-seq Metadata
- sc_meta dataframe with cells as rows and cell types as columns
- cell_type_key column name of the celltype identity in sc_meta
Single-cell RNA-seq distribution Data
- sc_distribution dataframe with cells as rows and genes as columns
Ligand and Receptor Data (optional)
- lr_df user provided dataframe with ligand-receptor pairs as rows, ligand, receptor and its weight as columns

Convert to adata format

sc_adata, st_adata, sc_distribution, lr_df = pp.prep_adata(sc_exp = sc_exp, st_exp = st_exp, sc_distribution = sc_smurf, 
                            sc_meta = sc_meta, st_coord = st_coord, SP = species)

2. Adata format

Expand section

Spatial Transcriptomics (ST) Count Data
- st_adata adata.X with spots as rows and genes as columns
- st_adata.obs dataframe with spot as rows, spot coordinates x and y as columns
Cell-type deconvoluted spatial matrix
- st_decon dataframe with spot as rows and cell-type as columns
Single-cell RNA-seq Count Data
- sc_adata adata.X dataframe with cells as rows and genes as columns
- sc_adata.obs dataframe with cells as rows and cell types as columns
Single-cell RNA-seq distribution Data
- sc_distribution dataframe with cells as rows and genes as columns

Usages

Prep object

Expand section

obj = spamint.spaMint(save_path = outDir, st_adata = st_adata, weight = st_decon, 
                 sc_distribution = sc_distribution, sc_adata = sc_adata, cell_type_key = 'celltype', 
                 st_tp = st_tp)
obj.prep()

Parameters

save_path Output Dir to save results
st_adata adata.X Spatial Transcriptomics (ST) Count Data with spots as rows and genes as columns
- st_adata.obs dataframe with spot as rows, spot coordinates x and y as columns
weight Cell-type deconvoluted spatial dataframe with spot as rows and cell-type as columns
sc_distribution Single-cell RNA-seq distribution dataframe with cells as rows and genes as columns
sc_adata adata.X Single-cell RNA-seq Count dataframe with cells as rows and genes as columns
- sc_adata.obs dataframe with cells as rows and cell types as columns
cell_type_key cell type colname in sc_adata.obs
st_tp ST sequencing platform choose from st (ST legacy), visium (10X Visium), or slide-seq (Any single-cell resolution data)

Initial process (Cell selection)

Expand section

sc_agg_meta = select_cells(self, p = 0.1, mean_num_per_spot = 10,  max_rep = 3, repeat_penalty = 10)

p percentage of the interface similarity during cell selection
mean_num_per_spot Average number of cells per spot.
max_rep Maximum number of repetitions for cell selection.
repeat_penalty When one cell has been picked for [THIS] many times, its probability of being picked again decreases by half. Recommanded to be near (st_exp.shape[0]*num_per_spot/sc_exp.shape[0]) * 10

Refinement process (Gradient descent)

Expand section

refine_sc_exp, sc_agg_meta = gradient_descent(self, alpha = 1, beta = 0.001, gamma = 0.001, 
                delta = 0.1, eta = 0.0005, 
               init_sc_embed = None,
               iteration = 20, k = 2, W_HVG = 2,
               left_range = 0, right_range = 8, steps = 1, dim = 2)

alpha, beta, gamma, delta Hyperparameters for the loss function.

alpha: the weight of the term that maintains the expression similarity between cells and their respective gamma distribution models, default: 1.

beta: the weight of adjusting cell locations based on cell-cell affinity.

gamma: the weight of optimizing interface profile similarity between pseudo-spots and their corresponding ST spots, default: 0.001.

delta: the weight of the regularization term.
eta float, default: 0.0005

Learning rate for gradient descent.
init_sc_embed DataFrame, optional, default: None

Initial embedding for single-cell data.
iteration int, optional, default: 20

The number of iterations for optimization.
k int, optional, default: 2

The number of neighbors in each adjacent spot.
W_HVG int, optional, default: 2

Weight for highly variable genes.
left_range int, optional, default: 0
right_range int, optional, default: 8

The index range for the neighbor number in the embedding process, the actual neighbor number is (i+1)*10
steps int, optional, default: 1

The iteration number for each neighbor
dim int, optional, default: 2

The embedding dimension of the reconstruction

More details in demo_tutorial.ipynb
tutorial file can be downloaded at: https://drive.google.com/drive/folders/1FYa4hzg3vVo6y2BOzlJbXhPTmdEcjD4O?usp=sharing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!