Skip to content

Commit 66a3e7e

Browse files
committed
minor changes to notebooks
1 parent 033af93 commit 66a3e7e

File tree

2 files changed

+35
-19
lines changed

2 files changed

+35
-19
lines changed

docs/content/tutorials/GLM_PCA_analysis.ipynb renamed to docs/content/tutorials/snmCATseq_glmPCA_analysis.ipynb

Lines changed: 26 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,14 @@
55
"id": "ee0fddbe-1c52-44c1-b2bc-b3f4165209cf",
66
"metadata": {},
77
"source": [
8-
"# Tutorial for snmC2Tseq\n",
9-
"Using the single-cell methylation data from snmC2Tseq [1], we re-processed the dataset to extract smaller bins (10kb), yielding non-Gaussian data. The data is losely following a Beta distribution, and we thus employ GLM-PCA with Beta distribution to find a lower-dimension representation.\n",
10-
"<br/>\n",
11-
"[1]: https://www.sciencedirect.com/science/article/pii/S2666979X22000271?via%3Dihub"
8+
"# GLM-PCA analysis of single-cell methylome data\n",
9+
"\n",
10+
"\n",
11+
"\n",
12+
"Starting from the publicly available single-cell methylation data from [snmCATseq](https://www.sciencedirect.com/science/article/pii/S2666979X22000271?via%3Dihub) , we re-processed a subset of the data to extract methlated/unmethylated reads in small genomics bins (10kb), yielding non-Gaussian data. \n",
13+
"\n",
14+
"\n",
15+
"The data is losely following a Beta distribution, and we thus employ GLM-PCA with Beta distribution to find a lower-dimension representation.\n"
1216
]
1317
},
1418
{
@@ -40,7 +44,11 @@
4044
"metadata": {},
4145
"source": [
4246
"## Read data\n",
43-
"Data needs to be processed as demonstrated in \"snmC2Tseq_prepcessing.ipynb\". We here load it."
47+
"\n",
48+
"\n",
49+
"Starting from the files provided in our [figshare repository](), the data needs to be first processed as demonstrated in \"snmCATseq_prepcessing.ipynb\". We here load the output.\n",
50+
"\n",
51+
"In order to confirm whether glmPCA results help in identifying clusters of cells, we shall also load the celltype metadata provided in the Supplementary table 5 of the original manuscript.\n"
4452
]
4553
},
4654
{
@@ -83,9 +91,13 @@
8391
"metadata": {},
8492
"source": [
8593
"## Sincei: GLM-PCA\n",
86-
"Two Beta families have been designed: Beta and SigmoidBeta. SigmoidBeta employs a logit-transformed saturated parameters, which removes all optimisation constraints and is thus much more stable.\n",
87-
"<br/>\n",
88-
"Several hyper-parameters are hard-coded, but do not vastly influence the analysis. The only hyper-parameter which can change results fondamentally is the learning rate. 2.5 is a good choice for sigmoid_beta, but this needs to be changed for other distributions.2**8"
94+
"\n",
95+
"To aid in this analysis, we provide two Beta families within the glmPCA framework: **Beta** and **SigmoidBeta**. \n",
96+
"\n",
97+
"SigmoidBeta employs a logit-transformed saturated parameters, which removes all optimisation constraints and is therefore much more stable.\n",
98+
"\n",
99+
"\n",
100+
"Several hyper-parameters are hard-coded, but do not vastly influence the analysis. The only hyper-parameter which can change results fondamentally is the learning rate. 2.5 is a good choice for sigmoid_beta, but this needs to be changed for other distributions."
89101
]
90102
},
91103
{
@@ -175,19 +187,19 @@
175187
"_umap_params = {'n_neighbors':15, 'min_dist': 0.3, 'n_epochs': 1000}\n",
176188
"\n",
177189
"metric = 'cosine'\n",
190+
"\n",
178191
"_umap_clf = umap.UMAP(metric=metric, verbose=True, **_umap_params)\n",
179192
"umap_embeddings = pd.DataFrame(\n",
180193
" _umap_clf.fit_transform(X_project.detach().numpy()), \n",
181194
" columns=['UMAP 1', 'UMAP 2']\n",
182195
").reset_index()\n",
183-
"umap_embeddings['label'] = cell_labels['snmC2T-seq Baseline Cluster'].values\n",
196+
"umap_embeddings['label'] = cell_labels['snmCAT-seq Baseline Cluster'].values\n",
184197
"\n",
185198
"g = sns.relplot(data=umap_embeddings, x='UMAP 1', y='UMAP 2',hue='label')\n",
186199
"figure_name = 'UMAP_glm_pca_%s_metric_%s_%s%s'%(\n",
187200
" n_pc, \n",
188201
" metric,\n",
189-
" family,\n",
190-
" '_' + method if 'beta' in family else ''\n",
202+
" family\n",
191203
")"
192204
]
193205
},
@@ -266,9 +278,9 @@
266278
],
267279
"metadata": {
268280
"kernelspec": {
269-
"display_name": "sincei",
281+
"display_name": "Python 3 (ipykernel)",
270282
"language": "python",
271-
"name": "sincei"
283+
"name": "python3"
272284
},
273285
"language_info": {
274286
"codemirror_mode": {
@@ -280,7 +292,7 @@
280292
"name": "python",
281293
"nbconvert_exporter": "python",
282294
"pygments_lexer": "ipython3",
283-
"version": "3.9.16"
295+
"version": "3.8.16"
284296
}
285297
},
286298
"nbformat": 4,

docs/content/tutorials/snmC2Tseq_preprocessing.ipynb renamed to docs/content/tutorials/snmCATseq_preprocessing.ipynb

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,12 @@
55
"id": "f1dc587b-e8b8-4beb-ab8b-6010e774d52a",
66
"metadata": {},
77
"source": [
8-
"# Pre-process snmC2Tseq data for later use in Percolate\n",
9-
"<b>NOTES</b>:\n",
8+
"# Pre-process snmCATseq data for later use in sincei\n",
9+
"\n",
10+
"This notebook uses a subset of data from the snmCAT-seq protocol presented in [Luo et. al (2022)](https://www.sciencedirect.com/science/article/pii/S2666979X22000271)\n",
11+
"\n",
12+
"\n",
13+
"**NOTES**:\n",
1014
"- unmeth is not a proper column naming, it actually corresponds to the total number of reads. It can be seen as the ratio of meth/unmeth maxes out at 0.5."
1115
]
1216
},
@@ -1352,9 +1356,9 @@
13521356
],
13531357
"metadata": {
13541358
"kernelspec": {
1355-
"display_name": "sincei",
1359+
"display_name": "Python 3 (ipykernel)",
13561360
"language": "python",
1357-
"name": "sincei"
1361+
"name": "python3"
13581362
},
13591363
"language_info": {
13601364
"codemirror_mode": {
@@ -1366,7 +1370,7 @@
13661370
"name": "python",
13671371
"nbconvert_exporter": "python",
13681372
"pygments_lexer": "ipython3",
1369-
"version": "3.9.16"
1373+
"version": "3.8.16"
13701374
}
13711375
},
13721376
"nbformat": 4,

0 commit comments

Comments
 (0)