|
5 | 5 | "id": "ee0fddbe-1c52-44c1-b2bc-b3f4165209cf",
|
6 | 6 | "metadata": {},
|
7 | 7 | "source": [
|
8 |
| - "# Tutorial for snmC2Tseq\n", |
9 |
| - "Using the single-cell methylation data from snmC2Tseq [1], we re-processed the dataset to extract smaller bins (10kb), yielding non-Gaussian data. The data is losely following a Beta distribution, and we thus employ GLM-PCA with Beta distribution to find a lower-dimension representation.\n", |
10 |
| - "<br/>\n", |
11 |
| - "[1]: https://www.sciencedirect.com/science/article/pii/S2666979X22000271?via%3Dihub" |
| 8 | + "# GLM-PCA analysis of single-cell methylome data\n", |
| 9 | + "\n", |
| 10 | + "\n", |
| 11 | + "\n", |
| 12 | + "Starting from the publicly available single-cell methylation data from [snmCATseq](https://www.sciencedirect.com/science/article/pii/S2666979X22000271?via%3Dihub) , we re-processed a subset of the data to extract methlated/unmethylated reads in small genomics bins (10kb), yielding non-Gaussian data. \n", |
| 13 | + "\n", |
| 14 | + "\n", |
| 15 | + "The data is losely following a Beta distribution, and we thus employ GLM-PCA with Beta distribution to find a lower-dimension representation.\n" |
12 | 16 | ]
|
13 | 17 | },
|
14 | 18 | {
|
|
40 | 44 | "metadata": {},
|
41 | 45 | "source": [
|
42 | 46 | "## Read data\n",
|
43 |
| - "Data needs to be processed as demonstrated in \"snmC2Tseq_prepcessing.ipynb\". We here load it." |
| 47 | + "\n", |
| 48 | + "\n", |
| 49 | + "Starting from the files provided in our [figshare repository](), the data needs to be first processed as demonstrated in \"snmCATseq_prepcessing.ipynb\". We here load the output.\n", |
| 50 | + "\n", |
| 51 | + "In order to confirm whether glmPCA results help in identifying clusters of cells, we shall also load the celltype metadata provided in the Supplementary table 5 of the original manuscript.\n" |
44 | 52 | ]
|
45 | 53 | },
|
46 | 54 | {
|
|
83 | 91 | "metadata": {},
|
84 | 92 | "source": [
|
85 | 93 | "## Sincei: GLM-PCA\n",
|
86 |
| - "Two Beta families have been designed: Beta and SigmoidBeta. SigmoidBeta employs a logit-transformed saturated parameters, which removes all optimisation constraints and is thus much more stable.\n", |
87 |
| - "<br/>\n", |
88 |
| - "Several hyper-parameters are hard-coded, but do not vastly influence the analysis. The only hyper-parameter which can change results fondamentally is the learning rate. 2.5 is a good choice for sigmoid_beta, but this needs to be changed for other distributions.2**8" |
| 94 | + "\n", |
| 95 | + "To aid in this analysis, we provide two Beta families within the glmPCA framework: **Beta** and **SigmoidBeta**. \n", |
| 96 | + "\n", |
| 97 | + "SigmoidBeta employs a logit-transformed saturated parameters, which removes all optimisation constraints and is therefore much more stable.\n", |
| 98 | + "\n", |
| 99 | + "\n", |
| 100 | + "Several hyper-parameters are hard-coded, but do not vastly influence the analysis. The only hyper-parameter which can change results fondamentally is the learning rate. 2.5 is a good choice for sigmoid_beta, but this needs to be changed for other distributions." |
89 | 101 | ]
|
90 | 102 | },
|
91 | 103 | {
|
|
175 | 187 | "_umap_params = {'n_neighbors':15, 'min_dist': 0.3, 'n_epochs': 1000}\n",
|
176 | 188 | "\n",
|
177 | 189 | "metric = 'cosine'\n",
|
| 190 | + "\n", |
178 | 191 | "_umap_clf = umap.UMAP(metric=metric, verbose=True, **_umap_params)\n",
|
179 | 192 | "umap_embeddings = pd.DataFrame(\n",
|
180 | 193 | " _umap_clf.fit_transform(X_project.detach().numpy()), \n",
|
181 | 194 | " columns=['UMAP 1', 'UMAP 2']\n",
|
182 | 195 | ").reset_index()\n",
|
183 |
| - "umap_embeddings['label'] = cell_labels['snmC2T-seq Baseline Cluster'].values\n", |
| 196 | + "umap_embeddings['label'] = cell_labels['snmCAT-seq Baseline Cluster'].values\n", |
184 | 197 | "\n",
|
185 | 198 | "g = sns.relplot(data=umap_embeddings, x='UMAP 1', y='UMAP 2',hue='label')\n",
|
186 | 199 | "figure_name = 'UMAP_glm_pca_%s_metric_%s_%s%s'%(\n",
|
187 | 200 | " n_pc, \n",
|
188 | 201 | " metric,\n",
|
189 |
| - " family,\n", |
190 |
| - " '_' + method if 'beta' in family else ''\n", |
| 202 | + " family\n", |
191 | 203 | ")"
|
192 | 204 | ]
|
193 | 205 | },
|
|
266 | 278 | ],
|
267 | 279 | "metadata": {
|
268 | 280 | "kernelspec": {
|
269 |
| - "display_name": "sincei", |
| 281 | + "display_name": "Python 3 (ipykernel)", |
270 | 282 | "language": "python",
|
271 |
| - "name": "sincei" |
| 283 | + "name": "python3" |
272 | 284 | },
|
273 | 285 | "language_info": {
|
274 | 286 | "codemirror_mode": {
|
|
280 | 292 | "name": "python",
|
281 | 293 | "nbconvert_exporter": "python",
|
282 | 294 | "pygments_lexer": "ipython3",
|
283 |
| - "version": "3.9.16" |
| 295 | + "version": "3.8.16" |
284 | 296 | }
|
285 | 297 | },
|
286 | 298 | "nbformat": 4,
|
|
0 commit comments