-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathCITE-seq_Workflow.Rmd
412 lines (250 loc) · 11 KB
/
CITE-seq_Workflow.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
---
title: "Reproducible workflow for the analysis of **CITE-seq data** for the project"
subtitle: "\"Broad immune activation underlies shared set point signatures for vaccine responsiveness in healthy individuals and disease activity in patients with lupus\""
author: "Yuri Kotliarov, Matthew Mule, Andrew Martins, John S. Tsang"
output:
html_notebook:
toc: yes
---
## Introduction
The following workflow was used to analyze the CITE-seq data and generate the CITE-seq related figures for our paper "Broad immune activation underlies shared set point signatures for vaccine responsiveness in healthy individuals and disease activity in lupus patients" published in Nature Medicine in 2020.
<br/>
## Prepare the working directory
Some steps of the workflow require data generated for the first part of the analysis of flow cytometry and microarrays data (see Workflow.Rmd).
Before running the workflow make sure you have the following subfloders in the working directory:
```
data
citeseq
--data
--figures
--R
--results
generated_data
```
* Download the archived CITE-seq data folder from and unpack it into the ```citeseq/data folder```.
* Download the archived directory ```citeseq``` with R code and unpack it into the citeseq folder.
* Make sure you have the ```.Rprofile``` file in the working directory. Modify the ```PROJECT_DIR``` variable with the full path to the working directory.
* Create empty folders generated_data and figure_generation.
* If you are going to use the Singularity container download it from . It should be located outside of the working directory.
## R packages used
Our pipeline requires the following R packages:
```
# CRAN:
install.packages(c("plyr", "tidyverse", "data.table", "pROC", "MASS", "limma", "mclust", "corrplot", "ggraph", "circlize", "pals", "ggsignif", "ggridges", "viridis", "clustree", "cowplot"))
# Bioconductor:
install.packages("BiocManager")
BiocManager::install()
BiocManager::install(c("SingleCellExperiment", "scater", "scran", "fgsea", "tmod", "ComplexHeatmap"))
# Seurat ver. 2.3.4
source("https://z.umn.edu/archived-seurat")
```
<br/>
### Notes
If while running some scripts you get X11 forwarding related error, check if you have X11 forwarding turned on (in Putty it is in Connection-SSH-X11) and turn it off.
Sometime an error may appear due to a conflict between R packages. In this case restart R and rerun the script that previously returned the error.
<br/>
---
<br/>
# **0. Before running this workflow always change the working directory to ```citeseq```**.
```{r}
setwd("citeseq")
```
<br/>
# 1. Read Seurat object with demultiplexed data and filtered for singlet cells from H1N1 day0 samples only
Input data are stored in ```data``` folder. These files or other RDS files generated on further steps can be downloaded from the [Figshare repository](https://doi.org/10.35092/yhjc.c.4753772).
* RDS file with Seurat object
* ```data/H1_day0_demultilexed_singlets.RDS```
* RDS file with negative control cells
* ```data/neg_control_object.rds```
```{r}
source("R/dsbnorm_prot_scrannorm_rna.R")
```
Intermediate RDS files containing lists of Seurat objects with normalized data for individual batches are stored in ```data/normalization_data```.
Output RDS file is stored as ```data/H1_day0_scranNorm_adtbatchNorm.rds```.
<br/>
# 2. Clustering all cells by ADT data at multiple resolution
The clustering was performed using as an input the matrix of distances between cells using ADT data only.
## Input data
Input data are stored in ```data``` folder:
* RDS file with Seurat object with RNA-seq data SCRAN-normalized and ADT data batch-normalized and batch-corrected * ```data/H1_day0_scranNorm_adtbatchNorm.rds```
The calculations require OVER 64gb of RAM and was run on a high-performance cluster. The output file can be downloaded from the [Figshare repository](https://doi.org/10.35092/yhjc.c.4753772).
```{r}
source("R/cluster_cells_ADT_distance_matrix.r")
```
Output:
* Table of clusters assignment for each cell:
* ```./data/H1N1_full_clustered_p3dist_multires_metadata.rds```.
* Seurat object with clustering assignment:
* ```./data/H1_day0_scranNorm_adtbatchNorm_dist_clustered.rds```.
<br/>
# 3. Clusters annotation
Generate a heatmap of average expression of selected protein markers in each of the cell clusters derived from the three different clustering resolutions
```{r}
source("R/adt_clustering_clusteraverages_all.levels.r")
```
The generated table in ```results/cluster_nodes.txt``` is used add cluster labelse and manual annotation of cluster cell types.
The generated figures are stored in ```figures/cluster_annotation``` and used for cluster labeling and annotation.
<br/>
# 4. Filter cells and generate final heatmap
After annotating the clusters we found that at resolution=3 clusters 34, 36, 38, 39 contain mostly doublet cells. We remove cells in these clusters from the Seurat object.
```{r}
source("R/remove_doublet_clusters.r")
```
Output file:
```./data/H1_day0_scranNorm_adtbatchNorm_dist_clustered_filtered.rds```
<br/>
The following script uses manual annottion table in ```data/clustree_node_labels_withCellTypeLabels.txt``` and clusters at first three resolutions to generate Figure 4c.
```{r}
source("R/adt_clustering_clusteraverages_3levels_figure.r")
```
The generated heatmap is stored in ```figures/cluster_annotation```.
<br/>
# 5. PCA and tSNE analysis of the ADT data
```{r}
source("R/compute_pca.r")
```
Output file: ```./data/H1_day0_scranNorm_adtbatchNorm_dist_clustered_PCA.rds```
PCA elbow plot is stored in ```figures/TSNE```. Using this plot We determined the number of principal components to use for tSNE as **7**.
<br/>
```{r}
source("R/compute_tsne.r")
```
This script for tSNE computation at multiple perplexity was run on high-performance cluster. The output file can be downloaded from the [Figshare repository](https://doi.org/10.35092/yhjc.c.4753772).
Output file: ```./data/H1_day0_scranNorm_adtbatchNorm_dist_clustered_TSNE.rds```
<br/>
# 6. Label and visualise the clusters
Re-label the clusters and create the final object.
```{r}
source("R/clustree_labels.r")
```
Output file: ```./data/H1_day0_scranNorm_adtbatchNorm_dist_clustered_TSNE_labels.rds```.
The output file can be downloaded from the [Figshare repository](https://doi.org/10.35092/yhjc.c.4753772).
*This Seurat object was used for all further analysis.*
<br/>
Generate tSNE plots to check for batch effect and show the first-level clusters for Figure 4a
```{r}
source("R/tsne_figures.r")
```
The generated figures are stored in ```figures/TSNE```.
<br/>
Generate the clustree for Figure 4b with nodes colored by level 1 cluters
```{r}
source("R/clustree_vertical_clr_level1.r")
```
The generated figure is stored in ```figures/cluster_annotation```.
<br/>
Generate the supplemental figure with the distribution of denoised and background rescaled counts for the surface proteins
```{r}
source("R/adt_clustering_protein_distribution.r")
```
The generated figures are stored in ```figures/cluster_annotation```.
<br/>
# 7. Test various gene signatures comparing low and high responders
Generate the list of signatures:
```{r}
source("R/make_list_of_signatures.r")
```
Output file with the list of signatures is saved as ```sig/sig.list.RDS```.
<br/>
Check enrichemnt of BTM modules in CD40act gene signature:
```{r}
source("R/CD40act_genes_HG.test.r")
```
Output plot is saved in ```figures/CD40act```.
<br/>
Test gene signatures comparing low vs. high responders using cells in pseudo-bulk and all cell clusters:
```{r}
source("R/test_sigs_clusters.r")
```
Output file with the list of signatures is saved as ```results/test_sig.genes_in_clusters_high_vs_low_responders.txt```.
<br/>
Compute scores for selected gene signatures in pseudo-bulk and all cell clusters:
```{r}
source("R/sig_scores_calc_by_cluster.r")
```
Output files are saved in ```results/sig_scores```.
<br/>
# 8. Visualize the results of gene signatures tests
Generate box-plot for pseudo-bulk for gene signatures with significant difference between low and high responders:
```{r}
source("R/sig_scores_boxplot_pseudobulk.r")
```
Output plots (for Figure 4d,e) are saved in ```results/sig_test_boxplots```.
<br/>
Generate box-plot for level 1 clusters for gene signatures with significant difference between low and high responders:
```{r}
source("R/sig_scores_boxplot_by_clusters.r")
```
Output plots (for Fig. 5a,b,e and Ext. Data Fig. 8f) are saved in ```results/sig_test_boxplots```.
<br/>
Generate clustree plots for selected gene signatures indicating significance of difference between low and high responders:
```{r}
source("R/clustree_vertical_test_sigs.r")
```
Output plots (for Fig. 5a,b,e and Ext. Data Fig. 8f) are saved in ```results/sig_test_clustree```.
<br/>
Generate box-plots for cluster C9 (pDC) for selected gene signatures:
```{r}
source("R/sig_scores_boxplot_C9.r")
```
Output plots (for Fig. 5c) are saved in ```results/sig_test_boxplots```.
<br/>
Drop-out test for selected gene signatures and selected cluster combinations:
```{r}
source("R/test_sigs_clusters_OUT.r")
```
Result table is saved as ```results/test_sig.genes_in_clusters_high_vs_low_responders_OUT.txt```.
Output plots (for Ext. Data Fig. 8g) are saved in ```figures/drop_out_test```.
<br/>
# 9. Manual gating of CD20+CD38++ B cells
```{r}
source("R/hand_gating.r")
```
Output table of cell frequencies is saved as ```results/CITEseq_CD38hi_cell_data.txt```.
Output plots (for Ext. Data Fig. 8d,e) are saved in ```results/hand_gating```.
<br/>
# 10. Compile scores for multiple signatures and perform correlation analysis
Compute scores of SLE-Sig signature for microarray day 0 data:
```{r}
source("R/SLE.sig_MA_sig_score.r")
```
Output file is saved as ```results/sig_scores/scores_SLE.sig_MA.txt```.
<br/>
Compile scores for selected gene signatures for CITE-seq and microarray data plus CD38++ cell frequency from flow cytometry data:
```{r}
source("R/scores_for_correlation.r")
```
Output table of cell frequencies is saved as ```results/CITEseq_CD38hi_cell_data.txt```.
<br/>
Generate scatter plot of CD40act score in C3.1.0 cell cluster vs. CD20=CD38++ B cells frequency from flow cytometry (Figure 5f):
```{r}
source("R/CD40act_vs_CD38hi.flow.r")
```
Output plot is saved in ```figures/sig_ranked_correlation```.
<br/>
Generate scatter plots of correlation between selected signature scores in selected clusters (for Figure 6a):
```{r}
source("R/sig_correlation_scatterplots.r")
```
Output plots are saved in ```figures/sig_ranked_correlation```.
<br/>
Generate a correlation plot (for FIgure 6a):
```{r}
source("R/scores_correlation_heatmap_custom_order.r")
```
Output plot is saved in ```figures/sig_ranked_correlation```.
<br/>
# 11. Differential Expression of selected surface proteins in pDC between low and high responders
```{r}
source("R/CD86_HLA-DR_vs_response.r")
```
Output plot is saved in ```figures/adt_vs_response```.
<br/>
# 12. Comparing the TGSig and SLE-Sig scores in females versus males
```{r}
source("R/TGSig_SLE.sig_vs_sex.r")
```
Output plot is saved in ```figures/male_vs_female```.
<br/>
<br/>
<br/>