forked from MonashBioinformaticsPlatform/LFQ-Analyst
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathLFQ_report.Rmd
138 lines (93 loc) · 4.65 KB
/
LFQ_report.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
title: "LFQ-Analyst report"
date: "`r format(Sys.time(), '%d %B, %Y')`"
params:
data: NA
alpha: NA
lfc: NA
num_signif: NA
tested_contrasts: NA
numbers_input: NA
coverage_input: NA
pca_input: NA
correlation_input: NA
missval_input: NA
detect_input: NA
imputation_input: NA
p_hist_input: NA
heatmap_input: NA
dep: NA
pg_width: NA
cvs_input: NA
output:
pdf_document:
fig_caption: yes
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(opts.label="kill_prefix") # Remove line number and Comment "##" from printing
```
## Method details
The raw data files were analyzed using MaxQuant to obtain protein identifications and their respective label-free quantification values using in-house standard parameters. Of note, the data were normalization based on the assumption that the majority of proteins do not change between the different conditions.
Statistical analysis was performed using an in-house generated R script based on the ProteinGroup.txt file. First, contaminant proteins, reverse sequences and proteins identified “only by site” were filtered out. In addition, proteins that have been only identified by a single peptide and proteins not identified/quantified consistantly in same condition have been removed as well. The LFQ data was converted to log2 scale, samples were grouped by conditions and missing values were imputed using the ‘Missing not At Random’ (MNAR) method, which uses random draws from a left-shifted Gaussian distribution of 1.8 StDev (standard deviation) apart with a width of 0.3. Protein-wise linear models combined with empirical Bayes statistics were used for the differential expression analyses. The _limma_ package from R Bioconductor was used to generate a list of differentially expressed proteins for each pair-wise comparison. A cutoff of the _adjusted p-value_ of 0.05 (Benjamini-Hochberg method) along with a |log2 fold change| of 1 has been applied to determine significantly regulated proteins in each pairwise comparison.
### Quick summary of parameters used:
* Tested pairwise comparisons = <span style="color:blue">`r params$tested_contrasts`</span>
* Adjusted _p-value_ cutoff <= <span style="color:blue">`r params$alpha`</span>
* Log fold change cutoff >= <span style="color:blue">`r params$lfc`</span>
## Results
#### MaxQuant result output contains `r nrow(params$data)` proteins groups of which _`r nrow(params$dep())`_ proteins were reproducibly quantified.
#### `r params$num_signif` proteins differ significantly between samples.
\pagebreak
## Exploratory Analysis (QC Plots)
#### Principle Component Analysis (PCA) plot
```{r pca_plot, echo=FALSE, fig.height= 4, fig.align='center', warning=FALSE}
print(params$pca_input())
```
\pagebreak
#### Sample Correlation matrix
```{r correlation_heatmap, echo=FALSE, fig.keep='first',fig.align='center'}
print(params$correlation_input())
```
\pagebreak
#### Sample Coefficient of variation (CVs)
```{r sample_cv, echo=FALSE, warning=FALSE, message=FALSE, fig.align='center'}
print(params$cvs_input())
```
\pagebreak
### Proteomics Experiment Summary
Protein quantified per sample (after pre-processing).
```{r numbers, echo=FALSE, warning=FALSE, results='hide', message=FALSE }
print(params$numbers_input())
```
\pagebreak
Protein overlap in all samples.
```{r coverage, echo=FALSE, warning=FALSE}
print(params$coverage_input())
```
\pagebreak
## Missing Value handling
#### Missing value heatmap
A heatmap for proteins with missing value in each dataset. Each row represent a protein with missing value in one or more replicate. Each replicate is clustered based on presence of missing values in the sample.
```{r missing_value_heatmap, echo=FALSE, message=FALSE, results='hide',warning=FALSE, tidy=TRUE}
params$missval_input()
```
\pagebreak
#### Missing value distribution
Protein expression distribution before and after imputation. The plot showing the effect of imputation on protein expression distribution.
```{r imputation_effect, echo=FALSE, message=FALSE, warning=FALSE, results='hide'}
print(params$imputation_input())
```
\pagebreak
## Differential Expression Analysis (Results Plots)
#### Heatmap
A plot representing an overview of expression of all significant (differencially expressed) proteins (rows) in all samples (columns).
```{r heatmap_2, echo=FALSE, warning=FALSE, results='hide', fig.keep='first',fig.align='center'}
print(params$heatmap_input())
```
\pagebreak
#### Volcano Plots
```{r volcano, echo=FALSE, warning=FALSE, comment=NA,fig.align='center'}
for(i in params$tested_contrasts){
# print(paste0('volcano_plot_',i,sep=""))
print(plot_volcano_new(params$dep(),contrast = i,label_size = 2, add_names = F))
}
```