-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathREADME.Rmd
205 lines (140 loc) · 12.4 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "vignettes/figures/README-",
out.width = "100%"
)
options(tibble.print_min = 5, tibble.print_max = 5)
```
# SmCCNet: A Comprehensive Tool for Multi-Omics Network Inference <a href=""><img src="vignettes/figures/logo.jpg" align="right" height="98" /></a>
<!-- badges: start -->
[](https://cran.r-project.org/web/packages/SmCCNet/index.html)
<!-- badges: end -->
**Note:** if you use SmCCNet in published research, please cite:
> Liu, W., Vu, T., Konigsberg, I. R., Pratte, K. A., Zhuang, Y., & Kechris, K. J. (2023). SmCCNet 2.0: an Upgraded R package for Multi-omics Network Inference. bioRxiv, 2023-11.
> Shi, W. J., Zhuang, Y., Russell, P. H., Hobbs, B. D., Parker, M. M., Castaldi, P. J., ... & Kechris, K. (2019). Unsupervised discovery of phenotype-specific multi-omics networks. Bioinformatics, 35(21), 4336-4343.
## Overview
SmCCNet is a framework designed for integrating one or multiple types of omics data with a quantitative or binary phenotype. It's based on the concept of sparse multiple canonical analysis (SmCCA) and sparse partial least squared discriminant analysis (SPLSDA) and aims to find relationships between omics data and a specific phenotype. The framework uses LASSO (Least Absolute Shrinkage and Selection Operator) for sparsity constraints, allowing it to identify significant features within the data.
The algorithm has two modes: weighted and unweighted. In the weighted mode, it uses different scaling factors for each data type, while in the unweighted mode, all scaling factors are equal. The choice of mode affects how the data is analyzed and interpreted.
SmCCNet's workflow consists of four main steps:
**Determine Sparsity Penalties**: The user selects sparsity penalties for omics feature selection, either based on study needs, prior knowledge, or through a K-fold cross-validation procedure. This step ensures the selection of features is generalizable and avoids overfitting.
**Subsample and Apply SmCCA**: Omics features are randomly subsampled and analyzed using SmCCA with the chosen penalties. This process is repeated multiple times to create a feature relationship matrix, which is then averaged to form a similarity matrix.
**Identify Multi-Omics Networks**: The similarity matrix is analyzed using hierarchical tree cutting to identify multiple subnetworks that are relevant to the phenotype.
**Prune and Summarize Networks**: Finally, the identified networks are pruned and summarized using a network pruning algorithm, refining the results to highlight the most significant findings.
# SmCCNet Key Features
There are three major computational algorithms that are used for difrerent number of datasets and phenotype modalities:
- Sparse Multiple Canonical Correlation Analysis (SmCCA)
- Sparse Partial Least Squared Discriminant Analysis (SPLSDA)
- Sparse Canonical Correlation Analysis (SCCA)
Unlock the Power of SmCCNet with These Key Features:
- 🧬 **Multi-Omics Network Inference**
- With Quantitative Phenotype (SmCCA)
- With Binary Phenotype (SmCCA + SPLSDA)
- 📊 **Single-Omics Network Inference**
- With Quantitative Phenotype (SCCA)
- With Binary Phenotype (SPLSDA)
- 🚀 **Automation Simplified**
- Automated SmCCNet with a Single Line of Code
# SmCCNet Network Visualization
The final network generated from SmCCNet can be visualized in two ways:
- Shiny Application simply by uplooding the final .Rdata to [SmCCNet Visualization Application](https://smccnet.shinyapps.io/smccnetnetwork/).
- Cytoscape Software [Cytoscape](https://cytoscape.org/) through R package [RCy3](https://www.bioconductor.org/packages/release/bioc/html/RCy3.html).
# SmCCNet Workflow
## General Workflow
```{r,echo = FALSE,out.width='100%'}
knitr::include_graphics("vignettes/figures/smccnetworkflow.jpg")
```
## Multi-Omics SmCCNet with Quantitative Phenotype
```{r,echo = FALSE,out.width='100%'}
knitr::include_graphics("vignettes/figures/SmCCNet-Quant.jpg")
```
## Multi-Omics SmCCNet with Binary Phenotype
```{r,echo = FALSE,out.width='100%'}
knitr::include_graphics("vignettes/figures/SmCCNet-Binary.jpg")
```
## Single-Omics SmCCNet
```{r,echo = FALSE,out.width='100%'}
knitr::include_graphics("vignettes/figures/single-omics-smccnet.jpg")
```
## SmCCNet Example Output Product
```{r,echo = FALSE,out.width='100%'}
knitr::include_graphics("vignettes/figures/example_network_continuous.jpg")
```
# Package Functions
The older version of the SmCCNet package includes four (external) functions:
- **getRobustPseudoWeights()**: Compute aggregated (SmCCA) canonical weights.
- **getAbar()**: Calculate similarity matrix based on canonical weights.
- **getMultiOmicsModules()**: Perform hierarchical tree cutting on the similarity matrix and extract clades with multi-omics features.
- **plotMultiOmicsNetwork()**: Plot (pruned or full) multi-omics subnetworks.
In the updated package, all functions except for **getAbar** are retired from the package, additional functions have been added to the package to perform single-/multi-omics SmCCNet with quantitative/binary phenotype, and their use is illustrated in this vignette:
- **aggregateCVSingle()**: Saving cross-validation result as the cross-validation table into the working directory and provide recommendation on the penalty term selection.
- **classifierEval()**: Evaluate binary classifier's performance with respect to user-selected metric (accuracy, auc score, precision, recall, f1).
- **dataPreprocess()**: A simple pipeline to preprocess the data before running SmCCNet (center, scale, coefficient of variation filtering and regressing out covariates).
- **fastAutoSmCCNet()**: Automated SmCCNet automatically identifies the project problem (single-omics vs multi-omics), and type of analysis (CCA for quantitative phenotype vs. PLS for binary phenotype) based on the input data that is provided. This method automatically preprocess data, choose scaling factors, subsampling percentage, and optimal penalty terms, then runs through the complete SmCCNet pipeline without the requirement for users to provide additional information. This function will store all the subnetwork information to a user-provided directory, as well as return all the global network and evaluation information. Refer to the automated SmCCNet vignette for more information.
- **getCanWeightsMulti()**: Run Sparse Multiple Canonical Correlation Analysis (SmCCA) and return canonical weight.
- **getCanCorMulti()**: Get canonical correlation value for SmCCA given canonical weight vectors and scaling factors.
- **getRobustWeightsMulti()**: SmCCNet algorithm with multi-omics data and quantitative phenotype. Calculate the canonical weights for SmCCA.
- **getRobustWeightsMultiBinary()**: SmCCNet algorithm with multi-omics data and binary phenotype. First, SmCCA is used to identify relationship between omics (exlude phenotype). Then, after highly connected omics features are selected in step 1, SPLSDA is used to identify relationships between these omics features and phenotype(s). The sparse PLSDA algorithm for binary outcome first compute PLS by assuming outcome is continuous, and extracts multiple latent factors, then uses latent factors to fit the logistic regression, and weights latent factor by regression parameters.
- **getRobustWeightsSingle()**: Compute aggregated (SmCCA) canonical weights for single omics data with quantitative phenotype.
- **getRobustWeightsSingleBinary()**: Compute aggregated (SmCCA) canonical weights for single omics data with binary phenotype.
- **getOmicsModules()**: Perform hierarchical tree cutting on the similarity matrix and extract clades with omics features.
- **networkPruning()**: Extract summarization scores (the first 3 NetSHy/regular prinicipal components) for specified network module with given network size. The omics features will be ranked based on PageRank algorithm, then the top $m$ omics features (where $m$ is the specified subnetwork size) will be included into the final subnetwork to generate the summarization score. For the PC score, the correlation with respect to the phenotype of interest will be calculated and stored. In addition, the correlation between individual omics features and the detected phenotype (with the **Pheno** argument) will also be recorded. The final subnetwork adjacency matrix will be stored into the user-specified working directory of interest.
- **scalingFactorInput()**: After inputing the annotation of omics data, it uses prompts to ask the user to supply the scaling factor intended for the SmCCNet algorithm to prioritize the correlation structure of interest. All scaling factor values supplied should be numeric and nonnegative.
- **summarizeNetSHy()**: Implement NetSHy network summarization via a hybrid approach to summarize network by considering the network topology with the Laplacian matrix.
## Installation
```{r, eval = FALSE}
# Install package
if (!require("devtools")) install.packages("devtools")
devtools::install_github("KechrisLab/SmCCNet")
# Load package
library(SmCCNet)
```
## Usage
We present below examples of how to execute Automated SmCCNet using a simulated dataset. In this demonstration, we simulate four datasets: two omics data and one phenotype data. We cover four cases in total, involving combinations of single or multi-omics data with either a quantitative or binary phenotype. The final case demonstrates the use of the regress-out approach for covariate adjustment. If users want to run through the pipeline step-by-step or understand more about the algorithm used, please refer to SmCCNet single or multi-omics vignettes for details.
```{r, message=FALSE, warning=FALSE, eval=FALSE}
library(SmCCNet)
set.seed(123)
data("ExampleData")
Y_binary <- ifelse(Y > quantile(Y, 0.5), 1, 0)
# single-omics with binary phenotype
result <- fastAutoSmCCNet(X = list(X1), Y = as.factor(Y_binary),
Kfold = 3,
subSampNum = 100, DataType = c('Gene'),
saving_dir = getwd(), EvalMethod = 'auc',
summarization = 'NetSHy',
CutHeight = 1 - 0.1^10, ncomp_pls = 5)
# single-omics with quantitative phenotype
result <- fastAutoSmCCNet(X = list(X1), Y = Y, Kfold = 3,
preprocess = FALSE,
subSampNum = 50, DataType = c('Gene'),
saving_dir = getwd(), summarization = 'NetSHy',
CutHeight = 1 - 0.1^10)
# multi-omics with binary phenotype
result <- fastAutoSmCCNet(X = list(X1,X2), Y = as.factor(Y_binary),
Kfold = 3, subSampNum = 50,
DataType = c('Gene', 'miRNA'),
CutHeight = 1 - 0.1^10,
saving_dir = getwd(),
EvalMethod = 'auc',
summarization = 'NetSHy',
BetweenShrinkage = 5,
ncomp_pls = 3)
# multi-omics with quantitative phenotype
result <- fastAutoSmCCNet(X = list(X1,X2), Y = Y,
K = 3, subSampNum = 50,
DataType = c('Gene', 'miRNA'),
CutHeight = 1 - 0.1^10,
saving_dir = getwd(),
summarization = 'NetSHy',
BetweenShrinkage = 5)
```
Global network information will be stored in object 'result', and subnetwork information will be stored in the directory user provide. For more information about using Cytoscape to visualize the subnetworks, please refer to the multi-omics vignette section 3.1.
## Getting help
If you encounter a bug, please file an issue with a reproducible example on [GitHub](https://github.com/KechrisLab/SmCCNet/issues). For questions and other discussion, please use [community.rstudio.com](https://community.rstudio.com/).
---
This package is developed by [KechrisLab](https://kechrislab.github.io/), for more questions about the package, please contact [Dr. Katerina Kechris]([email protected]) or [Weixuan Liu]([email protected]).