You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SmCCNet is a framework that adeptly integrates single or multiple omics data types along with a quantitative or binary phenotype of interest. It offers a streamlined setup process that can be tailored manually or configured automatically, ensuring a flexible and user-friendly experience. The algorithm is based on sparse multiple canonical analysis (SmCCA) and is designed for \(T\) omics data types \(X_1, X_2, ..., X_T\) along with a quantitative phenotype \(Y\). SmCCA identifies canonical weights \(w_1, w_2, ..., w_T\) that maximize the sum of pairwise canonical correlations between the omics data and \(Y\), subject to certain constraints. In SmCCNet, LASSO (Least Absolute Shrinkage and Selection Operator) is used as the sparsity constraint function.
34
+
SmCCNet is a framework designed for integrating one or multiple types of omics data with a quantitative or binary phenotype. It's based on the concept of sparse multiple canonical analysis (SmCCA) and sparse partial least squared discriminant analysis (SPLSDA) and aims to find relationships between omics data and a specific phenotype. The framework uses LASSO (Least Absolute Shrinkage and Selection Operator) for sparsity constraints, allowing it to identify significant features within the data.
35
35
36
-
The algorithm can operate in both weighted and unweighted modes, depending on whether \(a_{i,j}\) and \(b_i\) (scaling factors) are equal or not. When \(a_{i,j}\) and \(b_i\) are not all equal, it corresponds to the weighted version; otherwise, it corresponds to the unweighted version, where \(a_{i,j} = b_i = 1\) for all \(i\) and \(j\).
36
+
The algorithm has two modes: weighted and unweighted. In the weighted mode, it uses different scaling factors for each data type, while in the unweighted mode, all scaling factors are equal. The choice of mode affects how the data is analyzed and interpreted.
37
37
38
-
The sparsity penalties \(c_t\) determine the number of features included in each subnetwork. SmCCNet follows a workflow that involves creating a network similarity matrix using SmCCA canonical weights from repeated subsampled omics data and the phenotype. It then identifies multi-omics modules relevant to the phenotype. The subsampling scheme enhances network robustness by analyzing a subset of omics features multiple times and aggregating results from each subsampling step.Below are the four steps of SmCCNet workflow
38
+
SmCCNet's workflow consists of four main steps:
39
39
40
+
**Determine Sparsity Penalties**: The user selects sparsity penalties for omics feature selection, either based on study needs, prior knowledge, or through a K-fold cross-validation procedure. This step ensures the selection of features is generalizable and avoids overfitting.
40
41
41
-
- Step I: Determine SmCCA sparsity penalties $c_t$. The user can select the penalties for omics feature selection based on the study purpose and/or prior knowledge. Alternatively, one can pick sparsity penalties based on a K-fold cross validation (CV) procedure that minimizes the total prediction error. The K-fold CV procedure ensures selected penalties to be generalizable to similar independent data sets and prevents over-fitting.
42
-
- Step II: Randomly subsample omics features without replacement, apply SmCCA with chosen penalties, and compute a feature relationship matrix for each subset. Repeat the process many times and define the similarity matrix to be the average of all feature relationship matrices.
43
-
- Step III: Apply hierarchical tree cutting to the similarity matrix to find the multi-omics networks. This step simultaneously identifies multiple subnetworks.
44
-
- Step Iv: Prune and summarize each network with our network pruning algorithm.
42
+
**Subsample and Apply SmCCA**: Omics features are randomly subsampled and analyzed using SmCCA with the chosen penalties. This process is repeated multiple times to create a feature relationship matrix, which is then averaged to form a similarity matrix.
43
+
44
+
**Identify Multi-Omics Networks**: The similarity matrix is analyzed using hierarchical tree cutting to identify multiple subnetworks that are relevant to the phenotype.
45
+
46
+
**Prune and Summarize Networks**: Finally, the identified networks are pruned and summarized using a network pruning algorithm, refining the results to highlight the most significant findings.
45
47
46
48
# SmCCNet Key Features
47
49
50
+
There are three major computational algorithms that are used for difrerent number of datasets and phenotype modalities:
0 commit comments