|
| 1 | +# Beta Diversity Analysis |
| 2 | +This tutorial is to use R-based functions as well as Python scripts to estimate the beta diversity of microbiomes using metaphlan profiles. |
| 3 | + |
| 4 | +## R-based method |
| 5 | + |
| 6 | +#### R packages required |
| 7 | + |
| 8 | +* [vegan](https://cran.r-project.org/web/packages/vegan/index.html) |
| 9 | +* [ggplot2](https://ggplot2.tidyverse.org/) |
| 10 | +* [ape](https://cran.r-project.org/web/packages/ape/index.html) |
| 11 | +* [tidyverse](https://www.tidyverse.org/packages/) |
| 12 | + |
| 13 | +#### Beta diversity analysis, visualization and significance assessment |
| 14 | + |
| 15 | +Open a new working R script, and load our funtion-packed R script from which you can use relavant modules. |
| 16 | + |
| 17 | +```{r} |
| 18 | +>source(file = "path_to_the_package/KunDH-2023-CRM-MSM_metagenomics/scripts/functions/beta_diversity_funcs.R") |
| 19 | +``` |
| 20 | + |
| 21 | +Load a [matrix table](../example_data/matrix_species_relab.tsv) of species relative abundances quantified by MetaPhlAn and a [metadata table](../example_data/metadata_of_matrix_species_relab.tsv) which matches the matrix table row by row, namely in both matrix table and metadata table each row indicates the sample sample. |
| 22 | + |
| 23 | +```{r} |
| 24 | +>matrix <- read.csv("path_to_the_package/KunDH-2023-CRM-MSM_metagenomics/example_data/matrix_species_relab.tsv", |
| 25 | + header = TRUE, |
| 26 | + sep = "\t") |
| 27 | +>metadata <- read.csv("path_to_the_package/KunDH-2023-CRM-MSM_metagenomics/example_data/metadata_of_matrix_species_relab.tsv", |
| 28 | + header = TRUE, |
| 29 | + sep = "\t") |
| 30 | +``` |
| 31 | + |
| 32 | +Now, you would like to test the significance of the sample segragating due to the variable of interest while adjusting covariables such as BMI and disease status, etc. Here, we use function `est_permanova` which implements [PERMANOVA](https://rdrr.io/rforge/vegan/man/adonis.html) analysis, specifying arguments: |
| 33 | + * `mat`: the loaded matrix from metaphlan-style table, [dataframe]. |
| 34 | + * `md`: the metadata table pairing with the matrix, [dataframe]. |
| 35 | + * `variable`: specify the variable for testing, [string]. |
| 36 | + * `covariables`: give a vector of covariables for adjustment, [vector]. |
| 37 | + * `nper`: the number of permutation, [int], default: [999]. |
| 38 | + * `to_rm`: a vector of values in "variable" column where the corresponding rows will be removed first. |
| 39 | + * `by_method`: "terms" will assess significance for each term, sequentially; "margin" will assess the marginal effects of the terms. |
| 40 | + |
| 41 | +Here, we show an example by testing variable *condom use* while adjusting covariables including *antibiotics use*, *HIV status*, *BMI*, *Diet* and *Inflamatory bowel diseases* which might play a role in explaining the inter-individual variation in the gut microbiome composition. |
| 42 | + |
| 43 | +```{r} |
| 44 | +>est_permanova(mat = matrix, |
| 45 | + md = metadata, |
| 46 | + variable = "condom_use", |
| 47 | + covariables = c("Antibiotics_6mo", "HIV_status", "inflammatory_bowel_disease", "BMI_kg_m2_WHO", "diet"), |
| 48 | + nper = 999, |
| 49 | + to_rm = c("no_receptive_anal_intercourse"), |
| 50 | + by_method = "margin") |
| 51 | +
|
| 52 | + Df SumOfSqs R2 F Pr(>F) |
| 53 | +condom_use 4 1.2161 0.08194 1.5789 0.008 ** |
| 54 | +Antibiotics_6mo 2 0.4869 0.03281 1.2643 0.160 |
| 55 | +HIV_status 1 0.3686 0.02484 1.9146 0.030 * |
| 56 | +inflammatory_bowel_disease 1 0.2990 0.02015 1.5529 0.066 . |
| 57 | +BMI_kg_m2_WHO 5 1.8376 0.12382 1.9087 0.002 ** |
| 58 | +diet 3 0.8579 0.05781 1.4853 0.036 * |
| 59 | +Residual 49 9.4347 0.63571 |
| 60 | +Total 65 14.8412 1.00000 |
| 61 | +--- |
| 62 | +Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 |
| 63 | +``` |
| 64 | + |
| 65 | +Next, to visualize the sample segragation based on microbiome beta diversity we can use function `plot_pcoa` function which needs input arguments: |
| 66 | + * `mat`: the loaded matrix from metaphlan-style table, [dataframe]. |
| 67 | + * `md`: the metadata table pairing with the matrix, [dataframe]. |
| 68 | + * `dist_method`: the method for calculating beta diversity, [string]. default: ["bray"]. For other methods, refer to [vegdist()](https://rdrr.io/cran/vegan/man/vegdist.html). |
| 69 | + * `fsize`: the font size of labels, [int]. default: [11] |
| 70 | + * `dsize`: the dot size of scatter plot, [int]. default: [3] |
| 71 | + * `fstyle`: the font style, [string]. default: ["Arial"] |
| 72 | + * `variable`: specify the variable name based on which to group samples, [string]. |
| 73 | + * `to_rm`: a vector of values in "variable" column where the corresponding rows will be excluded first before analysis. |
| 74 | + |
| 75 | +Below, we are showcasing how to inspect the beta diversity of microbiomes from the angle of five different variables. |
| 76 | + |
| 77 | +```{r} |
| 78 | +>pcoa_condom_use <- pcoa_plot(mat = matrix, |
| 79 | + md = metadata, |
| 80 | + dist_method = "bray", |
| 81 | + fsize = 11, |
| 82 | + dsize = 3, |
| 83 | + fstyle = "Arial", |
| 84 | + variable = "condom_use", |
| 85 | + to_rm = c("no_receptive_anal_intercourse")) |
| 86 | +>pcoa_STI <- pcoa_plot(mat = matrix, |
| 87 | + md = metadata, |
| 88 | + dist_method = "bray", |
| 89 | + fsize = 11, |
| 90 | + dsize = 3, |
| 91 | + fstyle = "Arial", |
| 92 | + variable = "STI") |
| 93 | +>pcoa_number_of_partners <- pcoa_plot(mat = matrix, |
| 94 | + md = metadata, |
| 95 | + dist_method = "bray", |
| 96 | + fsize = 11, |
| 97 | + dsize = 3, |
| 98 | + fstyle = "Arial", |
| 99 | + variable = "number_partners") |
| 100 | +>pcoa_rai <- pcoa_plot(mat = matrix, |
| 101 | + md = metadata, |
| 102 | + dist_method = "bray", |
| 103 | + fsize = 11, |
| 104 | + dsize = 3, |
| 105 | + fstyle = "Arial", |
| 106 | + variable = "receptive_anal_intercourse") |
| 107 | +>pcoa_oral_sex <- pcoa_plot(mat = matrix, |
| 108 | + md = metadata, |
| 109 | + dist_method = "bray", |
| 110 | + fsize = 11, |
| 111 | + dsize = 3, |
| 112 | + fstyle = "Arial", |
| 113 | + variable = "oral.sex") |
| 114 | +>pcoa_lubricant_use <- pcoa_plot(mat = matrix, |
| 115 | + md = metadata, |
| 116 | + dist_method = "bray", |
| 117 | + fsize = 11, |
| 118 | + dsize = 3, |
| 119 | + fstyle = "Arial", |
| 120 | + variable = "lubricant") |
| 121 | +
|
| 122 | +>ggarrange(pcoa_rai, pcoa_lubricant_use, pcoa_STI, |
| 123 | + pcoa_oral_sex, pcoa_number_of_partners, pcoa_condom_use, |
| 124 | + nrow = 2, ncol = 3) |
| 125 | +``` |
| 126 | + |
| 127 | + |
| 128 | + |
| 129 | +## Python-based method |
| 130 | + |
| 131 | +## A method mixing R and Python |
0 commit comments