|
| 1 | +# Alpha Diversity Analysis |
| 2 | +This tutorial is to use R-based functions to estimate shannon index and richness of microbiomes using metaphlan profiles. Estimates are further visualized and tested for statistical significance associating with metadata. |
| 3 | + |
| 4 | +#### R packages required |
| 5 | +* [SummarizedExperiment](https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html) |
| 6 | +* [mia](https://www.bioconductor.org/packages/release/bioc/html/mia.html) |
| 7 | +* [ggpubr](https://cran.r-project.org/web/packages/ggpubr/readme/README.html) |
| 8 | +* [ggplot2](https://ggplot2.tidyverse.org/) |
| 9 | + |
| 10 | +#### Alpha diversity estimation and visualization |
| 11 | + |
| 12 | +Open a new working R script, and load our funtion-packed R script from which you can use relavant modules. |
| 13 | + |
| 14 | +```{r} |
| 15 | +>source(file = "path_to_the_package/KunDH-2023-CRM-MSM_metagenomics/scripts/functions/alpha_diversity_funcs.R") |
| 16 | +``` |
| 17 | + |
| 18 | +Load a merged metaphlan profile which contains metadata and taxonomic abundances. Here, we are going to use an example file for demostration. |
| 19 | + |
| 20 | +```{r} |
| 21 | +>mpa_df <- data.frame(read.csv("path_to_the_package/KunDH-2023-CRM-MSM_metagenomics/example_data/>merged_abundance_table_species_sgb_md.tsv", |
| 22 | + header = TRUE, |
| 23 | + sep = "\t")) |
| 24 | +``` |
| 25 | +|sample|P057|P054|P052|...|P049| |
| 26 | +|----|----|----|---|---|---|---| |
| 27 | +|sexual_orientation|MSM|MSM|MSM|...|Non-MSM| |
| 28 | +|HIV_status|negative|positive|positive|...|negative| |
| 29 | +|ethnicity|Caucasian|Caucasian|Caucasian|...|Caucasian| |
| 30 | +|antibiotics_6month|Yes|No|No|...|No| |
| 31 | +|BMI_kg_m2_WHO|ObeseClassI|Overweight|Normal|...|Overweight| |
| 32 | +|Methanomassiliicoccales_archaeon|0.0|0.0|0.0|...|0.01322| |
| 33 | +|...|...|...|...|...|...| |
| 34 | +|Methanobrevibacter_smithii|0.0|0.0|0.0|...|0.19154| |
| 35 | + |
| 36 | +Next, we convert the dataframe to [SummarizedExperiment](https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html) data structure to continue the analysis using function `SE_converter` which needs to specify three arguments: |
| 37 | + * `md_rows`: a vector specifying the range of rows indicating metadata. Note: 1-based. |
| 38 | + * `tax_starting_row`: an interger corresponding to the row where taxonomic abundances start. |
| 39 | + * `mpa_md`: a metaphlan table wedged with metadata, in the form of dataframe. |
| 40 | + |
| 41 | +```{r} |
| 42 | +>SE <- SE_converter(md_rows = 1:5, |
| 43 | + tax_starting_row = 6, |
| 44 | + mpa_md = mpa_df) |
| 45 | +``` |
| 46 | +```{r} |
| 47 | +>SE |
| 48 | +class: SummarizedExperiment |
| 49 | +dim: 1676 139 |
| 50 | +metadata(0): |
| 51 | +assays(1): relative_abundance |
| 52 | +rownames(1676): Methanomassiliicoccales_archaeon|t__SGB376 |
| 53 | + GGB1567_SGB2154|t__SGB2154 ... Entamoeba_dispar|t__EUK46681 |
| 54 | + Blastocystis_sp_subtype_1|t__EUK944036 |
| 55 | +rowData names(1): species |
| 56 | +colnames(139): P057 P054 ... KHK16 KHK11 |
| 57 | +colData names(5): sexual_orientation HIV_status ethnicity |
| 58 | + antibiotics_6month BMI_kg_m2_WHO |
| 59 | +``` |
| 60 | + |
| 61 | +Now, we can use function `est_alpha_diversity` to estimate shannon index and richness for each metagenomic sample. |
| 62 | + |
| 63 | +```{r} |
| 64 | +>alpha_df <- est_alpha_diversity(se_data = SE) |
| 65 | +>alpha_df |
| 66 | +``` |
| 67 | +||sexual_orientation|HIV_status|ethnicity|antibiotics_6month|BMI_kg_m2_WHO|observed|shannon| |
| 68 | +|----|----|----|-----|------|-----|-----|-----| |
| 69 | +|P057|MSM|negative|Caucasian|Yes|ObeseClassI|134|3.1847| |
| 70 | +|P054|MSM|positive|Caucasian|No|Overweight|141|2.1197| |
| 71 | +|...|...|...|...|...|...|...|...| |
| 72 | +|P052|MSM|positive|Caucasian|No|Normal|152|2.5273| |
| 73 | + |
| 74 | +To compare the difference of alpha diversities between groups, we can use function `make_boxplot` with arguments: |
| 75 | +* `df`: The dataframe containing microbiome alpha diversities, e.g. `shannon` and `observed` with categorical metadata. |
| 76 | +* `xlabel`: the column name one will put along x-axis. |
| 77 | +* `ylabel`: the index estimate one will put along y-axis. |
| 78 | +* `font_size`: the font size, default: [11] |
| 79 | +* `jitter_width`: the jitter width, default: [0.2] |
| 80 | +* `dot_size`: the dot size inside the boxplot, default: [1] |
| 81 | +* `font_style`: the font style, default: `Arial` |
| 82 | +* `pal`: a list of color codes for pallete, e.g. c(#888888, #eb2525). The order corresponds the column order of boxplot. |
| 83 | +* `stats`: wilcox rank-sum test. default: `TRUE` |
| 84 | + |
| 85 | +```{r} |
| 86 | +>shannon <- make_boxplot(alpha_df, "sexual_orientation", "shannon", stats = TRUE, pal = c("#888888", "#eb2525")) |
| 87 | +>richness <- make_boxplot(alpha_df, "sexual_orientation", "observed", stats = TRUE, pal = c("#888888", "#eb2525")) |
| 88 | +>multi_plot <- ggplot2::ggarrange(shannon, richness, ncol = 2) |
| 89 | +>ggplot2::ggsave(file = "shannon_richness.svg", plot = multi_plot, width = 4, height = 5) |
| 90 | +``` |
| 91 | + |
| 92 | + |
| 93 | +**Note:** The figure displayed above had been edited using [inkscape](https://inkscape.org/) on the base of the crude output in order to enhance the readability and aesthetic sense. |
0 commit comments