Skip to content

Commit 4ad3362

Browse files
committed
doc for alpha diversity has been added.
1 parent 32bc14d commit 4ad3362

File tree

5 files changed

+1794
-8
lines changed

5 files changed

+1794
-8
lines changed

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,8 @@
11
# KunDH-2023-CRM-MSM_metagenomics
2+
Computational workflows for reproducing analysis in the study of Huang et al., 2023
3+
4+
#### Table of contents
5+
1. [Setup]()
6+
2. [General usage]()
7+
3. Detailed tutorials
8+
* [Alpha diversity analysis](./docs/alpha_diversity_analysis.md)

docs/alpha_diversity_analysis.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# Alpha Diversity Analysis
2+
This tutorial is to use R-based functions to estimate shannon index and richness of microbiomes using metaphlan profiles. Estimates are further visualized and tested for statistical significance associating with metadata.
3+
4+
#### R packages required
5+
* [SummarizedExperiment](https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html)
6+
* [mia](https://www.bioconductor.org/packages/release/bioc/html/mia.html)
7+
* [ggpubr](https://cran.r-project.org/web/packages/ggpubr/readme/README.html)
8+
* [ggplot2](https://ggplot2.tidyverse.org/)
9+
10+
#### Alpha diversity estimation and visualization
11+
12+
Open a new working R script, and load our funtion-packed R script from which you can use relavant modules.
13+
14+
```{r}
15+
>source(file = "path_to_the_package/KunDH-2023-CRM-MSM_metagenomics/scripts/functions/alpha_diversity_funcs.R")
16+
```
17+
18+
Load a merged metaphlan profile which contains metadata and taxonomic abundances. Here, we are going to use an example file for demostration.
19+
20+
```{r}
21+
>mpa_df <- data.frame(read.csv("path_to_the_package/KunDH-2023-CRM-MSM_metagenomics/example_data/>merged_abundance_table_species_sgb_md.tsv",
22+
header = TRUE,
23+
sep = "\t"))
24+
```
25+
|sample|P057|P054|P052|...|P049|
26+
|----|----|----|---|---|---|---|
27+
|sexual_orientation|MSM|MSM|MSM|...|Non-MSM|
28+
|HIV_status|negative|positive|positive|...|negative|
29+
|ethnicity|Caucasian|Caucasian|Caucasian|...|Caucasian|
30+
|antibiotics_6month|Yes|No|No|...|No|
31+
|BMI_kg_m2_WHO|ObeseClassI|Overweight|Normal|...|Overweight|
32+
|Methanomassiliicoccales_archaeon|0.0|0.0|0.0|...|0.01322|
33+
|...|...|...|...|...|...|
34+
|Methanobrevibacter_smithii|0.0|0.0|0.0|...|0.19154|
35+
36+
Next, we convert the dataframe to [SummarizedExperiment](https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html) data structure to continue the analysis using function `SE_converter` which needs to specify three arguments:
37+
* `md_rows`: a vector specifying the range of rows indicating metadata. Note: 1-based.
38+
* `tax_starting_row`: an interger corresponding to the row where taxonomic abundances start.
39+
* `mpa_md`: a metaphlan table wedged with metadata, in the form of dataframe.
40+
41+
```{r}
42+
>SE <- SE_converter(md_rows = 1:5,
43+
tax_starting_row = 6,
44+
mpa_md = mpa_df)
45+
```
46+
```{r}
47+
>SE
48+
class: SummarizedExperiment
49+
dim: 1676 139
50+
metadata(0):
51+
assays(1): relative_abundance
52+
rownames(1676): Methanomassiliicoccales_archaeon|t__SGB376
53+
GGB1567_SGB2154|t__SGB2154 ... Entamoeba_dispar|t__EUK46681
54+
Blastocystis_sp_subtype_1|t__EUK944036
55+
rowData names(1): species
56+
colnames(139): P057 P054 ... KHK16 KHK11
57+
colData names(5): sexual_orientation HIV_status ethnicity
58+
antibiotics_6month BMI_kg_m2_WHO
59+
```
60+
61+
Now, we can use function `est_alpha_diversity` to estimate shannon index and richness for each metagenomic sample.
62+
63+
```{r}
64+
>alpha_df <- est_alpha_diversity(se_data = SE)
65+
>alpha_df
66+
```
67+
||sexual_orientation|HIV_status|ethnicity|antibiotics_6month|BMI_kg_m2_WHO|observed|shannon|
68+
|----|----|----|-----|------|-----|-----|-----|
69+
|P057|MSM|negative|Caucasian|Yes|ObeseClassI|134|3.1847|
70+
|P054|MSM|positive|Caucasian|No|Overweight|141|2.1197|
71+
|...|...|...|...|...|...|...|...|
72+
|P052|MSM|positive|Caucasian|No|Normal|152|2.5273|
73+
74+
To compare the difference of alpha diversities between groups, we can use function `make_boxplot` with arguments:
75+
* `df`: The dataframe containing microbiome alpha diversities, e.g. `shannon` and `observed` with categorical metadata.
76+
* `xlabel`: the column name one will put along x-axis.
77+
* `ylabel`: the index estimate one will put along y-axis.
78+
* `font_size`: the font size, default: [11]
79+
* `jitter_width`: the jitter width, default: [0.2]
80+
* `dot_size`: the dot size inside the boxplot, default: [1]
81+
* `font_style`: the font style, default: `Arial`
82+
* `pal`: a list of color codes for pallete, e.g. c(#888888, #eb2525). The order corresponds the column order of boxplot.
83+
* `stats`: wilcox rank-sum test. default: `TRUE`
84+
85+
```{r}
86+
>shannon <- make_boxplot(alpha_df, "sexual_orientation", "shannon", stats = TRUE, pal = c("#888888", "#eb2525"))
87+
>richness <- make_boxplot(alpha_df, "sexual_orientation", "observed", stats = TRUE, pal = c("#888888", "#eb2525"))
88+
>multi_plot <- ggplot2::ggarrange(shannon, richness, ncol = 2)
89+
>ggplot2::ggsave(file = "shannon_richness.svg", plot = multi_plot, width = 4, height = 5)
90+
```
91+
![Alpha Diversity Plots](../images/alpha_diversity_plots.png)
92+
93+
**Note:** The figure displayed above had been edited using [inkscape](https://inkscape.org/) on the base of the crude output in order to enhance the readability and aesthetic sense.

0 commit comments

Comments
 (0)