Skip to content

Commit 5f360c2

Browse files
committed
added python-based beta diversity analysis
1 parent 5a775eb commit 5f360c2

9 files changed

+3146
-0
lines changed

docs/beta_diversity_analysis.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,4 +128,69 @@ Below, we are showcasing how to inspect the beta diversity of microbiomes from t
128128

129129
## Python-based method
130130

131+
#### R packages required
132+
* [scikit-bio >= 0.5.6](https://scikit.bio/)
133+
* [pandas >= 1.3.5](https://pandas.pydata.org/)
134+
* [numpy >= 1.23.5](https://numpy.org/)
135+
* [matplotlib >= 3.5.0](https://matplotlib.org/)
136+
* [seaborn >= 0.11.2](https://seaborn.pydata.org/)
137+
138+
#### Beta diversity analysis with PCoA plotting integrating maximum three variables
139+
Here, we introduce a python script `multi_variable_pcoa_plot.py` in the `path_to_the_package/KunDH-2023-CRM-MSM_metagenomics/scripts` to perform PCoA analysis:
140+
141+
```{python}
142+
usage: multi_variable_pcoa_plot.py [-h] [--abundance_table [ABUNDANCE_TABLE]] [--metadata [METADATA]] [--transformation [TRANSFORMATION]] [--metric [METRIC]] [--amplifier [AMPLIFIER]] [--sample_column [SAMPLE_COLUMN]] [--variable1 [VARIABLE1]] [--variable2 [VARIABLE2]] [--variable3 [VARIABLE3]]
143+
[--marker_palette [MARKER_PALETTE]] [--marker_shapes [MARKER_SHAPES]] [--marker_sizes [MARKER_SIZES]] [--output_figure [OUTPUT_FIGURE]] [--test [TEST]] [--df_opt [DF_OPT]] [--font_style [FONT_STYLE]] [--font_size [FONT_SIZE]]
144+
145+
This program is to do PCoA analysis on microbial taxonomic or functional abundance data integrating maximum three variables together.
146+
147+
optional arguments:
148+
-h, --help show this help message and exit
149+
--abundance_table [ABUNDANCE_TABLE]
150+
Input the merged abundance table generated by MetaPhlAn.
151+
--metadata [METADATA]
152+
Input a tab-delimited metadata file.
153+
--transformation [TRANSFORMATION]
154+
Specify the tranformation function applied on data points in the original table. For abundance table, you can choose <sqrt>/<log>. Default setting is <None>.
155+
--metric [METRIC] Specify the metric you want to use for calculating beta diversity in the case of as input using abundance table.<braycurtis>/<unweighted_unifrac>/<jaccard>/<weighted_unifrac>. Default setting is <braycurtis>
156+
--amplifier [AMPLIFIER]
157+
Specify how much you want to amplify your original data point. For example, <--amplifier 100> indicates that all original data point times 100. Default is 1.
158+
--sample_column [SAMPLE_COLUMN]
159+
Specify the header of column containing metagenome sample names in the metadata file.
160+
--variable1 [VARIABLE1]
161+
Specify the header of the variable in the metadata table you want to assess. This variable will be represented by colors.
162+
--variable2 [VARIABLE2]
163+
Specify the header of second variable in the metadata table you want to assess. This variable will be represented by marker shapes.
164+
--variable3 [VARIABLE3]
165+
Specify the header of the third variable in the metadata table you want to assess. This variable will be represented by marker sizes.
166+
--marker_palette [MARKER_PALETTE]
167+
Input a tab-delimited mapping file where 1st column contains group names and 2nd column contains color codes. default: [None] (automatic handling)
168+
--marker_shapes [MARKER_SHAPES]
169+
Input a tab-delimited mapping file where 1st column contains group names and 2nd column contains marker shapes. default: [None] (automatic handling)
170+
--marker_sizes [MARKER_SIZES]
171+
Input a tab-delimited mapping file where values are group names and keys are marker size. default: [None] (automatic handling)
172+
--output_figure [OUTPUT_FIGURE]
173+
Specify the name for the output figure. For example, output_figure.svg
174+
--test [TEST] Specify an output file for saving permanova test results. For example, project_name
175+
--df_opt [DF_OPT] Specify the output name for saving coordinates (PC1 and PC2) for each sample. For example, project_name_coordinates.tsv
176+
--font_style [FONT_STYLE]
177+
Specify the font style which is composed by font family and font type, delimited with a comma. default: [sans-serif,Arial]
178+
--font_size [FONT_SIZE]
179+
Specify the font size. default: [11]
180+
181+
examples:
182+
pcoa_painter.py --abundance_table <merged_metaphlan_table> --metadata <metadata> --sample_column <sample_header> --variable1 <variable1_name> --variable2 <variable2_name> --variable3 <variable3_name> --output_figure <output.png>
183+
```
184+
185+
To demostrate the usage of `multi_variable_pcoa_plot.py`, we will drawa PCoA plot based on [microbiome compistion of samples](../example_data/mvpp_mpa_species_relab.tsv.bz2) from 11 populations grouped as *W (Westernization)*, *NW (Non-Westernization)*, *NWU (Non-Westernization(Urban))* and *MSM (Men-having-sex-with-men)*. Different populations will be assigned with custom colors using a [color map file](../example_data/mvpp_color_map.tsv) and *MSM* population will be highlighted with larger marker size using a [marker size map file](../example_data/mvpp_marker_size_map.tsv). The metadata of each sample is provided by a [metadata file](../example_data/mvpp_metadata.tsv).
186+
187+
Example command:
188+
~~~
189+
$multi_variable_pcoa_plot.py --abundance_table mvpp_mpa_species_relab.tsv --metadata mvpp_metadata.tsv --sample_column sample --variable1 country --variable2 westernization --variable3 country --output_figure mvpp_pcoa.png --test mvpp_permanova.tsv --df_opt mvpp_coordinates_df.tsv --marker_palette mvpp_color_map.tsv --marker_sizes mvpp_marker_size_map.tsv
190+
~~~
191+
192+
![Multiple variable PCoA plot](../images/mvpp_pcoa.png)
193+
194+
As optional ouputs, `multi_variable_pcoa_plot.py` also generates non-adjustment PERMANOVA test (e.g. [mvpp_permanova.tsv](../example_data/mvpp_permanova.tsv)) and coordinates of PC1 and PC2 (e.g. [mvpp_coordinates.tsv](../example_data/mvpp_coordinates.tsv)) which can be used in visualization in other ways we will discuss shortly below.
195+
131196
## A method mixing R and Python

example_data/mvpp_color_map.tsv

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
SWE #544179
2+
ITA #6166B3
3+
USA #32C1CD
4+
JPN #17D7A0
5+
FJI #E02401
6+
MDG #F78812
7+
TZA #AB6D23
8+
GHA #FFCCD2
9+
ETH #FF5C58
10+
MSM #888888
11+
CHN #000000

0 commit comments

Comments
 (0)