Skip to content

Commit 8f1a94f

Browse files
committed
added mosaic plot tutorial
1 parent f3e0462 commit 8f1a94f

File tree

6 files changed

+223
-0
lines changed

6 files changed

+223
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,4 @@ Computational workflows for reproducing analysis in the study of Huang et al., 2
1010
* [Beta diversity analysis](./docs/beta_diversity_analysis.md)
1111
* [Co-presence analysis](./docs/copresence_analysis.md)
1212
* [ComplexHeatmap plotting](./docs/make_ComplexHeatmap.md)
13+
* [Make mosaic plot](./docs/make_mosaic_plot.md)

docs/make_mosaic_plot.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Make mosaic plot
2+
This tutorial is to use a python script to draw a mosaic plot for visualizing frequency distribution of two variables.
3+
4+
#### Python packages required
5+
6+
* [Pandas](https://pandas.pydata.org/)
7+
* [SciPy](https://scipy.org/)
8+
* [Matplotlib](https://matplotlib.org/)
9+
* [statsmodels](https://www.statsmodels.org/stable/index.html)
10+
11+
##### Drawing a mosaic plot using `mosaic_plot.py`
12+
13+
You will use a python script [mosaic_plot.py](../scripts/mosaic_plot.py) in the path `path_to_the_package/KunDH-2023-CRM-MSM_metagenomics/scripts/`, and a table containing species associated with *MSM* and *Non-MSM* individuals which were identified as Gram-negative or not in [two_variable_mosaic.tsv](../example_data/two_variable_mosaic.tsv).
14+
15+
```{python}
16+
usage: mosaic_plot.py [-h] [--input [INPUT]] [--facecolor_map [FACECOLOR_MAP]] [--font_style [FONT_STYLE]] [--output [OUTPUT]]
17+
18+
This program is to draw a mosaic plot.
19+
20+
optional arguments:
21+
-h, --help show this help message and exit
22+
--input [INPUT] Input a file containing two variable information regarding each individual subject.
23+
--facecolor_map [FACECOLOR_MAP]
24+
Specify the the pathway to SCFA metabolisms database. default: /vol/projects/khuang/databases/SCFA/SCFA_pathways.tsv
25+
--font_style [FONT_STYLE]
26+
Specify the font style, font family and font type is delimited by a comma. default: [sans-serif,Arial]
27+
--output [OUTPUT] Specify the output figure name.
28+
29+
examples: mosaic_plot.py --input input_file.tsv --facecolor_map facecolor_mapfile.tsv --output mosaic_plot.png
30+
```
31+
32+
Example command:
33+
```{bash}
34+
mosaic_plot.py --input two_variable_mosaic.tsv --facecolor_map facecolor_map.tsv --output mosaic_plot.png
35+
```
36+
![Mosaic plot](../images/)
37+
38+
*Note*
39+
The face color of mosaic plot should be specified as in the example [mapping file](../example_data/facecolor_map.tsv).
40+

example_data/facecolor_map.tsv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
MSM grey
2+
Non-MSM red

example_data/two_variable_mosaic.tsv

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
SGB Sexuality Gram_negative
2+
SGB1667 MSM No
3+
SGB15090 MSM Yes
4+
SGB1662 MSM No
5+
SGB1676 MSM No
6+
SGB5066 MSM Yes
7+
SGB1653 MSM No
8+
SGB1666 MSM No
9+
SGB13999 MSM Yes
10+
SGB15107 MSM Yes
11+
SGB9333 MSM No
12+
SGB1675 MSM No
13+
SGB1668 MSM No
14+
SGB1883_group MSM No
15+
SGB5904 MSM No
16+
SGB1928 MSM No
17+
SGB1437 MSM No
18+
SGB6817 MSM Yes
19+
SGB9274 MSM No
20+
SGB1657 MSM No
21+
SGB2230 MSM No
22+
SGB2240 MSM No
23+
SGB4367 MSM Yes
24+
SGB5967 MSM No
25+
SGB6973 MSM No
26+
SGB15260 MSM Yes
27+
SGB5858 MSM No
28+
SGB1699 MSM No
29+
SGB2238 MSM No
30+
SGB1626 MSM No
31+
SGB15225 MSM Yes
32+
SGB5239 MSM Yes
33+
SGB15244 MSM Yes
34+
SGB15106 MSM Yes
35+
SGB3677 MSM No
36+
SGB4886 MSM Yes
37+
SGB1680 MSM No
38+
SGB15322 MSM Yes
39+
SGB1474 MSM No
40+
SGB1475 MSM No
41+
SGB1472 MSM No
42+
SGB9243 MSM No
43+
SGB1672_group MSM No
44+
SGB1701 MSM No
45+
SGB2241 MSM No
46+
SGB15093 MSM Yes
47+
SGB1663 MSM No
48+
SGB1614 MSM No
49+
SGB9209 MSM No
50+
SGB4893 MSM Yes
51+
SGB1636_group MSM No
52+
SGB1644 MSM No
53+
SGB4910 MSM Yes
54+
SGB1673 MSM No
55+
SGB4909 MSM Yes
56+
SGB15370 MSM Yes
57+
SGB2239 MSM No
58+
SGB1677 MSM No
59+
SGB1617_group MSM No
60+
SGB1861 Non-MSM No
61+
SGB4820 Non-MSM Yes
62+
SGB2286 Non-MSM No
63+
SGB1790 Non-MSM No
64+
SGB1965 Non-MSM No
65+
SGB9340 Non-MSM No
66+
SGB17256 Non-MSM Yes
67+
SGB1798 Non-MSM No
68+
SGB15342 Non-MSM Yes
69+
SGB1613 Non-MSM No
70+
SGB5190 Non-MSM Yes
71+
SGB9262 Non-MSM No
72+
SGB15120 Non-MSM Yes
73+
SGB1877 Non-MSM No
74+
SGB9347 Non-MSM No
75+
SGB1815 Non-MSM No
76+
SGB17248 Non-MSM Yes
77+
SGB4584 Non-MSM Yes
78+
SGB15132 Non-MSM Yes
79+
SGB1862 Non-MSM No
80+
SGB4557 Non-MSM Yes
81+
SGB1836_group Non-MSM No
82+
SGB2290 Non-MSM No
83+
SGB8601 Non-MSM No
84+
SGB4269 Non-MSM Yes
85+
SGB17237 Non-MSM Yes
86+
SGB4608 Non-MSM Yes
87+
SGB14993_group Non-MSM Yes
88+
SGB2071 Non-MSM No
89+
SGB1934 Non-MSM No
90+
SGB4422 Non-MSM Yes
91+
SGB5792 Non-MSM No
92+
SGB4714 Non-MSM Yes
93+
SGB1814 Non-MSM No
94+
SGB2303 Non-MSM No
95+
SGB2318 Non-MSM No
96+
SGB2301 Non-MSM No
97+
SGB4705 Non-MSM Yes
98+
SGB4874 Non-MSM Yes

images/mosaic_plot.png

14.4 KB
Loading

scripts/mosaic_plot.py

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
#!/usr/bin/env python
2+
3+
"""
4+
NAME: mosaic_plot.py
5+
DESCRIPTION: mosaic_plot.py is a python script for visualizing proportions of data points along two variables.
6+
DATE: 29.11.2023
7+
AUTHOR: Kun D. Huang
8+
"""
9+
10+
11+
import pandas as pd
12+
from scipy.stats import fisher_exact
13+
import matplotlib.pyplot as plt
14+
from statsmodels.graphics.mosaicplot import mosaic
15+
import matplotlib
16+
import sys
17+
import argparse
18+
import textwrap
19+
20+
21+
22+
def make_mosaic_plot(two_variable_file, facecolor_dict, output_fig, font_style = "sans-serif,Arial"):
23+
font_family, font_type = font_style.split(",")
24+
matplotlib.rcParams['font.family'] = font_family
25+
matplotlib.rcParams['font.sans-serif'] = font_type
26+
two_variable_df = pd.read_csv(two_variable_file, sep = "\t", index_col = False)
27+
features, variable1, variable2 = two_variable_df.columns
28+
cont_df = pd.crosstab(two_variable_df[variable1], two_variable_df[variable2])
29+
res = fisher_exact(cont_df, alternative = "two-sided")
30+
label_dict = {}
31+
for idx in cont_df.index.to_list():
32+
for col in cont_df.columns.to_list():
33+
label_dict[(idx, col)] = cont_df.loc[idx, col]
34+
labelizer = lambda k:label_dict[k]
35+
36+
variable2_0, variable2_1 = sorted(set(two_variable_df[variable2].to_list()))
37+
props = {}
38+
for variable in facecolor_dict:
39+
props[(variable, variable2_0)] = {"facecolor": facecolor_dict[variable], "edgecolor": "white"}
40+
props[(variable, variable2_1)] = {"facecolor": facecolor_dict[variable], "edgecolor": "white"}
41+
mosaic(two_variable_df, [variable1, variable2], labelizer = labelizer, properties = props, title = " P-value: "+ str(res[1]) + " (Fisher's exact test)")
42+
plt.savefig(output_fig)
43+
44+
if __name__ == "__main__":
45+
def read_args(args):
46+
# This function is to parse arguments
47+
48+
parser = argparse.ArgumentParser(formatter_class=argparse.RawDescriptionHelpFormatter,
49+
description = textwrap.dedent('''\
50+
This program is to draw a mosaic plot.
51+
'''),
52+
epilog = textwrap.dedent('''\
53+
examples: mosaic_plot.py --input input_file.tsv --facecolor_map facecolor_mapfile.tsv --output mosaic_plot.png
54+
'''))
55+
parser.add_argument('--input',
56+
nargs = '?',
57+
help = 'Input a file containing two variable information regarding each individual subject.',
58+
type = str,
59+
default = None)
60+
61+
parser.add_argument('--facecolor_map',
62+
nargs = '?',
63+
help = 'Specify the the pathway to SCFA metabolisms database. default: /vol/projects/khuang/databases/SCFA/SCFA_pathways.tsv',
64+
default = '/vol/projects/khuang/databases/SCFA/SCFA_pathways.tsv')
65+
66+
parser.add_argument('--font_style',
67+
nargs = '?',
68+
help = 'Specify the font style, font family and font type is delimited by a comma. default: [sans-serif,Arial]',
69+
default = 'sans-serif,Arial')
70+
71+
parser.add_argument('--output',
72+
nargs = '?',
73+
help = 'Specify the output figure name.',
74+
type = str,
75+
default = None)
76+
77+
return vars(parser.parse_args())
78+
79+
pars = read_args(sys.argv)
80+
facecolor_dict = {i.rstrip().split("\t")[0]: i.rstrip().split("\t")[1] for i in open(pars['facecolor_map']).readlines()}
81+
make_mosaic_plot(pars["input"], facecolor_dict , pars["output"], font_style = pars["font_style"])
82+

0 commit comments

Comments
 (0)