PySEAT may have conflict with numpy version. We recommand: numpy = 1.22.4 and pyseat = 0.0.1.3
To reproduce the result, first please download and unzip data.zip to /data directory.
If you want to start the analysis from GCN, please run /script_procedure/step1_compute_distance first to compute distance matrix of GCN. This may take some time. To save time, you can directly use sp_d.tsv in /data directory which is preproduced by step1_compute_distance.
Prior structure is generated by scripts in /script_GCN_d3/GCN_tree, please run this script before FMT, NSCLC, Anti_analysis which depend on the prior sturcture.
metadata.tsv(metadata from GutMeta, disease should be in the header for phenotype comparison)
abd.tsv (header: species, index: sample name)
read by abd_profile.input_profile()
GCN.tsv (header: KO, index: species)
read by GCN.input_GCN()
Check the abundance difference for each taxon (including NAFLD 16s OTU).
-
completeness
Compute the module completeness of each taxon (including NAFLD 16s OTU). -
test_diff
Test completeness enrichment based one the GCN prior GCN structure and should be used after GCN_tree script in script_GCN_d3
-
GCN_tree
Make prior GCN tree structure. -
SE_diff / NFR_diff
Check SE/nFR difference of disease and health group. -
distribution_se
Plot SE distribution for disease and health group.
Scripts related to two FMT dataset analysis.
-
analysis_se
Mutiple regression on SE value, days after FMT and fraction at each cluster/super-cluster. -
analysis_nfr
Mutiple regression on nFR, days after FMT and fraction at each cluster/super-cluster.
-
analysis_se/analysis_nfr
Check SE/nFR difference of control and exposed group at each clsuter/super-cluter. -
analysis_se_exposed/analysis_nfr_exposed
Check SE/nFR difference of six participants exhibited a bloom of the opportunistic pathogen Enterobacter cloacae complex at the E7 timepoint in exposed group and control group at each clsuter/super-cluter. -
merge
Merge and plot the difference test result of nFR and SE in control and exposed group. -
merge_exposed
Merge and plot the difference test result of nFR and SE of the six samples and control group. -
boxplot Draw boxplot for SE at each cluster/super-cluster.
-
step0_NAFLD
An example of comparing keyston clusters of taxa on NAFLD dataset. -
step1_compute_distance
An example of computing taxa distance and KO distance from GCN. -
step2_cluster_analysis
An example of analyzing keystone cluster and keystone taxon for metagenomics abundance profiles in cMD by constructing posterior structure. -
step3_count_support
An example of checking valid keystone-taxon enterotype with more than one network of size larger than 10 supporting. -
utils
a. log_effect
An example of computing lCFR and showing distribution of lCFR values and CFR values without log and normalization.b. nestedness_experiment
An example to test the nestedness compared with NULL experiments of lCFR.c. evaluation
An example to evaluate the feature of GCN. -
draw_*
Scripts used to plot the result of previous step.
-
SE
Test difference of SE between response group and non-response group at each cluster/super-cluster and compute FR S score for each sample. -
sig_SE
Test difference of SE between response group and non-response group at SIG1/SIG2 clsuter raised in original study and compute S score for each sample. -
distribution Plot SE distribution for response group and non-response group.
-
combination
Compute combined S score for each sample. -
The r script Used to produce the analysis in original study and is provided by https://github.com/valerioiebba/TOPOSCORE/tree/main.
GCN_tree result is required
abundance difference result is required
- run.ipynb
Plot keystone result.
- run.ipynb
Find eigen species and plot the result.
GCN_tree result and SE values are required
-
CRC_recurrent_ROC.ipynb
Predict CRC. -
IBD_ROC.ipynb
Predict IBD.