This repository contains data and script for reproducing results in the follow manuscript:
The Effect of Genome Graph Expressiveness onthe Discrepancy Between Genome Graph Distanceand String Set Distance
plot_GTED_EMED.ipynb
contains necessary scripts to analyze results presented in the manuscript
scripts/
contains necessary scripts to
- Sample flow decomposition from input graphs
- Compute FGTED between graphs
- Construct MSA/dBG graphs with conforming formats to FGTED solver
- See
pipeline_tcr.sh
,pipeline_hbv.sh
for scripts to produce pair-wise FGTED. - See
pipeline_diameter.sh
for scripts to produce sampled diameters.
- Data: MSA graph and dBG4 constructed on 50 string sets are contained in
TCR_graphs
- The instruction to simulate TCR repertoires is included in
data/
TCR_all_EMED.csv
-- contains all pair-wise EMEDTCR_sampled_diameters_msa.csv
-- contains all sampled diameters on MSA graphsTCR_sampled_diameters_dbg4.csv
-- contains all sampled diameters on dBG4 graphsTCR_fgted_logs
-- contains all output from gurobi for solving FGTEDTCR_summary.csv
-- contains summary stats for all pairs of MSA graphsTCR_FGTED.csv
-- contains FGTED between pairs of dBG4 graphs
- The instruction to simulate TCR repertoires is included in
- Data: MSA graphs constructed on 9 string sets are contained in
HBV_graphs
HBV_all_EMED.csv
-- contains all pair-wise EMEDHBV_sampled_diameters.txt
-- contains all sampled diametersHBV_fgted_logs
-- contains all output from gurobi for solving FGTEDHBV_summary.csv
-- contains all summary stats for all pairs
- Scripts: see
pipeline_hbv.sh
for scripts to produce pair-wise FGTED- see
GTED_MSA_HBV_solver.py
for details
- see