Katie Jenike, Sam Kovaka, Shujun Ou, Stephen Hwang, Srividya Ramakrishnan, Ben Langmead, Zach Lippman, Michael Schatz
An alignment-free pan-genome viewer
Please note: installation instructions and pre-processing scripts are a work in progress.
git clone --recursive https://github.com/kjenike/panagram.git
cd panagram
pip install .
The --recursive
option is required to install the KMC dependency. If you forget to include it, you can update the repository with the command git submodule update --init
.
Installation may fail if pip is not up-to-date or if setuptools is not up-to-date. In order to update pip and setuptools run:
pip install --upgrade pip
pip install --upgrade setuptools
Requires python version >=3.7, pip, samtools, and tabix. All other dependencies should be automatically installed via pip.
Panagram relies on KMC to build its kmer index. This should be installed automatically, however it is possible that the KMC installation will fail but panagram will successfully install. In this case panagram view
can be run, but panagram index
will return an error. You may be able to debug the KMC installation by running make -C KMC py_kmc_api
and attempting to fix any errors, then re-run pip install -v .
after the errors are fixed.
Panagram runs in two steps, the pre-processing step (index command) and the viewing (view command).
Usage: Anchor KMC bitvectors to reference FASTA files to create pan-kmer bitmap
usage: panagram index [-h] <config.toml>
See example config.toml file for more details on the layout. Must include paths to all of the fasta files and optionally any annotations in gff format.
Panagram may fail to index datasets with more than 32 genomes. This is not a fundamental limitation, and we are working on fixing it.
Currently genome IDs should only contain alphanumeric characters and underscores due to KMC requirements.
Usage: Display panagram viewer
usage: panagram view [-h] <index_dir/> [genome] [chrom] [start] [end]
index_dir Panagram index directory
genome Initial anchor genome (optional)
chrom Initial chromosome (optional)
start Initial start coordinate (optional)
end Initial end coordinate (optional)
--ndebug Run server in production mode (important for a public-
facing server)
--port str Server port (default: 8050)
--host str Server address (default: 127.0.0.1)
--url_base str A local URL prefix to use app-wide (passed to
Dash.dash(url_base_pathname=...)) (default: /)
--bookmarks str Bed file with bookmarked regions (default: None)
Runs a local Dash server. Browser can be viewed at http://127.0.0.1:8050/ by default.
Usage:
usage: panagram bitdump [-h] [-v bool] index_dir coords step
Query pan-kmer bitmap generated by "panagram index"/
index_dir Panagram index directory
coords Coordinates to query in chr:start-end format
step Spacing between output kmers (optimized for multiples
of 100) (default: 1)
-v bool, --verbose bool
Output the full bitmap (default: False)
First download the example_data.zip bacterial data from: http://data.schatz-lab.org/panagram/
Unzip the archive and you will find 5 bacterial genomes plus their annotations
unzip example_data.zip
To run, first index the genomes:
cd example_data
panagram index conf.toml
It is super important that any gff files are in the correct format. GFF format is supported. We strongly suggest that if you run into any problems you first check the format annotation format. This can be done with command line tools like gff3validator or online here: https://genometools.org/cgi-bin/gff3validator.cgi
Then you can panagram to visualize (from the example_data directory):
panagram view .
From there, you can view the results in your webbrowser at http://127.0.0.1:8050/
Panagram uses Dash to serve the plotly visualizations. By default the dedicated webserver runs on localhost (127.0.0.1) on port 8050, but you can reverse proxy to a different port and path using a web engine such as nginx
For nginx, first reconfigure your nginx configuration file to add (note to be very careful with the use of the slash ('/') character):
location /panagram {
proxy_pass http://127.0.0.1:8050;
}
The retart nginx with
systemctl stop nginx
systemctl start nginx
For a secure public-facing server, be sure to run with the option panagram view --ndebug
to disable debug mode.
You may also wish to change the base URL path with the --url_base
option, for example to something like --url_base /panagram/
. The port and host name can be specified by the --port
and --host
options.
Finally you will need to run panagram using panagram view <dir>
. You will probably want to run this in a loop
in case it needs to be restarted, such as:
until panagram view --ndebug .; do echo "restarting"; sleep 1; done
We will optimize this process in future releases.
k = 12
prefix = "."
processes = 5
lowres_step = 100
chr_bin_kbp = 200
gff_anno_types = ["exon", "CDS"]
[kmc]
memory = 10
processes = 5
threads = 4
[fasta]
ecoli = "FASTAS/ecoli_GCF_001612495.1_ASM161249v1_genomic.fna"
ecoli_k12 = "FASTAS/ecoli_k12_GCF_000005845.2_ASM584v2_genomic.fna"
klebsiella = "FASTAS/klebsiella_GCF_000240185.1_ASM24018v2_genomic.fna"
salmonella = "FASTAS/salmonella_GCF_016117835.1_ASM1611783v1_genomic.fna"
shigella = "FASTAS/shigella_GCF_000006925.2_ASM692v2_genomic.fna"
[gff]
ecoli = "gffs/ecoli_GCF_001612495.1_ASM161249v1_genomic.gff"
ecoli_k12 = "gffs/ecoli_k12_GCF_000005845.2_ASM584v2_genomic.gff"
klebsiella = "gffs/klebsiella_GCF_000240185.1_ASM24018v2_genomic.gff"
salmonella = "gffs/salmonella_GCF_016117835.1_ASM1611783v1_genomic.gff"
shigella = "gffs/shigella_GCF_000006925.2_ASM692v2_genomic.gff"
The dev branch, while actively being developed, currently utilizes Snakemake. This is straightforward to use, you just need a tsv file with a list of samples and corresponding fasta files.
Example tsv file:
name fasta gff id anchor
ecoli FASTAS/ecoli_GCF_001612495.1_ASM161249v1_genomic.fna ANNO/ecoli_GCF_001612495.1_ASM161249v1_genomic.gff 0 True
ecoli_k12 FASTAS/ecoli_k12_GCF_000005845.2_ASM584v2_genomic.fna ANNO/ecoli_k12_GCF_000005845.2_ASM584v2_genomic.gff 1 True
klebsiella FASTAS/klebsiella_GCF_000240185.1_ASM24018v2_genomic.fna ANNO/klebsiella_GCF_000240185.1_ASM24018v2_genomic.gff True
salmonella FASTAS/salmonella_GCF_016117835.1_ASM1611783v1_genomic.fna ANNO/salmonella_GCF_016117835.1_ASM1611783v1_genomic.gfTrue
shigella FASTAS/shigella_GCF_000006925.2_ASM692v2_genomic.fna ANNO/shigella_GCF_000006925.2_ASM692v2_genomic.gff 4 True
- Right now, there is a bug (issue #7) when indexing very large genomes with very large chromosomes. We are activley working to fix this.
- Indexing sometimes fails when working with more than 32 genomes
- Mash dendogram leaf placement is not always perfect
- Installing on a mac can be tricky. Will need to include a more detailed list of dependancies
- Add a row for gene coverage (rather than just gene density) for the third tab.
- Update the step size in the control panel.
- Add the actual sequence.