SplitMEM can be used on Windows and Linux platforms.
cd src && make THREADS=16
Usage:
./SplitMEM
Usage: ./SplitMEM [options]
Available options:
--inputfile/-i FILE input file name (required)
--outputfile/-o FILE output file name (required)
--kmer/-k k K-mer size (required)
--method/-m use method to construct SA {SAIS, DIV, PARDIV} (default: DIV)
--threads/-t t use t threads to construct SA (only effected in PARDIV)
--nocache/-n do not use cache in program
--help/-h print help message
--version/-v print program version
Sample usage: ./SplitMEM -i 1.fasta -k 10 -o 1.ans
Contact:
- wym6912 [email protected]
Graphical pan-genome analysis with compressed suffix trees and the Burrows-Wheeler transform Uwe Baier Timo Beller Enno Ohlebusch
Algorithm: These are the algorithms proposed in Graphical pan-genome analysis with compressed suffix trees and the Burrows-Wheeler transform Submitted to BIOINF-2015-1242 Please cite the paper above, if you use one of theses algorithms.
Requirements: - a modern c++11 ready compiler such as gcc version 4.7 - the Succinct Data Structure Library (sdsl-lite) (version d533b2600950b4f878cb063ca0cd1bf340c53df4, maybe newer)
Install:
- Install the SDSL by the commands:
git clone [email protected]:simongog/sdsl-lite.git sdsl-lite
cd sdsl-lite
./install.sh [PATH]
- Save the path of the sdsl into the variable SDSLLITE e.g. by:
export SDSLLITE=[PATH to SDSL]
- Build the executables by the command:
make
cst_based
:
Has theoretical running time
Example: ./cst_based.x 10 input.fa output
with
10 = k
input.fa = the input file in FASTA
output = the output filename
Will produce the file output file containing the de Bruijn graph.
bwt_based
:
Has theoretical running time cst_based
.
The space requirement is roughly 1.5n Byte + size of the compressed de Bruijn graph.
Example: ./bwt_based.x input.fa example kfile.txt
with
input.fa = the input file in FASTA
example = the output filename
kfile.txt= a file containing the values of k
Will produce two files for each kfile.txt
:
example.k*k*.dot
example.k*k*.start_nodes.txt
Where the *.dot files contain the de Bruijn graph and the *.start_nodes.txt contains the list of the start nodes.
Notes:
The programs should compile with commit d533b2600950b4f878cb063ca0cd1bf340c53df4
,
of the SDSL, but may also work with newer versions of the SDSL.
Limitations: The input file must be in FASTA format, especially the input may not contain the 0-byte or 1-byte, newlines and characters between '>' and the next newline will be removed. k must be smaller than the shortest sequence.