Dockerfile with kraken2 and a script for automatization
git clone [email protected]:FelipeMelis/automatic_kk2_classifier.git
https://drive.google.com/drive/folders/1zb2SZ8-lsBAuc-xGdSo-4M2066W3Zy2D
docker build -t bacterial_classifier .
docker run -it -v $PWD:/DATA --rm bacterial_classifier
- Index webpage https://benlangmead.github.io/aws-indexes/k2
- Download selected database to kraken2 root folder
- Extract the content to a folder with a name related to the DB
- Modify the
config.py
and add the name of the DB and path
- Download Viral database to
/root/kraken2/
and extract tok2_viral
2.- Modify config.py
(provided in this repo) and add "viral": "/root/kraken2/k2_viral"
to the python dict
DATABASE_TYPE = {
"refseq": "/root/kraken2/minikraken2_v1_8GB",
"silva": "/root/kraken2/16S_SILVA138_k2db",
"greengenes": "/root/kraken2/16S_Greengenes_k2db",
"rdp": "/root/kraken2/16S_RDP_k2db",
"viral": "/root/kraken2/k2_viral" # new database
}
3.- The database will appear at option -d
and will be ready to use with that name
usage: run_classifier.py [-h] -l PATH_TO_LIBRARY -t LIBRARY_TYPE -c COMPRESSION_TYPE -d DATABASE_TYPE -o OUTPUT_FOLDER -th TAX_HIERARCHY -cpus CPU_NUMBER
Script that runs kraken2 classifier
optional arguments:
-h, --help show this help message and exit
-l PATH_TO_LIBRARY, --path_to_library PATH_TO_LIBRARY
path to the file that contains library reads
-t LIBRARY_TYPE, --library_type LIBRARY_TYPE
library type: ['paired', 'single-end', 'fasta']
-c COMPRESSION_TYPE, --compression_type COMPRESSION_TYPE
compression type: ['gzip', 'bzip2', 'none']
-d DATABASE_TYPE, --database_type DATABASE_TYPE
DB type: ['refseq', 'silva', 'greengenes', 'rdp', 'viral']
-o OUTPUT_FOLDER, --output_folder OUTPUT_FOLDER
path to output folder
-th TAX_HIERARCHY, --tax_hierarchy TAX_HIERARCHY
tax_hierarchy: P, C, O, F, G, S, S1
-cpus CPU_NUMBER, --cpu_number CPU_NUMBER
Number of cpus to be used
Add the following lines for each new database
WORKDIR /root/kraken2
RUN wget <link_to_kraken2_index_tar.gz>
RUN mkdir <kraken2_index_folder_name>
RUN tar -xzvf <kraken2_index_tar.gz> -C <kraken2_index_folder_name>