automatic_kk2_classifier

Dockerfile with kraken2 and a script for automatization

Installation

Clone repositorie

git clone git@github.com:FelipeMelis/automatic_kk2_classifier.git

Download test from Google drive to root directory

https://drive.google.com/drive/folders/1zb2SZ8-lsBAuc-xGdSo-4M2066W3Zy2D

Build docker image

docker build -t bacterial_classifier .

Run docker image and mount current dir

docker run -it -v $PWD:/DATA --rm bacterial_classifier

Updating databases

Go to kraken2 index webpage

Index webpage https://benlangmead.github.io/aws-indexes/k2
Download selected database to kraken2 root folder
Extract the content to a folder with a name related to the DB
Modify the config.py and add the name of the DB and path

Example

Download Viral database to /root/kraken2/ and extract to k2_viral

2.- Modify config.py (provided in this repo) and add "viral": "/root/kraken2/k2_viral" to the python dict

DATABASE_TYPE = {
    "refseq": "/root/kraken2/minikraken2_v1_8GB",
    "silva": "/root/kraken2/16S_SILVA138_k2db",
    "greengenes": "/root/kraken2/16S_Greengenes_k2db",
    "rdp": "/root/kraken2/16S_RDP_k2db",
    "viral": "/root/kraken2/k2_viral" # new database
}

3.- The database will appear at option -d and will be ready to use with that name

usage: run_classifier.py [-h] -l PATH_TO_LIBRARY -t LIBRARY_TYPE -c COMPRESSION_TYPE -d DATABASE_TYPE -o OUTPUT_FOLDER -th TAX_HIERARCHY -cpus CPU_NUMBER

Script that runs kraken2 classifier

optional arguments:
  -h, --help            show this help message and exit
  -l PATH_TO_LIBRARY, --path_to_library PATH_TO_LIBRARY
                        path to the file that contains library reads
  -t LIBRARY_TYPE, --library_type LIBRARY_TYPE
                        library type: ['paired', 'single-end', 'fasta']
  -c COMPRESSION_TYPE, --compression_type COMPRESSION_TYPE
                        compression type: ['gzip', 'bzip2', 'none']
  -d DATABASE_TYPE, --database_type DATABASE_TYPE
                        DB type: ['refseq', 'silva', 'greengenes', 'rdp', 'viral']
  -o OUTPUT_FOLDER, --output_folder OUTPUT_FOLDER
                        path to output folder
  -th TAX_HIERARCHY, --tax_hierarchy TAX_HIERARCHY
                        tax_hierarchy: P, C, O, F, G, S, S1
  -cpus CPU_NUMBER, --cpu_number CPU_NUMBER
                        Number of cpus to be used

Dockerfile example

Add the following lines for each new database

WORKDIR /root/kraken2
RUN wget <link_to_kraken2_index_tar.gz>
RUN mkdir <kraken2_index_folder_name> 
RUN tar -xzvf <kraken2_index_tar.gz> -C <kraken2_index_folder_name>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

automatic_kk2_classifier

Installation

Clone repositorie

Download test from Google drive to root directory

Build docker image

Run docker image and mount current dir

Updating databases

Go to kraken2 index webpage

Example

Dockerfile example

Files

README.md

Latest commit

History

README.md

File metadata and controls

automatic_kk2_classifier

Installation

Clone repositorie

Download test from Google drive to root directory

Build docker image

Run docker image and mount current dir

Updating databases

Go to kraken2 index webpage

Example

Dockerfile example