A DeePaC plugin for real-time analysis of Illumina sequencing runs. Captures HiLive2 output and uses deep neural nets to detect novel pathogens directly from NGS reads.
We recommend having a look at:
-
DeePaC main repo: https://gitlab.com/dacs-hpi/deepac
- tutorial
- original bacterial and viral datasets
- code and documentation
-
HiLive2 repo: https://gitlab.com/rki_bioinformatics/HiLive2.
- extensive tutorial
- code and documentation
DeePaC-Live ships new, updated models for bacterial pathogenic potential and viral infectious potential prediction.
The Illumina models are trained on 25-250bp subreads to ensure high performance over the whole sequencing run.
The Nanopore models are trained on 250bp subreads corresponding to just around 0.5s of sequencing.
To fetch the models, install DeePaC or DeePaC-Live and use deepac getmodels --fetch
. In the created directory, you will find the following models ready for inference:
- illu-bac-res18.h5 : an Illumina bacterial model
- illu-vir-res18.h5 : an Illumina viral model
- nano-bac-res18.h5 : a Nanopore bacterial model
- illu-vir-res18.h5 : a Nanopore viral model
We recommend using Bioconda (based on the conda
package manager) or custom Docker images based on official Tensorflow images.
Alternatively, a pip
installation is possible as well.
You can install DeePaC-Live with bioconda
. Set up the bioconda channel first (channel ordering is important):
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
We recommend setting up an isolated conda
environment:
# python 3.6, 3.7 and 3.8 are supported
conda create -n my_env python=3.8
conda activate my_env
and then:
# For GPU support (recommended)
conda install tensorflow-gpu deepaclive
# Basic installation (CPU-only)
conda install deepaclive
Highly recommended: download and compile the latest deepac-live custom models:
deepac getmodels --fetch
If you want to install the DeePaC plugins as well (not necessary), use:
#Note: those models were not designed for reads shorter than 250bp. Performance may be unstable.
conda install deepacvir deepacstrain
Requirements:
- install Docker on your host machine.
- For GPU support, you have to install the NVIDIA Docker support as well.
See TF Docker installation guide and the NVIDIA Docker support installation guide for details. The guide below assumes you have Docker 19.03 or above.
You can then pull the desired image:
# Basic installation - CPU only
docker pull dacshpi/deepaclive:0.3.2
# For GPU support
docker pull dacshpi/deepaclive:0.3.2-gpu
And run it:
# Basic installation - CPU only
docker run -v $(pwd):/deepac -u $(id -u):$(id -g) --rm dacshpi/deepaclive:0.3.2-gpu deepac-live --help
docker run -v $(pwd):/deepac -u $(id -u):$(id -g) --rm dacshpi/deepaclive:0.3.2-gpu deepac-live test
# With GPU support
docker run -v $(pwd):/deepac -u $(id -u):$(id -g) --rm --gpus all dacshpi/deepaclive:0.3.2-gpu deepac-live test
# If you want to use the shell inside the container
docker run -it -v $(pwd):/deepac -u $(id -u):$(id -g) --rm --gpus all dacshpi/deepaclive:0.3.2-gpu bash
The image ships deepaclive
and the main deepac
package along the deepac-vir
and deepac-strain
plugins. See the basic usage guide below for more deepaclive commands.
Optional: download and compile the latest deepac-live custom models:
docker run -v $(pwd):/deepac -u $(id -u):$(id -g) --rm --gpus all dacshpi/deepaclive:0.3.2-gpu deepac --fetch
For more information about the usage of the NVIDIA container toolkit (e.g. selecting the GPUs to use), consult the User Guide.
The dacshpi/deepaclive:latest
corresponds to the latest version of the CPU build. We recommend using explicit version tags instead.
We recommend setting up an isolated conda
environment (see above). Alternatively, you can use a virtualenv
virtual environment (note that deepac requires python 3):
# use -p to use the desired python interpreter (python 3.6 or higher required)
virtualenv -p /usr/bin/python3 my_env
source my_env/bin/activate
You can then install DeePaC with pip
. For GPU support, you need to install CUDA and CuDNN manually first (see TensorFlow installation guide for details).
Then you can do the same as above:
pip install deepaclive
Optional: download and compile the latest deepac-live custom models:
deepac getmodels --fetch
If you want to install the DeePaC plugins as well (not necessary), use:
#Note: those models were not designed for reads shorter than 250bp. Performance may be unstable.
pip install deepacvir deepacstrain
# Run locally: deepac-live Illumina models
deepac-live local -C -m illu-bac-res18.h5 -s 25,50,75,100,133,158,183,208 -l 100 -i hilive-out -o temp -I temp -O output -B ACAG-TCGA,undetermined
deepac-live local -C -m illu-vir-res18.h5 -s 25,50,75,100,133,158,183,208 -l 100 -i hilive-out -o temp -I temp -O output -B ACAG-TCGA,undetermined
# Run locally: custom model
deepac-live local -C -m custom_model.h5 -s 25,50,75,100,133,158,183,208 -l 100 -i hilive-out -o temp -I temp -O output -B ACAG-TCGA,undetermined
# Run locally: built-in model for bacteria (not recommended)
deepac-live local -c deepac -m rapid -s 25,50,75,100,133,158,183,208 -l 100 -i hilive-out -o temp -I temp -O output -B ACAG-TCGA,undetermined
# Run locally: built-in model for viruses (not recommended)
deepac-live local -c deepacvir -m rapid -s 25,50,75,100,133,158,183,208 -l 100 -i hilive-out -o temp -I temp -O output -B ACAG-TCGA,undetermined
# Setup sender on the source machine
deepac-live sender -s 25,50,75,100,133,158,183,208 -l 100 -A -i hilive-out -o temp -r [email protected]:~/rem-temp -k privatekey -B ACAG-TCGA,undetermined
# Setup receiver on the target machine
deepac-live receiver -C -m illu-vir-res18.h5 -m rapid -s 25,50,75,100,133,158,183,208 -l 100 -I rem-temp -O output -B ACAG-TCGA,undetermined
# Setup an ensemble on the target machine
deepac-live refilter -s 25,50,75,100,133,158,183,208 -l 100 -i rem-temp -I output_1,output_2 -O final_output -B ACAG-TCGA,undetermined
# Use another threshold
deepac-live refilter -s 25,50,75,100,133,158,183,208 -l 100 -i rem-temp -I output_1 -O final_output -t 0.75 -B ACAG-TCGA,undetermined
Datasets are available here: .
You can find the scripts and data files used in the paper for dataset preprocessing and benchmarking here.