This repository contains customized Dockerfile for the Ensembl Variant Effect Predictor (VEP) tool. The image is based on the official Ensembl VEP image and includes the following modifications that allows to run the VEP with the loftee plugin:
- Addition of the sqllite3 and its perl wrapper along with the required dependencies
- Addition of the samtools
- Addition of the loftee.sql database
See loftee source for more details.
To build the image one need to run the following command:
make build-local
The lifecycle of the image is bound to the lifecycle of the official Ensembl VEP image. With the github actions we look over the latest version of the Ensembl VEP image and if there is a new version, it will automatically trigger the PR to this repository. This behavior is designed in the pr.yaml
.
When the PR is merged, the new image build will be triggered in the artifact.yaml
.
Note
The default image is hosted on opentargets google cloud.
As each Ensembl release has its own VEP release, the underlying cache data needs to be updated as well.
#!/usr/bin/env bash
CACHE_DIR='path to local folder'
ENSEMBL_RELEASE='114'
mkdir -p ${CACHE_DIR}
cd ${CACHE_DIR}
# Clone VEP plugins, check out release:
git clone https://github.com/Ensembl/VEP_plugins
cd VEP_plugins
git checkout ${ENSEMBL_RELEASE}
cd ${CACHE_DIR}
# Download cache:
wget "https://ftp.ensembl.org/pub/release-${ENSEMBL_RELEASE}/variation/indexed_vep_cache/homo_sapiens_vep_${ENSEMBL_RELEASE}_GRCh38.tar.gz" -P ${CACHE_DIR}/
# Extract tar:
tar xzf homo_sapiens_vep_${ENSEMBL_RELEASE}_GRCh38.tar.gz
# Some of the plugins/offline options require the access to fasta file:
wget https://ftp.ensembl.org/pub/release-${ENSEMBL_RELEASE}/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz -P ${CACHE_DIR}/
gzip -d Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
bgzip Homo_sapiens.GRCh38.dna.primary_assembly.fa
# For GERP conservation scores the relevant bw file is needed:
wget https://ftp.ensembl.org/pub/release-${ENSEMBL_RELEASE}/compara/conservation_scores/91_mammals.gerp_conservation_score/gerp_conservation_scores.homo_sapiens.GRCh38.bw -P ${CACHE_DIR}/
# Move datasets to GCP:
gsutil -m cp -r ${CACHE_DIR}/VEP_plugins ${CACHE_TARGET_GCP}/
gsutil -m cp -r ${CACHE_DIR}/homo_sapiens ${CACHE_TARGET_GCP}/
gsutil -m cp ${CACHE_DIR}/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ${CACHE_TARGET_GCP}/
gsutil -m cp ${CACHE_DIR}/gerp_conservation_scores.homo_sapiens.GRCh38.bw ${CACHE_TARGET_GCP}/