Skip to content

Annotation Column Versions

Dave Lawrence edited this page Feb 20, 2024 · 15 revisions

Background

We need to change how annotation works, to be able to upgrade it on different systems.

This is driven by:

settings.ANNOTATION["GRCh37"][""columns_version"] = 3

This affects how the VEP command line is built via ColumnVEPField.min_vep_columns_version and ColumnVEPField.max_vep_columns_version(and a few if statements)

Columns Version 1

This is for legacy versions before columns_version 2, ie what VG3 is using

Columns Version 2

May 2022, for new pathogenicity prediction tools.

Columns Version 3

December 2023

Install VEP version 110

In plugins dir,

# Replace with bugfixed version
rm MaveDB.pm
wget https://raw.githubusercontent.com/Ensembl/VEP_plugins/main/MaveDB.pm

This added new fields: alphamissense_rankscore, gnomad_faf95, gnomad_faf99, gnomad_fafmax_faf95_max, gnomad_fafmax_faf99_max, gnomad_mid_af, gnomad_non_par, gnomad_xy_ac, gnomad_xy_af, gnomad_xy_an, gnomad_hemi_count, mavedb_score, mavedb_urn

Data

To make sure the files are there:

python3 manage.py vep_data_check

To check contents of files, go to the directory where GRCh37/GRCh38 annotation data is (something like cd /data/annotation/VEP/annotation_data)

Then run:

md5sum -c ${VARIANTGRID_DIR}/annotation/annotation_data/md5sum_check/vep_110_columns_version3_md5sum.txt

Upgrade steps:

  • Pull / migrate ./annotation/annotation_data/cdot_update.sh

Gene Annotation Release to match VEP 110

RefSeq

cd /tmp
wget https://github.com/SACGF/cdot/releases/download/data_v0.2.22/cdot-0.2.22.GCF_000001405.40_GRCh38.p14_genomic.110.gff.json.gz
python3 manage.py import_gene_annotation --annotation-consortium=RefSeq --genome-build=GRCh38 --release refseq_110_grch38 --json-file /tmp/cdot-0.2.22.GCF_000001405.40_GRCh38.p14_genomic.110.gff.json.gz

Ensembl


python3 manage.py import_gene_annotation --annotation-consortium=Ensembl --genome-build=GRCh38 --release ensembl_110_grch38 --json-file /tmp/cdot-0.2.22.ensembl.Homo_sapiens.GRCh38.110.gff3.json.gz

New ontology

python3 manage.py ontology_import --hgnc_sync --mondo ${OD}/mondo.json --hpo ${OD}/hp.owl --biomart ${OD}/mart_export.txt --phenotype_to_genes ${OD}/phenotype_to_genes.txt --gencc ${OD}/gencc-submissions.csv

Gene Annotation

python3 manage.py import_dbnsfp_gene_annotation --dbnsfp-version=4.5 /data/annotation/incoming/dbNSFP4.5_gene.complete.gz

In Admin, create a new gene annotation version using latest stuff for 37/38 (remember to match gene annotation release)

Then in admin go and double check the latest annotation version - it should have been auto linked with gene annotation version

—--------- In settings files:

Change columns_version = 3 Edit the

ANNOTATION[BUILD_GRCH37]["vep_config"].update({
from annotation.tasks.annotation_scheduler_task import annotation_scheduler
annotation_scheduler(active=False)  # This runs it in the background while can still use the system as is
Clone this wiki locally