-
Notifications
You must be signed in to change notification settings - Fork 2
Annotation Column Versions
We need to change how annotation works, to be able to upgrade it on different systems.
This is driven by:
settings.ANNOTATION["GRCh37"][""columns_version"] = 3
This affects how the VEP command line is built via ColumnVEPField.min_vep_columns_version
and ColumnVEPField.max_vep_columns_version
(and a few if statements)
This is for legacy versions before columns_version 2, ie what VG3 is using
May 2022, for new pathogenicity prediction tools.
December 2023
Install VEP version 110
In plugins dir,
# Replace with bugfixed version
rm MaveDB.pm
wget https://raw.githubusercontent.com/Ensembl/VEP_plugins/main/MaveDB.pm
This added new fields: alphamissense_rankscore, gnomad_faf95, gnomad_faf99, gnomad_fafmax_faf95_max, gnomad_fafmax_faf99_max, gnomad_mid_af, gnomad_non_par, gnomad_xy_ac, gnomad_xy_af, gnomad_xy_an, gnomad_hemi_count, mavedb_score, mavedb_urn
To make sure the files are there:
python3 manage.py vep_data_check
To check contents of files, go to the directory where GRCh37/GRCh38 annotation data is (something like cd /data/annotation/VEP/annotation_data
)
Then run:
md5sum -c ${VARIANTGRID_DIR}/annotation/annotation_data/md5sum_check/vep_110_columns_version3_md5sum.txt
Upgrade steps:
- Pull / migrate
./annotation/annotation_data/cdot_update.sh
cd /tmp
wget https://github.com/SACGF/cdot/releases/download/data_v0.2.22/cdot-0.2.22.GCF_000001405.40_GRCh38.p14_genomic.110.gff.json.gz
python3 manage.py import_gene_annotation --annotation-consortium=RefSeq --genome-build=GRCh38 --release refseq_110_grch38 --json-file /tmp/cdot-0.2.22.GCF_000001405.40_GRCh38.p14_genomic.110.gff.json.gz
python3 manage.py import_gene_annotation --annotation-consortium=Ensembl --genome-build=GRCh38 --release ensembl_110_grch38 --json-file /tmp/cdot-0.2.22.ensembl.Homo_sapiens.GRCh38.110.gff3.json.gz
New ontology
python3 manage.py ontology_import --hgnc_sync --mondo ${OD}/mondo.json --hpo ${OD}/hp.owl --biomart ${OD}/mart_export.txt --phenotype_to_genes ${OD}/phenotype_to_genes.txt --gencc ${OD}/gencc-submissions.csv
Gene Annotation
python3 manage.py import_dbnsfp_gene_annotation --dbnsfp-version=4.5 /data/annotation/incoming/dbNSFP4.5_gene.complete.gz
In Admin, create a new gene annotation version using latest stuff for 37/38 (remember to match gene annotation release)
Then in admin go and double check the latest annotation version - it should have been auto linked with gene annotation version
—--------- In settings files:
Change columns_version = 3 Edit the
ANNOTATION[BUILD_GRCH37]["vep_config"].update({
from annotation.tasks.annotation_scheduler_task import annotation_scheduler
annotation_scheduler(active=False) # This runs it in the background while can still use the system as is