Skip to content

Releases: bioinfo-chru-strasbourg/howard

v0.12.1.1

31 Dec 12:10
c08a91e
Compare
Choose a tag to compare

Few updates and fixes.

Updates

  • Improve transcripts table creation
  • Improve Docker image:
    • Using Micromamba instead of mamba
    • Reduce size by combining layers and cleaning caches

Fixes

  • Fix tag of howard tool and docker images

v0.12.0

16 Dec 15:36
064468c
Compare
Choose a tag to compare

This release introduce 'BigWig' annotation, prioritization options and transcripts view, improve samples management, INFO/tags rename, annotation databases generation and operations, configuration files in YAML format, and python packages stability.

News

  • Add annotation method 'BigWig'
  • Add prioritization options:
    • SQL syntax available to define filters
    • New 'Class' prioritization field
  • New transcripts view:
    • Create a transcript view, using a structure from multiple source type (e.g. snpEff, external annotation databases)
    • Mapping between multiple transcript ID source (e.g. refSeq, Ensembl)
    • Transcripts prioritization, using same prioritization process than variants
    • Export transcripts table as a file, in multiple format such as VCF, TSV, Parquet
  • Export with a specific sample list
  • Rename or remove INFO/tags before exporting
  • Configuration and parameters files in YAML format allowed
  • Add dynamic transcript column for NOMEN calculation (using transcript prioritization column)
  • Add plugins:
    • 'update_databases'

Updates

  • Improve snpEff annotations operations
  • New option 'uniquify' for dbSNFP generation, identification of columns type
  • Management, check and export of samples columns
  • Improve query type mode
  • Improve splice annotation
  • Improve NOMEN generation

Fixes

  • Genotype format detection
  • Fix packages releases
  • Fix parameters and configuration files options
  • Fix calculations list and parametrization
  • Fix empty file export
  • Fix BED annotation with parquet method
  • More explicite log messages

v0.11.0

12 Jul 10:27
d35f922
Compare
Choose a tag to compare

This release introduce splice annotation tool, and update duckDB python package for improve stability.

News

  • Add splice tool with docker image
  • Add snpSift tool to annotate with VCF databases
  • Add quick annotation tool option (e.g. --annotation_parquet, --annotation_snpsift)
  • Add database generation from gene annotations database
  • Add plugins:
    • 'genebe' (GeneBe annotation using REST API)
    • 'minimalize' (Minimalize a VCF file, such as removing INFO/Tags or samples)

Updates

  • DuckDB 1.0.0 stable Snow Duck (Anas Nivis) release
  • Add API Documentation
  • Improve tests

Fixes

  • Paths parameters check fixed (genome and genomes-folders)
  • Fix snpEff download error with databases list

v0.10.0

08 May 16:55
0726681
Compare
Choose a tag to compare

This release is a refactor of HOWARD (Highly Open Workflow for Annotation & Ranking toward genomic variant Discovery) in Python, using Parquet and duckDB.

HOWARD annotates and prioritizes genetic variations, calculates and normalizes annotations, translates files in multiple formats (e.g. vcf, tsv, parquet) and generates variants statistics.

See README and gitHub for more explanations.

HOWARD 0.9.15.6

21 Sep 16:23
Compare
Choose a tag to compare

HOWARD

HOWARD annotates and prioritizes genetic variations, calculates and normalizes annotations, translates vcf format and generates variants statistics.

HOWARD annotation is mainly based on ANNOVAR and snpEff tools to annotate, using available databases (see ANNOVAR and snpEff) and home made databases. It also uses BCFTOOLS to annotate variants with a VCF file. ANNOVAR and snpEff databases are automatically downloaded if needed.

HOWARD calculation harmonizes allele frequency (VAF), extracts Nomen (transcript, cNomen, pNomen...) from HGVS fields with an optional list of personalized transcripts, generates VaRank format barcode.

HOWARD prioritization algorithm uses profiles to flag variants (as passed or filtered), calculate a prioritization score, and automatically generate a comment for each variants (example: 'polymorphism identified in dbSNP. associated to Lung Cancer. Found in ClinVar database').Prioritization profiles are defined in a configuration file. A profile is defined as a list of annotation/value, using wildcards and comparison options (contains, lower than, greater than, equal...). Annotations fields may be quality values (usually from callers, such as 'GQ', 'DP') or other annotations fields provided by annotations tools, such as HOWARD itself (example: COSMIC, Clinvar, 1000genomes, PolyPhen, SIFT). Multiple profiles can be used simultaneously, which is useful to define multiple validation/prioritization levels (example: 'standard', 'stringent', 'rare variants', 'low allele frequency').

HOWARD translates VCF format into TSV format, by sorting variants using specific fields (example : 'prioritization score', 'allele frequency', 'gene symbol'), including/excluding annotations/fields, including/excluding variants, adding fixed columns.

HOWARD generates statistics files with a specific algorithm, snpEff and BCFTOOLS.

HOWARD is multithreaded through the number of variants and by database (data-scaling).

Getting Started

In order to build, setup and create a persitent CLI (running container), docker-compose command build images and launch services as containers.

$ docker-compose up

A setup container (HOWARD-setup) automatically downloads required databases according to an HOWARD VCF example annotation using ANNOVAR and snpEff. Configuration of host data and databases folders (default ${HOME}/HOWARD), assembly and databases to download in .env file. See HOWARD, ANNOVAR and snpEff documentation for custom databases download.

A Command Line Interface container (HOWARD-CLI) is started with host data and databases folders mounted. Execute a command, or connect to the CLI as a terminal, and let's start with HOWARD!

Using an HOWARD VCF example, this command:

  • 1- annotates with HGVS (variation identification), outcome and location (fonctionnal annotation), and clinical databases (ClinVar and Cosmic),
  • 2- calculates the Variant Allele Frquency (VAF), a genotype barcode (BARCODE), and process HGVS to extract NOMEN information,
  • 3- prioritizes variations according to priorization rules specific to somatic focus (quality, functionnal and clinical annotations),
  • 4- translates into TSV format, with specific fields order for the first 3 columns (ALL for the rest), and a sorting to focus on intersting variations (Flag as PASS, with best score)
  • 5- generates final file into host data folder (e.g. ${HOME}/HOWARD/data/example.howard.tsv)
$ docker exec HOWARD --input=/tool/docs/example.vcf --output=/data/example.howard.tsv --annotation=snpeff,hgvs,symbol,outcome,location,CLINVAR,CLINVAR_CLNDN,COSMIC --calculation=VAF,BARCODE,NOMEN --prioritization=SOMATIC --translation=TSV --fields=NOMEN,PZFlag,PZScore,ALL --sort=PZFlag::DESC,PZScore:n:DESC
$ docker exec -ti HOWARD-CLI bash
[data]# HOWARD --help

Docker

HOWARD image presents a container that runs on CentOS, and includes yum modules and other tools dependencies:

  • Java [1.8]
  • bcftools/htslib [1.12]
  • ANNOVAR [2019Oct24]
  • snpEff [5.0e]

Docker Build - Image

The Dockerfile provided with this package provides everything that is needed to build the image. The build system must have Docker installed in
order to build the image.

$ cd ${HOME}/HOWARD
$ docker build -t howard:latest .

Running Run - Container

The container host must have Docker installed in order to run the image as a container. Then the image can be pulled and a container can be started directly. Any standard Docker switches may be provided on the command line when running a container.

$ docker run howard:latest

Mount Data and Databases volumes

In order to make data and databases persistent, host volumes can be mounted. Content may also be copied directly into the running container using a
docker cp ....

-v ${HOME}/HOWARD/data:/data
-v ${HOME}/HOWARD/databases:/databases

Run as a terminal

In order to execute command directly to an container, start HOWARD container with terminal interface:

$ docker run --name howard --entrypoint=bash -ti howard:latest

Example

Run HOWARD as a uniq command.

$ docker run --rm -v ${HOME}/HOWARD/data:/data -v ${HOME}/HOWARD/databases:/databases howard:latest --input=/tool/docs/example.vcf --output=/data/example.howard.tsv --annotation=snpeff,hgvs,symbol,outcome,location,CLINVAR,CLINVAR_CLNDN,COSMIC --calculation=VAF,BARCODE,NOMEN --prioritization=SOMATIC --translation=TSV --fields=NOMEN,PZFlag,PZScore,ALL --sort=PZFlag::DESC,PZScore:n:DESC

Database download

Databases are downloaded automatically by using annotation configuratin file, or options in command line (--annovar_databases, --snpeff_databases, assembly...).

Use a vcf file, such as HOWARD VCF example, to download ANNOVAR and snpEff databases (WITHOUT multithreading, "ALL" for all databases, "core" for core databases, "snpeff" for snpEff database, or a list of databases, or ANNOVAR code). Use this command multiple times for all needed databases and assembly (such as hg19, hg38, mm9).

$ docker run howard:latest --input=/tool/docs/example.vcf --output=/tool/docs/example.annotated.vcf --annotation=ALL,snpeff --thread=1 --verbose

Note: For home made databases, refer to config.annotation.ini file to construct and configure your own database.

Note: Beware of proxy configuration!

HOWARD 0.9.15.4

12 Apr 23:38
Compare
Choose a tag to compare

HOWARD

HOWARD annotates and prioritizes genetic variations, calculates and normalizes annotations, translates vcf format and generates variants statistics.

HOWARD annotation is mainly based on ANNOVAR and snpEff tools to annotate, using available databases (see ANNOVAR and snpEff) and home made databases. It also uses BCFTOOLS to annotate variants with a VCF file. ANNOVAR and snpEff databases are automatically downloaded if needed.

HOWARD calculation harmonizes allele frequency (VAF), extracts Nomen (transcript, cNomen, pNomen...) from HGVS fields with an optional list of personalized transcripts, generates VaRank format barcode.

HOWARD prioritization algorithm uses profiles to flag variants (as passed or filtered), calculate a prioritization score, and automatically generate a comment for each variants (example: 'polymorphism identified in dbSNP. associated to Lung Cancer. Found in ClinVar database').Prioritization profiles are defined in a configuration file. A profile is defined as a list of annotation/value, using wildcards and comparison options (contains, lower than, greater than, equal...). Annotations fields may be quality values (usually from callers, such as 'GQ', 'DP') or other annotations fields provided by annotations tools, such as HOWARD itself (example: COSMIC, Clinvar, 1000genomes, PolyPhen, SIFT). Multiple profiles can be used simultaneously, which is useful to define multiple validation/prioritization levels (example: 'standard', 'stringent', 'rare variants', 'low allele frequency').

HOWARD translates VCF format into TSV format, by sorting variants using specific fields (example : 'prioritization score', 'allele frequency', 'gene symbol'), including/excluding annotations/fields, including/excluding variants, adding fixed columns.

HOWARD generates statistics files with a specific algorithm, snpEff and BCFTOOLS.

HOWARD is multithreaded through the number of variants and by database (data-scaling).

Getting Started

In order to build, setup and create a persitent CLI (running container), docker-compose command build images and launch services as containers.

$ docker-compose up

A setup container (HOWARD-setup) automatically downloads required databases according to an HOWARD VCF example annotation using ANNOVAR and snpEff. Configuration of host data and databases folders (default ${HOME}/HOWARD), assembly and databases to download in .env file. See HOWARD, ANNOVAR and snpEff documentation for custom databases download.

A Command Line Interface container (HOWARD-CLI) is started with host data and databases folders mounted. Execute a command, or connect to the CLI as a terminal, and let's start with HOWARD!

Using an HOWARD VCF example, this command:

  • 1- annotates with HGVS (variation identification), outcome and location (fonctionnal annotation), and clinical databases (ClinVar and Cosmic),
  • 2- calculates the Variant Allele Frquency (VAF), a genotype barcode (BARCODE), and process HGVS to extract NOMEN information,
  • 3- prioritizes variations according to priorization rules specific to somatic focus (quality, functionnal and clinical annotations),
  • 4- translates into TSV format, with specific fields order for the first 3 columns (ALL for the rest), and a sorting to focus on intersting variations (Flag as PASS, with best score)
  • 5- generates final file into host data folder (e.g. ${HOME}/HOWARD/data/example.howard.tsv)
$ docker exec HOWARD --input=/tool/docs/example.vcf --output=/data/example.howard.tsv --annotation=hgvs,symbol,outcome,location,CLINVAR,CLINVAR_CLNDN,COSMIC --calculation=VAF,BARCODE,NOMEN --prioritization=SOMATIC --translation=TSV --fields=NOMEN,PZFlag,PZScore,ALL --sort=PZFlag::DESC,PZScore:n:DESC
$ docker exec -ti HOWARD-CLI bash
[data]# HOWARD --help

Docker

HOWARD image presents a container that runs on CentOS, and includes yum modules and other tools dependencies:

  • Java [1.8]
  • bcftools/htslib [1.12]
  • ANNOVAR [2019Oct24]
  • snpEff [5.0e]

Docker Build - Image

The Dockerfile provided with this package provides everything that is needed to build the image. The build system must have Docker installed in
order to build the image.

$ cd ${HOME}/HOWARD
$ docker build -t howard:latest .

Running Run - Container

The container host must have Docker installed in order to run the image as a container. Then the image can be pulled and a container can be started directly. Any standard Docker switches may be provided on the command line when running a container.

$ docker run howard:latest

Mount Data and Databases volumes

In order to make data and databases persistent, host volumes can be mounted. Content may also be copied directly into the running container using a
docker cp ....

-v ${HOME}/HOWARD/data:/data
-v ${HOME}/HOWARD/databases:/databases

Run as a terminal

In order to execute command directly to an container, start HOWARD container with terminal interface:

$ docker run --name howard --entrypoint=bash -ti howard:latest

Example

Run HOWARD as a uniq command.

$ docker run --rm -v ${HOME}/HOWARD/data:/data -v ${HOME}/HOWARD/databases:/databases howard:latest --input=/tool/docs/example.vcf --output=/data/example.howard.tsv --annotation=hgvs,symbol,outcome,location,CLINVAR,CLINVAR_CLNDN,COSMIC --calculation=VAF,BARCODE,NOMEN --prioritization=SOMATIC --translation=TSV --fields=NOMEN,PZFlag,PZScore,ALL --sort=PZFlag::DESC,PZScore:n:DESC

Database download

Databases are downloaded automatically by using annotation configuratin file, or options in command line (--annovar_databases, --snpeff_databases, assembly...).

Use a vcf file, such as HOWARD VCF example, to download ANNOVAR and snpEff databases (WITHOUT multithreading, "ALL" for all databases, "core" for core databases, "snpeff" for snpEff database, or a list of databases, or ANNOVAR code). Use this command multiple times for all needed databases and assembly (such as hg19, hg38, mm9).

$ docker run howard:latest --input=/tool/docs/example.vcf --output=/tool/docs/example.annotated.vcf --annotation=ALL,snpeff --thread=1 --verbose

Note: For home made databases, refer to config.annotation.ini file to construct and configure your own database.

Note: Beware of proxy configuration!