Overview
MD5Sum: 106d05940e873de3b46df58eba130e13
Documentation
Documentation for dragen-germline-pipeline v4.3.6
Dockstore
ICAv2
Tenant: umccr-prod
Bundles Generated
Bundle Name: dragen_germline_pipeline_with_validation_data__4_3_6__20250207031159 / Bundle Version v10_r4__20250207031159
Description
This bundle has been generated by the release of workflows/dragen-germline-pipeline/4.3.6/dragen-germline-pipeline__4.3.6.cwl. The pipeline can be found at https://github.com/umccr/cwl-ica/releases/tag/dragen-germline-pipeline/4.3.6__20250207031159.
Version Description
Bundle version description is currently redundant while we cannot append versions to bundles. Regardless - the bunch version is v10_r4
Bundle ID: bd1b6841-5bc0-4506-a7db-72a209f3e685
- Bundle Link
Pipeline Project ID: 5844391a-69db-4b52-86b5-6a0d55c2386f
Pipeline Project Name: pipelines
Pipeline ID: 6b5f6a0c-5d21-43a9-971c-516209927bdc
Pipeline Code: dragen-germline-pipeline__4_3_6__20250207031159
Projects
- development
- staging
Datasets
- dragen_hash_table_chm13_v2_v10_r4_graph_cnv_hla_rna
- dragen_hash_table_chm13_v2_v10_r4_linear_cnv_hla_rna_methylated_combined
- dragen_hash_table_hg38_alt_masked_v10_r4_graph_cnv_hla_rna
- dragen_hash_table_hg38_alt_masked_v10_r4_linear_cnv_hla_rna_methylated_combined
- wgs_validation_fastq__cups_pair_8
- wgs_validation_fastq__2016_249_17_MH_P033
- wgs_validation_fastq__2016_249_18_WH_P025
- wgs_validation_fastq__B_ALL_Case_10
- wgs_validation_fastq_Diploid_Never_Responder
- wgs_validation_fastq_SBJ00303
- wgs_validation_fastq_SEQC50
- wgs_validation_fastq_SFRC01073
Bundle Name: dragen_germline_pipeline_prod__4_3_6__20250207031159 / Bundle Version v10_r4__20250207031159
Description
This bundle has been generated by the release of workflows/dragen-germline-pipeline/4.3.6/dragen-germline-pipeline__4.3.6.cwl. The pipeline can be found at https://github.com/umccr/cwl-ica/releases/tag/dragen-germline-pipeline/4.3.6__20250207031159.
Version Description
Bundle version description is currently redundant while we cannot append versions to bundles. Regardless - the bunch version is v10_r4
Bundle ID: d44f42b3-b920-4790-89f4-9d8e1384eefa
- Bundle Link
Pipeline Project ID: 5844391a-69db-4b52-86b5-6a0d55c2386f
Pipeline Project Name: pipelines
Pipeline ID: 6b5f6a0c-5d21-43a9-971c-516209927bdc
Pipeline Code: dragen-germline-pipeline__4_3_6__20250207031159
Projects
- production
Datasets
- dragen_hash_table_chm13_v2_v10_r4_graph_cnv_hla_rna
- dragen_hash_table_chm13_v2_v10_r4_linear_cnv_hla_rna_methylated_combined
- dragen_hash_table_hg38_alt_masked_v10_r4_graph_cnv_hla_rna
- dragen_hash_table_hg38_alt_masked_v10_r4_linear_cnv_hla_rna_methylated_combined
Visual Overview
Inputs Template
Yaml
Click to expand!
# yaml-language-server: $schema=https://github.com/umccr/cwl-ica/releases/download/dragen-germline-pipeline%2F4.3.6__20250207031159/dragen-germline-pipeline__4.3.6__20250207031159.schema.json
# bam input (Optional)
# Docs: Input a normal BAM file for the variant calling stage
bam_input:
class: File
location: icav2://project_id/path/to/file
# cnv enable self normalization (Optional)
# Docs: Enable CNV self normalization.
# Self Normalization requires that the DRAGEN hash table be generated with the enable-cnv=true option.
cnv_enable_self_normalization: false
# cram input (Optional)
# Docs: Input a normal CRAM file for the variant calling stage
cram_input:
class: File
location: icav2://project_id/path/to/file
# cram reference (Optional)
# Docs: Path to the reference fasta file for the CRAM input.
# Required only if the input is a cram file AND not the reference in the tarball
cram_reference:
class: File
location: icav2://project_id/path/to/file
# dbsnp annotation (Optional)
# Docs: In Germline, Tumor-Normal somatic, or Tumor-Only somatic modes,
# DRAGEN can look up variant calls in a dbSNP database and add annotations for any matches that it finds there.
# To enable the dbSNP database search, set the --dbsnp option to the full path to the dbSNP database
# VCF or .vcf.gz file, which must be sorted in reference order.
dbsnp_annotation:
class: File
location: icav2://project_id/path/to/file
# deduplicate minimum quality (Optional)
# Docs: Specifies the Phred quality score below which a base should be excluded from the quality score
# calculation used for choosing among duplicate reads.
dedup_min_qual: string
# enable cnv calling (Optional)
# Docs: Enable CNV processing in the DRAGEN Host Software.
enable_cnv: false
# enable duplicate marking (Optional)
# Docs: Mark identical alignments as duplicates
enable_duplicate_marking: false
# enable hla (Optional)
# Docs: Enable HLA typing by setting --enable-hla flag to true
enable_hla: false
# enable map align (Optional)
# Docs: Enabled by default since --enable-variant-caller option is set to true.
# Set this value to false if using bam_input
enable_map_align: false
# enable map align output (Optional)
# Docs: Do you wish to have the output bam files present
enable_map_align_output: false
# enable pgx (Optional)
# Docs: Enable star allele caller. This also turns on other PGx callers such as CYP2D6, CYP2B6
enable_pgx: false
# enable sv (Optional)
# Docs: Enable/disable structural variant
# caller. Default is false.
enable_sv: false
# enable targeted (Optional)
# Docs: Enable targeted variant calling for repetitive regions
enable_targeted: false
# fastq list (Optional)
# Docs: CSV file that contains a list of FASTQ files
# to process.
# Read1File and Read2File may be presigned urls or use this in conjunction with
# the fastq_list_mount_paths inputs.
fastq_list:
class: File
location: icav2://project_id/path/to/file
# fastq list rows (Optional)
# Docs: Alternative to providing a file, one can instead provide a list of 'fastq-list-row' objects
fastq_list_rows:
- rgid: string
rglb: string
rgsm: string
lane: string
read_1:
class: File
location: icav2://project_id/path/to/file
read_2:
class: File
location: icav2://project_id/path/to/file
# hla allele frequency file (Optional)
# Docs: Use the population-level HLA allele frequency file to break ties if one or more HLA allele produces the same or similar results.
# The input HLA allele frequency file must be in CSV format and contain the HLA alleles and the occurrence frequency in population.
# If --hla-allele-frequency-file is not specified, DRAGEN automatically uses hla_classI_allele_frequency.csv from /opt/edico/config/.
# Population-level allele frequencies can be obtained from the Allele Frequency Net database.
hla_allele_frequency_file:
class: File
location: icav2://project_id/path/to/file
# hla bed file (Optional)
# Docs: Use the HLA region BED input file to specify the region to extract HLA reads from.
# DRAGEN HLA Caller parses the input file for regions within the BED file, and then
# extracts reads accordingly to align with the HLA allele reference.
hla_bed_file:
class: File
location: icav2://project_id/path/to/file
# hla min reads (Optional)
# Docs: Set the minimum number of reads to align to HLA alleles to ensure sufficient coverage and perform HLA typing.
# The default value is 1000 and suggested for WES samples. If using samples with less coverage, you can use a
# lower threshold value.
hla_min_reads: string
# hla reference file (Optional)
# Docs: Use the HLA allele reference file to specify the reference alleles to align against.
# The input HLA reference file must be in FASTA format and contain the protein sequence separated into exons.
# If --hla-reference-file is not specified, DRAGEN uses hla_classI_ref_freq.fasta from /opt/edico/config/.
# The reference HLA sequences are obtained from the IMGT/HLA database.
hla_reference_file:
class: File
location: icav2://project_id/path/to/file
# hla tiebreaker threshold (Optional)
# Docs: If more than one allele has a similar number of reads aligned and there is not a clear indicator for the best allele,
# the alleles are considered as ties. The HLA Caller places the tied alleles into a candidate set for tie breaking based
# on the population allele frequency. If an allele has more than the specified fraction of reads aligned (normalized to
# the top hit), then the allele is included into the candidate set for tie breaking. The default value is 0.97.
hla_tiebreaker_threshold: string
# hla zygosity threshold (Optional)
# Docs: If the minor allele at a given locus has fewer reads mapped than a fraction of the read count of the major allele,
# then the HLA Caller infers homozygosity for the given HLA-I gene. You can use this option to specify the fraction value.
# The default value is 0.15.
hla_zygosity_threshold: string
# license instance id location (Optional)
# Default value: /opt/instance-identity
# Docs: You may wish to place your own in.
# Optional value, default set to /opt/instance-identity
# which is a path inside the dragen container
lic_instance_id_location:
class: File
location: icav2://project_id/path/to/file
# output format (Optional)
# Docs: For mapping and aligning, the output is sorted and compressed into BAM format by default before saving to disk.
# You can control the output format from the map/align stage with the --output-format <SAM|BAM|CRAM> option.
output_format: SAM
# output prefix (Required)
# Docs: The prefix given to all output files
output_prefix: string
# qc coverage ignore overlaps (Optional)
# Docs: Set to true to resolve all of the alignments for each fragment and avoid double-counting any
# overlapping bases. This might result in marginally longer run times.
# This option also requires setting --enable-map-align=true.
qc_coverage_ignore_overlaps: false
# qc coverage region 1 (Optional)
# Docs: Generates coverage region report using bed file 1.
qc_coverage_region_1:
class: File
location: icav2://project_id/path/to/file
# qc coverage region 2 (Optional)
# Docs: Generates coverage region report using bed file 2.
qc_coverage_region_2:
class: File
location: icav2://project_id/path/to/file
# qc coverage region 3 (Optional)
# Docs: Generates coverage region report using bed file 3.
qc_coverage_region_3:
class: File
location: icav2://project_id/path/to/file
# reference tar (Required)
# Docs: Path to ref data tarball
reference_tar:
class: File
location: icav2://project_id/path/to/file
# repeat genotype enable (Optional)
# Docs: Enable DRAGEN repeat expansion detection
repeat_genotype_enable: false
# repeat genotype specs (Optional)
# Docs: Specifies the full path to the JSON file that contains the repeat variant catalog (specification) describing the loci to call.
# --repeat-genotype-specs is required for ExpansionHunter.
# If the option is not provided,
# DRAGEN attempts to autodetect the applicable catalog file from /opt/edico/repeat-specs/ based on the reference provided.
repeat_genotype_specs:
class: File
location: icav2://project_id/path/to/file
# repeat genotype use catalog (Optional)
# Docs: The repeat-specification (also called variant catalog) JSON file defines the repeat regions for ExpansionHunter to analyze.
# Default repeat-specification for some pathogenic and polymorphic repeats are in the /opt/edico/repeat-specs/ directory,
# based on the reference genome used with DRAGEN. Users can choose between any of the three default repeat-specification files
# packaged with DRAGEN using <default|default_plus_smn|expanded>
repeat_genotype_use_catalog: default
# sample sex (Optional)
# Docs: Specifies the sex of a sample
sample_sex: male
# sv call regions bed (Optional)
# Docs: Specifies a BED file containing the set of regions to call.
sv_call_regions_bed:
class: File
location: icav2://project_id/path/to/file
# sv discovery (Optional)
# Docs: Enable SV discovery. This flag can be set to false only when --sv-forcegt-vcf is used.
# When set to false, SV discovery is disabled and only the forced genotyping input variants
# are processed. The default is true.
sv_discovery: false
# sv enable liquid tumor mode (Optional)
# Docs: Enable liquid tumor mode.
sv_enable_liquid_tumor_mode: false
# sv exome (Optional)
# Docs: Set to true to configure the variant caller for targeted sequencing inputs,
# which includes disabling high depth filters.
# In integrated mode, the default is to autodetect targeted sequencing input,
# and in standalone mode the default is false.
sv_exome: false
# sv forcegt vcf (Optional)
# Docs: Specify a VCF of structural variants for forced genotyping. The variants are scored and emitted
# in the output VCF even if not found in the sample data.
# The variants are merged with any additional variants discovered directly from the sample data.
sv_forcegt_vcf:
class: File
location: icav2://project_id/path/to/file
# sv output contigs (Optional)
# Docs: Set to true to have assembled contig sequences output in a VCF file. The default is false.
sv_output_contigs: false
# sv region (Optional)
# Docs: Limit the analysis to a specified region of the genome for debugging purposes.
# This option can be specified multiple times to build a list of regions.
# The value must be in the format "chr:startPos-endPos"..
sv_region: string
# sv use overlap pair evidence (Optional)
# Docs: Allow overlapping read pairs to be considered as evidence.
# By default, DRAGEN uses autodetect on the fraction of overlapping read pairs if <20%.
sv_se_overlap_pair_evidence: false
# sv tin contam tolerance (Optional)
# Docs: Set the Tumor-in-Normal (TiN) contamination tolerance level.
# You can enter any value between 0-1. The default maximum TiN contamination tolerance is 0.15.
sv_tin_contam_tolerance: string
# vc decoy contigs (Optional)
# Docs: The --vc-decoy-contigs option specifies a comma-separated list of contigs to skip during variant calling.
# This option can be set in the configuration file.
vc_decoy_contigs: string
# vc emit ref confidence (Optional)
# Docs: A genomic VCF (gVCF) file contains information on variants and positions determined to be homozygous to the reference genome.
# For homozygous regions, the gVCF file includes statistics that indicate how well reads support the absence of variants or
# alternative alleles. To enable gVCF output, set to GVCF. By default, contiguous runs of homozygous reference calls with similar
# scores are collapsed into blocks (hom-ref blocks). Hom-ref blocks save disk space and processing time of downstream analysis tools.
# DRAGEN recommends using the default mode. To produce unbanded output, set --vc-emit-ref-confidence to BP_RESOLUTION.
vc_emit_ref_confidence: string
# vc enable baf (Optional)
# Docs: Enable or disable B-allele frequency output. Enabled by default.
vc_enable_baf: false
# vc enable decoy contigs (Optional)
# Docs: If --vc-enable-decoy-contigs is set to true, variant calls on the decoy contigs are enabled.
# The default value is false.
vc_enable_decoy_contigs: false
# vc enable gatk acceleration (Optional)
# Docs: If is set to true, the variant caller runs in GATK mode
# (concordant with GATK 3.7 in germline mode and GATK 4.0 in somatic mode).
vc_enable_gatk_acceleration: false
# vc enable phasing (Optional)
# Docs: The -vc-enable-phasing option enables variants to be phased when possible. The default value is true.
vc_enable_phasing: false
# vc enable roh (Optional)
# Docs: Enable or disable the ROH caller by setting this option to true or false. Enabled by default for human autosomes only.
vc_enable_roh: false
# vc enable sex chr diploid (Optional)
# Docs: For male samples in germline calling mode, DRAGEN calls potential mosaic variants in non-PAR regions of sex chromosomes.
# A variant is called as mosaic when the allele frequency (FORMAT/AF) is below 85% or if multiple alt alleles are called,
# suggesting incompatibility with the haploid assumption. The GT field for bi-allelic mosaic variants is "0/1",
# denoting a mixture of reference and alt alleles, as opposed to the regular GT of "1" for haploid variants.
# The GT field for multi-allelic mosaic variants is "1/2" in VCF.
# You can disable the calling of mosaic variants by setting --vc-enable-sex-chr-diploid to false.
vc_enable_sex_chr_diploid: false
# vc enable vcf output (Optional)
# Docs: The -vc-enable-vcf-output option enables VCF file output during a gVCF run. The default value is false.
vc_enable_vcf_output: false
# vc forcegt vcf (Optional)
# Docs: AGENsupports force genotyping (ForceGT) for Germline SNV variant calling.
# To use ForceGT, use the --vc-forcegt-vcf option with a list of small variants to force genotype.
# The input list of small variants can be a .vcf or .vcf.gz file.
# The current limitations of ForceGT are as follows:
# * ForceGT is supported for Germline SNV variant calling in the V3 mode.
# The V1, V2, and V2+ modes are not supported.
# * ForceGT is not supported for Somatic SNV variant calling.
# * ForceGT variants do not propagate through Joint Genotyping.
vc_forcegt_vcf:
class: File
location: icav2://project_id/path/to/file
# vc haploid call af threshold (Optional)
# Docs: Option --vc-haploid-call-af-threshold=<af_threshold> to control threshold.
# * Diploid model is applied to haploid (chrX/Y, non-PAR) regions in male samples.
# * Variants with only one alt allele and with AF>=85% are rewritten to haploid calls.
# * The potential mosaic calls with AF<85% will have GT of "0/1" and an INFO tag of
# "MOSAIC" will be added.
vc_haploid_call_af_threshold: string
# vc hard fitler (Optional)
# Docs: DRAGEN provides post-VCF variant filtering based on annotations present in the VCF records.
# However, due to the nature of DRAGEN's algorithms, which incorporate the hypothesis of correlated errors
# from within the core of variant caller, the pipeline has improved capabilities in distinguishing
# the true variants from noise, and therefore the dependency on post-VCF filtering is substantially reduced.
# For this reason, the default post-VCF filtering in DRAGEN is very simple
vc_hard_filter: string
# vc max reads per active region (Optional)
# Docs: specifies the maximum number of reads covering a given active region.
# Default is 10000 for the germline workflow
vc_max_reads_per_active_region: string
# vc max reads per raw region (Optional)
# Docs: specifies the maximum number of reads covering a given raw region.
# Default is 30000 for the germline workflow
vc_max_reads_per_raw_region: string
# vc ml enable recalibration (Optional)
# Docs: DRAGEN employs machine learning-based variant recalibration (DRAGEN-ML) for germline SNV VC.
# Variant calling accuracy is improved using powerful and efficient machine learning techniques that augment the variant caller,
# by exploiting more of the available read and context information that does not easily integrate into the Bayesian processing
# used by the haplotype variant caller.
vc_ml_enable_recalibration: false
# vc remove all soft clips (Optional)
# Docs: If is set to true, the variant caller does not use soft clips of reads to determine variants.
vc_remove_all_soft_clips: false
# vc roh blacklist bed (Optional)
# Docs: If provided, the ROH caller ignores variants that are contained in any region in the blacklist BED file.
# DRAGEN distributes blacklist files for all popular human genomes and automatically selects a blacklist to
# match the genome in use, unless this option is used explicitly select a file.
vc_roh_blacklist_bed:
class: File
location: icav2://project_id/path/to/file
# vc target bed (Optional)
# Docs: This is an optional command line input that restricts processing of the small variant caller,
# target bed related coverage, and callability metrics to regions specified in a BED file.
vc_target_bed:
class: File
location: icav2://project_id/path/to/file
# vc target bed padding (Optional)
# Docs: This is an optional command line input that can be used to pad all of the target
# BED regions with the specified value.
# For example, if a BED region is 1:1000-2000 and a padding value of 100 is used,
# it is equivalent to using a BED region of 1:900-2100 and a padding value of 0.
# Any padding added to --vc-target-bed-padding is used by the small variant caller
# and by the target bed coverage/callability reports. The default padding is 0.
vc_target_bed_padding: string
# vc target coverage (Optional)
# Docs: The --vc-target-coverage option specifies the target coverage for down-sampling.
# The default value is 500 for germline mode and 50 for somatic mode.
vc_target_coverage: string
Json
Click to expand!
{
"bam_input": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"cnv_enable_self_normalization": false,
"cram_input": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"cram_reference": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"dbsnp_annotation": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"dedup_min_qual": "string",
"enable_cnv": false,
"enable_duplicate_marking": false,
"enable_hla": false,
"enable_map_align": false,
"enable_map_align_output": false,
"enable_pgx": false,
"enable_sv": false,
"enable_targeted": false,
"fastq_list": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"fastq_list_rows": [
{
"rgid": "string",
"rglb": "string",
"rgsm": "string",
"lane": "string",
"read_1": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"read_2": {
"class": "File",
"location": "icav2://project_id/path/to/file"
}
}
],
"hla_allele_frequency_file": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"hla_bed_file": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"hla_min_reads": "string",
"hla_reference_file": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"hla_tiebreaker_threshold": "string",
"hla_zygosity_threshold": "string",
"lic_instance_id_location": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"output_format": "SAM",
"output_prefix": "string",
"qc_coverage_ignore_overlaps": false,
"qc_coverage_region_1": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"qc_coverage_region_2": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"qc_coverage_region_3": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"reference_tar": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"repeat_genotype_enable": false,
"repeat_genotype_specs": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"repeat_genotype_use_catalog": "default",
"sample_sex": "male",
"sv_call_regions_bed": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"sv_discovery": false,
"sv_enable_liquid_tumor_mode": false,
"sv_exome": false,
"sv_forcegt_vcf": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"sv_output_contigs": false,
"sv_region": "string",
"sv_se_overlap_pair_evidence": false,
"sv_tin_contam_tolerance": "string",
"vc_decoy_contigs": "string",
"vc_emit_ref_confidence": "string",
"vc_enable_baf": false,
"vc_enable_decoy_contigs": false,
"vc_enable_gatk_acceleration": false,
"vc_enable_phasing": false,
"vc_enable_roh": false,
"vc_enable_sex_chr_diploid": false,
"vc_enable_vcf_output": false,
"vc_forcegt_vcf": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"vc_haploid_call_af_threshold": "string",
"vc_hard_filter": "string",
"vc_max_reads_per_active_region": "string",
"vc_max_reads_per_raw_region": "string",
"vc_ml_enable_recalibration": false,
"vc_remove_all_soft_clips": false,
"vc_roh_blacklist_bed": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"vc_target_bed": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"vc_target_bed_padding": "string",
"vc_target_coverage": "string"
}
Outputs Template
Click to expand!
{
"dragen_bam_out": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"dragen_germline_output_directory": {
"class": "Directory",
"location": "icav2://project_id/path/to/dir/"
},
"dragen_vcf_out": {
"class": "File",
"location": "icav2://project_id/path/to/file"
},
"multiqc_output_directory": {
"class": "Directory",
"location": "icav2://project_id/path/to/dir/"
}
}
Overrides Template
Zipped workflow
Click to expand!
[
"workflow.cwl#dragen-germline-pipeline--4.3.6/dragen_qc_step",
"workflow.cwl#dragen-germline-pipeline--4.3.6/run_dragen_germline_step"
]
Packed workflow
Click to expand!
[
"#main/dragen_qc_step",
"#main/run_dragen_germline_step"
]
Inputs
Click to expand!
bam input
ID: bam_input
Optional: True
Type: File
Docs:
Input a normal BAM file for the variant calling stage
cnv enable self normalization
ID: cnv_enable_self_normalization
Optional: True
Type: boolean
Docs:
Enable CNV self normalization.
Self Normalization requires that the DRAGEN hash table be generated with the enable-cnv=true option.
cram input
ID: cram_input
Optional: True
Type: File
Docs:
Input a normal CRAM file for the variant calling stage
cram reference
ID: cram_reference
Optional: True
Type: File
Docs:
Path to the reference fasta file for the CRAM input.
Required only if the input is a cram file AND not the reference in the tarball
dbsnp annotation
ID: dbsnp_annotation
Optional: True
Type: File
Docs:
In Germline, Tumor-Normal somatic, or Tumor-Only somatic modes,
DRAGEN can look up variant calls in a dbSNP database and add annotations for any matches that it finds there.
To enable the dbSNP database search, set the --dbsnp option to the full path to the dbSNP database
VCF or .vcf.gz file, which must be sorted in reference order.
deduplicate minimum quality
ID: dedup_min_qual
Optional: True
Type: int
Docs:
Specifies the Phred quality score below which a base should be excluded from the quality score
calculation used for choosing among duplicate reads.
enable cnv calling
ID: enable_cnv
Optional: True
Type: boolean
Docs:
Enable CNV processing in the DRAGEN Host Software.
enable duplicate marking
ID: enable_duplicate_marking
Optional: True
Type: boolean
Docs:
Mark identical alignments as duplicates
enable hla
ID: enable_hla
Optional: True
Type: boolean
Docs:
Enable HLA typing by setting --enable-hla flag to true
enable map align
ID: enable_map_align
Optional: True
Type: boolean
Docs:
Enabled by default since --enable-variant-caller option is set to true.
Set this value to false if using bam_input
enable map align output
ID: enable_map_align_output
Optional: True
Type: boolean
Docs:
Do you wish to have the output bam files present
enable pgx
ID: enable_pgx
Optional: True
Type: boolean
Docs:
Enable star allele caller. This also turns on other PGx callers such as CYP2D6, CYP2B6
enable sv
ID: enable_sv
Optional: True
Type: boolean
Docs:
Enable/disable structural variant
caller. Default is false.
enable targeted
ID: enable_targeted
Optional: True
Type: boolean
Docs:
Enable targeted variant calling for repetitive regions
fastq list
ID: fastq_list
Optional: True
Type: File
Docs:
CSV file that contains a list of FASTQ files
to process.
Read1File and Read2File may be presigned urls or use this in conjunction with
the fastq_list_mount_paths inputs.
fastq list rows
ID: fastq_list_rows
Optional: True
Type: fastq-list-row[]
Docs:
Alternative to providing a file, one can instead provide a list of 'fastq-list-row' objects
hla allele frequency file
ID: hla_allele_frequency_file
Optional: True
Type: File
Docs:
Use the population-level HLA allele frequency file to break ties if one or more HLA allele produces the same or similar results.
The input HLA allele frequency file must be in CSV format and contain the HLA alleles and the occurrence frequency in population.
If --hla-allele-frequency-file is not specified, DRAGEN automatically uses hla_classI_allele_frequency.csv from /opt/edico/config/.
Population-level allele frequencies can be obtained from the Allele Frequency Net database.
hla bed file
ID: hla_bed_file
Optional: True
Type: File
Docs:
Use the HLA region BED input file to specify the region to extract HLA reads from.
DRAGEN HLA Caller parses the input file for regions within the BED file, and then
extracts reads accordingly to align with the HLA allele reference.
hla min reads
ID: hla_min_reads
Optional: True
Type: int
Docs:
Set the minimum number of reads to align to HLA alleles to ensure sufficient coverage and perform HLA typing.
The default value is 1000 and suggested for WES samples. If using samples with less coverage, you can use a
lower threshold value.
hla reference file
ID: hla_reference_file
Optional: True
Type: File
Docs:
Use the HLA allele reference file to specify the reference alleles to align against.
The input HLA reference file must be in FASTA format and contain the protein sequence separated into exons.
If --hla-reference-file is not specified, DRAGEN uses hla_classI_ref_freq.fasta from /opt/edico/config/.
The reference HLA sequences are obtained from the IMGT/HLA database.
hla tiebreaker threshold
ID: hla_tiebreaker_threshold
Optional: True
Type: float
Docs:
If more than one allele has a similar number of reads aligned and there is not a clear indicator for the best allele,
the alleles are considered as ties. The HLA Caller places the tied alleles into a candidate set for tie breaking based
on the population allele frequency. If an allele has more than the specified fraction of reads aligned (normalized to
the top hit), then the allele is included into the candidate set for tie breaking. The default value is 0.97.
hla zygosity threshold
ID: hla_zygosity_threshold
Optional: True
Type: float
Docs:
If the minor allele at a given locus has fewer reads mapped than a fraction of the read count of the major allele,
then the HLA Caller infers homozygosity for the given HLA-I gene. You can use this option to specify the fraction value.
The default value is 0.15.
license instance id location
ID: lic_instance_id_location
Optional: True
Type: ['File', 'string']
Docs:
You may wish to place your own in.
Optional value, default set to /opt/instance-identity
which is a path inside the dragen container
output format
ID: output_format
Optional: True
Type: [ SAM | BAM | CRAM ]
Docs:
For mapping and aligning, the output is sorted and compressed into BAM format by default before saving to disk.
You can control the output format from the map/align stage with the --output-format <SAM|BAM|CRAM> option.
output prefix
ID: output_prefix
Optional: False
Type: string
Docs:
The prefix given to all output files
qc coverage ignore overlaps
ID: qc_coverage_ignore_overlaps
Optional: True
Type: boolean
Docs:
Set to true to resolve all of the alignments for each fragment and avoid double-counting any
overlapping bases. This might result in marginally longer run times.
This option also requires setting --enable-map-align=true.
qc coverage region 1
ID: qc_coverage_region_1
Optional: True
Type: File
Docs:
Generates coverage region report using bed file 1.
qc coverage region 2
ID: qc_coverage_region_2
Optional: True
Type: File
Docs:
Generates coverage region report using bed file 2.
qc coverage region 3
ID: qc_coverage_region_3
Optional: True
Type: File
Docs:
Generates coverage region report using bed file 3.
reference tar
ID: reference_tar
Optional: False
Type: File
Docs:
Path to ref data tarball
repeat genotype enable
ID: repeat_genotype_enable
Optional: True
Type: boolean
Docs:
Enable DRAGEN repeat expansion detection
repeat genotype specs
ID: repeat_genotype_specs
Optional: True
Type: ['File', 'string']
Docs:
Specifies the full path to the JSON file that contains the repeat variant catalog (specification) describing the loci to call.
--repeat-genotype-specs is required for ExpansionHunter.
If the option is not provided,
DRAGEN attempts to autodetect the applicable catalog file from /opt/edico/repeat-specs/ based on the reference provided.
repeat genotype use catalog
ID: repeat_genotype_use_catalog
Optional: True
Type: [ default | default_plus_smn | expanded ]
Docs:
The repeat-specification (also called variant catalog) JSON file defines the repeat regions for ExpansionHunter to analyze.
Default repeat-specification for some pathogenic and polymorphic repeats are in the /opt/edico/repeat-specs/ directory,
based on the reference genome used with DRAGEN. Users can choose between any of the three default repeat-specification files
packaged with DRAGEN using <default|default_plus_smn|expanded>
sample sex
ID: sample_sex
Optional: True
Type: [ male | female ]
Docs:
Specifies the sex of a sample
sv call regions bed
ID: sv_call_regions_bed
Optional: True
Type: File
Docs:
Specifies a BED file containing the set of regions to call.
sv discovery
ID: sv_discovery
Optional: True
Type: boolean
Docs:
Enable SV discovery. This flag can be set to false only when --sv-forcegt-vcf is used.
When set to false, SV discovery is disabled and only the forced genotyping input variants
are processed. The default is true.
sv enable liquid tumor mode
ID: sv_enable_liquid_tumor_mode
Optional: True
Type: boolean
Docs:
Enable liquid tumor mode.
sv exome
ID: sv_exome
Optional: True
Type: boolean
Docs:
Set to true to configure the variant caller for targeted sequencing inputs,
which includes disabling high depth filters.
In integrated mode, the default is to autodetect targeted sequencing input,
and in standalone mode the default is false.
sv forcegt vcf
ID: sv_forcegt_vcf
Optional: True
Type: File
Docs:
Specify a VCF of structural variants for forced genotyping. The variants are scored and emitted
in the output VCF even if not found in the sample data.
The variants are merged with any additional variants discovered directly from the sample data.
sv output contigs
ID: sv_output_contigs
Optional: True
Type: boolean
Docs:
Set to true to have assembled contig sequences output in a VCF file. The default is false.
sv region
ID: sv_region
Optional: True
Type: string
Docs:
Limit the analysis to a specified region of the genome for debugging purposes.
This option can be specified multiple times to build a list of regions.
The value must be in the format "chr:startPos-endPos"..
sv use overlap pair evidence
ID: sv_se_overlap_pair_evidence
Optional: True
Type: boolean
Docs:
Allow overlapping read pairs to be considered as evidence.
By default, DRAGEN uses autodetect on the fraction of overlapping read pairs if <20%.
sv tin contam tolerance
ID: sv_tin_contam_tolerance
Optional: True
Type: float
Docs:
Set the Tumor-in-Normal (TiN) contamination tolerance level.
You can enter any value between 0-1. The default maximum TiN contamination tolerance is 0.15.
vc decoy contigs
ID: vc_decoy_contigs
Optional: True
Type: string
Docs:
The --vc-decoy-contigs option specifies a comma-separated list of contigs to skip during variant calling.
This option can be set in the configuration file.
vc emit ref confidence
ID: vc_emit_ref_confidence
Optional: True
Type: string
Docs:
A genomic VCF (gVCF) file contains information on variants and positions determined to be homozygous to the reference genome.
For homozygous regions, the gVCF file includes statistics that indicate how well reads support the absence of variants or
alternative alleles. To enable gVCF output, set to GVCF. By default, contiguous runs of homozygous reference calls with similar
scores are collapsed into blocks (hom-ref blocks). Hom-ref blocks save disk space and processing time of downstream analysis tools.
DRAGEN recommends using the default mode. To produce unbanded output, set --vc-emit-ref-confidence to BP_RESOLUTION.
vc enable baf
ID: vc_enable_baf
Optional: True
Type: boolean
Docs:
Enable or disable B-allele frequency output. Enabled by default.
vc enable decoy contigs
ID: vc_enable_decoy_contigs
Optional: True
Type: boolean
Docs:
If --vc-enable-decoy-contigs is set to true, variant calls on the decoy contigs are enabled.
The default value is false.
vc enable gatk acceleration
ID: vc_enable_gatk_acceleration
Optional: True
Type: boolean
Docs:
If is set to true, the variant caller runs in GATK mode
(concordant with GATK 3.7 in germline mode and GATK 4.0 in somatic mode).
vc enable phasing
ID: vc_enable_phasing
Optional: True
Type: boolean
Docs:
The -vc-enable-phasing option enables variants to be phased when possible. The default value is true.
vc enable roh
ID: vc_enable_roh
Optional: True
Type: boolean
Docs:
Enable or disable the ROH caller by setting this option to true or false. Enabled by default for human autosomes only.
vc enable sex chr diploid
ID: vc_enable_sex_chr_diploid
Optional: True
Type: boolean
Docs:
For male samples in germline calling mode, DRAGEN calls potential mosaic variants in non-PAR regions of sex chromosomes.
A variant is called as mosaic when the allele frequency (FORMAT/AF) is below 85% or if multiple alt alleles are called,
suggesting incompatibility with the haploid assumption. The GT field for bi-allelic mosaic variants is "0/1",
denoting a mixture of reference and alt alleles, as opposed to the regular GT of "1" for haploid variants.
The GT field for multi-allelic mosaic variants is "1/2" in VCF.
You can disable the calling of mosaic variants by setting --vc-enable-sex-chr-diploid to false.
vc enable vcf output
ID: vc_enable_vcf_output
Optional: True
Type: boolean
Docs:
The -vc-enable-vcf-output option enables VCF file output during a gVCF run. The default value is false.
vc forcegt vcf
ID: vc_forcegt_vcf
Optional: True
Type: File
Docs:
AGENsupports force genotyping (ForceGT) for Germline SNV variant calling.
To use ForceGT, use the --vc-forcegt-vcf option with a list of small variants to force genotype.
The input list of small variants can be a .vcf or .vcf.gz file.
The current limitations of ForceGT are as follows:
- ForceGT is supported for Germline SNV variant calling in the V3 mode.
The V1, V2, and V2+ modes are not supported. - ForceGT is not supported for Somatic SNV variant calling.
- ForceGT variants do not propagate through Joint Genotyping.
vc haploid call af threshold
ID: vc_haploid_call_af_threshold
Optional: True
Type: float
Docs:
Option --vc-haploid-call-af-threshold=<af_threshold> to control threshold.
- Diploid model is applied to haploid (chrX/Y, non-PAR) regions in male samples.
- Variants with only one alt allele and with AF>=85% are rewritten to haploid calls.
- The potential mosaic calls with AF<85% will have GT of "0/1" and an INFO tag of
"MOSAIC" will be added.
vc hard fitler
ID: vc_hard_filter
Optional: True
Type: string
Docs:
DRAGEN provides post-VCF variant filtering based on annotations present in the VCF records.
However, due to the nature of DRAGEN's algorithms, which incorporate the hypothesis of correlated errors
from within the core of variant caller, the pipeline has improved capabilities in distinguishing
the true variants from noise, and therefore the dependency on post-VCF filtering is substantially reduced.
For this reason, the default post-VCF filtering in DRAGEN is very simple
vc max reads per active region
ID: vc_max_reads_per_active_region
Optional: True
Type: int
Docs:
specifies the maximum number of reads covering a given active region.
Default is 10000 for the germline workflow
vc max reads per raw region
ID: vc_max_reads_per_raw_region
Optional: True
Type: int
Docs:
specifies the maximum number of reads covering a given raw region.
Default is 30000 for the germline workflow
vc ml enable recalibration
ID: vc_ml_enable_recalibration
Optional: True
Type: boolean
Docs:
DRAGEN employs machine learning-based variant recalibration (DRAGEN-ML) for germline SNV VC.
Variant calling accuracy is improved using powerful and efficient machine learning techniques that augment the variant caller,
by exploiting more of the available read and context information that does not easily integrate into the Bayesian processing
used by the haplotype variant caller.
vc remove all soft clips
ID: vc_remove_all_soft_clips
Optional: True
Type: boolean
Docs:
If is set to true, the variant caller does not use soft clips of reads to determine variants.
vc roh blacklist bed
ID: vc_roh_blacklist_bed
Optional: True
Type: File
Docs:
If provided, the ROH caller ignores variants that are contained in any region in the blacklist BED file.
DRAGEN distributes blacklist files for all popular human genomes and automatically selects a blacklist to
match the genome in use, unless this option is used explicitly select a file.
vc target bed
ID: vc_target_bed
Optional: True
Type: File
Docs:
This is an optional command line input that restricts processing of the small variant caller,
target bed related coverage, and callability metrics to regions specified in a BED file.
vc target bed padding
ID: vc_target_bed_padding
Optional: True
Type: int
Docs:
This is an optional command line input that can be used to pad all of the target
BED regions with the specified value.
For example, if a BED region is 1:1000-2000 and a padding value of 100 is used,
it is equivalent to using a BED region of 1:900-2100 and a padding value of 0.
Any padding added to --vc-target-bed-padding is used by the small variant caller
and by the target bed coverage/callability reports. The default padding is 0.
vc target coverage
ID: vc_target_coverage
Optional: True
Type: int
Docs:
The --vc-target-coverage option specifies the target coverage for down-sampling.
The default value is 500 for germline mode and 50 for somatic mode.
Steps
Click to expand!
dragen qc step
ID: dragen-germline-pipeline--4.3.6/dragen_qc_step
Step Type: tool
Docs:
The dragen qc step - this takes in an array of dirs
run dragen germline step
ID: dragen-germline-pipeline--4.3.6/run_dragen_germline_step
Step Type: tool
Docs:
Runs the dragen germline workflow on the FPGA.
Takes in either a fastq list as a file or a fastq_list_rows schema object
Outputs
Click to expand!
dragen bam out
ID: dragen-germline-pipeline--4.3.6/dragen_bam_out
Optional: True
Output Type: File
Docs:
The output bam file, exists only if --enable-map-align-output is set to true
dragen germline output directory
ID: dragen-germline-pipeline--4.3.6/dragen_germline_output_directory
Optional: False
Output Type: Directory
Docs:
The output directory containing all germline output files
dragen vcf out
ID: dragen-germline-pipeline--4.3.6/dragen_vcf_out
Optional: True
Output Type: File
Docs:
The output germline vcf file
multiqc output directory
ID: dragen-germline-pipeline--4.3.6/multiqc_output_directory
Optional: False
Output Type: Directory
Docs:
The output directory for multiqc