Skip to content

Latest commit



221 lines (195 loc) · 11.2 KB

File metadata and controls

221 lines (195 loc) · 11.2 KB

singleton.wdl inputs and outputs

DAG (simplified)

title: singleton.wdl
flowchart TD
  subgraph "`**Upstream of Phasing**`"
    subgraph "per-movie"
      ubam[/"HiFi uBAM"/] --> pbmm2_align["pbmm2 align"]
      pbmm2_align --> pbsv_discover["PBSV discover"]
    pbmm2_align --> merge_read_stats["merge read statistics"]
    pbmm2_align --> samtools_merge["samtools merge"]
    samtools_merge --> mosdepth["mosdepth"]
    samtools_merge --> paraphase["Paraphase"]
    samtools_merge --> hificnv["HiFiCNV"]
    samtools_merge --> trgt["TRGT"]
    samtools_merge --> trgt_dropouts["TR coverage dropouts"]
    samtools_merge --> deepvariant["DeepVariant"]
    pbsv_discover --> pbsv_call["PBSV call"]
  subgraph "`**Phasing and Downstream**`"
    deepvariant --> hiphase["HiPhase"]
    trgt --> hiphase
    pbsv_call --> hiphase
    hiphase --> bcftools_roh["bcftools roh"]
    hiphase --> bcftools_stats["bcftools stats\n(small variants)"]
    hiphase --> sv_stats["SV stats"]
    hiphase --> cpg_pileup["5mCpG pileup"]
    hiphase --> starphase["StarPhase"]
    hiphase --> pharmcat["PharmCat"]
    starphase --> pharmcat
  subgraph "`**Tertiary Analysis**`"
    hiphase --> slivar_small_variants["slivar small variants"]
    hiphase --> svpack["svpack filter and annotate"]
    svpack --> slivar_svpack["slivar svpack tsv"]


Type Name Description Notes
String sample_id Unique identifier for the sample Alphanumeric characters, periods, dashes, and underscores are allowed.
String? sex Sample sex
Used by HiFiCNV and TRGT for genotyping. Allosome karyotype will default to XX unless sex is specified as "MALE".
Array[File] hifi_reads Array of paths to HiFi reads in unaligned BAM format.
File ref_map_file TSV containing reference genome file paths; must match backend
String? phenotypes Comma-delimited list of HPO terms. Human Phenotype Ontology (HPO) phenotypes associated with the cohort.

If omitted, tertiary analysis will be skipped.
File? tertiary_map_file TSV containing tertiary analysis file paths and thresholds; must match backend AF/AC/nhomalt thresholds can be modified, but this will affect performance.

If omitted, tertiary analysis will be skipped.
Boolean gpu Use GPU when possible

Default: false
GPU support
String backend Backend where the workflow will be executed

["GCP", "Azure", "AWS-AGC", "AWS-HealthOmics", "HPC"]
String? zones Zones where compute will take place; required if backend is set to 'AWS' or 'GCP'. Determining available zones in GCP
String? gpuType GPU type to use; required if gpu is set to true for cloud backends; must match backend Available GPU types
String? container_registry Container registry where workflow images are hosted.

Default: ""
If omitted, PacBio's public registry will be used.

Custom container_registry must be set if backend is set to 'AWS-HealthOmics'.
Boolean preemptible Where possible, run tasks preemptibly

[true, false]

Default: true
If set to true, run tasks preemptibly where possible. If set to false, on-demand VMs will be used for every task. Ignored if backend is set to HPC.


Alignments, Coverage, and QC

Type Name Description Notes
String workflow_name Workflow name
String workflow_version Workflow version
File stats_file Table of summary statistics
File bam_stats BAM stats Per-read length and read-quality
File read_length_plot Read length plot
File? read_quality_plot Read quality plot
File merged_haplotagged_bam Merged, haplotagged alignments Includes unmapped reads
File merged_haplotagged_bam_index
File mosdepth_summary Summary of aligned read depth.
File mosdepth_region_bed Median aligned read depth by 500bp windows.
File mosdepth_region_bed_index
File mosdepth_depth_distribution_plot
File mapq_distribution_plot Distribution of mapping quality per alignment
File mg_distribution_plot Distribution of gap-compressed identity score per alignment
String stat_num_reads Number of reads
String stat_read_length_mean Mean read length
String stat_read_length_median Median read length
String stat_read_quality_mean Mean read quality
String stat_read_quality_median Median read quality
String stat_mapped_read_count Count of reads mapped to reference
String stat_mapped_percent Percent of reads mapped to reference
String inferred_sex Inferred sex Sex is inferred based on relative depth of chrY alignments.
String stat_mean_depth Mean depth

Small Variants (<50 bp)

Type Name Description Notes
File phased_small_variant_vcf Phased small variant VCF
File phased_small_variant_vcf_index
File small_variant_gvcf Small variant GVCF Can be used for joint-calling.
File small_variant_gvcf_index
File small_variant_stats Small variant stats Generated by bcftools stats.
String stat_small_variant_SNV_count SNV count (PASS variants)
String stat_small_variant_INDEL_count INDEL count (PASS variants)
String stat_small_variant_TSTV_ratio Ts/Tv ratio (PASS variants)
String stat_small_variant_HETHOM_ratio Het/Hom ratio (PASS variants)
File snv_distribution_plot Distribution of SNVs by REF, ALT
File indel_distribution_plot Distribution of indels by size

Structural Variants (≥50 bp)

Type Name Description Notes
File phased_sv_vcf Phased structural variant VCF
File phased_sv_vcf_index Index for phased structural variant VCF
String stat_sv_DUP_count Structural variant DUP count (PASS variants)
String stat_sv_DEL_count Structural variant DEL count (PASS variants)
String stat_sv_INS_count Structural variant INS count (PASS variants)
String stat_sv_INV_count Structural variant INV count (PASS variants)
String stat_sv_BND_count Structural variant BND count (PASS variants)
File bcftools_roh_out ROH calling bcftools roh
File bcftools_roh_bed Generated from above, without filtering

Copy Number Variants (≥100 kb)

Type Name Description Notes
File cnv_vcf CNV VCF
File cnv_vcf_index Index for CNV VCF
File cnv_copynum_bedgraph CNV copy number BEDGraph
File cnv_depth_bw CNV depth BigWig
File cnv_maf_bw CNV MAF BigWig
String stat_cnv_DUP_count Count of DUP events (for PASS variants)
String stat_cnv_DEL_count Count of DEL events (PASS variants)
String stat_cnv_DUP_sum Sum of DUP bp (PASS variants)
String stat_cnv_DEL_sum Sum of DEL bp (PASS variants)

Tandem Repeat Genotyping

Type Name Description Notes
File phased_trgt_vcf Phased TRGT VCF
File phased_trgt_vcf_index
File trgt_spanning_reads TRGT spanning reads
File trgt_spanning_reads_index
File trgt_coverage_dropouts TRGT coverage dropouts
String stat_trgt_genotyped_count Count of genotyped sites
String stat_trgt_uncalled_count Count of ungenotyped sites

Variant Phasing

Type Name Description Notes
File phase_stats Phasing stats
File phase_blocks Phase blocks
File phase_haplotags Per-read haplotag assignment
String stat_phased_basepairs Count of bp within phase blocks
String stat_phase_block_ng50 Phase block NG50

Variant Calling in Dark Regions

Type Name Description Notes
File paraphase_output_json Paraphase output JSON
File paraphase_realigned_bam Paraphase realigned BAM
File paraphase_realigned_bam_index
File? paraphase_vcfs Paraphase VCFs Compressed as .tar.gz

5mCpG Methylation Calling

Type Name Description Notes
File? cpg_hap1_bed CpG hap1 BED
File? cpg_hap1_bed_index
File? cpg_hap2_bed CpG hap2 BED
File? cpg_hap2_bed_index
File? cpg_combined_bed CpG combined BED
File? cpg_combined_bed_index
File? cpg_hap1_bw CpG hap1 BigWig
File? cpg_hap2_bw CpG hap2 BigWig
File? cpg_combined_bw CpG combined BigWig
String stat_cpg_hap1_count Hap1 CpG count
String stat_cpg_hap2_count Hap2 CpG count
String stat_cpg_combined_count Combined CpG count

PGx Typing

Type Name Description Notes
File pbstarphase_json PBstarPhase JSON Haplotype calls for PGx loci
File? pharmcat_match_json PharmCAT match JSON
File? pharmcat_phenotype_json PharmCAT phenotype JSON
File? pharmcat_report_html PharmCAT report HTML
File? pharmcat_report_json PharmCAT report JSON

Tertiary Analysis

Type Name Description Notes
File? pedigree Pedigree file in PLINK PED format
File? tertiary_small_variant_filtered_vcf Filtered, annotated small variant VCF
File? tertiary_small_variant_filtered_vcf_index
File? tertiary_small_variant_filtered_tsv Filtered, annotated small variant calls
File? tertiary_small_variant_compound_het_vcf Filtered, annotated compound heterozygous small variant VCF
File? tertiary_small_variant_compound_het_vcf_index
File? tertiary_small_variant_compound_het_tsv Filtered, annotated compound heterozygous small variant calls
File? tertiary_sv_filtered_vcf Filtered, annotated structural variant VCF
File? tertiary_sv_filtered_vcf_index
File? tertiary_sv_filtered_tsv Filtered, annotated structural variant TSV