Skip to content

Commit

Permalink
🔧 accept cram input
Browse files Browse the repository at this point in the history
🔧 Svaba CRAM adjustments
  • Loading branch information
dmiller15 committed Sep 7, 2022
1 parent 2f92e5f commit a9de7f5
Show file tree
Hide file tree
Showing 3 changed files with 44 additions and 36 deletions.
14 changes: 7 additions & 7 deletions docs/GERMLINE_SV_README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@

The Kids First Data Resource Center (KFDRC) Germline Structural Variant (SV)
Caller Workflow is a common workflow language (CWL) implmentation to generate
SV calls from an aligned reads BAM file. The workflow makes use of Manta and
SvABA to call varaiants then annotates these variants using AnnotSV.
SV calls from an aligned reads BAM or CRAM file. The workflow makes use of
Manta and SvABA to call varaiants then annotates these variants using AnnotSV.

## Relevant Softwares and Versions

Expand Down Expand Up @@ -50,8 +50,8 @@ and has built-in support for case-control experiments (e.g. tumor/normal, or
trios or quads). In case/control mode, any number of cases and controls (but
min of 1 case) can be input, and will jointly assemble all sequences together.
If both a case and control are present, variants are output separately in
"somatic" and "germline" VCFs. If only a single BAM is present (input with the
-t flag), a single SV and a single indel VCF will be emitted.
"somatic" and "germline" VCFs. If only a single BAM/CRAM is present (input with
the -t flag), a single SV and a single indel VCF will be emitted.

A BWA-MEM index reference genome must also be supplied with -G.

Expand All @@ -65,10 +65,10 @@ potential pathogenicity and ii) filter out SV potential false positives.
## Input Files

At the moment the workflow uses only a few inputs:
- `germline_bam`: The germline BAM input that has been aligned to a reference
genome.
- `germline_reads`: The germline BAM/CRAM input that has been aligned to a
reference genome.
- `indexed_reference_fasta`: The reference genome fasta (and associated
indicies) to which the germline BAM was aligned.
indicies) to which the germline BAM/CRAM was aligned.
- `annotsv_annotations_dir`: These annotations are simply those from the
install-human-annotation installation process run during AnnotSV installation
(see: https://github.com/lgmgeo/AnnotSV/#quick-installation). Specifically
Expand Down
42 changes: 24 additions & 18 deletions tools/svaba.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ doc: |
trios or quads). In case/control mode, any number of cases and controls (but
min of 1 case) can be input, and will jointly assemble all sequences together.
If both a case and control are present, variants are output separately in
"somatic" and "germline" VCFs. If only a single BAM is present (input with the
"somatic" and "germline" VCFs. If only a single SAM is present (input with the
-t flag), a single SV and a single indel VCF will be emitted.

A BWA-MEM index reference genome must also be supplied with -G.
Expand All @@ -30,9 +30,15 @@ requirements:
- class: ResourceRequirement
ramMin: $(inputs.ram * 1000)
coresMin: $(inputs.cores)
baseCommand: ['svaba','run']
baseCommand: []
arguments:
- position: 0
shellQuote: false
valueFrom: >
seq_cache_populate.pl -r $PWD/ref_cache $(inputs.reference_genome.path)
&& export REF_CACHE=$PWD/ref_cache/%2s/%2s/%s
&& svaba run
- position: 11
valueFrom: "--g-zip"
inputs:
tumor_bams:
Expand All @@ -41,9 +47,9 @@ inputs:
items: File
inputBinding:
prefix: '--tumor-bam'
secondaryFiles: [{pattern: '^.bai', required: true}]
secondaryFiles: [{pattern: '^.bai', required: false}, {pattern: '.bai', required: false}, {pattern: '^.crai', required: false}, {pattern: '.crai', required: false}]
inputBinding:
position: 0
position: 11
doc: "Case BAM/CRAM/SAM file (eg tumor). Can input multiple."
normal_bams:
type:
Expand All @@ -52,26 +58,26 @@ inputs:
items: File
inputBinding:
prefix: '--normal-bam'
secondaryFiles: [{pattern: '^.bai', required: true}]
secondaryFiles: [{pattern: '^.bai', required: false}, {pattern: '.bai', required: false}, {pattern: '^.crai', required: false}, {pattern: '.crai', required: false}]
inputBinding:
position: 0
position: 11
doc: "Control BAM/CRAM/SAM file (eg normal). Can input multiple. Optional."
reference_genome: { type: 'File', secondaryFiles: [{pattern: '.fai', required: true}, {pattern: '.64.amb', required: true}, {pattern: '.64.ann', required: true}, {pattern: '.64.bwt', required: true}, {pattern: '.64.pac', required: true}, {pattern: '.64.sa', required: true}], inputBinding: { prefix: '--reference-genome', position: 0 }, doc: "Path to indexed reference genome to be used by BWA-MEM." }
reference_genome: { type: 'File', secondaryFiles: [{pattern: '.fai', required: true}, {pattern: '.64.amb', required: true}, {pattern: '.64.ann', required: true}, {pattern: '.64.bwt', required: true}, {pattern: '.64.pac', required: true}, {pattern: '.64.sa', required: true}], inputBinding: { prefix: '--reference-genome', position: 11 }, doc: "Path to indexed reference genome to be used by BWA-MEM." }

dbsnp_vcf: { type: 'File?', inputBinding: { prefix: '--dbsnp-vcf', position: 0 }, doc: "DBsnp database (VCF) to compare indels against" }
region_file: { type: 'File?', inputBinding: { prefix: '--region-file', position: 0 }, doc: "Run on targeted intervals. Accepts BED file" }
blacklist: { type: 'File?', inputBinding: { prefix: '--blacklist', position: 0 }, doc: "BED-file with blacklisted regions to not extract any reads from." }
germline_sv_database: { type: 'File?', inputBinding: { prefix: '--germline-sv-database', position: 0 }, doc: "BED file containing sites of known germline SVs. Used as additional filter for somatic SV detection." }
dbsnp_vcf: { type: 'File?', inputBinding: { prefix: '--dbsnp-vcf', position: 11 }, doc: "DBsnp database (VCF) to compare indels against" }
region_file: { type: 'File?', inputBinding: { prefix: '--region-file', position: 11 }, doc: "Run on targeted intervals. Accepts BED file" }
blacklist: { type: 'File?', inputBinding: { prefix: '--blacklist', position: 11 }, doc: "BED-file with blacklisted regions to not extract any reads from." }
germline_sv_database: { type: 'File?', inputBinding: { prefix: '--germline-sv-database', position: 11 }, doc: "BED file containing sites of known germline SVs. Used as additional filter for somatic SV detection." }

germline: { type: 'boolean?', inputBinding: { prefix: '--germline', position: 0}, doc: "Sets recommended settings for case-only analysis (eg germline). (-I, -L5, assembles NM >= 3 reads)" }
rules: { type: 'boolean?', inputBinding: { valueFrom: "$(self ? '--rules all' : '')", position: 0 }, doc: "Default behavior is just assemble clipped/discordant/unmapped/gapped reads. Override?" }
highly_parallel: { type: 'boolean?', inputBinding: { prefix: '--hp', position: 0 }, doc: "Highly parallel. Don't write output until completely done. More memory, but avoids all thread-locks." }
no_interchrom_lookup: { type: 'boolean?', inputBinding: { prefix: '--no-interchrom-lookup', position: 0 }, doc: "Set true to not do mate-region lookup if mates are mapped to different chromosome." }
germline: { type: 'boolean?', inputBinding: { prefix: '--germline', position: 11}, doc: "Sets recommended settings for case-only analysis (eg germline). (-I, -L5, assembles NM >= 3 reads)" }
rules: { type: 'boolean?', inputBinding: { valueFrom: "$(self ? '--rules all' : '')", position: 11 }, doc: "Default behavior is just assemble clipped/discordant/unmapped/gapped reads. Override?" }
highly_parallel: { type: 'boolean?', inputBinding: { prefix: '--hp', position: 11 }, doc: "Highly parallel. Don't write output until completely done. More memory, but avoids all thread-locks." }
no_interchrom_lookup: { type: 'boolean?', inputBinding: { prefix: '--no-interchrom-lookup', position: 11 }, doc: "Set true to not do mate-region lookup if mates are mapped to different chromosome." }

mate_lookup_min: { type: 'int?', inputBinding: { prefix: '--mate-lookup-min', position: 0 }, doc: "Minimum number of somatic reads required to attempt mate-region lookup" }
mate_lookup_min: { type: 'int?', inputBinding: { prefix: '--mate-lookup-min', position: 11 }, doc: "Minimum number of somatic reads required to attempt mate-region lookup" }

output_basename: { type: 'string', inputBinding: { prefix: '--id-string', position: 0 }, doc: "String specifying the analysis ID to be used as part of ID common." }
cores: { type: 'int?', default: 16, inputBinding: { prefix: '--threads', position: 0 }, doc: "Use NUM threads to run svaba." }
output_basename: { type: 'string', inputBinding: { prefix: '--id-string', position: 11 }, doc: "String specifying the analysis ID to be used as part of ID common." }
cores: { type: 'int?', default: 16, inputBinding: { prefix: '--threads', position: 11 }, doc: "Use NUM threads to run svaba." }
ram: { type: 'int?', default: 16, doc: "Minimum ram to allocate to the task." }
outputs:
alignments: { type: 'File', outputBinding: { glob: "*.alignments.txt.gz" }, doc: "An ASCII plot of variant-supporting contigs and the BWA-MEM alignment of reads to the contigs" }
Expand Down
24 changes: 13 additions & 11 deletions workflows/kfdrc-germline-sv-wf.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ doc: |

The Kids First Data Resource Center (KFDRC) Germline Structural Variant (SV)
Caller Workflow is a common workflow language (CWL) implmentation to generate
SV calls from an aligned reads BAM file. The workflow makes use of Manta and
SvABA to call varaiants then annotates these variants using AnnotSV.
SV calls from an aligned reads BAM or CRAM file. The workflow makes use of
Manta and SvABA to call varaiants then annotates these variants using AnnotSV.

## Relevant Softwares and Versions

Expand Down Expand Up @@ -55,8 +55,8 @@ doc: |
trios or quads). In case/control mode, any number of cases and controls (but
min of 1 case) can be input, and will jointly assemble all sequences together.
If both a case and control are present, variants are output separately in
"somatic" and "germline" VCFs. If only a single BAM is present (input with the
-t flag), a single SV and a single indel VCF will be emitted.
"somatic" and "germline" VCFs. If only a single BAM/CRAM is present (input with
the -t flag), a single SV and a single indel VCF will be emitted.

A BWA-MEM index reference genome must also be supplied with -G.

Expand All @@ -70,10 +70,10 @@ doc: |
## Input Files

At the moment the workflow uses only a few inputs:
- `germline_bam`: The germline BAM input that has been aligned to a reference
genome.
- `germline_reads`: The germline BAM/CRAM input that has been aligned to a
reference genome.
- `indexed_reference_fasta`: The reference genome fasta (and associated
indicies) to which the germline BAM was aligned.
indicies) to which the germline BAM/CRAM was aligned.
- `annotsv_annotations_dir`: These annotations are simply those from the
install-human-annotation installation process run during AnnotSV installation
(see: https://github.com/lgmgeo/AnnotSV/#quick-installation). Specifically
Expand Down Expand Up @@ -136,8 +136,10 @@ inputs:
{class: File, path: 6063901d357c3a53540ca81e, name: Homo_sapiens_assembly38.fasta.64.bwt},
{class: File, path: 6063901c357c3a53540ca801, name: Homo_sapiens_assembly38.fasta.64.pac},
{class: File, path: 60639015357c3a53540ca7a9, name: Homo_sapiens_assembly38.fasta.64.sa}]}
germline_bam: {type: 'File', secondaryFiles: [{pattern: '^.bai', required: false},
{pattern: '.bai', required: false}], doc: "Input BAM file", "sbg:fileTypes": "BAM"}
germline_reads: {type: 'File', secondaryFiles: [{pattern: '^.bai', required: false},
{pattern: '.bai', required: false}, {pattern: '^.crai', required: false}, {
pattern: '.crai', required: false}], doc: "Input BAM file", "sbg:fileTypes": "BAM,\
\ CRAM"}

annotsv_annotations_dir: {type: 'File', doc: "TAR.GZ'd Directory containing AnnotSV\
\ annotations", "sbg:fileTypes": "TAR, TAR.GZ, TGZ", "sbg:suggestedValue": {
Expand Down Expand Up @@ -178,7 +180,7 @@ steps:
run: ../tools/svaba.cwl
in:
tumor_bams:
source: germline_bam
source: germline_reads
valueFrom: $([self])
reference_genome: indexed_reference_fasta
germline:
Expand All @@ -197,7 +199,7 @@ steps:
in:
reference: indexed_reference_fasta
input_normal_reads:
source: germline_bam
source: germline_reads
valueFrom: $([self])
output_basename:
source: output_basename
Expand Down

0 comments on commit a9de7f5

Please sign in to comment.