Skip to content

Commit

Permalink
📝 updated readme to reflect new updates
Browse files Browse the repository at this point in the history
🔨 added param to allow for cram re-mapping
🔨 edited all related tools and subwf to accept new cram input
  • Loading branch information
migbro committed Aug 22, 2022
1 parent 74fb0da commit 637e33f
Show file tree
Hide file tree
Showing 10 changed files with 168 additions and 64 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ For more information see: https://github.com/kids-first/kf-alignment-workflow#ou
## Sentieon Alignment: Similarities and Differences

The two workflows start identically; both workflows start by splitting the
input BAMs into read group (RG) BAMs using samtools split then convert those RG
BAMs into FASTQ files using biobambam2 bamtofastq. After FASTQ creation, the
input SAMs/BAMs/CRAMs (Alignment/Map files, or AMs) into read group (RG) AMs using samtools split then convert those RG
AMs into FASTQ files using biobambam2 bamtofastq. After FASTQ creation, the
two workflows diverge in software usage. Whereas the KFDRC GATK pipeline uses a
wide variety of tools (bwa, sambamba, samblaster, GATK, Picard, and samtools)
to generate the realigned CRAMs, the KFDRC Sentieon pipeline uses exclusively
Expand Down
15 changes: 8 additions & 7 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ this can be used later on for further analysis in joint trio genotyping and subs
This workflow is the current production workflow, equivalent to this [Cavatica public app](https://cavatica.sbgenomics.com/public/apps#cavatica/apps-publisher/kfdrc-alignment-workflow) and supersedes the [old workflow](https://github.com/kids-first/kf-alignment-workflow/tree/1.0.0) and [public app](https://cavatica.sbgenomics.com/public/apps#kids-first-drc/kids-first-drc-alignment-workflow/kfdrc-alignment-bam2cram2gvcf/); however outputs are considered equivalent.

## Input Agnostic Alignment Workflow
Workflow for the alignment or realignment of input BAMs, PE reads, and/or SE reads; conditionally generate gVCF and metrics.
Workflow for the alignment or realignment of input SAMs/BAMs/CRAMs (Alignment/Map files, or AMs), PE reads, and/or SE reads; conditionally generate gVCF and metrics.

This workflow is a all-in-one workflow for handling any kind of reads inputs: BAM inputs, PE reads
and mates inputs, SE reads inputs, or any combination of these. The workflow will naively attempt
Expand Down Expand Up @@ -68,6 +68,7 @@ to `true`; no additonal inputs are required.
input_se_rgs_list: { type: 'string[]?', doc: "List of RG strings to use in SE processing" }
run_bam_processing: { type: boolean, doc: "BAM processing will be run. Requires: input_bam_list" }
run_pe_reads_processing: { type: boolean, doc: "PE reads processing will be run. Requires: input_pe_reads_list, input_pe_mates_list, input_pe_rgs_list" }
cram_reference: { type: 'File?', doc: "If aligning from cram, need to provided reference used to generate that cram" }
run_se_reads_processing: { type: boolean, doc: "SE reads processing will be run. Requires: input_se_reads_list, input_se_rgs_list" }
# IF WGS or CREATE gVCF
wgs_calling_interval_list: { type: 'File?', doc: "WGS interval list used to aid scattering Haplotype caller" }
Expand Down Expand Up @@ -113,18 +114,18 @@ to `true`; no additonal inputs are required.
#### Detailed Input Information:
The pipeline is build to handle three distinct input types:
1. BAMs
1. SAMs/BAMs/CRAMs (Alignment/Map files, or AMs)
1. PE Fastqs
1. SE Fastqs
Additionally, the workflow supports these three in any combination. You can have PE Fastqs and BAMs,
PE Fastqs and SE Fastqs, BAMS and PE Fastqs and SE Fastqs, etc. Each of these three classes will be
Additionally, the workflow supports these three in any combination. You can have PE Fastqs and AMs,
PE Fastqs and SE Fastqs, AMs and PE Fastqs and SE Fastqs, etc. Each of these three classes will be
procsessed and aligned separately and the resulting BWA aligned bams will be merged into a final BAM
before performing steps like BQSR and Metrics collection.
##### BAM Inputs
The BAM processing portion of the pipeline is the simplest when it comes to inputs. You may provide
a single BAM or many BAMs. The input for BAMs is a file list. In Cavatica or other GUI interfaces,
##### Alignment/Map Inputs
The Alignment/Map processing portion of the pipeline is the simplest when it comes to inputs. You may provide
a single Alignment/Map file or many AMs. The input for AMs is a file list. In Cavatica or other GUI interfaces,
simply select the files you wish to process. For command line interfaces such as cwltool, your input
should look like the following.
```json
Expand Down
3 changes: 3 additions & 0 deletions subworkflows/kfdrc_process_bam.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ inputs:
secondaryFiles: ['.64.amb', '.64.ann', '.64.bwt', '.64.pac', '.64.sa', '.64.alt', '^.dict']
sample_name: string
min_alignment_score: int?
cram_reference: { type: 'File?', doc: "If aligning from cram, need to provided reference used to generate that cram" }

outputs:
unsorted_bams:
type:
Expand All @@ -35,5 +37,6 @@ steps:
sample_name: sample_name
input_rgbam: samtools_split/bam_files
min_alignment_score: min_alignment_score
cram_reference: cram_reference
scatter: [input_rgbam]
out: [unsorted_bams] #+1 Nesting File[][]
3 changes: 3 additions & 0 deletions subworkflows/kfdrc_process_bamlist.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ inputs:
sample_name: string
conditional_run: int
min_alignment_score: int?
cram_reference: { type: 'File?', doc: "If aligning from cram, need to provided reference used to generate that cram" }

outputs:
unsorted_bams:
type:
Expand All @@ -32,6 +34,7 @@ steps:
indexed_reference_fasta: indexed_reference_fasta
sample_name: sample_name
min_alignment_score: min_alignment_score
cram_reference: cram_reference
scatter: input_bam
out: [unsorted_bams] #+2 Nesting File[][][]

Expand Down
7 changes: 4 additions & 3 deletions subworkflows/kfdrc_rgbam_to_realnbam.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ inputs:
secondaryFiles: ['.64.amb', '.64.ann', '.64.bwt', '.64.pac', '.64.sa', '.64.alt', '^.dict']
sample_name: string
min_alignment_score: int?
cram_reference: { type: 'File?', doc: "If aligning from cram, need to provided reference used to generate that cram" }

outputs:
unsorted_bams:
type: File[]
Expand All @@ -19,8 +21,8 @@ steps:
bamtofastq_chomp:
run: ../tools/bamtofastq_chomp.cwl
in:
input_bam: input_rgbam
# sample: sample_name
input_align: input_rgbam
reference: cram_reference
out: [output, rg_string]

expression_updatergsample:
Expand All @@ -37,7 +39,6 @@ steps:
reads: bamtofastq_chomp/output
interleaved:
default: true
# rg: bamtofastq_chomp/rg_string
rg: expression_updatergsample/rg_str
min_alignment_score: min_alignment_score
scatter: [reads]
Expand Down
5 changes: 4 additions & 1 deletion subworkflows/rgbam_to_bwa_payload.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ requirements:
inputs:
input_rgbam: File
sample_name: string
cram_reference: { type: 'File?', doc: "Fasta file if input is cram", secondaryFiles: [.fai] }

outputs:
bwa_payload:
type:
Expand Down Expand Up @@ -42,7 +44,8 @@ steps:
bamtofastq:
run: ../tools/biobambam_bamtofastq.cwl
in:
input_bam: input_rgbam
input_align: input_rgbam
reference: cram_reference
out: [output]

clt_prepare_bwa_payload:
Expand Down
43 changes: 24 additions & 19 deletions tools/bamtofastq_chomp.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -23,33 +23,38 @@ arguments:
valueFrom: |-
set -eo pipefail

samtools view -H $(inputs.input_bam.path) | grep ^@RG > rg.txt
samtools view -H $(inputs.input_align.path) | grep ^@RG > rg.txt

if [ $(inputs.input_bam.size) -gt $(inputs.max_size) ]; then
bamtofastq tryoq=1 filename=$(inputs.input_bam.path) | split -dl 680000000 - reads-
EXT=$(inputs.input_align.nameext.toLowerCase().substr(1))

if [ $(inputs.input_align.size) -gt $(inputs.max_size) ]; then
bamtofastq tryoq=1 filename=$(inputs.input_align.path) inputformat=$EXT ${
if (inputs.reference != null){
return "reference=" + inputs.reference.path;
}
else{
return "";
}
} | split -dl 680000000 - reads-
ls reads-* | xargs -i mv {} {}.fq
else
bamtofastq tryoq=1 filename=$(inputs.input_bam.path) > reads-00.fq
bamtofastq tryoq=1 filename=$(inputs.input_align.path) inputformat=$EXT ${
if (inputs.reference != null){
return "reference=" + inputs.reference.path;
}
else{
return "";
}
} > reads-00.fq
fi
inputs:
input_bam: { type: File, doc: "Input bam file" }
max_size: { type: long, default: 20000000000, doc: "The maximum size (in bytes) that an input bam can be before the FASTQ is split" }
# sample: { type: string, doc: "String name of the sample used to relabel the rg string" }
input_align: { type: File, doc: "Input alignment file" }
max_size: { type: 'long?', default: 20000000000, doc: "The maximum size (in bytes) that an input bam can be before the FASTQ is split" }
reference: { type: 'File?', doc: "Fasta file if input is cram", secondaryFiles: [.fai] }

outputs:
output: { type: 'File[]', outputBinding: { glob: '*.fq' } }
rg_string:
# type: string
type: File
outputBinding:
glob: rg.txt
# loadContents: true
# outputEval:
# ${
# var arr = self[0].contents.split('\n')[0].split('\t');
# for (var i=1; i<arr.length; i++){
# if (arr[i].startsWith('SM')){
# arr[i] = 'SM:' + inputs.sample;
# }
# }
# return arr.join('\\t');
# }
10 changes: 8 additions & 2 deletions tools/biobambam_bamtofastq.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,15 @@ arguments:
- position: 0
shellQuote: false
valueFrom: |-
bamtofastq tryoq=1 filename=$(inputs.input_bam.path) > reads-00.fq
bamtofastq tryoq=1 inputformat=$(inputs.input_align.nameext.toLowerCase().substr(1))
- position: 2
shellQuote: false
valueFrom: |-
> reads-00.fq
inputs:
input_bam: { type: File, doc: "Input bam file" }
input_align: { type: File, doc: "Input alignment file", inputBinding: { position: 1, prefix: "filename=", separate: false } }
reference: { type: 'File?', doc: "Fasta file if input is cram", secondaryFiles: [.fai],
inputBinding: { position: 1, prefix: "reference=", separate: false } }
outputs:
output: { type: 'File', outputBinding: { glob: '*.fq' } }

Expand Down
Loading

0 comments on commit 637e33f

Please sign in to comment.