-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alignment optimization suggestions #47
Comments
I believe your missing the merge step from the proposed setup above. The PCAWG project had success using the biobambam2 suite of tools. Using those the setup would be: bamtofastq | BWA | bamsormadup | bammerge |
After some testing this is what is recommended: Extract RG info from header && Example command line looks like this:
We returned Picard SamToFastq since samtools bam2fq did not do a great job in conversion - a significant number of the reads were converted as single-end for some reason which might affect the final outcome. Extract RG info from header part is some linux maneuvering to get the RG info from input BAM header (using BWA -R parameter), since in the original setup the RG information was added to the output BAM with Picard MergeBAMAlignment. For one random pilot sample tested, average single BWA job duration is reduced from ~35min to ~28min and duplicate marking is done on-the-fly, saving nearly 5 hours needed for executing Picard Markduplicates. @adamstruck Merge step is not included in the pipe, since alignment step is scattered by read group, so merge comes in later as a separate step in the current flow. |
Inputs with a single read group are aligned twice as long because of using 18 threads. An expression tool should be added to autodetect number of read groups in input BAM and set threads: Done here bogdang989@b4a6228 |
@adamstruck I tested the biobambam bamtofastq and it indeed has the option to output interleaved fastq. What's even better it has the option to restore original quality scores before recalibration, thus fully allowing the use of samtools split instead of picard revertsam. Extract RG info from header && Added this change here: bogdang989@0e9d63c Still didn't test bamsormadup. |
@adamstruck do you know whether |
BWA Mem (scattered by read group)
Current setup:
Picard SamToFastq | BWA | Picard MergeBamAlignment
Proposed setup:
Samtools bam2fq | BWA |
Samblaster | (mark duplicates here and remove picard_markduplicates; It makes sense to mark duplicates only within the same read group and saves a lot of time by piping it with BWA) |
Sambamba view (sam to bam) |
Sambamba Sort
The text was updated successfully, but these errors were encountered: