Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,10 @@

> Felix Krueger, Simon R. Andrews, Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications, Bioinformatics, Volume 27, Issue 11, 1 June 2011, Pages 1571–1572, doi: [10.1093/bioinformatics/btr167](https://doi.org/10.1093/bioinformatics/btr167)

- [BWA-MEM](https://arxiv.org/abs/1303.3997v2)

> Li H: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 2013. doi: 10.48550/arXiv.1303.3997

- [bwa-meth](https://arxiv.org/abs/1401.1129)

> Pedersen, Brent S. and Eyring, Kenneth and De, Subhajyoti and Yang, Ivana V. and Schwartz, David A. Fast and accurate alignment of long bisulfite-seq reads, arXiv:1401.1129, doi: [10.48550/arXiv.1401.1129](https://doi.org/10.48550/arXiv.1401.1129)
Expand All @@ -49,6 +53,8 @@

> Daley, T., Smith, A. Predicting the molecular complexity of sequencing libraries. Nat Methods 10, 325–327 (2013). doi: [10.1038/nmeth.2375](https://doi.org/10.1038/nmeth.2375)

- [rastair](https://bitbucket.org/bsblabludwig/rastair/src/master/)

- [Samtools](https://doi.org/10.1093/gigascience/giab008)

> Petr Danecek, James K Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A McCarthy, Robert M Davies, Heng Li, Twelve years of SAMtools and BCFtools, GigaScience, Volume 10, Issue 2, February 2021, giab008, doi: [10.1093/gigascience/giab008](https://doi.org/10.1093/gigascience/giab008)
Expand Down
32 changes: 16 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,26 +32,26 @@ On release, automated continuous integration tests run the pipeline on a full-si

## Pipeline Summary

The pipeline allows you to choose between running either [Bismark](https://github.com/FelixKrueger/Bismark) or [bwa-meth](https://github.com/brentp/bwa-meth) / [MethylDackel](https://github.com/dpryan79/methyldackel).
The pipeline allows you to choose between running either [Bismark](https://github.com/FelixKrueger/Bismark), [bwa-meth](https://github.com/brentp/bwa-meth) / [MethylDackel](https://github.com/dpryan79/methyldackel) or [BWA-Mem](https://github.com/lh3/bwa) plus [rastair](https://bitbucket.org/bsblabludwig/rastair/src/master/) for for TAPS data processing. rastair can also be used with bwa-meth aligned reads by setting the aligner to `--aligner bwameth` and adding the flag `--taps`.

Choose between workflows by using `--aligner bismark` (default, uses bowtie2 for alignment), `--aligner bismark_hisat` or `--aligner bwameth`. For higher performance, the pipeline can leverage the [Parabricks implementation of bwa-meth (fq2bammeth)](https://docs.nvidia.com/clara/parabricks/latest/documentation/tooldocs/man_fq2bam_meth.html), which implements the baseline tool `bwa-meth` in a performant method using fq2bam (BWA-MEM + GATK) as a backend for processing on GPU. To use this option, include the `gpu` profile along with `--aligner bwameth`.
Choose between workflows by using `--aligner bismark` (default, uses bowtie2 for alignment), `--aligner bismark_hisat`, `--aligner bwameth` or `--aligner bwamem`. For higher performance, the pipeline can leverage the [Parabricks implementation of bwa-meth (fq2bammeth)](https://docs.nvidia.com/clara/parabricks/latest/documentation/tooldocs/man_fq2bam_meth.html) and the [Parabricks implementation of bwa-mem (fq2bammemh)](https://docs.nvidia.com/clara/parabricks/latest/documentation/tooldocs/man_fq2bam.html), which implement the baseline tools `bwa-meth` and `bwa-mem`. To use this option, include the `gpu` profile along with `--aligner bwameth` or `--aligner bwamem`.

Note: For faster CPU runs with BWA-Meth, enable the BWA-MEM2 algorithm using `--use_mem2`. The GPU pathway (Parabricks) requires `-profile gpu` and a container runtime (Docker, Singularity, or Podman); Conda/Mamba are not supported for the GPU module.

| Step | Bismark workflow | bwa-meth workflow |
| -------------------------------------------- | ------------------------ | --------------------- |
| Generate Reference Genome Index _(optional)_ | Bismark | bwa-meth |
| Merge re-sequenced FastQ files | cat | cat |
| Raw data QC | FastQC | FastQC |
| Adapter sequence trimming | Trim Galore! | Trim Galore! |
| Align Reads | Bismark (bowtie2/hisat2) | bwa-meth |
| Deduplicate Alignments | Bismark | Picard MarkDuplicates |
| Extract methylation calls | Bismark | MethylDackel |
| Sample report | Bismark | - |
| Summary Report | Bismark | - |
| Alignment QC | Qualimap _(optional)_ | Qualimap _(optional)_ |
| Sample complexity | Preseq _(optional)_ | Preseq _(optional)_ |
| Project Report | MultiQC | MultiQC |
| Step | Bismark workflow | bwa-meth workflow | bwa-mem + TAPS workflow |
| -------------------------------------------- | ------------------------ | --------------------- | ------------------------------- |
| Generate Reference Genome Index _(optional)_ | Bismark | bwa-meth | bwa index |
| Merge re-sequenced FastQ files | cat | cat | cat |
| Raw data QC | FastQC | FastQC | FastQC |
| Adapter sequence trimming | Trim Galore! | Trim Galore! | Trim Galore! |
| Align Reads | Bismark (bowtie2/hisat2) | bwa-meth | bwa mem |
| Deduplicate Alignments | Bismark | Picard MarkDuplicates | Picard MarkDuplicates |
| Extract methylation calls | Bismark | MethylDackel | TAPS subworkflow (rastair) |
| Sample report | Bismark | - | - |
| Summary Report | Bismark | - | - |
| Alignment QC | Qualimap _(optional)_ | Qualimap _(optional)_ | Qualimap _(optional)_ |
| Sample complexity | Preseq _(optional)_ | Preseq _(optional)_ | Preseq _(optional)_ |
| Project Report | MultiQC | MultiQC | MultiQC |

Optional targeted sequencing analysis is available via `--run_targeted_sequencing` and `--target_regions_file`; see the [usage documentation](https://nf-co.re/methylseq/usage) for details.

Expand Down
2 changes: 1 addition & 1 deletion assets/samplesheet.csv
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ sample,fastq_1,fastq_2,genome
SRR389222_sub1,https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub1.fastq.gz,,
SRR389222_sub2,https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub2.fastq.gz,,
SRR389222_sub3,https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub3.fastq.gz,,
Ecoli_10K_methylated,https://github.com/nf-core/test-datasets/raw/methylseq/testdata/Ecoli_10K_methylated_R1.fastq.gz,https://github.com/nf-core/test-datasets/raw/methylseq/testdata/Ecoli_10K_methylated_R2.fastq.gz,
Ecoli_10K_methylated,https://github.com/nf-core/test-datasets/raw/methylseq/testdata/Ecoli_10K_methylated_R1.fastq.gz,https://github.com/nf-core/test-datasets/raw/methylseq/testdata/Ecoli_10K_methylated_R2.fastq.gz,
3 changes: 3 additions & 0 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -81,4 +81,7 @@ process {
withName: PARABRICKS_FQ2BAMMETH {
memory = { 100.GB * task.attempt }
}
withName: PARABRICKS_FQ2BAM {
memory = { 100.GB * task.attempt }
}
}
5 changes: 5 additions & 0 deletions conf/modules/gatk_removeduplicates.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
process {
withName: GATK4_REMOVEDUPLICATES {
ext.args = "--REMOVE_DUPLICATES true --TAG_DUPLICATE_SET_MEMBERS true"
}
}
8 changes: 8 additions & 0 deletions conf/modules/picard_addorreplacereadgroups.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
process {
withName: PICARD_ADDORREPLACEREADGROUPS {
ext.args = "--RGID 1 --RGLB lib1 --RGPL illumina --RGPU unit1 --RGSM sample1"
}
}



18 changes: 18 additions & 0 deletions conf/modules/picard_removeduplicates.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
process {
withName: PICARD_REMOVEDUPLICATES {
ext.args = "--ASSUME_SORTED true --REMOVE_DUPLICATES true --VALIDATION_STRINGENCY LENIENT --PROGRAM_RECORD_ID 'null' --TMP_DIR tmp"
ext.prefix = { "${meta.id}.dedup.sorted" }
publishDir = [
[
path: { "${params.outdir}/${params.aligner}/deduplicated/picard_metrics" },
pattern: "*.metrics.txt",
mode: params.publish_dir_mode
],
[
path: { "${params.outdir}/${params.aligner}/deduplicated" },
pattern: "*.bam",
mode: params.publish_dir_mode
]
]
}
}
20 changes: 20 additions & 0 deletions conf/modules/rastair_call.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
process {
withName: RASTAIR_CALL {
ext.args = [
// Pending the resolution of the mbias_parse process
params.trim_OT ?: '0,0,10,0', // [r1_start, r1_end, r2_start, r2_end]
params.trim_OB ?: '0,0,10,0' // [r1_start, r1_end, r2_start, r2_end]
].join(" ").trim()
publishDir = [
[
path: { "${params.outdir}/rastair/call" },
mode: params.publish_dir_mode,
pattern: "*.txt"
], [
path: { "${params.outdir}/rastair/call" },
mode: params.publish_dir_mode,
pattern: "*.gz"
]
]
}
}
11 changes: 11 additions & 0 deletions conf/modules/rastair_mbias.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
process {
withName: RASTAIR_MBIAS {
publishDir = [
[
path: { "${params.outdir}/rastair/mbias" },
mode: params.publish_dir_mode,
pattern: "*.txt"
]
]
}
}
11 changes: 11 additions & 0 deletions conf/modules/rastair_mbias_parser.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
process {
withName: RASTAIR_MBIAS_PARSER {
publishDir = [
[
path: { "${params.outdir}/rastair/mbias_parser" },
mode: params.publish_dir_mode,
pattern: "*rastair_mbias_processed*"
]
]
}
}
11 changes: 11 additions & 0 deletions conf/modules/rastair_methylkit.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
process {
withName: CONVERT_TO_METHYLKIT {
publishDir = [
[
path: { "${params.outdir}/rastair/methylkit" },
mode: params.publish_dir_mode,
pattern: "*.txt.gz"
]
]
}
}
4 changes: 4 additions & 0 deletions conf/subworkflows/fasta_index_bismark_bwameth_bwamem.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
includeConfig "../modules/gunzip.config"
includeConfig "../modules/bismark_genomepreparation.config"
includeConfig "../modules/samtools_faidx.config"
includeConfig "../modules/bwameth_index.config"
2 changes: 2 additions & 0 deletions conf/subworkflows/fastq_align_dedup_bwamem.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
includeConfig "../modules/picard_addorreplacereadgroups.config"
includeConfig "../modules/picard_removeduplicates.config"
2 changes: 0 additions & 2 deletions conf/subworkflows/fastq_align_dedup_bwameth.config
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ includeConfig "../modules/samtools_sort.config"
includeConfig "../modules/samtools_flagstat.config"
includeConfig "../modules/samtools_stats.config"
includeConfig "../modules/picard_markduplicates.config"
includeConfig "../modules/methyldackel_extract.config"
includeConfig "../modules/methyldackel_mbias.config"

process {

Expand Down
2 changes: 2 additions & 0 deletions conf/subworkflows/methyldackel.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
includeConfig "../modules/methyldackel_extract.config"
includeConfig "../modules/methyldackel_mbias.config"
4 changes: 4 additions & 0 deletions conf/subworkflows/taps_conversion.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
includeConfig "../modules/rastair_call.config"
includeConfig "../modules/rastair_mbias.config"
includeConfig "../modules/rastair_mbias_parser.config"
includeConfig "../modules/rastair_methylkit.config"
Binary file added docs/images/nf-core-methylseq-taps.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
57 changes: 56 additions & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This document describes the output produced by the methylseq pipeline.

Most of the plots are taken from the MultiQC report, which summarizes results at the end of the pipeline.

> NOTE: nf-core/methylseq contains two workflows - one for Bismark, one for bwa-meth. The results files produced will vary depending on which variant is run.
> NOTE: nf-core/methylseq contains three alignment workflows - one for Bismark, one for bwa-meth and one for bwa-mem. On top of that, there is an extra workflow to process conversion rates from TAPS data (protocol for positive methylation reading) through Rastair. The results files produced will vary depending on which variant is run.

The output directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.

Expand Down Expand Up @@ -85,6 +85,47 @@ bwameth/
└── logs
```

#### bwa-mem

```
bwamem/
├── bwamem
│ ├── alignments
│ └── deduplicated
├── fastqc
│ ├── Ecoli_10K_methylated_1_fastqc.html
│ ├── Ecoli_10K_methylated_2_fastqc.html
│ └── zips
├── multiqc
│ └── bwamem
├── pipeline_info
│ ├── execution_report_2024-12-13_05-36-34.html
│ ├── execution_timeline_2024-12-13_05-36-34.html
│ ├── execution_trace_2024-12-13_05-36-34.txt
│ ├── nf_core_methylseq_software_mqc_versions.yml
│ ├── params_2024-12-13_05-36-43.json
│ └── pipeline_dag_2024-12-13_05-36-34.html
└── trimgalore
├── fastqc
└── logs
```

#### rastair

```
rastair
├── call
| └── Ecoli_10K_methylated.markdup.sorted_CpG.rastair_call.tsv
├── mbias
| └── Ecoli_10K_methylated.markdup.sorted_CpG.rastair_mbias.tsv
├── mbias_parser
│ ├── Ecoli_10K_methylated.markdup.sorted_CpG.rastair_mbias_processed.txt
│ ├── Ecoli_10K_methylated.markdup.sorted_CpG.rastair_mbias_processed.csv
│ └── Ecoli_10K_methylated.markdup.sorted_CpG.rastair_mbias_processed.pdf
├── methylkit
| └── Ecoli_10K_methylated.markdup.sorted_CpG.rastair_methylkit.txt.gz
```

### Detailed Output Descriptions

### Reference Genome Preparation
Expand Down Expand Up @@ -174,6 +215,9 @@ _Note that bismark can use either use Bowtie2 (default) or HISAT2 as alignment t
- `logs/samtools_stats/sample_stats.txt`
- Summary file giving lots of metrics about the aligned BAM file.

**bwa-mem output directory: `results/bwamem/alignments/`**
# TODO
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO


### Deduplication

This step removes alignments with identical mapping position to avoid technical duplication in the results. Note that it is skipped if `--save_align_intermeds`, `--skip_deduplication` or `--rrbs` is specified when running the pipeline.
Expand All @@ -196,6 +240,9 @@ This step removes alignments with identical mapping position to avoid technical
- `logs/sample.sorted.markDups_metrics.txt`
- Log file giving summary statistics about deduplication.

**bwa-mem output directory: `results/bwamem/deduplicated/`**
# TODO
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO


### Methylation Extraction

The methylation extractor step takes a BAM file with aligned reads and generates files containing cytosine methylation calls. It produces a few different output formats, described below.
Expand Down Expand Up @@ -231,6 +278,9 @@ Filename abbreviations stand for the following reference alignment strands:
- `sample.bedGraph`
- Methylation statuses in [bedGraph](http://genome.ucsc.edu/goldenPath/help/bedgraph.html) format.

**bwa-mem / TAPS workflow output directory: `results/rastair/`**
# TODO
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO


### Targeted Sequencing

If `--run_targeted_sequencing` is set to `true`, the pipeline performs additional analysis for targeted sequencing experiments.
Expand All @@ -241,6 +291,7 @@ BedGraph files are filtered using the BED file passed to `--target_regions_file`

**Bismark output directory: `results/bismark/methylation_calls/bedGraph/`**
**bwa-meth output directory: `results/methyldackel/`**
# TODO: implement filtering by `--target_regions_file` in bwa-mem

- `*.targeted.bedGraph`
- Methylation statuses in [bedGraph](http://genome.ucsc.edu/goldenPath/help/bedgraph.html) format, limited to the positions in the target regions BED file.
Expand All @@ -254,6 +305,10 @@ BedGraph files are filtered using the BED file passed to `--target_regions_file`
- `*.CollectHsMetrics.coverage_metrics`
- Text-based statistics showed also in the MultiQC report.


### Rastair
# TODO

### Bismark Reports

Bismark generates a HTML reports describing results for each sample, as well as a summary report for the whole run.
Expand Down
13 changes: 12 additions & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,17 @@ nextflow run nf-core/methylseq --aligner bwameth --use_mem2 --input samplesheet.
nextflow run nf-core/methylseq --aligner bwameth --use_mem2 --profile gpu --input samplesheet.csv --genome GRCh38
```

- `Parabricks/FQ2BAMMETH` (GPU-based): For higher performance, the pipeline can leverage the [Parabricks implementation of bwa-meth (fq2bammeth)](https://docs.nvidia.com/clara/parabricks/latest/documentation/tooldocs/man_fq2bam_meth.html), which implements the baseline tool `bwa-meth` in a performant method using fq2bam (BWA-MEM + GATK) as a backend for processing on GPU. To use this option, include the `gpu` profile (as in `--profile gpu`) along with `--aligner bwameth`.
- `Parabricks/FQ2BAMMETH` (GPU-based): For higher performance, the pipeline can leverage the [Parabricks implementation of bwa-meth (fq2bammeth)](https://docs.nvidia.com/clara/parabricks/latest/documentation/tooldocs/man_fq2bam_meth.html), which implements the baseline tool `bwa-meth`. To use this option, include the `gpu` profile (as in `--profile gpu`) along with `--aligner bwameth`.

### Workflow: BWA-Mem

The third workflow uses [BWA-Mem](https://github.com/lh3/bwa) as the alignment tool and [rastair](https://bitbucket.org/bsblabludwig/rastair/src/master/) for post-processing.

bwa-mem aligner options:

- Standard `bwa-mem` (CPU-based): This option can be invoked via `--aligner bwamem` and uses the traditional BWA-Mem aligner and runs on CPU processors.

- `Parabricks/FQ2BAM` (GPU-based): For higher performance, the pipeline can leverage the [Parabricks implementation of bwa-mem (fq2bam)](https://docs.nvidia.com/clara/parabricks/latest/documentation/tooldocs/man_fq2bam.html), which implements the baseline tool `bwa-mem`. To use this option, include the `gpu` profile (as in `--profile gpu`) along with `--aligner bwamem`.

> [!NOTE]
> The Parabricks module does not support Conda/Mamba. Use Docker, Singularity, or Podman.
Expand Down Expand Up @@ -258,6 +268,7 @@ For a detailed list of different options available, please refer to the official

- [Bismark](https://felixkrueger.github.io/Bismark/options/genome_preparation/)
- [bwa-meth](https://github.com/brentp/bwa-meth)
- [bwa-mem](https://github.com/lh3/bwa)

### Running the `test` profile

Expand Down
Loading
Loading