Skip to content

Commit 6c2b504

Browse files
committed
Upgrading meta construct to also allow single-end data types to be passed through the pipeline
1 parent 4f31bd8 commit 6c2b504

File tree

14 files changed

+213
-75
lines changed

14 files changed

+213
-75
lines changed

conf/modules.config

+8
Original file line numberDiff line numberDiff line change
@@ -42,4 +42,12 @@ process {
4242
enabled: false
4343
]
4444
}
45+
withName: 'VSEARCH_FASTQFILTER|VSEARCH_FASTQMERGE' {
46+
publishDir = [
47+
path: { "${params.outdir}/biobloom" },
48+
mode: params.publish_dir_mode,
49+
enabled: false
50+
]
51+
}
52+
4553
}

docs/installation.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -26,4 +26,4 @@ The path specified with `--outdir` can then be given to the pipeline during norm
2626

2727
If you run on anything other than a local system, this pipeline requires a site-specific configuration file to be able to talk to your cluster or compute infrastructure. Nextflow supports a wide range of such infrastructures, including Slurm, LSF and SGE - but also Kubernetes and AWS. For more information, see [here](https://www.nextflow.io/docs/latest/executor.html).
2828

29-
Site-specific config-files for our pipeline ecosystem are stored centrally on [github](https://github.com/marchoeppner/configs). Please talk to us if you want to add your system
29+
Site-specific config-files for our pipeline ecosystem are stored centrally on [github](https://github.com/marchoeppner/nf-configs). Please talk to us if you want to add your system

docs/usage.md

+14-4
Original file line numberDiff line numberDiff line change
@@ -31,16 +31,26 @@ In this example, both `--reference_base` and the choice of software provisioning
3131

3232
# Options
3333

34-
## `--input samplesheet.csv` [default = null]
34+
## `--input samples.csv` [default = null]
3535

3636
This pipeline expects a CSV-formatted sample sheet to properly pull various meta data through the processes. The required format looks as follows:
3737

3838
```
39-
sample_id,library_id,readgroup_id,R1,R2
40-
S100,S100,AACYTCLM5.1.S100,/home/marc/projects/gaba/data/S100_R1.fastq.gz,/home/marc/projects/gaba/data/S100_R2.fastq.gz
39+
sample_id,library_id,readgroup_id,single_end,R1,R2
40+
S100,S100,AACYTCLM5.1.S100,false,/home/marc/projects/gaba/data/S100_R1.fastq.gz,/home/marc/projects/gaba/data/S100_R2.fastq.gz
4141
```
42+
The columns `sample_id` and `library_id` should be self-explanatory.
4243

43-
If you are unsure about the readgroup ID, just make sure that it is unique for the combination of library, flowcell and lane. Typically it would be constructed from these components - and the easiest way to get it is from the FastQ file itself (header of read 1, for example).
44+
If you are uncertain about `readgroup_id`, just make sure that it is unique for the combination of library, flowcell and lane. Typically it would be constructed from these components - and the easiest way to get it is from the FastQ file itself (header of read 1, for example).
45+
46+
```
47+
@VL00316:70:AACYTCLM5:1:1101:18686:1038 1:N:0:AAGCGGTGAA+AACCTAGACG
48+
```
49+
For a hypothetical library called "LIB100", this can be turned into the readgroup id: `AACYTCLM5.1.LIB100` - where `AACYTCLM5` is the ID of the flowcell, `1` is the lane on that flow cell and `LIB100` is the identifier of the library.
50+
51+
The `single_end` column is prospectively included to enable support for non-paired end sequencing technologies, such as Ion Torrent or Pacbio/ONT (TBD). For the moment, you can just put "false" here.
52+
53+
`R1` and `R2` designate the full path(s) to the read data. This can either be a local path on your (shared) file system or data in the cloud which you access via e.g., S3, google buckets or FTP.
4454

4555
## `--genome tomato` [default = tomato]
4656

modules/biobloomtools/categorizer/main.nf

+3-3
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,10 @@ process BIOBLOOMTOOLS_CATEGORIZER {
1010
'quay.io/biocontainers/biobloomtools:2.3.5--h4056dc3_2' }"
1111

1212
input:
13-
tuple val(meta), path(r1), path(r2)
13+
tuple val(meta), path(reads)
1414

1515
output:
16-
tuple val(meta), path(r1_trim), path(r2_trim), emit: reads
16+
tuple val(meta), path('*noMatch*.fq.gz'), emit: reads
1717
path('versions.yml'), emit: versions
1818
path("*summary.tsv"), emit: results
1919

@@ -23,7 +23,7 @@ process BIOBLOOMTOOLS_CATEGORIZER {
2323
r2_trim = filtered + '_noMatch_2.fq.gz'
2424

2525
"""
26-
biobloomcategorizer -p $filtered -t ${task.cpus} -n --fq --gz_out -i -e -f "${params.bloomfilter}" $r1 $r2
26+
biobloomcategorizer -p $filtered -t ${task.cpus} -n --fq --gz_out -i -e -f "${params.bloomfilter}" $reads
2727
2828
cat <<-END_VERSIONS > versions.yml
2929
"${task.process}":

modules/cat_fastq/main.nf

+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
process CAT_FASTQ {
2+
tag "$meta.sample_id"
3+
label 'process_single'
4+
5+
conda 'conda-forge::sed=4.7'
6+
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
7+
'https://depot.galaxyproject.org/singularity/ubuntu:20.04' :
8+
'ubuntu:20.04' }"
9+
10+
input:
11+
tuple val(meta), path(reads, stageAs: 'input*/*')
12+
13+
output:
14+
tuple val(meta), path('*.merged.fastq.gz'), emit: reads
15+
path 'versions.yml' , emit: versions
16+
17+
when:
18+
task.ext.when == null || task.ext.when
19+
20+
script:
21+
def prefix = meta.sample_id
22+
def readList = reads instanceof List ? reads.collect { r -> r.toString() } : [reads.toString()]
23+
24+
def read1 = []
25+
def read2 = []
26+
readList.eachWithIndex { v, ix -> (ix & 1 ? read2 : read1) << v }
27+
"""
28+
zcat ${read1.join(' ')} > ${prefix}_1.merged.fastq.gz
29+
zcat ${read2.join(' ')} > ${prefix}_2.merged.fastq.gz
30+
31+
cat <<-END_VERSIONS > versions.yml
32+
"${task.process}":
33+
cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//')
34+
END_VERSIONS
35+
"""
36+
}

modules/fastp/main.nf

+48-26
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,66 @@
11
process FASTP {
2-
publishDir "${params.outdir}/Processing/FastP", mode: 'copy'
2+
tag "${meta.sample_id}"
33

44
label 'short_parallel'
55

6-
tag "${meta.sample_id}|${meta.library_id}|${meta.readgroup_id}"
7-
86
conda 'bioconda::fastp=0.23.4'
97
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
108
'https://depot.galaxyproject.org/singularity/fastp:0.23.4--hadf994f_2' :
119
'quay.io/biocontainers/fastp:0.23.4--hadf994f_2' }"
1210

1311
input:
14-
tuple val(meta), path(r1), path(r2)
12+
tuple val(meta), path(reads)
1513

1614
output:
17-
tuple val(meta), path(r1_trim), path(r2_trim), emit: reads
15+
tuple val(meta), path('*trimmed.fastq.gz'), emit: reads
1816
path("*.json"), emit: json
1917
path('versions.yml'), emit: versions
2018

2119
script:
20+
21+
def args = task.ext.args ?: ''
22+
def prefix = task.ext.prefix ?: reads[0].getBaseName()
23+
24+
r1 = reads[0]
25+
2226
suffix = '_trimmed.fastq.gz'
23-
r1_trim = file(r1).getBaseName() + suffix
24-
r2_trim = file(r2).getBaseName() + suffix
25-
json = file(r1).getBaseName() + '.fastp.json'
26-
html = file(r2).getBaseName() + '.fastp.html'
27-
28-
"""
29-
fastp -c --in1 $r1 --in2 $r2 \
30-
--out1 $r1_trim \
31-
--out2 $r2_trim \
32-
--detect_adapter_for_pe \
33-
-w ${task.cpus} \
34-
-j $json \
35-
-h $html \
36-
--length_required 35
37-
38-
cat <<-END_VERSIONS > versions.yml
39-
"${task.process}":
40-
fastp: \$(fastp --version 2>&1 | sed -e "s/fastp //g")
41-
END_VERSIONS
42-
43-
"""
27+
28+
json = prefix + '.fastp.json'
29+
html = prefix + '.fastp.html'
30+
31+
if (meta.single_end) {
32+
r1_trim = r1.getBaseName() + suffix
33+
"""
34+
fastp --in1 ${r1} \
35+
--out1 $r1_trim \
36+
-w ${task.cpus} \
37+
-j $json \
38+
-h $html $args
39+
40+
cat <<-END_VERSIONS > versions.yml
41+
"${task.process}":
42+
fastp: \$(fastp --version 2>&1 | sed -e "s/fastp //g")
43+
END_VERSIONS
44+
"""
45+
} else {
46+
r2 = reads[1]
47+
r1_trim = r1.getBaseName() + suffix
48+
r2_trim = r2.getBaseName() + suffix
49+
"""
50+
fastp --in1 ${r1} --in2 ${r2} \
51+
--out1 $r1_trim \
52+
--out2 $r2_trim \
53+
--detect_adapter_for_pe \
54+
-w ${task.cpus} \
55+
-j $json \
56+
-h $html \
57+
$args
58+
59+
cat <<-END_VERSIONS > versions.yml
60+
"${task.process}":
61+
fastp: \$(fastp --version 2>&1 | sed -e "s/fastp //g")
62+
END_VERSIONS
63+
64+
"""
65+
}
4466
}

modules/input_check.nf

+19-14
Original file line numberDiff line numberDiff line change
@@ -8,30 +8,35 @@ workflow INPUT_CHECK {
88

99
main:
1010
samplesheet
11-
.splitCsv(header:true, sep: ',')
12-
.map { fastq_channel(it) }
11+
.splitCsv(header:true, sep:',')
12+
.map { row -> fastq_channel(row) }
1313
.set { reads }
1414

1515
emit:
1616
reads // channel: [ val(meta), [ reads ] ]
1717
}
1818

19+
// Function to get list of [ meta, [ fastq_1, fastq_2 ] ]
1920
def fastq_channel(LinkedHashMap row) {
20-
// create meta map
21-
def meta = [:]
22-
meta.sample_id = row.sample_id
23-
meta.readgroup_id = row.readgroup_id
24-
meta.library_id = row.library_id
2521

26-
// add path(s) of the fastq file(s) to the meta map
27-
def fastqMeta = []
22+
meta = [:]
23+
meta.sample_id = row.sample_id
24+
meta.single_end = row.single_end
25+
meta.library_id = row.library_id
26+
meta.readgroup_id = row.readgroup_id
27+
28+
array = []
2829
if (!file(row.R1).exists()) {
2930
exit 1, "ERROR: Please check input samplesheet -> Read 1 FastQ file does not exist!\n${row.R1}"
3031
}
31-
if (!file(row.R2).exists()) {
32-
exit 1, "ERROR: Please check input samplesheet -> Read 2 FastQ file does not exist!\n${row.R2}"
32+
if (meta.single_end) {
33+
array = [ meta, [ file(row.R1)] ]
34+
} else {
35+
if (!file(row.R2).exists()) {
36+
exit 1, "ERROR: Please check input samplesheet -> Read 2 FastQ file does not exist!\n${row.R2}"
37+
}
38+
array = [ meta, [ file(row.R1), file(row.R2)] ]
3339
}
34-
fastqMeta = [ meta, file(row.R1), file(row.R2) ]
35-
36-
return fastqMeta
40+
41+
return array
3742
}

modules/ptrimmer/main.nf

+36-16
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,53 @@
11
process PTRIMMER {
2-
publishDir "${params.outdir}/${meta.sample_id}/VSEARCH/PTRIMMER", mode: 'copy'
3-
42
label 'short_serial'
53

6-
tag "${meta.sample_id}|${meta.library_id}|${meta.readgroup_id}"
4+
tag "${meta.sample_id}"
75

86
conda 'bioconda::ptrimmer=1.3.3.'
97
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
108
'https://depot.galaxyproject.org/singularity/ptrimmer:1.3.3--h50ea8bc_5' :
119
'quay.io/biocontainers/ptrimmer:1.3.3--h50ea8bc_5' }"
1210

1311
input:
14-
tuple val(meta), path(r1), path(r2)
12+
tuple val(meta), path(reads)
1513
path(amplicon_txt)
1614

1715
output:
18-
tuple val(meta), path(r1_trimmed), path(r2_trimmed), emit: reads
16+
tuple val(meta), path('*ptrimmed.fastq.gz'), emit: reads
1917
path('versions.yml'), emit: versions
2018

2119
script:
22-
r1_trimmed = r1.getBaseName() + '_ptrimmed.fastq'
23-
r2_trimmed = r2.getBaseName() + '_ptrimmed.fastq'
24-
25-
"""
26-
ptrimmer -t pair -a $amplicon_txt -f $r1 -d $r1_trimmed -r $r2 -e $r2_trimmed
27-
cat <<-END_VERSIONS > versions.yml
28-
"${task.process}":
29-
Ptrimmer: \$(ptrimmer --help 2>&1 | grep Version | sed -e "s/Version: //g")
30-
END_VERSIONS
31-
32-
"""
20+
def args = task.ext.args ?: ''
21+
def prefix = task.ext.prefix ?: meta.sample_id
22+
23+
r1 = reads[0]
24+
r1_trimmed = prefix + '_1.ptrimmed.fastq'
25+
r1_trimmed_gz = r1_trimmed + '.gz'
26+
27+
if (meta.single_end) {
28+
"""
29+
ptrimmer $args -t single -a $amplicon_txt -f $r1 -d $r1_trimmed
30+
gzip $r1_trimmed
31+
32+
cat <<-END_VERSIONS > versions.yml
33+
"${task.process}":
34+
Ptrimmer: \$(ptrimmer --help 2>&1 | grep Version | sed -e "s/Version: //g")
35+
END_VERSIONS
36+
"""
37+
} else {
38+
r2 = reads[1]
39+
r2_trimmed = prefix + '_2.ptrimmed.fastq'
40+
r2_trimmed_gz = r2_trimmed + '.gz'
41+
42+
"""
43+
ptrimmer $args -t pair -a $amplicon_txt -f $r1 -d $r1_trimmed -r $r2 -e $r2_trimmed
44+
gzip $r1_trimmed
45+
gzip $r2_trimmed
46+
47+
cat <<-END_VERSIONS > versions.yml
48+
"${task.process}":
49+
Ptrimmer: \$(ptrimmer --help 2>&1 | grep Version | sed -e "s/Version: //g")
50+
END_VERSIONS
51+
"""
52+
}
3353
}

modules/vsearch/fastqfilter/main.nf

+6-2
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,15 @@ process VSEARCH_FASTQFILTER {
1919
filtered = fq.getBaseName() + '.filtered.fasta'
2020

2121
"""
22-
vsearch -fastq_filter $fq -fastq_maxee 0.5 -relabel Filtered -fastaout $filtered
22+
vsearch -fastq_filter $fq \
23+
-fastq_maxee 0.5 \
24+
--threads ${task.cpus} \
25+
-relabel Filtered \
26+
-fastaout $filtered
2327
2428
cat <<-END_VERSIONS > versions.yml
2529
"${task.process}":
26-
vsearch: \$(vsearch --version 2>&1 | head -n1 | sed -e "s/vsearch //g" -e "s/,.*//")
30+
vsearch: \$(vsearch --version 2>&1 | head -n 1 | sed 's/vsearch //g' | sed 's/,.*//g' | sed 's/^v//' | sed 's/_.*//')
2731
END_VERSIONS
2832
"""
2933
}

modules/vsearch/fastqmerge/main.nf

+2-1
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,13 @@ process VSEARCH_FASTQMERGE {
2020

2121
"""
2222
vsearch --fastq_merge $fwd --reverse $rev \
23+
--threads ${task.cpus} \
2324
--fastqout $merged \
2425
--fastq_eeout
2526
2627
cat <<-END_VERSIONS > versions.yml
2728
"${task.process}":
28-
vsearch: \$(vsearch --version 2>&1 | head -n1 | sed -e "s/vsearch //g" -e "s/,.*//")
29+
vsearch: \$(vsearch --version 2>&1 | head -n 1 | sed 's/vsearch //g' | sed 's/,.*//g' | sed 's/^v//' | sed 's/_.*//')
2930
END_VERSIONS
3031
"""
3132
}

modules/vsearch/fastxuniques/main.nf

+5-2
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,14 @@ process VSEARCH_FASTXUNIQUES {
1919
derep = fa.getBaseName() + '.unique.fasta'
2020

2121
"""
22-
vsearch -fastx_uniques $fa -sizeout -relabel ${meta.sample_id}_Unique -fastaout $derep
22+
vsearch -fastx_uniques $fa \
23+
-sizeout -relabel ${meta.sample_id}_Unique \
24+
-fastaout $derep \
25+
--threads ${task.cpus} \
2326
2427
cat <<-END_VERSIONS > versions.yml
2528
"${task.process}":
26-
vsearch: \$(vsearch --version 2>&1 | head -n1 | sed -e "s/vsearch //g" -e "s/,.*//")
29+
vsearch: \$(vsearch --version 2>&1 | head -n 1 | sed 's/vsearch //g' | sed 's/,.*//g' | sed 's/^v//' | sed 's/_.*//')
2730
END_VERSIONS
2831
"""
2932
}

subworkflows/bwamem2/main.nf

+1-2
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,6 @@ workflow BWAMEM2_WORKFLOW {
3333
reads,
3434
fasta
3535
)
36-
3736
ch_versions = ch_versions.mix(BWAMEM2_MEM.out.versions)
3837

3938
// Group BAM files by sample, in case of multi-lane setup
@@ -95,4 +94,4 @@ workflow BWAMEM2_WORKFLOW {
9594
vcf = FREEBAYES.out.vcf
9695
reports = ch_reports
9796
bam = SAMTOOLS_AMPLICONCLIP.out.bam
98-
}
97+
}

0 commit comments

Comments
 (0)