Skip to content

Commit 673fa2b

Browse files
committed
Adding intial documentation
1 parent c97275b commit 673fa2b

File tree

6 files changed

+111
-11
lines changed

6 files changed

+111
-11
lines changed

docs/installation.md

+12
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,17 @@
11
# Installation
22

3+
## Installation the references
4+
5+
This pipeline requires locally stored genomes in fasta format. To build these, do:
6+
7+
```
8+
nextflow run marchoeppner/gmo-check -profile standard,singularity --run_name build_refs --outdir /path/to/references
9+
```
10+
11+
If you do not singularity on your system, you can also specify docker, podman or conda for software provisioning - see the [usage information](usage.md).
12+
13+
The path specified with `--outdir` can then be given to the pipeline during normal execution as `--reference_base`.
14+
315
## Site-specific config file
416

517
This pipeline requires a site-specific configuration file to be able to talk to your local cluster or compute infrastructure. Nextflow supports a wide

docs/usage.md

+66-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,69 @@
11
# Usage information
22

3-
## Basic execution
3+
# Running the pipeline
44

5+
A basic execution of the pipeline looks as follows:
6+
7+
a) Without a site-specific config file
8+
9+
```
10+
nextflow run marchoeppner/gmo-check -profile standard,singularity --input samples.csv --genome tomato --reference_base /path/to/references --run_name pipeline-test
11+
```
12+
where `--path_to_references` corresponds to the location in which you have [installed](installation.md) the pipeline references.
13+
14+
In this example, the pipeline will assume it runs on a single computer with the singularity container engine available. Other options to provision software are:
15+
16+
`-profile standard,docker`
17+
18+
`-profile standard,podman`
19+
20+
`-profile standard,conda`
21+
22+
b) with a site-specific config file
23+
24+
```
25+
nextflow run marchoeppner/gmo-check -profile lsh --input samples.csv --genome tomato --run_name pipeline-text
26+
```
27+
28+
# Options
29+
30+
## `--input samplesheet.csv` [default = null]
31+
32+
This pipeline expects a CSV-formatted sample sheet to properly pull various meta data through the processes. The required format looks as follows:
33+
34+
```
35+
sample_id,library_id,readgroup_id,R1,R2
36+
S100,S100,AACYTCLM5.1.S100,/home/marc/projects/gaba/data/S100_R1.fastq.gz,/home/marc/projects/gaba/data/S100_R2.fastq.gz
37+
```
38+
39+
If you are unsure about the read group ID, just make sure that it should be unique for the combination of library, flowcell and lane. Typically it would be constructed from these components - and the easiest way to get it is from the FastQ file iteself (header of read 1, for example).
40+
41+
## `--genome tomato` [default = tomato]
42+
43+
The name of the pre-configured genome to analyze against. This parameter controls not only the mapping reference (if you use a mapping-based analysis), but also which internally pre-configured configuration files are used. Currently, only one genome can be analyzed per pipeline run.
44+
45+
Available options:
46+
47+
- tomato
48+
49+
## `--run_name Fubar` [default = null]
50+
51+
A mandatory name for this run, to be included with the result files.
52+
53+
## `--email [email protected]` [ default = null]
54+
55+
An email address to which the MultiQC report is send after pipeline completion. This requires for the executing system to have `sendmail` configured.
56+
57+
## `--tools vsearch` [default = vsearch]
58+
59+
This pipeline supports two completely independent tool chains:
60+
61+
- `vsearch` using a simple "metagenomics-like" amplicon processing workflow to produce dereplicated sequences from the short reads to then search for pre-defined patterns against a BLAST database (built-in)
62+
63+
- `bwa2` uses a classical variant calling approach, with parameters similar to what one would find in cancer analysis to detect low-frequency SNPs in mixed samples.
64+
65+
You can specify either one, or both: `--tools 'vsearch,bwa2'`
66+
67+
## `--reference_base` [default = null ]
68+
69+
The location of where the pipeline references are installed on your system. This will typically be pre-set in your site-specific config file and is only needed when you run without one.

modules/bwamem2/index/main.nf

+6-3
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,19 @@
11
process BWAMEM2_INDEX {
22

3-
tag "${meta.genome}"
3+
tag "${meta.id}"
44

55
label 'medium_serial'
66

7-
publishDir "${params.outdir}/${meta.genome}", mode 'copy'
7+
conda 'bioconda::samtools=1.19.2 bioconda::bwa-mem2=2.2.1'
8+
container 'quay.io/biocontainers/mulled-v2-e5d375990341c5aef3c9aff74f96f66f65375ef6:2cdf6bf1e92acbeb9b2834b1c58754167173a410-0'
9+
10+
publishDir "${params.outdir}/${meta.id}", mode: 'copy'
811

912
input:
1013
tuple val(meta),path(fasta)
1114

1215
output:
13-
//path('*'), emit: bwa_index
16+
path('*'), emit: bwa_index
1417
path("versions.yml"), emit: versions
1518

1619
script:

modules/samtools/dict/main.nf

+6-5
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,17 @@
11
process SAMTOOLS_DICT {
2+
3+
tag "${meta.id}"
4+
25
conda 'bioconda::samtools=1.19.2'
36
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
47
'https://depot.galaxyproject.org/singularity/samtools:1.19.2--h50ea8bc_0' :
58
'quay.io/biocontainers/samtools:1.19.2--h50ea8bc_0' }"
69

7-
publishDir "${params.outdir}/${meta.genome}", mode 'copy'
8-
9-
tag "${fasta}"
10+
publishDir "${params.outdir}/${meta.id}", mode: 'copy'
1011

1112
input:
12-
tuple val(meta), path(fasta)
13-
13+
tuple val(meta),path(fasta)
14+
1415
output:
1516
tuple val(meta), path(dict), emit: dict
1617
path("versions.yml"), emit: versions

modules/samtools/faidx/main.nf

+2-2
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,13 @@ process SAMTOOLS_FAIDX {
44

55
label 'short_serial'
66

7-
publishDir "${params.outdir}/${meta.id}", mode 'copy'
8-
97
conda 'bioconda::samtools=1.19.2'
108
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
119
'https://depot.galaxyproject.org/singularity/samtools:1.19.2--h50ea8bc_0' :
1210
'quay.io/biocontainers/samtools:1.19.2--h50ea8bc_0' }"
1311

12+
publishDir "${params.outdir}/${meta.id}", mode: 'copy'
13+
1414
input:
1515
tuple val(meta),path(fasta)
1616

workflows/build_references.nf

+19
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
11
include { GUNZIP } from "./../modules/gunzip"
22
include { SAMTOOLS_FAIDX } from "./../modules/samtools/faidx"
3+
include { SAMTOOLS_DICT } from "./../modules/samtools/dict"
4+
include { BWAMEM2_INDEX } from "./../modules/bwamem2/index"
35

46
genomes = params.references.genomes.keySet()
57

68
genome_list = []
79

10+
// Get all the configured genomes
811
genomes.each { genome ->
912
def meta = [:]
1013
meta.id = genome.toString()
@@ -14,22 +17,38 @@ genomes.each { genome ->
1417

1518
ch_genomes = Channel.fromList(genome_list)
1619

20+
// Workflow starts here
1721
workflow BUILD_REFERENCES {
1822

1923
main:
2024

25+
// Check if any of the fasta files are gzipped
2126
ch_genomes.branch {
2227
compressed: it[1].toString().contains(".gz")
2328
uncompressed: !it[1].toString().contains(".gz")
2429
}.set { ch_genomes_branched }
2530

31+
// unzip all the compressed fasta files
2632
GUNZIP(
2733
ch_genomes_branched.compressed
2834
)
2935

36+
// merge all fasta files back into one channel
3037
ch_fasta = ch_genomes_branched.uncompressed.mix(GUNZIP.out.gunzip)
3138

39+
// Index the fasta file(s)
3240
SAMTOOLS_FAIDX(
3341
ch_fasta
3442
)
43+
44+
// Create a sequence dictionary for the fasta file(s)
45+
SAMTOOLS_DICT(
46+
ch_fasta
47+
)
48+
49+
// Create the BWA2 index for the fasta file(s)
50+
BWAMEM2_INDEX(
51+
ch_fasta
52+
)
53+
3554
}

0 commit comments

Comments
 (0)