You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/developer.md
+58-1
Original file line number
Diff line number
Diff line change
@@ -144,4 +144,61 @@ You'll note that some obvious errors/warnings are omitted. This behavior is cont
144
144
145
145
## Sending report emails
146
146
147
-
This template is set up to send the final QC report via Email (--email [email protected]). This requires for sendmail to be configured on the executing node/computer.
147
+
This template is set up to send the final QC report via Email (--email [email protected]). This requires for sendmail to be configured on the executing node/computer.
148
+
149
+
## Adding new genomes and targets
150
+
151
+
This pipeline uses a JSON-formatted config file to keep track of the supported analyses. The most basic form looks as follows:
152
+
153
+
```JSON
154
+
{
155
+
"rules": {
156
+
"vsearch-blast": {
157
+
"payload": [
158
+
{
159
+
"format": "JSON",
160
+
"name": "GABA Mutation in SIGAD3",
161
+
"target": "SiGAD3|NM_001246898.2",
162
+
"matcher": "AAAG-TGGA",
163
+
"positive_report": "Diese Probe enthält eine GABA Mutation in SIGAD3. Nachweis erbraucht über: Amplicon Analyse.",
164
+
"negative_report": "Für diese Probe konnte keine GABA Mutation in SIGAD3 nachgewiesen werden."
165
+
}
166
+
]
167
+
168
+
},
169
+
"bwa-freebayes": {
170
+
"payload": [
171
+
{
172
+
"format": "VCF",
173
+
"target": "1:14834-14836",
174
+
"name": "GABA Mutation in SIGAD3",
175
+
"matcher": "1\t14834\t.\tGTG\tGTTG",
176
+
"positive_report": "Diese Probe enthält eine GABA Mutation in SIGAD3. Nachweis erbracht über: Varianten Analyse.",
177
+
"negative_report": "Für diese Probe konnte keine GABA Mutation in SIGAD3 nachgewiesen werden."
178
+
}
179
+
]
180
+
}
181
+
}
182
+
}
183
+
```
184
+
185
+
This file is reference-genome specific and lives in `assets/genome/NAME_OF_SPECIES/rules.json`[example](../assets/genomes/tomato/rules.json)
186
+
187
+
The rule set knows two types of rules:
188
+
189
+
-`vsearch-blast` - for analyses that use assembled and clustered amplicons to find patterns in a BLAST database
190
+
191
+
-`bwa-freebayes` - for analyses that use read alignment and variant calling against a reference genome.
192
+
193
+
To add new targets to an already established reference genome:
194
+
195
+
- Add new elements to the appropriate payload block in the rules.json manifest, following the example structure above
196
+
- If you want to enable the vsearch-blast tool chain, make sure that the built-in [Blast Database](../assets/blastdb.fasta.gz) contains the required target motif(s) (usually a gene of interest).
197
+
- Add the necessary primer information to the Ptrimmer config (amplicon.txt)
198
+
- Add the primer sequences to the cutadapt fasta file (primers.fa)
199
+
200
+
To add new reference genomes and matching target rules:
201
+
202
+
- Add the necessary information about the new reference genome into the [resources.config](../conf/resources.config) file, including a download link.
203
+
- Create a new species folder under /assets/genome
204
+
- Add the relevant files as described for above for adding individual assets
In this example, both `--reference_base` and the choice of software provisioning are already set in your local configuration and don't have to provided as command line argument.
41
+
In this example, both `--reference_base` and the choice of software provisioning are already set in your site-specific configuration and don't have to provided as command line argument.
31
42
32
-
# Options
43
+
##Options
33
44
34
-
## `--input samples.csv`[default = null]
45
+
###`--input samples.csv`[default = null]
35
46
36
47
This pipeline expects a CSV-formatted sample sheet to properly pull various meta data through the processes. The required format looks as follows:
@@ -52,23 +63,23 @@ The `single_end` column is prospectively included to enable support for non-pair
52
63
53
64
`R1` and `R2` designate the full path(s) to the read data. This can either be a local path on your (shared) file system or data in the cloud which you access via e.g., S3, google buckets or FTP.
54
65
55
-
## `--genome tomato`[default = tomato]
66
+
###`--genome tomato`[default = tomato]
56
67
57
68
The name of the pre-configured genome to analyze against. This parameter controls not only the mapping reference (if you use a mapping-based analysis), but also which internally pre-configured configuration files are used. Currently, only one genome can be analyzed per pipeline run.
58
69
59
70
Available options:
60
71
61
72
- tomato
62
73
63
-
## `--run_name Fubar`[default = null]
74
+
###`--run_name Fubar`[default = null]
64
75
65
76
A mandatory name for this run, to be included with the result files.
An email address to which the MultiQC report is send after pipeline completion. This requires for the executing system to have [sendmail](https://rimuhosting.com/support/settingupemail.jsp?mta=sendmail) configured.
70
81
71
-
## `--tools vsearch`[default = vsearch]
82
+
###`--tools vsearch`[default = vsearch]
72
83
73
84
This pipeline supports two completely independent tool chains:
74
85
@@ -80,16 +91,32 @@ You can specify either one, or both: `--tools 'vsearch,bwa2'`
80
91
81
92
Which tool chain is the best choice? Well, technically both options give near-identical results. So in this case `vsearch` would be the better option since it runs significantly faster. However, this pipeline is designed to (theoretically) handle many more types of genetic variants, not all of which are necessarily detectable without a proper variant calling. This is why the `bwa2` option exists - future proofing.
82
93
83
-
## `--reference_base`[default = null ]
94
+
###`--reference_base`[default = null ]
84
95
85
96
The location of where the pipeline references are installed on your system. This will typically be pre-set in your site-specific config file and is only needed when you run without one.
86
97
87
-
## `--outdir results`[default = results]
98
+
###`--outdir results`[default = results]
88
99
89
100
The location where the results are stored. Usually this will be `results`in the location from where you run the nextflow process. However, this option also accepts any other path in your file system(s).
The minimum percentage of reads supporting a SNP at a given site for the SNP to be considered. The default of 1% is chosen to be able to detect low levels of contribution but may need some tweaking depending on your exact sequencing setup and coverage.
The minimum percentage of reads supporting a SNP at a given site for the SNP to be considered. The default of 1% is chosen to be able to detect low levels of contribution but may need some tweaking depending on your exact sequencing setup and coverage.
107
+
108
+
## Resources
109
+
110
+
The following options can be set to control resource usage outside of a site-specific [config](https://github.com/marchoeppner/nf-configs) file.
111
+
112
+
### `--max_cpus`[ default = 16]
113
+
114
+
The maximum number of cpus a single job can request. This is typically the maximum number of cores available on a compute node or your local (development) machine.
115
+
116
+
### `--max_memory`[ default = 128.GB ]
117
+
118
+
The maximum amount of memory a single job can request. This is typically the maximum amount of RAM available on a compute node or your local (development) machine, minus a few percent to prevent the machine from running out of memory while running basic background tasks.
119
+
120
+
### `--max_time`[ default = 240.h ]
121
+
122
+
The maximum allowed run/wall time a single job can request. This is mostly relevant for environments where run time is restricted, such as in a computing cluster with active resource manager or possibly some cloud environments.
0 commit comments