|
1 | 1 | # Usage information
|
2 | 2 |
|
3 |
| -## Basic execution |
| 3 | +# Running the pipeline |
4 | 4 |
|
| 5 | +A basic execution of the pipeline looks as follows: |
| 6 | + |
| 7 | +a) Without a site-specific config file |
| 8 | + |
| 9 | +``` |
| 10 | +nextflow run marchoeppner/gmo-check -profile standard,singularity --input samples.csv --genome tomato --reference_base /path/to/references --run_name pipeline-test |
| 11 | +``` |
| 12 | +where `--path_to_references` corresponds to the location in which you have [installed](installation.md) the pipeline references. |
| 13 | + |
| 14 | +In this example, the pipeline will assume it runs on a single computer with the singularity container engine available. Other options to provision software are: |
| 15 | + |
| 16 | +`-profile standard,docker` |
| 17 | + |
| 18 | +`-profile standard,podman` |
| 19 | + |
| 20 | +`-profile standard,conda` |
| 21 | + |
| 22 | +b) with a site-specific config file |
| 23 | + |
| 24 | +``` |
| 25 | +nextflow run marchoeppner/gmo-check -profile lsh --input samples.csv --genome tomato --run_name pipeline-text |
| 26 | +``` |
| 27 | + |
| 28 | +# Options |
| 29 | + |
| 30 | +## `--input samplesheet.csv` [default = null] |
| 31 | + |
| 32 | +This pipeline expects a CSV-formatted sample sheet to properly pull various meta data through the processes. The required format looks as follows: |
| 33 | + |
| 34 | +``` |
| 35 | +sample_id,library_id,readgroup_id,R1,R2 |
| 36 | +S100,S100,AACYTCLM5.1.S100,/home/marc/projects/gaba/data/S100_R1.fastq.gz,/home/marc/projects/gaba/data/S100_R2.fastq.gz |
| 37 | +``` |
| 38 | + |
| 39 | +If you are unsure about the read group ID, just make sure that it should be unique for the combination of library, flowcell and lane. Typically it would be constructed from these components - and the easiest way to get it is from the FastQ file iteself (header of read 1, for example). |
| 40 | + |
| 41 | +## `--genome tomato` [default = tomato] |
| 42 | + |
| 43 | +The name of the pre-configured genome to analyze against. This parameter controls not only the mapping reference (if you use a mapping-based analysis), but also which internally pre-configured configuration files are used. Currently, only one genome can be analyzed per pipeline run. |
| 44 | + |
| 45 | +Available options: |
| 46 | + |
| 47 | +- tomato |
| 48 | + |
| 49 | +## `--run_name Fubar` [default = null] |
| 50 | + |
| 51 | +A mandatory name for this run, to be included with the result files. |
| 52 | + |
| 53 | +## `--email [email protected]` [ default = null] |
| 54 | + |
| 55 | +An email address to which the MultiQC report is send after pipeline completion. This requires for the executing system to have `sendmail` configured. |
| 56 | + |
| 57 | +## `--tools vsearch` [default = vsearch] |
| 58 | + |
| 59 | +This pipeline supports two completely independent tool chains: |
| 60 | + |
| 61 | +- `vsearch` using a simple "metagenomics-like" amplicon processing workflow to produce dereplicated sequences from the short reads to then search for pre-defined patterns against a BLAST database (built-in) |
| 62 | + |
| 63 | +- `bwa2` uses a classical variant calling approach, with parameters similar to what one would find in cancer analysis to detect low-frequency SNPs in mixed samples. |
| 64 | + |
| 65 | +You can specify either one, or both: `--tools 'vsearch,bwa2'` |
| 66 | + |
| 67 | +## `--reference_base` [default = null ] |
| 68 | + |
| 69 | +The location of where the pipeline references are installed on your system. This will typically be pre-set in your site-specific config file and is only needed when you run without one. |
0 commit comments