Skip to content

Usage and config file parameters

Håkon Kaspersen edited this page Jan 27, 2021 · 1 revision

Usage

If using either of the assembly tracks of the pipeline, the reads need to have a specific naming convention in order to match in the pipeline. Overall, avoid using dots (.) in the read file names, except at the end (.fastq.gz). For short reads, the following naming convention must be followed:

*_R{1,2}.fastq.gz

Where the "*" represents the sample ID. For long reads, the following convention must be followed:

*.fastq.gz

Where "*" must exactly match the "*" in the short reads.

To run the pipeline, copy the main.config or the plasmap.config file and edit it to your needs. Then, run the following:

To run Ellipsis main pipeline:
path/to/ellipsis.sh main config_file.config output_folder

To run PlasMap pipeline:
path/to/ellipsis.sh plasmap plasmap.config output_folder

Java is automatically activated and deactivated.

Config file description

Workflow

  • params.track: Which workflow to run; either "hybrid","short_assembly", or "no_assembly".

General

  • params.reads: The path to the directory that holds the illumina reads. Must match the readfiles inside, not just the directory (see example in the config file).
  • params.longreads: The path to the directory that holds the long reads. See naming convention mentioned above.
  • params.assemblies: The path to the directory holding the assemblies if params.track = "no_assembly".
  • params.*db: Path to each database for each respective program.
  • params.chrom: If true, the chromosome is included in the annotations downstream.
  • params.trim: If true, trimming is run on both long- and short reads (Canu and Trim-galore).
  • params.illumina_filtering: If true, use Illumina reads as a quality reference for filtering long reads.

Program-specific settings

  • params.phred_score: Which phred score cutoff is used to trim Illumina reads (default 15).
  • params.genomesize: Approximate size of the organisms genome.
  • params.sequencer: nanopore or pacbio reads?
  • params.minlen: Minimum length of long reads to keep.
  • params.keep_percent: Filter away low-quality long reads until x percent remains.
  • params.target_bases: Maximum number of bases to keep after filtering.
  • params.mode: Unicycler mode of assembly, see here for more information.
  • params.min_fasta_length: All contigs below this size threshold will be removed.
  • params.prokka_additional: Any additional prokka options may be added here (must be added as-is, i.e. as they are typed when running prokka on a terminal). F.ex. --proteins <path> may be used here.
Clone this wiki locally