Skip to content

3. Subworkflows

Håkon Kaspersen edited this page Feb 5, 2025 · 6 revisions

Subworkflows are smaller workflows that are optional to run with either of the assembly tracks in Assemblage. These can be run by specifying their specific parameter in the nextflow run.

Ellipsis

Ellipsis is a genome characterization workflow, focusing on resistance, virulence, plasmids, and annotation. The workflow identifies which contigs are part of the chromosome, and which are likely to be plasmids. Further, it will provide data per replicon identified, which makes it easy to see where in the genome a specific gene is located. This is especially powerful when coupled with the hybrid assembly main workflow, as you are likely to work with complete genomes.

How to run

Add the parameter --ellipsis to either main workflow in Assemblage.

You also need to provide a csv file containing the paths to the databases used in Ellipsis with the --databases parameter. This file need to have the headers name and path, and each row represents each of these databases:

name,path
resfinder,path_to_resfinder_db
virulencefinder,path_to_virulencefinder_db
plasmidfinder,path_to_plasmidfinder_db
mobsuite,path_to_mobsuite_db
bakta,path_to_bakta_db

If any of the names are wrong or if any of the paths doesn't exist, the run will be terminated with an error message stating where the error is located.

A species parameter also need to be supplied, e.g. --species "'Escherichia coli'". Notice the extra '' in the command, this is because VirulenceFinder will not recognize the parameter from nextflow if not wrapped in double and single quotes.

Output

The output of Ellipsis will be located in the output directory specified with the --out_dir command in either of the main workflows. The contig results can be found in the file ellipsis_report.tsv, which can be imported into Excel or any other text formatter.

Parameters

--ellipsis:             Use to activate the Ellipsis subworkflow
--databases file.csv:   Specify the path to the csv file containing the paths to all the databases
--species:              The species of the input genomes, used in Virulencefinder. Needs to be specified with double quotes, e.g. "'Escherichia coli'"
--mincov:               Minimum coverage for gene identification, used in the *finder processes. Default: 0.6
--identity_threshold:   Minimum identity for gene identification, used in the *finder processes. Default: 0.8
--output_mge_reports:   Use to output the mge reports from MOB-suite

Clone this wiki locally