-
Notifications
You must be signed in to change notification settings - Fork 0
3. Subworkflows
Subworkflows are smaller workflows that are optional to run with either of the assembly tracks in Assemblage. These can be run by specifying their specific parameter in the nextflow run.
Ellipsis is a genome characterization workflow, focusing on resistance, virulence, plasmids, and annotation. The workflow identifies which contigs are part of the chromosome, and which are likely to be plasmids. Further, it will provide data per replicon identified, which makes it easy to see where in the genome a specific gene is located. This is especially powerful when coupled with the hybrid assembly main workflow, as you are likely to work with complete genomes.
Add the parameter --ellipsis
to either main workflow in Assemblage.
You also need to provide a csv file containing the paths to the databases used in Ellipsis with the --databases
parameter.
This file need to have the headers name
and path
, and each row represents each of these databases:
name,path
resfinder,path_to_resfinder_db
virulencefinder,path_to_virulencefinder_db
plasmidfinder,path_to_plasmidfinder_db
mobsuite,path_to_mobsuite_db
bakta,path_to_bakta_db
If any of the names are wrong or if any of the paths doesn't exist, the run will be terminated with an error message stating where the error is located.
A species parameter also need to be supplied, e.g. --species "'Escherichia coli'"
. Notice the extra ''
in the command, this is because VirulenceFinder will not recognize the parameter from nextflow if not wrapped in double and single quotes.
The output of Ellipsis will be located in the output directory specified with the --out_dir
command in either of the main workflows.
The contig results can be found in the file ellipsis_report.tsv
, which can be imported into Excel or any other text formatter.
--ellipsis: Use to activate the Ellipsis subworkflow --databases file.csv: Specify the path to the csv file containing the paths to all the databases --species: The species of the input genomes, used in Virulencefinder. Needs to be specified with double quotes, e.g. "'Escherichia coli'" --mincov: Minimum coverage for gene identification, used in the *finder processes. Default: 0.6 --identity_threshold: Minimum identity for gene identification, used in the *finder processes. Default: 0.8 --output_mge_reports: Use to output the mge reports from MOB-suite