nf-core · ochkalova · Mar 9, 2026 · Oct 28, 2025 · Oct 28, 2025 · Oct 28, 2025
diff --git a/.nf-core.yml b/.nf-core.yml
@@ -2,6 +2,8 @@ lint:
   files_exist:
     - conf/igenomes.config
     - conf/igenomes_ignored.config
+  nextflow_config:
+    - params.input
   files_unchanged:
     - .github/PULL_REQUEST_TEMPLATE.md
 nf_core_version: 3.5.1

diff --git a/README.md b/README.md
@@ -21,85 +21,185 @@
 
 ## Introduction
 
-**nf-core/seqsubmit** is a bioinformatics pipeline that submits data to public archives such as [ENA](https://www.ebi.ac.uk/ena/browser/home)
+**nf-core/seqsubmit** is a Nextflow pipeline for submitting sequence data to [ENA](https://www.ebi.ac.uk/ena/browser/home).
+Currently, the pipeline supports three submission modes, each routed to a dedicated workflow and requiring its own input samplesheet structure:
 
-Pipeline will have several modes
+- `mags` for Metagenome Assembled Genomes (MAGs) submission with `GENOMESUBMIT` workflow
+- `bins` for bins submission with `GENOMESUBMIT` workflow
+- `metagenomic_assemblies` for assembly submission with `ASSEMBLYSUBMIT` workflow
 
-- `mags` for MAGs submission with **genome_submitter** wf
-- `bins` for bins submission with **genome_submitter** wf
-- `assemblies` for assembly submission with **assembly_submitter** wf
+![seqsubmit workflow diagram](assets/seqsubmit_schema.png)
 
 ## Requirements
 
-- Webin account registered https://www.ebi.ac.uk/ena/submit/webin/login
-- Raw reads submitted into [INSDC](https://www.insdc.org/)
+- [Nextflow](https://www.nextflow.io/) `>=25.04.0`
+- Webin account registered at https://www.ebi.ac.uk/ena/submit/webin/login
+- Raw reads used to assemble contigs submitted to [INSDC](https://www.insdc.org/) and associated accessions available
 
 Setup your environment secrets before running the pipeline:
 
 `nextflow secrets set WEBIN_ACCOUNT "Webin-XXX"`
 
 `nextflow secrets set WEBIN_PASSWORD "XXX"`
 
-Make sure you update with your authorised credentials.
+Make sure you update commands above with your authorised credentials.
 
-## genome_submitter
+## Input samplesheets
 
-Workflow to submit MAGs and/or bins to ENA.
+### `mags` and `bins` modes (`GENOMESUBMIT`)
 
-It takes input `samplesheet.csv` with fields required for [genome_uploader](https://github.com/EBI-Metagenomics/genome_uploader). Fields described in [docs](https://github.com/EBI-Metagenomics/genome_uploader/blob/main/README.md#input-tsv-and-fields).
-For now workflow converts CSV into required TSV.
+The input must follow `assets/schema_input_genome.json`.
 
-_Future implementation will consider missing fields (for example completeness and contamination) and would run steps to fill in the gaps._
+Required columns:
 
-<!-- TODO nf-core:
-   Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
-   major pipeline sections and the types of output it produces. You're giving an overview to someone new
-   to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
--->
+- `sample`
+- `fasta` (must end with `.fa.gz` or `.fasta.gz`)
+- `accession`
+- `assembly_software`
+- `binning_software`
+- `binning_parameters`
+- `stats_generation_software`
+- `metagenome`
+- `environmental_medium`
+- `broad_environment`
+- `local_environment`
+- `co-assembly`
 
-<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
-     workflows use the "tube map" design for that. See https://nf-co.re/docs/guidelines/graphic_design/workflow_diagrams#examples for examples.   -->
-<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
+Columns that required for now, but will be optional in the nearest future:
+
+- `completeness`
+- `contamination`
+- `genome_coverage`
+- `rRNA_presence`
+- `NCBI_lineage`
+
+Those fields are metadata required for [genome_uploader](https://github.com/EBI-Metagenomics/genome_uploader) package. They are described in [docs](https://github.com/EBI-Metagenomics/genome_uploader/blob/main/README.md#input-tsv-and-fields).
+
+Example `samplesheet_genome.csv`:
+
+```csv
+sample,fasta,accession,assembly_software,binning_software,binning_parameters,stats_generation_software,completeness,contamination,genome_coverage,metagenome,co-assembly,broad_environment,local_environment,environmental_medium,rRNA_presence,NCBI_lineage
+lachnospira_eligens,data/bin_lachnospira_eligens.fa.gz,SRR24458089,spades_v3.15.5,metabat2_v2.6,default,CheckM2_v1.0.1,61.0,0.21,32.07,sediment metagenome,false,marine,cable_bacteria,marine_sediment,false,d__Bacteria;p__Proteobacteria;s_unclassified_Proteobacteria
+```
+
+### `metagenomic_assemblies` mode (`ASSEMBLYSUBMIT`)
+
+The input must follow `assets/schema_input_assembly.json`.
+
+Required columns:
+
+- `sample`
+- `fasta` (must end with `.fa.gz` or `.fasta.gz`)
+- `run_accession`
+- `assembler`
+- `assembler_version`
+
+At least one of the following must be provided per row:
+
+- reads (`fastq_1`, optional `fastq_2` for paired-end)
+- `coverage`
+
+If `coverage` is missing and reads are provided, the workflow calculates average coverage with `coverm`.
+
+Example `samplesheet_assembly.csv`:
+
+```csv
+sample,fasta,fastq_1,fastq_2,coverage,run_accession,assembler,assembler_version
+assembly_1,data/contigs_1.fasta.gz,data/reads_1.fastq.gz,data/reads_2.fastq.gz,,ERR011322,SPAdes,3.15.5
+assembly_2,data/contigs_2.fasta.gz,,,42.7,ERR011323,MEGAHIT,1.2.9
+```
 
 ## Usage
 
 > [!NOTE]
 > If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
 
-<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
-     Explain what rows and columns represent. For instance (please edit as appropriate):
-First, prepare a samplesheet with your input data that looks as follows:
-`samplesheet.csv`:
-```csv
-sample,fastq_1,fastq_2
-CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
+### Required parameters:
+
+| Parameter            | Description                                                                       |
+| -------------------- | --------------------------------------------------------------------------------- |
+| `--mode`             | Type of the data to be submitted. Options: `[mags, bins, metagenomic_assemblies]` |
+| `--input`            | Path to the samplesheet describing the data to be submitted                       |
+| `--outdir`           | Path to the output directory for pipeline results                                 |
+| `--submission_study` | ENA study accession (PRJ/ERP) to submit the data to                               |
+| `--centre_name`      | Name of the submitter's organisation                                              |
+
+### Optional parameters:
+
+| Parameter           | Description                                                                              |
+| ------------------- | ---------------------------------------------------------------------------------------- |
+| `--upload_tpa`      | Flag to control the type of assembly study (third party assembly or not). Default: false |
+| `--test_upload`     | Upload to TEST ENA server instead of LIVE. Default: false                                |
+| `--webincli_submit` | If set to false, submissions will be validated, but not submitted. Default: true         |
+
+General command template:
+
+```bash
+nextflow run nf-core/seqsubmit \
+   -profile <docker/singularity/...> \
+   --mode <mags|bins|metagenomic_assemblies> \
+   --input <samplesheet.csv> \
+   --centre_name <your_centre> \
+   --submission_study <your_study> \
+   --outdir <outdir>
+```
+
+Validation run (submission to the ENA TEST server) in `mags` mode:
+
+```bash
+nextflow run nf-core/seqsubmit \
+   -profile docker \
+   --mode mags \
+   --input assets/samplesheet_genomes.csv \
+   --submission_study <your_study> \
+   --centre_name TEST_CENTER \
+   --webincli_submit true \
+   --test_upload true \
+   --outdir results/validate_mags
 ```
-Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
--->
 
-Now, you can run the pipeline using:
+Validation run (submission to the ENA TEST server) in `metagenomic_assemblies` mode:
 
-<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
+```bash
+nextflow run nf-core/seqsubmit \
+   -profile docker \
+   --mode metagenomic_assemblies \
+   --input assets/samplesheet_assembly.csv \
+   --submission_study <your_study> \
+   --centre_name TEST_CENTER \
+   --webincli_submit true \
+   --test_upload true \
+   --outdir results/validate_assemblies
+```
+
+Live submission example:
 
 ```bash
 nextflow run nf-core/seqsubmit \
-   -profile <docker/singularity/.../institute> \
-   --input samplesheet.csv \
-   --outdir <OUTDIR>
+   -profile docker \
+   --mode metagenomic_assemblies \
+   --input assets/samplesheet_assembly.csv \
+   --submission_study PRJEB98843 \
+   --test_upload false \
+   --webincli_submit true \
+   --outdir results/live_assembly
 ```
 
 > [!WARNING]
 > Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).
 
 For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/seqsubmit/usage) and the [parameter documentation](https://nf-co.re/seqsubmit/parameters).
 
-<!-- TODO nf-core:
 ## Pipeline output
 
-To see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/seqsubmit/results) tab on the nf-core website pipeline page.
-For more details about the output files and reports, please refer to the
-[output documentation](https://nf-co.re/seqsubmit/output).
--->
+Key output locations in `--outdir`:
+
+- `upload/manifests/`: generated manifest files for submission
+- `upload/webin_cli/`: ENA Webin CLI reports
+- `multiqc/`: MultiQC summary report
+- `pipeline_info/`: execution reports, trace, DAG, and software versions
+
+For full details, see the [output documentation](https://nf-co.re/seqsubmit/output).
 
 ## Credits
 

diff --git a/assets/samplesheet_assembly.csv b/assets/samplesheet_assembly.csv
@@ -0,0 +1,4 @@
+sample,fasta,fastq_1,fastq_2,coverage,run_accession,assembler,assembler_version
+sample1,tests/data/contigs.fasta.gz,tests/data/fastq_1.fastq,tests/data/fastq_2.fastq,,ERR000001,SPAdes,3.15
+sample2,tests/data/invalid_assembly.fasta.gz,,,45,ERR000002,Velvet,1.2.10
+sample3,tests/data/contigs.fasta.gz,,,30,ERR000003,MEGAHIT,1.2.9
diff --git a/assets/samplesheet.csv → assets/samplesheet_genomes.csv b/assets/samplesheet.csv → assets/samplesheet_genomes.csv
diff --git a/assets/schema_input_assembly.json b/assets/schema_input_assembly.json
@@ -0,0 +1,114 @@
+{
+    "$schema": "https://json-schema.org/draft/2020-12/schema",
+    "$id": "https://raw.githubusercontent.com/nf-core/seqsubmit/main/assets/schema_input_assembly.json",
+    "title": "nf-core/seqsubmit pipeline - params.input schema",
+    "description": "Schema for the sample sheet provided with params.input if params.mode is set to 'metagenomic_assemblies'",
+    "type": "array",
+    "items": {
+        "type": "object",
+        "properties": {
+            "sample": {
+                "type": "string",
+                "pattern": "^\\S+$",
+                "errorMessage": "Sample must be provided and cannot contain spaces",
+                "meta": ["id"]
+            },
+            "fasta": {
+                "type": "string",
+                "format": "file-path",
+                "exists": true,
+                "pattern": "^([\\S\\s]*\\/)?[^\\s\\/]+\\.f(ast)?a\\.gz$",
+                "errorMessage": "FASTA file must be provided and have extension '.fa', '.fasta', '.fas', '.fna' (optionally gzipped)",
+                "description": "Metagenomic assembly FASTA file"
+            },
+            "fastq_1": {
+                "anyOf": [
+                    {
+                        "type": "string",
+                        "format": "file-path",
+                        "exists": true,
+                        "pattern": "^\\S+\\.(fq|fastq)(\\.gz)?$"
+                    },
+                    {
+                        "type": "string",
+                        "maxLength": 0
+                    }
+                ],
+                "errorMessage": "FASTQ file must have extension '.fq' or '.fastq' (optionally gzipped)",
+                "description": "Forward reads if paired-end or single-end reads FASTQ file"
+            },
+            "fastq_2": {
+                "anyOf": [
+                    {
+                        "type": "string",
+                        "format": "file-path",
+                        "exists": true,
+                        "pattern": "^\\S+\\.(fq|fastq)(\\.gz)?$"
+                    },
+                    {
+                        "type": "string",
+                        "maxLength": 0
+                    }
+                ],
+                "errorMessage": "FASTQ file for reverse reads must have extension '.fq' or '.fastq' (optionally gzipped)",
+                "description": "Reverse reads FASTQ file if paired-end. Leave empty for single-end reads"
+            },
+            "coverage": {
+                "anyOf": [
+                    {
+                        "type": "number",
+                        "minimum": 0
+                    },
+                    {
+                        "type": "string",
+                        "maxLength": 0
+                    }
+                ],
+                "errorMessage": "Coverage must be a positive number or empty",
+                "description": "Estimated value of assembly coverage"
+            },
+            "run_accession": {
+                "type": "string",
+                "pattern": "^\\S+$",
+                "errorMessage": "Accession must be provided and cannot contain spaces",
+                "description": "Accession of the run used to generate the assembly"
+            },
+            "assembler": {
+                "type": "string",
+                "pattern": "^\\S+$",
+                "errorMessage": "Assembler must be provided and cannot contain spaces",
+                "description": "Name of the assembler software used to generate the assembly"
+            },
+            "assembler_version": {
+                "anyOf": [{ "type": "string" }, { "type": "number" }],
+                "pattern": "^\\S+$",
+                "errorMessage": "Assembler version must be provided and cannot contain spaces",
+                "description": "Version of the assembler software used to generate the assembly"
+            }
+        },
+        "required": ["sample", "fasta", "run_accession", "assembler", "assembler_version"],
+        "anyOf": [
+            {
+                "properties": {
+                    "fastq_1": {
+                        "type": "string",
+                        "minLength": 1
+                    }
+                },
+                "required": ["fastq_1"]
+            },
+            {
+                "properties": {
+                    "coverage": {
+                        "type": "number",
+                        "minimum": 0
+                    }
+                },
+                "required": ["coverage"]
+            }
+        ],
+        "errorMessage": {
+            "anyOf": "Either reads or coverage must be provided in the sample sheet for each assembly"
+        }
+    }
+}
diff --git a/assets/schema_input.json → assets/schema_input_genome.json b/assets/schema_input.json → assets/schema_input_genome.json
@@ -1,8 +1,8 @@
 {
     "$schema": "https://json-schema.org/draft/2020-12/schema",
-    "$id": "https://raw.githubusercontent.com/nf-core/seqsubmit/main/assets/schema_input.json",
+    "$id": "https://raw.githubusercontent.com/nf-core/seqsubmit/main/assets/schema_input_genome.json",
     "title": "nf-core/seqsubmit pipeline - params.input schema",
-    "description": "Schema for the file provided with params.input",
+    "description": "Schema for the file provided with params.input if params.mode is set to 'mags' or 'bins'",
     "type": "array",
     "items": {
         "type": "object",

diff --git a/assets/seqsubmit_schema.png b/assets/seqsubmit_schema.png
diff --git a/conf/modules.config b/conf/modules.config
@@ -20,15 +20,15 @@ process {
 
     withName: 'GENOME_UPLOAD' {
         publishDir = [
-            path: { "${params.outdir}/upload/manifests" },
+            path: { "${params.outdir}/${params.mode}/upload/manifests" },
             mode: params.publish_dir_mode,
             saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
         ]
     }
 
     withName: 'ENA_WEBIN_CLI' {
         publishDir = [
-            path: { "${params.outdir}/upload/webin_cli" },
+            path: { "${params.outdir}/${params.mode}/upload/webin_cli" },
             mode: params.publish_dir_mode,
             saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
         ]
@@ -37,10 +37,13 @@ process {
     withName: 'MULTIQC' {
         ext.args   = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' }
         publishDir = [
-            path: { "${params.outdir}/multiqc" },
+            path: { "${params.outdir}/${params.mode}/multiqc" },
             mode: params.publish_dir_mode,
             saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
         ]
     }
 
+    withName: 'GENERATE_ASSEMBLY_MANIFEST|ENA_WEBIN_CLI|REGISTERSTUDY' {
+        ext.args = { params.test_upload ? "--test" : "" }
+    }
 }