new process: picard/samtofastq#10298
Conversation
4b93cb6 to
268ec05
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new nf-core module process PICARD_SAMTOFASTQ under modules/nf-core/picard/samtofastq to convert SAM/BAM inputs into FASTQ output.
Changes:
- Introduces the
PICARD_SAMTOFASTQprocess implementation (conda/container, inputs/outputs, stub). - Adds module metadata (
meta.yml) describing inputs/outputs and tool information.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
| modules/nf-core/picard/samtofastq/main.nf | New Nextflow process definition for running Picard SamToFastq and emitting reads + version info. |
| modules/nf-core/picard/samtofastq/meta.yml | New module metadata describing the process interface and Picard tool details. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - bam: | ||
| type: file | ||
| description: | | ||
| SAM or BAM file |
There was a problem hiding this comment.
The bam input in the metadata lacks a pattern (and ontologies) field, unlike other modules in this repo (including other Picard modules). Adding a pattern such as *.{bam,sam} makes the interface clearer and helps with automated validation/linting.
| SAM or BAM file | |
| SAM or BAM file | |
| pattern: "*.{bam,sam}" | |
| ontologies: ["EDAM:format_2572", "EDAM:format_2573"] |
| @@ -0,0 +1,54 @@ | |||
| process PICARD_SAMTOFASTQ { | |||
| tag "$meta.id" | |||
There was a problem hiding this comment.
tag "$meta.id" does not interpolate meta.id in Groovy (only $meta would). Use ${meta.id} so the tag shows the sample id instead of the literal .id suffix in logs.
| tag "$meta.id" | |
| tag "${meta.id}" |
| conda "bioconda::picard=3.1.0" | ||
| container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | ||
| 'https://depot.galaxyproject.org/singularity/picard:3.1.0--hdfd78af_0' : | ||
| 'biocontainers/picard:3.1.0--hdfd78af_0' }" |
There was a problem hiding this comment.
Other modules/nf-core/picard/* modules standardize on conda "${moduleDir}/environment.yml" plus the Wave container URL for Picard (e.g. bedtointervallist/main.nf). This module hard-codes a conda package string and a different container source/version; consider aligning with the established Picard module pattern to keep dependency management consistent.
| conda "bioconda::picard=3.1.0" | |
| container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | |
| 'https://depot.galaxyproject.org/singularity/picard:3.1.0--hdfd78af_0' : | |
| 'biocontainers/picard:3.1.0--hdfd78af_0' }" | |
| conda "${moduleDir}/environment.yml" | |
| container "https://wave.seqera.io/library/biocontainers/picard:3.1.0--hdfd78af_0" |
| A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) | ||
| data and formats such as SAM/BAM/CRAM and VCF. | ||
| homepage: https://broadinstitute.github.io/picard/ | ||
| documentation: https://gatk.broadinstitute.org/hc/en-us/articles/360036510672-FastqToSam-Picard- |
There was a problem hiding this comment.
The Picard documentation: URL points to the FastqToSam page, not SamToFastq. Update it to the correct SamToFastq documentation to avoid misleading module users.
| documentation: https://gatk.broadinstitute.org/hc/en-us/articles/360036510672-FastqToSam-Picard- | |
| documentation: https://gatk.broadinstitute.org/hc/en-us/articles/360037226472-SamToFastq-Picard- |
| - picard: | ||
| description: | | ||
| A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) | ||
| data and formats such as SAM/BAM/CRAM and VCF. | ||
| homepage: https://broadinstitute.github.io/picard/ | ||
| documentation: https://gatk.broadinstitute.org/hc/en-us/articles/360036510672-FastqToSam-Picard- | ||
| tool_dev_url: https://github.com/broadinstitute/picard | ||
| licence: ["MIT"] | ||
| input: |
There was a problem hiding this comment.
The metadata is missing the identifier: biotools:picard_tools field that other Picard modules include under tools.picard (e.g. picard/fastqtosam/meta.yml). Adding it keeps tool metadata consistent across the Picard module set.
| output: | ||
| tuple val(meta), path("*.fastq"), emit: reads | ||
| tuple val("${task.process}"), val('picard'), eval("picard SamToFastq --version 2>&1 | sed -n 's/.*Version://p'"), topic: versions, emit: versions_picard |
There was a problem hiding this comment.
The process declares path("*.fastq") as output, but the script writes ${prefix}.fastq.gz (and _1/_2.fastq.gz for paired-end). This mismatch will cause Nextflow to not capture outputs; update the output glob to match the actual filenames/extensions.
| """ | ||
| echo "picard \\ | ||
| -Xmx${avail_mem}M \\ | ||
| SamToFastq \\ | ||
| ${args} \\ | ||
| --INPUT ${bam} \\ | ||
| ${output}" | ||
| touch ${output} | ||
| """ |
There was a problem hiding this comment.
The stub uses touch ${output}, but output contains Picard CLI flags (e.g. --FASTQ file ... --SECOND_END_FASTQ file ...). This will fail (touch sees --FASTQ as an option) and it won’t create the expected output files for stub runs; touch only the actual fastq filenames instead.
| process PICARD_SAMTOFASTQ { | ||
| tag "$meta.id" | ||
| label 'process_medium' | ||
|
|
||
| conda "bioconda::picard=3.1.0" | ||
| container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | ||
| 'https://depot.galaxyproject.org/singularity/picard:3.1.0--hdfd78af_0' : | ||
| 'biocontainers/picard:3.1.0--hdfd78af_0' }" | ||
|
|
||
| input: | ||
| tuple val(meta), path(bam) | ||
|
|
||
| output: | ||
| tuple val(meta), path("*.fastq"), emit: reads | ||
| tuple val("${task.process}"), val('picard'), eval("picard SamToFastq --version 2>&1 | sed -n 's/.*Version://p'"), topic: versions, emit: versions_picard | ||
|
|
There was a problem hiding this comment.
This module introduces new behavior but doesn’t include an nf-test test suite (compare to other Picard modules such as modules/nf-core/picard/cleansam/tests). Adding at least a basic test for single-end and paired-end output naming would prevent regressions.
| - picard: | ||
| type: string | ||
| description: The tool name | ||
| - "picard BedToIntervalList --version 2>&1 | sed -n 's/.*Version://p'": |
There was a problem hiding this comment.
The versions_picard metadata command is for BedToIntervalList, but this module runs SamToFastq. This should be updated so the recorded version command matches the actual tool invocation.
| - "picard BedToIntervalList --version 2>&1 | sed -n 's/.*Version://p'": | |
| - "picard SamToFastq --version 2>&1 | sed -n 's/.*Version://p'": |
| - fastq: | ||
| type: file | ||
| description: fastq files | ||
| pattern: "*.{fastq}" |
There was a problem hiding this comment.
The reads output is described as a single fastq file with pattern *.{fastq}, but the process produces gzipped FASTQ(s) (*.fastq.gz, potentially _1/_2). Align the metadata with nf-core conventions by using globbed output keys (e.g. "*.fastq.gz") and an appropriate pattern that covers single- and paired-end outputs.
| pattern: "*.{fastq}" | |
| pattern: "*.fastq.gz" |
Add a new process
PICARD_SAMTOFASTQ. Resolves #4176.I started creating this module when pytest was used for testing, and I haven't been able to get nf-test working in my dev environment. If someone could take over generating tests for this PR that would be appreciated!
If it's helpful, here is the test main.nf file I initially created for use with pytest: https://github.com/nf-core/modules/blob/119444e25af2544ffc43da16b8c47e92cd7dfb00/tests/modules/nf-core/picard/samtofastq/main.nf
PR checklist
Closes ##4176
[ ] If necessary, include test data in your PR.topic: versions- See version_topicslabelnf-core modules test <MODULE> --profile dockernf-core modules test <MODULE> --profile singularitynf-core modules test <MODULE> --profile conda[ ]nf-core subworkflows test <SUBWORKFLOW> --profile docker[ ]nf-core subworkflows test <SUBWORKFLOW> --profile singularity[ ]nf-core subworkflows test <SUBWORKFLOW> --profile conda