Skip to content

new process: picard/samtofastq#10298

Draft
kelly-sovacool wants to merge 4 commits intonf-core:masterfrom
kelly-sovacool:picard/samtofastq
Draft

new process: picard/samtofastq#10298
kelly-sovacool wants to merge 4 commits intonf-core:masterfrom
kelly-sovacool:picard/samtofastq

Conversation

@kelly-sovacool
Copy link
Member

@kelly-sovacool kelly-sovacool commented Mar 2, 2026

Add a new process PICARD_SAMTOFASTQ. Resolves #4176.

I started creating this module when pytest was used for testing, and I haven't been able to get nf-test working in my dev environment. If someone could take over generating tests for this PR that would be appreciated!

If it's helpful, here is the test main.nf file I initially created for use with pytest: https://github.com/nf-core/modules/blob/119444e25af2544ffc43da16b8c47e92cd7dfb00/tests/modules/nf-core/picard/samtofastq/main.nf

PR checklist

Closes ##4176

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • [ ] If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions - See version_topics
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • [ ] nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • [ ] nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • [ ] nf-core subworkflows test <SUBWORKFLOW> --profile conda

@mashehu mashehu force-pushed the picard/samtofastq branch from 4b93cb6 to 268ec05 Compare March 2, 2026 15:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new nf-core module process PICARD_SAMTOFASTQ under modules/nf-core/picard/samtofastq to convert SAM/BAM inputs into FASTQ output.

Changes:

  • Introduces the PICARD_SAMTOFASTQ process implementation (conda/container, inputs/outputs, stub).
  • Adds module metadata (meta.yml) describing inputs/outputs and tool information.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 11 comments.

File Description
modules/nf-core/picard/samtofastq/main.nf New Nextflow process definition for running Picard SamToFastq and emitting reads + version info.
modules/nf-core/picard/samtofastq/meta.yml New module metadata describing the process interface and Picard tool details.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- bam:
type: file
description: |
SAM or BAM file
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bam input in the metadata lacks a pattern (and ontologies) field, unlike other modules in this repo (including other Picard modules). Adding a pattern such as *.{bam,sam} makes the interface clearer and helps with automated validation/linting.

Suggested change
SAM or BAM file
SAM or BAM file
pattern: "*.{bam,sam}"
ontologies: ["EDAM:format_2572", "EDAM:format_2573"]

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,54 @@
process PICARD_SAMTOFASTQ {
tag "$meta.id"
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tag "$meta.id" does not interpolate meta.id in Groovy (only $meta would). Use ${meta.id} so the tag shows the sample id instead of the literal .id suffix in logs.

Suggested change
tag "$meta.id"
tag "${meta.id}"

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +8
conda "bioconda::picard=3.1.0"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/picard:3.1.0--hdfd78af_0' :
'biocontainers/picard:3.1.0--hdfd78af_0' }"
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other modules/nf-core/picard/* modules standardize on conda "${moduleDir}/environment.yml" plus the Wave container URL for Picard (e.g. bedtointervallist/main.nf). This module hard-codes a conda package string and a different container source/version; consider aligning with the established Picard module pattern to keep dependency management consistent.

Suggested change
conda "bioconda::picard=3.1.0"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/picard:3.1.0--hdfd78af_0' :
'biocontainers/picard:3.1.0--hdfd78af_0' }"
conda "${moduleDir}/environment.yml"
container "https://wave.seqera.io/library/biocontainers/picard:3.1.0--hdfd78af_0"

Copilot uses AI. Check for mistakes.
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS)
data and formats such as SAM/BAM/CRAM and VCF.
homepage: https://broadinstitute.github.io/picard/
documentation: https://gatk.broadinstitute.org/hc/en-us/articles/360036510672-FastqToSam-Picard-
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Picard documentation: URL points to the FastqToSam page, not SamToFastq. Update it to the correct SamToFastq documentation to avoid misleading module users.

Suggested change
documentation: https://gatk.broadinstitute.org/hc/en-us/articles/360036510672-FastqToSam-Picard-
documentation: https://gatk.broadinstitute.org/hc/en-us/articles/360037226472-SamToFastq-Picard-

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +16
- picard:
description: |
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS)
data and formats such as SAM/BAM/CRAM and VCF.
homepage: https://broadinstitute.github.io/picard/
documentation: https://gatk.broadinstitute.org/hc/en-us/articles/360036510672-FastqToSam-Picard-
tool_dev_url: https://github.com/broadinstitute/picard
licence: ["MIT"]
input:
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metadata is missing the identifier: biotools:picard_tools field that other Picard modules include under tools.picard (e.g. picard/fastqtosam/meta.yml). Adding it keeps tool metadata consistent across the Picard module set.

Copilot uses AI. Check for mistakes.
Comment on lines +13 to +15
output:
tuple val(meta), path("*.fastq"), emit: reads
tuple val("${task.process}"), val('picard'), eval("picard SamToFastq --version 2>&1 | sed -n 's/.*Version://p'"), topic: versions, emit: versions_picard
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The process declares path("*.fastq") as output, but the script writes ${prefix}.fastq.gz (and _1/_2.fastq.gz for paired-end). This mismatch will cause Nextflow to not capture outputs; update the output glob to match the actual filenames/extensions.

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +53
"""
echo "picard \\
-Xmx${avail_mem}M \\
SamToFastq \\
${args} \\
--INPUT ${bam} \\
${output}"
touch ${output}
"""
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stub uses touch ${output}, but output contains Picard CLI flags (e.g. --FASTQ file ... --SECOND_END_FASTQ file ...). This will fail (touch sees --FASTQ as an option) and it won’t create the expected output files for stub runs; touch only the actual fastq filenames instead.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +16
process PICARD_SAMTOFASTQ {
tag "$meta.id"
label 'process_medium'

conda "bioconda::picard=3.1.0"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/picard:3.1.0--hdfd78af_0' :
'biocontainers/picard:3.1.0--hdfd78af_0' }"

input:
tuple val(meta), path(bam)

output:
tuple val(meta), path("*.fastq"), emit: reads
tuple val("${task.process}"), val('picard'), eval("picard SamToFastq --version 2>&1 | sed -n 's/.*Version://p'"), topic: versions, emit: versions_picard

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module introduces new behavior but doesn’t include an nf-test test suite (compare to other Picard modules such as modules/nf-core/picard/cleansam/tests). Adding at least a basic test for single-end and paired-end output naming would prevent regressions.

Copilot uses AI. Check for mistakes.
- picard:
type: string
description: The tool name
- "picard BedToIntervalList --version 2>&1 | sed -n 's/.*Version://p'":
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The versions_picard metadata command is for BedToIntervalList, but this module runs SamToFastq. This should be updated so the recorded version command matches the actual tool invocation.

Suggested change
- "picard BedToIntervalList --version 2>&1 | sed -n 's/.*Version://p'":
- "picard SamToFastq --version 2>&1 | sed -n 's/.*Version://p'":

Copilot uses AI. Check for mistakes.
- fastq:
type: file
description: fastq files
pattern: "*.{fastq}"
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reads output is described as a single fastq file with pattern *.{fastq}, but the process produces gzipped FASTQ(s) (*.fastq.gz, potentially _1/_2). Align the metadata with nf-core conventions by using globbed output keys (e.g. "*.fastq.gz") and an appropriate pattern that covers single- and paired-end outputs.

Suggested change
pattern: "*.{fastq}"
pattern: "*.fastq.gz"

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

new module: picard/samtofastq

2 participants