diff --git a/.editorconfig b/.editorconfig index dd9ffa5..72dda28 100644 --- a/.editorconfig +++ b/.editorconfig @@ -28,10 +28,6 @@ indent_style = unset [/assets/email*] indent_size = unset -# ignore Readme -[README.md] -indent_style = unset - -# ignore python +# ignore python and markdown [*.{py,md}] indent_style = unset diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index ae5bb47..f3b747d 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -19,7 +19,7 @@ If you'd like to write some code for nf-core/detaxizer, the standard workflow is 1. Check that there isn't already an issue about your idea in the [nf-core/detaxizer issues](https://github.com/nf-core/detaxizer/issues) to avoid duplicating work. If there isn't one already, please create one so that others know you're working on this 2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/detaxizer repository](https://github.com/nf-core/detaxizer) to your GitHub account 3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions) -4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). +4. Use `nf-core pipelines schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). 5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). @@ -40,7 +40,7 @@ There are typically two types of tests that run: ### Lint tests `nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. -To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. +To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core pipelines lint ` command. If any failures or warnings are encountered, please follow the listed URL for more documentation. @@ -75,7 +75,7 @@ If you wish to contribute a new step, please use the following coding standards: 2. Write the process block (see below). 3. Define the output channel if needed (see below). 4. Add any new parameters to `nextflow.config` with a default (see below). -5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core schema build` tool). +5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core pipelines schema build` tool). 6. Add sanity checks and validation for all relevant parameters. 7. Perform local tests to validate that the new code works as expected. 8. If applicable, add a new test command in `.github/workflow/ci.yml`. @@ -86,11 +86,11 @@ If you wish to contribute a new step, please use the following coding standards: Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope. -Once there, use `nf-core schema build` to add to `nextflow_schema.json`. +Once there, use `nf-core pipelines schema build` to add to `nextflow_schema.json`. ### Default processes resource requirements -Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. +Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/main/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. The process resources can be passed on to the tool dynamically within the process with the `${task.cpus}` and `${task.memory}` variables in the `script:` block. @@ -103,7 +103,7 @@ Please use the following naming schemes, to make it easy to understand what is g ### Nextflow version bumping -If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]` +If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core pipelines bump-version --nextflow . [min-nf-version]` ### Images and figures diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 890a6fe..4c58ef6 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -17,7 +17,8 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/deta - [ ] If you've fixed a bug or added code that should be tested, add tests! - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/detaxizer/tree/master/.github/CONTRIBUTING.md) - [ ] If necessary, also make a PR on the nf-core/detaxizer _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. -- [ ] Make sure your code lints (`nf-core lint`). +- [ ] Make sure your code lints (`nf-core pipelines lint`). +- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker --outdir `). - [ ] Check for unexpected warnings in debug mode (`nextflow run . -profile debug,test,docker --outdir `). - [ ] Usage Documentation in `docs/usage.md` is updated. - [ ] Output Documentation in `docs/output.md` is updated. diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml index a87b5ab..c37b6f5 100644 --- a/.github/workflows/awsfulltest.yml +++ b/.github/workflows/awsfulltest.yml @@ -1,19 +1,37 @@ name: nf-core AWS full size tests -# This workflow is triggered on published releases. +# This workflow is triggered on PRs opened against the master branch. # It can be additionally triggered manually with GitHub actions workflow dispatch button. # It runs the -profile 'test_full' on AWS batch on: - release: - types: [published] + pull_request: + branches: + - master workflow_dispatch: + pull_request_review: + types: [submitted] + jobs: - run-tower: + run-platform: name: Run AWS full tests - if: github.repository == 'nf-core/detaxizer' + # run only if the PR is approved by at least 2 reviewers and against the master branch or manually triggered + if: github.repository == 'nf-core/detaxizer' && github.event.review.state == 'approved' && github.event.pull_request.base.ref == 'master' || github.event_name == 'workflow_dispatch' runs-on: ubuntu-latest steps: - - name: Launch workflow via tower + - uses: octokit/request-action@v2.x + id: check_approvals + if: github.event_name != 'workflow_dispatch' + with: + route: GET /repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/reviews + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - id: test_variables + if: github.event_name != 'workflow_dispatch' + run: | + JSON_RESPONSE='${{ steps.check_approvals.outputs.data }}' + CURRENT_APPROVALS_COUNT=$(echo $JSON_RESPONSE | jq -c '[.[] | select(.state | contains("APPROVED")) ] | length') + test $CURRENT_APPROVALS_COUNT -ge 2 || exit 1 # At least 2 approvals are required + - name: Launch workflow via Seqera Platform uses: seqeralabs/action-tower-launch@v2 # TODO nf-core: You can customise AWS full pipeline tests as required # Add full size test data (but still relatively small datasets for few samples) @@ -33,7 +51,7 @@ jobs: - uses: actions/upload-artifact@v4 with: - name: Tower debug log file + name: Seqera Platform debug log file path: | - tower_action_*.log - tower_action_*.json + seqera_platform_action_*.log + seqera_platform_action_*.json diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml index e7e2180..c23849f 100644 --- a/.github/workflows/awstest.yml +++ b/.github/workflows/awstest.yml @@ -5,13 +5,13 @@ name: nf-core AWS test on: workflow_dispatch: jobs: - run-tower: + run-platform: name: Run AWS tests if: github.repository == 'nf-core/detaxizer' runs-on: ubuntu-latest steps: - # Launch workflow using Tower CLI tool action - - name: Launch workflow via tower + # Launch workflow using Seqera Platform CLI tool action + - name: Launch workflow via Seqera Platform uses: seqeralabs/action-tower-launch@v2 with: workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }} @@ -27,7 +27,7 @@ jobs: - uses: actions/upload-artifact@v4 with: - name: Tower debug log file + name: Seqera Platform debug log file path: | - tower_action_*.log - tower_action_*.json + seqera_platform_action_*.log + seqera_platform_action_*.json diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index a7a64b8..9ba7226 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -7,9 +7,12 @@ on: pull_request: release: types: [published] + workflow_dispatch: env: NXF_ANSI_LOG: false + NXF_SINGULARITY_CACHEDIR: ${{ github.workspace }}/.singularity + NXF_SINGULARITY_LIBRARYDIR: ${{ github.workspace }}/.singularity concurrency: group: "${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}" @@ -17,47 +20,68 @@ concurrency: jobs: test: - name: Run pipeline with test data + name: "Run pipeline with test data (${{ matrix.NXF_VER }} | ${{ matrix.test_name }} | ${{ matrix.profile }})" # Only run on push if this is the nf-core dev branch (merged PRs) if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/detaxizer') }}" runs-on: ubuntu-latest strategy: matrix: NXF_VER: - - "23.04.0" + - "24.04.2" - "latest-everything" + profile: + - "conda" + - "docker" + - "singularity" + test_name: + - "test" + - "test_blastn" + - "test_filter_preprocessed" + isMaster: + - ${{ github.base_ref == 'master' }} + # Exclude conda and singularity on dev + exclude: + - isMaster: false + profile: "conda" + - isMaster: false + profile: "singularity" steps: - name: Check out pipeline code - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4 + uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 - - name: Install Nextflow - uses: nf-core/setup-nextflow@v1 + - name: Set up Nextflow + uses: nf-core/setup-nextflow@v2 with: version: "${{ matrix.NXF_VER }}" - - name: Disk space cleanup - uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1 + - name: Set up Apptainer + if: matrix.profile == 'singularity' + uses: eWaterCycle/setup-apptainer@main - - name: Run pipeline with test data + - name: Set up Singularity + if: matrix.profile == 'singularity' run: | - nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results + mkdir -p $NXF_SINGULARITY_CACHEDIR + mkdir -p $NXF_SINGULARITY_LIBRARYDIR - profiles: - name: Run workflow profile - # Only run on push if this is the nf-core dev branch (merged PRs) - if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/detaxizer') }} - runs-on: ubuntu-latest - strategy: - matrix: - # Run remaining test profiles with minimum nextflow version - profile: [test_skip_blastn, test_filter_preprocessed] - steps: - - name: Check out pipeline code - uses: actions/checkout@v4 + - name: Set up Miniconda + if: matrix.profile == 'conda' + uses: conda-incubator/setup-miniconda@a4260408e20b96e80095f42ff7f1a15b27dd94ca # v3 + with: + miniconda-version: "latest" + auto-update-conda: true + conda-solver: libmamba + channels: conda-forge,bioconda - - name: Install Nextflow - uses: nf-core/setup-nextflow@v1 + - name: Set up Conda + if: matrix.profile == 'conda' + run: | + echo $(realpath $CONDA)/condabin >> $GITHUB_PATH + echo $(realpath python) >> $GITHUB_PATH + + - name: Clean up Disk space + uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1 - - name: Run pipeline with ${{ matrix.profile }} test profile + - name: "Run pipeline with test data ${{ matrix.NXF_VER }} | ${{ matrix.test_name }} | ${{ matrix.profile }}" run: | - nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.profile }},docker --outdir ./results + nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.test_name }},${{ matrix.profile }} --outdir ./results diff --git a/.github/workflows/download_pipeline.yml b/.github/workflows/download_pipeline.yml index 08622fd..713dc3e 100644 --- a/.github/workflows/download_pipeline.yml +++ b/.github/workflows/download_pipeline.yml @@ -1,4 +1,4 @@ -name: Test successful pipeline download with 'nf-core download' +name: Test successful pipeline download with 'nf-core pipelines download' # Run the workflow when: # - dispatched manually @@ -8,12 +8,14 @@ on: workflow_dispatch: inputs: testbranch: - description: "The specific branch you wish to utilize for the test execution of nf-core download." + description: "The specific branch you wish to utilize for the test execution of nf-core pipelines download." required: true default: "dev" pull_request: types: - opened + - edited + - synchronize branches: - master pull_request_target: @@ -28,15 +30,20 @@ jobs: runs-on: ubuntu-latest steps: - name: Install Nextflow - uses: nf-core/setup-nextflow@v1 + uses: nf-core/setup-nextflow@v2 - - uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5 + - name: Disk space cleanup + uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1 + + - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5 with: - python-version: "3.11" + python-version: "3.12" architecture: "x64" - - uses: eWaterCycle/setup-singularity@931d4e31109e875b13309ae1d07c70ca8fbc8537 # v7 + + - name: Setup Apptainer + uses: eWaterCycle/setup-apptainer@4bb22c52d4f63406c49e94c804632975787312b3 # v2.0.0 with: - singularity-version: 3.8.3 + apptainer-version: 1.3.4 - name: Install dependencies run: | @@ -49,24 +56,64 @@ jobs: echo "REPOTITLE_LOWERCASE=$(basename ${GITHUB_REPOSITORY,,})" >> ${GITHUB_ENV} echo "REPO_BRANCH=${{ github.event.inputs.testbranch || 'dev' }}" >> ${GITHUB_ENV} + - name: Make a cache directory for the container images + run: | + mkdir -p ./singularity_container_images + - name: Download the pipeline env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images run: | - nf-core download ${{ env.REPO_LOWERCASE }} \ + nf-core pipelines download ${{ env.REPO_LOWERCASE }} \ --revision ${{ env.REPO_BRANCH }} \ --outdir ./${{ env.REPOTITLE_LOWERCASE }} \ --compress "none" \ --container-system 'singularity' \ - --container-library "quay.io" -l "docker.io" -l "ghcr.io" \ + --container-library "quay.io" -l "docker.io" -l "community.wave.seqera.io" \ --container-cache-utilisation 'amend' \ - --download-configuration + --download-configuration 'yes' - name: Inspect download run: tree ./${{ env.REPOTITLE_LOWERCASE }} - - name: Run the downloaded pipeline + - name: Count the downloaded number of container images + id: count_initial + run: | + image_count=$(ls -1 ./singularity_container_images | wc -l | xargs) + echo "Initial container image count: $image_count" + echo "IMAGE_COUNT_INITIAL=$image_count" >> ${GITHUB_ENV} + + - name: Run the downloaded pipeline (stub) + id: stub_run_pipeline + continue-on-error: true env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images NXF_SINGULARITY_HOME_MOUNT: true run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -stub -profile test,singularity --outdir ./results + - name: Run the downloaded pipeline (stub run not supported) + id: run_pipeline + if: ${{ job.steps.stub_run_pipeline.status == failure() }} + env: + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images + NXF_SINGULARITY_HOME_MOUNT: true + run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -profile test,singularity --outdir ./results + + - name: Count the downloaded number of container images + id: count_afterwards + run: | + image_count=$(ls -1 ./singularity_container_images | wc -l | xargs) + echo "Post-pipeline run container image count: $image_count" + echo "IMAGE_COUNT_AFTER=$image_count" >> ${GITHUB_ENV} + + - name: Compare container image counts + run: | + if [ "${{ env.IMAGE_COUNT_INITIAL }}" -ne "${{ env.IMAGE_COUNT_AFTER }}" ]; then + initial_count=${{ env.IMAGE_COUNT_INITIAL }} + final_count=${{ env.IMAGE_COUNT_AFTER }} + difference=$((final_count - initial_count)) + echo "$difference additional container images were \n downloaded at runtime . The pipeline has no support for offline runs!" + tree ./singularity_container_images + exit 1 + else + echo "The pipeline can be downloaded successfully!" + fi diff --git a/.github/workflows/fix-linting.yml b/.github/workflows/fix-linting.yml index 1aa8eb4..28a0a5f 100644 --- a/.github/workflows/fix-linting.yml +++ b/.github/workflows/fix-linting.yml @@ -13,7 +13,7 @@ jobs: runs-on: ubuntu-latest steps: # Use the @nf-core-bot token to check out so we can push later - - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4 + - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 with: token: ${{ secrets.nf_core_bot_auth_token }} @@ -32,9 +32,9 @@ jobs: GITHUB_TOKEN: ${{ secrets.nf_core_bot_auth_token }} # Install and run pre-commit - - uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5 + - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5 with: - python-version: 3.11 + python-version: "3.12" - name: Install pre-commit run: pip install pre-commit diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index 073e187..a502573 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -1,6 +1,6 @@ name: nf-core linting # This workflow is triggered on pushes and PRs to the repository. -# It runs the `nf-core lint` and markdown lint tests to ensure +# It runs the `nf-core pipelines lint` and markdown lint tests to ensure # that the code meets the nf-core guidelines. on: push: @@ -14,13 +14,12 @@ jobs: pre-commit: runs-on: ubuntu-latest steps: - - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4 + - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 - - name: Set up Python 3.11 - uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5 + - name: Set up Python 3.12 + uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5 with: - python-version: 3.11 - cache: "pip" + python-version: "3.12" - name: Install pre-commit run: pip install pre-commit @@ -32,27 +31,42 @@ jobs: runs-on: ubuntu-latest steps: - name: Check out pipeline code - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4 + uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 - name: Install Nextflow - uses: nf-core/setup-nextflow@v1 + uses: nf-core/setup-nextflow@v2 - - uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5 + - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5 with: - python-version: "3.11" + python-version: "3.12" architecture: "x64" + - name: read .nf-core.yml + uses: pietrobolcato/action-read-yaml@1.1.0 + id: read_yml + with: + config: ${{ github.workspace }}/.nf-core.yml + - name: Install dependencies run: | python -m pip install --upgrade pip - pip install nf-core + pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }} + + - name: Run nf-core pipelines lint + if: ${{ github.base_ref != 'master' }} + env: + GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} + run: nf-core -l lint_log.txt pipelines lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md - - name: Run nf-core lint + - name: Run nf-core pipelines lint --release + if: ${{ github.base_ref == 'master' }} env: GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} - run: nf-core -l lint_log.txt lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md + run: nf-core -l lint_log.txt pipelines lint --release --dir ${GITHUB_WORKSPACE} --markdown lint_results.md - name: Save PR number if: ${{ always() }} @@ -60,7 +74,7 @@ jobs: - name: Upload linting log file artifact if: ${{ always() }} - uses: actions/upload-artifact@5d5d22a31266ced268874388b861e4b58bb5c2f3 # v4 + uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808 # v4 with: name: linting-logs path: | diff --git a/.github/workflows/linting_comment.yml b/.github/workflows/linting_comment.yml index b706875..42e519b 100644 --- a/.github/workflows/linting_comment.yml +++ b/.github/workflows/linting_comment.yml @@ -11,7 +11,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Download lint results - uses: dawidd6/action-download-artifact@f6b0bace624032e30a85a8fd9c1a7f8f611f5737 # v3 + uses: dawidd6/action-download-artifact@bf251b5aa9c2f7eeb574a96ee720e24f801b7c11 # v6 with: workflow: linting.yml workflow_conclusion: completed diff --git a/.github/workflows/release-announcements.yml b/.github/workflows/release-announcements.yml index d468aea..c6ba35d 100644 --- a/.github/workflows/release-announcements.yml +++ b/.github/workflows/release-announcements.yml @@ -12,7 +12,7 @@ jobs: - name: get topics and convert to hashtags id: get_topics run: | - curl -s https://nf-co.re/pipelines.json | jq -r '.remote_workflows[] | select(.full_name == "${{ github.repository }}") | .topics[]' | awk '{print "#"$0}' | tr '\n' ' ' >> $GITHUB_OUTPUT + echo "topics=$(curl -s https://nf-co.re/pipelines.json | jq -r '.remote_workflows[] | select(.full_name == "${{ github.repository }}") | .topics[]' | awk '{print "#"$0}' | tr '\n' ' ')" | sed 's/-//g' >> $GITHUB_OUTPUT - uses: rzr/fediverse-action@master with: @@ -25,13 +25,13 @@ jobs: Please see the changelog: ${{ github.event.release.html_url }} - ${{ steps.get_topics.outputs.GITHUB_OUTPUT }} #nfcore #openscience #nextflow #bioinformatics + ${{ steps.get_topics.outputs.topics }} #nfcore #openscience #nextflow #bioinformatics send-tweet: runs-on: ubuntu-latest steps: - - uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5 + - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5 with: python-version: "3.10" - name: Install dependencies diff --git a/.github/workflows/template_version_comment.yml b/.github/workflows/template_version_comment.yml new file mode 100644 index 0000000..e8aafe4 --- /dev/null +++ b/.github/workflows/template_version_comment.yml @@ -0,0 +1,46 @@ +name: nf-core template version comment +# This workflow is triggered on PRs to check if the pipeline template version matches the latest nf-core version. +# It posts a comment to the PR, even if it comes from a fork. + +on: pull_request_target + +jobs: + template_version: + runs-on: ubuntu-latest + steps: + - name: Check out pipeline code + uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 + with: + ref: ${{ github.event.pull_request.head.sha }} + + - name: Read template version from .nf-core.yml + uses: nichmor/minimal-read-yaml@v0.0.2 + id: read_yml + with: + config: ${{ github.workspace }}/.nf-core.yml + + - name: Install nf-core + run: | + python -m pip install --upgrade pip + pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }} + + - name: Check nf-core outdated + id: nf_core_outdated + run: echo "OUTPUT=$(pip list --outdated | grep nf-core)" >> ${GITHUB_ENV} + + - name: Post nf-core template version comment + uses: mshick/add-pr-comment@b8f338c590a895d50bcbfa6c5859251edc8952fc # v2 + if: | + contains(env.OUTPUT, 'nf-core') + with: + repo-token: ${{ secrets.NF_CORE_BOT_AUTH_TOKEN }} + allow-repeats: false + message: | + > [!WARNING] + > Newer version of the nf-core template is available. + > + > Your pipeline is using an old version of the nf-core template: ${{ steps.read_yml.outputs['nf_core_version'] }}. + > Please update your pipeline to the latest version. + > + > For more documentation on how to update your pipeline, please see the [nf-core documentation](https://github.com/nf-core/tools?tab=readme-ov-file#sync-a-pipeline-with-the-template) and [Synchronisation documentation](https://nf-co.re/docs/contributing/sync). + # diff --git a/.gitignore b/.gitignore index 5124c9a..a42ce01 100644 --- a/.gitignore +++ b/.gitignore @@ -6,3 +6,4 @@ results/ testing/ testing* *.pyc +null/ diff --git a/.gitpod.yml b/.gitpod.yml index 105a182..4611863 100644 --- a/.gitpod.yml +++ b/.gitpod.yml @@ -4,17 +4,14 @@ tasks: command: | pre-commit install --install-hooks nextflow self-update - - name: unset JAVA_TOOL_OPTIONS - command: | - unset JAVA_TOOL_OPTIONS vscode: extensions: # based on nf-core.nf-core-extensionpack - - esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code + #- esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code - EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files - Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar - mechatroner.rainbow-csv # Highlight columns in csv files in different colors - # - nextflow.nextflow # Nextflow syntax highlighting + - nextflow.nextflow # Nextflow syntax highlighting - oderwat.indent-rainbow # Highlight indentation level - streetsidesoftware.code-spell-checker # Spelling checker for source code - charliermarsh.ruff # Code linter Ruff diff --git a/.nf-core.yml b/.nf-core.yml index 78a3fd2..4f8b530 100644 --- a/.nf-core.yml +++ b/.nf-core.yml @@ -1,5 +1,21 @@ -repository_type: pipeline +bump_version: null lint: files_unchanged: - .github/CONTRIBUTING.md - .github/PULL_REQUEST_TEMPLATE.md +nf_core_version: 3.0.2 +org_path: null +repository_type: pipeline +template: + author: Jannik Seidel + description: A pipeline to identify (and remove) certain sequences from raw genomic + data. Default taxon to identify (and remove) is Homo sapiens. Removal + is optional. + force: false + is_nfcore: true + name: detaxizer + org: nf-core + outdir: . + skip_features: null + version: 1.1.0 +update: null diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index af57081..9e9f0e1 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -3,8 +3,11 @@ repos: rev: "v3.1.0" hooks: - id: prettier + additional_dependencies: + - prettier@3.2.5 + - repo: https://github.com/editorconfig-checker/editorconfig-checker.python - rev: "2.7.3" + rev: "3.0.3" hooks: - id: editorconfig-checker alias: ec diff --git a/CHANGELOG.md b/CHANGELOG.md index 3f30cab..2aa959d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,78 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## v1.1.0 - Kombjuudr - [2024-11-08] + +### `Added` + +- [PR #34](https://github.com/nf-core/detaxizer/pull/34) - Added bbduk to the classification step (kraken2 as default, both can be run together) (by @jannikseidelQBiC) +- [PR #34](https://github.com/nf-core/detaxizer/pull/34) - Added `--fasta_bbduk` parameter to provide a fasta file with contaminants (by @jannikseidelQBiC) +- [PR #34](https://github.com/nf-core/detaxizer/pull/34) - Rewrote summary step of classification to be usable with bbduk and/or kraken2 (by @jannikseidelQBiC) +- [PR #34](https://github.com/nf-core/detaxizer/pull/34) - Made preprocessing with fastp optional and added the parameter `--fastp_eval_duplication` to turn on duplication removal (off as default, was on/not changeable in v1.0.0) (by @jannikseidelQBiC) +- [PR #34](https://github.com/nf-core/detaxizer/pull/34) - Optionally the removed reads can now be written to the output folder (by @jannikseidelQBiC) +- [PR #34](https://github.com/nf-core/detaxizer/pull/34) - Added optional classification of filtered and removed reads via kraken2 (by @jannikseidelQBiC) +- [PR #39](https://github.com/nf-core/detaxizer/pull/39) - Added generation of input samplesheet for nf-core/mag, nf-core/taxprofiler (by @Joon-Klaps) + +#### Parameters + +Added parameters: + +| Parameter | +| ----------------------------------------- | +| `--fasta_bbduk` | +| `--preprocessing` | +| `--output_removed_reads` | +| `--classification_kraken2` | +| `--classification_bbduk` | +| `--kraken2confidence_filtered` | +| `--kraken2confidence_removed` | +| `--classification_kraken2_post_filtering` | +| `--fastp_eval_duplication` | +| `--bbduk_kmers` | + +Changed default values of parameters: + +| Parameter | Old default value | New default value | +| -------------------------- | ----------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | +| `--fastp_cut_mean_quality` | 15 | 1 | +| `--kraken2db` | 'https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20231009.tar.gz' | 'https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20240605.tar.gz' | +| `--kraken2confidence` | 0.05 | 0.00 | +| `--tax2filter` | 'Homo' | 'Homo sapiens' | +| `--cutoff_tax2filter` | 2 | 0 | +| `--cutoff_tax2keep` | 0.5 | 0.0 | + +### `Changed` + +- [PR #42](https://github.com/nf-core/detaxizer/pull/42) - Template update for nf-core/tools 3.0.2, for details read [this blog post](https://nf-co.re/blog/2024/tools-3_0_0#important-template-updates) + +### `Fixed` + +- [PR #33](https://github.com/nf-core/detaxizer/pull/33) - Addition of quotation marks in `parse_kraken2report.nf` prevents failure of the pipeline when using a taxon with space (e.g. Homo sapiens) with the `--tax2filter` parameter (by @jannikseidelQBiC) +- [PR #34](https://github.com/nf-core/detaxizer/pull/34) - Made validation via blastn optional by default (by @jannikseidelQBiC) +- [PR #34](https://github.com/nf-core/detaxizer/pull/34) - Changed parameter `--fasta` to `--fasta_blastn` (by @jannikseidelQBiC) + +### `Dependencies` + +Updated and added dependencies + +| Tool | Previous version | Current version | +| ------- | ---------------- | --------------- | +| bbmap | - | 39.10 | +| blastn | 2.14.1 | 2.15.0 | +| multiQC | 1.21 | 1.25.1 | +| kraken2 | 2.1.2 | 2.1.3 | +| seqkit | 2.8.0 | 2.8.2 | + +### `Deprecated` + +| Parameter | New parameter | Reason | +| --------------- | --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | +| `--fasta` | `--fasta_blastn` | Introduction of fasta_bbduk; necessary to further distinguish the two parameters | +| `--skip_blastn` | `--validation_blastn` | blastn is now to be enabled on purpose; too resource intensive for a default setting | +| `--max_cpus` | - | New behavior of [nextflow](https://www.nextflow.io/docs/latest/reference/process.html#resourcelimits), `resourceLimits` can now be set via a config | +| `--max_memory` | - | New behavior of [nextflow](https://www.nextflow.io/docs/latest/reference/process.html#resourcelimits), `resourceLimits` can now be set via a config | +| `--max_time` | - | New behavior of [nextflow](https://www.nextflow.io/docs/latest/reference/process.html#resourcelimits), `resourceLimits` can now be set via a config | + ## v1.0.0 - Kobbfarbad - [2024-03-26] Initial release of nf-core/detaxizer, created with the [nf-core](https://nf-co.re/) template. diff --git a/CITATIONS.md b/CITATIONS.md index ddff8d5..99290f7 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -10,6 +10,10 @@ ## Pipeline tools +- [bbmap](https://sourceforge.net/projects/bbmap/) + + > Bushnell B. (2022) BBMap, URL: http://sourceforge.net/projects/bbmap/ + - [blastn](https://blast.ncbi.nlm.nih.gov/Blast.cgi) > Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. Journal of Molecular Biology 215, 403–410 (1990). doi:10.1016/s0022-2836(05)80360-2. diff --git a/README.md b/README.md index 5c6598f..eeeed18 100644 --- a/README.md +++ b/README.md @@ -9,26 +9,26 @@ [![GitHub Actions Linting Status](https://github.com/nf-core/detaxizer/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/detaxizer/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/detaxizer/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.10877147-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.10877147) [![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com) -[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.04.0-23aa62.svg)](https://www.nextflow.io/) +[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.2-23aa62.svg)](https://www.nextflow.io/) [![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/) [![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/) [![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/) -[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://tower.nf/launch?pipeline=https://github.com/nf-core/detaxizer) +[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/nf-core/detaxizer) [![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23detaxizer-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/detaxizer)[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core)[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core) ## Introduction -**nf-core/detaxizer** is a bioinformatics pipeline that checks for the presence of a specific taxon in (meta)genomic fastq files and offers the option to filter out this taxon or taxonomic subtree. The process begins with preprocessing (adapter trimming, quality cutting and optional length and quality filtering) using fastp and quality assessment via FastQC, followed by taxon classification with kraken2, and employs blastn for validation of the reads associated with the identified taxa. Users must provide a samplesheet to indicate the fastq files and, if utilizing the validation step, a fasta file for creating the blastn database to verify the targeted taxon. +**nf-core/detaxizer** is a bioinformatics pipeline that checks for the presence of a specific taxon in (meta)genomic fastq files and offers the option to filter out this taxon or taxonomic subtree. The process begins with quality assessment via FastQC and optional preprocessing (adapter trimming, quality cutting and optional length and quality filtering) using fastp, followed by taxonomic classification with kraken2 and/or bbduk, and optionally employs blastn for validation of the reads associated with the identified taxa. Users must provide a samplesheet to indicate the fastq files and, if utilizing bbduk in the classification and/or the validation step, fasta files for usage of bbduk and creating the blastn database to verify the targeted taxon. ![detaxizer metro workflow](docs/images/Detaxizer_metro_workflow.png) 1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)) -2. Pre-processing ([`fastp`](https://github.com/OpenGene/fastp)) -3. Classification of reads ([`Kraken2`](https://ccb.jhu.edu/software/kraken2/)) +2. Optional pre-processing ([`fastp`](https://github.com/OpenGene/fastp)) +3. Classification of reads ([`Kraken2`](https://ccb.jhu.edu/software/kraken2/), and/or [`bbduk`](https://sourceforge.net/projects/bbmap/)) 4. Optional validation of searched taxon/taxa ([`blastn`](https://blast.ncbi.nlm.nih.gov/Blast.cgi)) -5. Optional filtering of the searched taxon/taxa from the reads (either from the raw files or the preprocessed reads, using either the output from kraken2 or blastn) -6. Summary of the processes (how many reads were initially present after preprocessing, how many were classified as the `tax2filter` plus potential taxonomic subtree and optionally how many were validated) +5. Optional filtering of the searched taxon/taxa from the reads (either from the raw files or the preprocessed reads, using either the output from the classification (kraken2 and/or bbduk) or blastn) +6. Summary of the processes (how many were classified and optionally how many were validated) 7. Present QC for raw reads ([`MultiQC`](http://multiqc.info/)) ## Usage @@ -55,8 +55,7 @@ nextflow run nf-core/detaxizer \ ``` > [!WARNING] -> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; -> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files). +> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files). For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/detaxizer/usage) and the [parameter documentation](https://nf-co.re/detaxizer/parameters). @@ -66,6 +65,11 @@ To see the results of an example test run with a full size dataset refer to the For more details about the output files and reports, please refer to the [output documentation](https://nf-co.re/detaxizer/output). +Generated samplesheets from the directory `/downstream_samplesheets/` can be used for the pipelines: + +- [nf-core/mag](https://nf-co.re/mag) +- [nf-core/taxprofiler](https://nf-co.re/taxprofiler) + ## Credits nf-core/detaxizer was originally written by [Jannik Seidel](https://github.com/jannikseidelQBiC) at the [Quantitative Biology Center (QBiC)](http://qbic.life/). @@ -74,6 +78,8 @@ We thank the following people for their extensive assistance in the development - [Daniel Straub](https://github.com/d4straub) +This work was initially funded by the German Center for Infection Research (DZIF). + ## Contributions and Support If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md). diff --git a/assets/email_template.html b/assets/email_template.html index 2172935..02191bf 100644 --- a/assets/email_template.html +++ b/assets/email_template.html @@ -4,7 +4,7 @@ - + nf-core/detaxizer Pipeline Report diff --git a/assets/multiqc_config.yml b/assets/multiqc_config.yml index 10f6037..5f2acd9 100644 --- a/assets/multiqc_config.yml +++ b/assets/multiqc_config.yml @@ -1,7 +1,7 @@ report_comment: > - This report has been generated by the nf-core/detaxizer + This report has been generated by the nf-core/detaxizer analysis pipeline. For information about how to interpret these results, please see the - documentation. + documentation. report_section_order: "nf-core-detaxizer-methods-description": order: -1000 diff --git a/assets/schema_input.json b/assets/schema_input.json index d7e71f0..f1407c9 100644 --- a/assets/schema_input.json +++ b/assets/schema_input.json @@ -1,5 +1,5 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/nf-core/detaxizer/master/assets/schema_input.json", "title": "nf-core/detaxizer pipeline - params.input schema", "description": "Schema for the file provided with params.input", diff --git a/conf/base.config b/conf/base.config index ff31d38..7623804 100644 --- a/conf/base.config +++ b/conf/base.config @@ -10,9 +10,10 @@ process { - cpus = { check_max( 1 * task.attempt, 'cpus' ) } - memory = { check_max( 6.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + // TODO nf-core: Check the defaults for all processes + cpus = { 1 * task.attempt } + memory = { 6.GB * task.attempt } + time = { 4.h * task.attempt } errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' } maxRetries = 1 @@ -25,30 +26,30 @@ process { // adding in your local modules too. // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors withLabel:process_single { - cpus = { check_max( 1 , 'cpus' ) } - memory = { check_max( 6.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 1 } + memory = { 6.GB * task.attempt } + time = { 4.h * task.attempt } } withLabel:process_low { - cpus = { check_max( 2 * task.attempt, 'cpus' ) } - memory = { check_max( 12.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 2 * task.attempt } + memory = { 12.GB * task.attempt } + time = { 4.h * task.attempt } } withLabel:process_medium { - cpus = { check_max( 6 * task.attempt, 'cpus' ) } - memory = { check_max( 36.GB * task.attempt, 'memory' ) } - time = { check_max( 8.h * task.attempt, 'time' ) } + cpus = { 6 * task.attempt } + memory = { 36.GB * task.attempt } + time = { 8.h * task.attempt } } withLabel:process_high { - cpus = { check_max( 12 * task.attempt, 'cpus' ) } - memory = { check_max( 72.GB * task.attempt, 'memory' ) } - time = { check_max( 72.h * task.attempt, 'time' ) } + cpus = { 12 * task.attempt } + memory = { 72.GB * task.attempt } + time = { 16.h * task.attempt } } withLabel:process_long { - time = { check_max( 20.h * task.attempt, 'time' ) } + time = { 20.h * task.attempt } } withLabel:process_high_memory { - memory = { check_max( 600.GB * task.attempt, 'memory' ) } + memory = { 200.GB * task.attempt } } withLabel:error_ignore { errorStrategy = 'ignore' diff --git a/conf/igenomes_ignored.config b/conf/igenomes_ignored.config new file mode 100644 index 0000000..b4034d8 --- /dev/null +++ b/conf/igenomes_ignored.config @@ -0,0 +1,9 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for iGenomes paths +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Empty genomes dictionary to use when igenomes is ignored. +---------------------------------------------------------------------------------------- +*/ + +params.genomes = [:] diff --git a/conf/modules.config b/conf/modules.config index 33912c7..7e81ecd 100644 --- a/conf/modules.config +++ b/conf/modules.config @@ -25,7 +25,8 @@ process { "--cut_front", "--cut_tail", "--cut_mean_quality ${params.fastp_cut_mean_quality}", - "--length_required ${params.reads_minlength}" + "--length_required ${params.reads_minlength}", + params.fastp_eval_duplication ? "" : "--dont_eval_duplication" ].join(' ').trim() publishDir = [ [ @@ -53,6 +54,27 @@ process { ] } + withName: BBMAP_BBDUK { + ext.args = ["ordered=t", + "k=${params.bbduk_kmers}" + ].join(' ').trim() + publishDir = [ + path: { "${params.outdir}/bbduk/" }, + mode: params.publish_dir_mode, + pattern: '*.log', + enabled: params.save_intermediates + ] + } + + withName: ISOLATE_BBDUK_IDS { + publishDir = [ + path: { "${params.outdir}/bbduk/ids" }, + mode: params.publish_dir_mode, + pattern: '*.bbduk.txt', + enabled: params.save_intermediates + ] + } + withName: KRAKEN2_KRAKEN2 { ext.args = ["--use-names", "--confidence ${params.kraken2confidence}", @@ -64,9 +86,34 @@ process { pattern: '*kraken2*', enabled: params.save_intermediates ] + } + withName: KRAKEN2_POST_CLASSIFICATION_FILTERED { + ext.args = ["--use-names", + "--confidence ${params.kraken2confidence_filtered}", + "--report-zero-counts" + ].join(' ').trim() + publishDir = [ + path: { "${params.outdir}/kraken2/filtered" }, + mode: params.publish_dir_mode, + pattern: '*kraken2*', + enabled: true + ] } + withName: KRAKEN2_POST_CLASSIFICATION_REMOVED { + ext.args = ["--use-names", + "--confidence ${params.kraken2confidence_removed}", + "--report-zero-counts" + ].join(' ').trim() + publishDir = [ + path: { "${params.outdir}/kraken2/removed" }, + mode: params.publish_dir_mode, + pattern: '*kraken2*', + enabled: true + ] + } + withName: PARSE_KRAKEN2REPORT { publishDir = [ path: { "${params.outdir}/kraken2/taxonomy" }, @@ -76,7 +123,7 @@ process { ] } - withName: ISOLATE_IDS_FROM_KRAKEN2_TO_BLASTN { + withName: ISOLATE_KRAKEN2_IDS { publishDir = [ path: { "${params.outdir}/kraken2/isolated" }, mode: params.publish_dir_mode, @@ -84,6 +131,16 @@ process { enabled: params.save_intermediates ] + } + + withName: MERGE_IDS { + publishDir = [ + path: { "${params.outdir}/classification/ids" }, + mode: params.publish_dir_mode, + pattern: '*ids.txt', + enabled: params.save_intermediates + ] + } withName: BLAST_MAKEBLASTDB { ext.args = '-dbtype "nucl"' @@ -109,28 +166,34 @@ process { } withName: RENAME_FASTQ_HEADERS_AFTER { - publishDir = [ - path: { "${params.outdir}/filter/" }, + publishDir = [[ + path: { "${params.outdir}/filter/filtered" }, mode: params.publish_dir_mode, - pattern: '*.fastq.gz', + pattern: '*filtered.fastq.gz', enabled: true - ] + ], + [ + path: { "${params.outdir}/filter/removed" }, + mode: params.publish_dir_mode, + pattern: '*removed.fastq.gz', + enabled: params.output_removed_reads + ]] } - withName: SUMMARY_BLASTN { + withName: SUMMARY_CLASSIFICATION { publishDir = [ - path: { "${params.outdir}/blast/summary" }, + path: { "${params.outdir}/classification/summary" }, mode: params.publish_dir_mode, - pattern: '*.blastn_summary.tsv', + pattern: '*.classification_summary.tsv', enabled: params.save_intermediates ] } - withName: SUMMARY_KRAKEN2 { + withName: SUMMARY_BLASTN { publishDir = [ - path: { "${params.outdir}/kraken2/summary" }, + path: { "${params.outdir}/blast/summary" }, mode: params.publish_dir_mode, - pattern: '*.kraken2_summary.tsv', + pattern: '*.blastn_summary.tsv', enabled: params.save_intermediates ] } @@ -143,7 +206,6 @@ process { enabled: true ] } - withName: 'MULTIQC' { ext.args = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' } publishDir = [ diff --git a/conf/test.config b/conf/test.config index 3eb4977..db30da4 100644 --- a/conf/test.config +++ b/conf/test.config @@ -10,28 +10,34 @@ ---------------------------------------------------------------------------------------- */ +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + params { config_profile_name = 'Test profile' config_profile_description = 'Minimal test dataset to check pipeline function' - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' - // Input data - input = "https://raw.githubusercontent.com/nf-core/test-datasets/detaxizer/samplesheets/samplesheet.csv" + input = params.pipelines_testdata_base_path + 'detaxizer/samplesheets/samplesheet.csv' // Genome references - fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/detaxizer/host_reference/genome.hg38.chr21_10000bp_region.fa" + fasta_bbduk = params.pipelines_testdata_base_path + 'detaxizer/host_reference/genome.hg38.chr21_10000bp_region.fa' - // Kraken2 test db - kraken2db = "https://raw.githubusercontent.com/nf-core/test-datasets/detaxizer/test_data/minigut_kraken.tgz" - kraken2confidence = 0.00 - tax2filter = 'unclassified' + // Run bbduk and kraken2 + classification_bbduk = true + classification_kraken2 = true - blast_coverage = 40.0 - blast_identity = 40.0 + // Kraken2 test db + kraken2db = params.pipelines_testdata_base_path + 'detaxizer/test_data/minigut_kraken.tgz' + kraken2confidence = 0.00 + tax2filter = 'unclassified' - enable_filter = true + enable_filter = true + generate_downstream_samplesheets = true + generate_pipeline_samplesheets = "taxprofiler,mag" } diff --git a/conf/test_blastn.config b/conf/test_blastn.config new file mode 100644 index 0000000..eef76f2 --- /dev/null +++ b/conf/test_blastn.config @@ -0,0 +1,45 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for running minimal tests +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Defines input files and everything required to run a fast and simple pipeline test + with validation through blastn. + + Use as follows: + nextflow run nf-core/detaxizer -profile test_blastn, --outdir + +---------------------------------------------------------------------------------------- +*/ + +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + +params { + config_profile_name = 'Test profile enabling blastn validation step' + config_profile_description = 'Minimal test dataset to check pipeline function when using blastn' + + + // Input data + input = params.pipelines_testdata_base_path + "detaxizer/samplesheets/samplesheet.csv" + + // Kraken2 test db + kraken2db = params.pipelines_testdata_base_path + "detaxizer/test_data/minigut_kraken.tgz" + kraken2confidence = 0.00 + tax2filter = 'unclassified' + + // Genome references + fasta_blastn = params.pipelines_testdata_base_path + "detaxizer/host_reference/genome.hg38.chr21_10000bp_region.fa" + + // Workflow parameters + enable_filter = true + validation_blastn = true + + // samplesheet generation + generate_downstream_samplesheets = true + generate_pipeline_samplesheets = "taxprofiler,mag" +} diff --git a/conf/test_filter_preprocessed.config b/conf/test_filter_preprocessed.config index 4550297..5e9de80 100644 --- a/conf/test_filter_preprocessed.config +++ b/conf/test_filter_preprocessed.config @@ -10,28 +10,32 @@ ---------------------------------------------------------------------------------------- */ +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + params { config_profile_name = 'Test profile filter preprocessed' config_profile_description = 'Minimal test dataset to check pipeline function when using preprocessed reads in the filtering step' - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' - // Input data - input = "https://raw.githubusercontent.com/nf-core/test-datasets/detaxizer/samplesheets/samplesheet.csv" + input = params.pipelines_testdata_base_path + "detaxizer/samplesheets/samplesheet.csv" // Kraken2 test db - kraken2db = "https://raw.githubusercontent.com/nf-core/test-datasets/detaxizer/test_data/minigut_kraken.tgz" - kraken2confidence = 0.00 - tax2filter = 'unclassified' - - // Genome references - fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/detaxizer/host_reference/genome.hg38.chr21_10000bp_region.fa" - + kraken2db = params.pipelines_testdata_base_path + "detaxizer/test_data/minigut_kraken.tgz" + kraken2confidence = 0.00 + tax2filter = 'unclassified' // Workflow parameters - enable_filter = true - filter_trimmed = true + enable_filter = true + preprocessing = true + filter_trimmed = true + + // samplesheet generation + generate_downstream_samplesheets = true + generate_pipeline_samplesheets = "taxprofiler,mag" } diff --git a/conf/test_full.config b/conf/test_full.config index 39cf642..7d5f888 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -11,21 +11,27 @@ */ params { - config_profile_name = 'Full test profile' - config_profile_description = 'Full test dataset to check pipeline function' + config_profile_name = 'Full test profile' + config_profile_description = 'Full test dataset to check pipeline function' // Input data for full size test - input = "https://raw.githubusercontent.com/nf-core/test-datasets/detaxizer/samplesheets/samplesheet.full.csv" + input = params.pipelines_testdata_base_path + 'detaxizer/samplesheets/samplesheet.full.csv' // Genome references - fasta = "s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa" + fasta_bbduk = "s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa" // Kraken2 test db - kraken2db = "https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20240112.tar.gz" - kraken2confidence = 0.00 - tax2filter = 'Homo' + kraken2db = "https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20240904.tar.gz" + kraken2confidence = 0.00 + tax2filter = 'Homo sapiens' - blast_coverage = 40.0 - blast_identity = 40.0 + classification_bbduk = true + classification_kraken2 = true - enable_filter = true + enable_filter = true + classification_kraken2_post_filtering = true + output_removed_reads = true + + // samplesheet generation + generate_downstream_samplesheets = true + generate_pipeline_samplesheets = "taxprofiler,mag" } diff --git a/conf/test_skip_blastn.config b/conf/test_skip_blastn.config deleted file mode 100644 index a2de757..0000000 --- a/conf/test_skip_blastn.config +++ /dev/null @@ -1,33 +0,0 @@ -/* -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Nextflow config file for running minimal tests -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Defines input files and everything required to run a fast and simple pipeline test skipping blastn. - - Use as follows: - nextflow run nf-core/detaxizer -profile test_skip_blastn, --outdir - ----------------------------------------------------------------------------------------- -*/ - -params { - config_profile_name = 'Test profile skipping blastn step' - config_profile_description = 'Minimal test dataset to check pipeline function when skipping blastn' - - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' - - // Input data - input = "https://raw.githubusercontent.com/nf-core/test-datasets/detaxizer/samplesheets/samplesheet.csv" - - // Kraken2 test db - kraken2db = "https://raw.githubusercontent.com/nf-core/test-datasets/detaxizer/test_data/minigut_kraken.tgz" - kraken2confidence = 0.00 - tax2filter = 'unclassified' - - // Workflow parameters - enable_filter = true - skip_blastn = true -} diff --git a/docs/images/Detaxizer_metro_workflow.png b/docs/images/Detaxizer_metro_workflow.png index 4d40d5c..9141ff5 100644 Binary files a/docs/images/Detaxizer_metro_workflow.png and b/docs/images/Detaxizer_metro_workflow.png differ diff --git a/docs/images/Detaxizer_metro_workflow.svg b/docs/images/Detaxizer_metro_workflow.svg index e871717..023c3f7 100644 --- a/docs/images/Detaxizer_metro_workflow.svg +++ b/docs/images/Detaxizer_metro_workflow.svg @@ -1,4 +1,4 @@ -
Fastq
Fastq
fastq.gz
FastQC
fastp
Blastn
Summarizer
Filtered results
Fastq
Fastq
fastq.gz
kraken2
Filter
kraken2 DB
Blastn DB
tsv
Summary
short reads
- single-end
- paired-end
long reads
QC, preprocessing and classification
v1.0.0
Exclusive OR
Optional validation
Optional filtering
Databases for classification and validation
Summary
\ No newline at end of file +
Fastq
Fastq
fastq.gz
FastQC
fastp
blastn
Summarizer
filtered reads
Fastq
Fastq
fastq.gz
kraken2
Filter
Standard route
v1.1.0
Exclusive OR
Optional filtering
Optional routes
bbduk
tsv
summary
short reads
    - single-end
    - paired-end
long reads
Fastq
Fastq
fastq.gz
removed
reads
kraken2
txt
txt
kraken2
reports
removed
reads
kraken2
reports filtered
reads
Classification
\ No newline at end of file diff --git a/docs/output.md b/docs/output.md index 9292836..1cfee72 100644 --- a/docs/output.md +++ b/docs/output.md @@ -11,15 +11,17 @@ The directories listed below will be created in the results directory after the The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: - [FastQC](#fastqc) - Raw read QC - Output not in the results directory by default -- [fastp](#fastp) - Preprocessing of raw reads -- [kraken2](#kraken2) - Classification of the preprocessed reads and extracting the searched taxa from the results -- [blastn](#blastn) - Validation of the reads classified as the searched taxa and extracting ids of validated reads -- [filter](#filter) - (Optional) filtering of the raw or preprocessed reads using either the read ids from kraken2 output or blastn output +- [fastp](#fastp) - (Optional) preprocessing of raw reads +- [kraken2](#kraken2) - Classification of the (preprocessed) reads and extracting the searched taxa from the results +- [bbduk](#bbduk) - Classification of the (preprocessed) reads +- [classification](#classification) - Preparation of the read IDs for filtering and/or validation +- [blastn](#blastn) - (Optional) validation of the reads classified as the searched taxa and extracting ids of validated reads +- [filter](#filter) - (Optional) filtering of the raw or preprocessed reads using either the read ids from kraken2 and/or bbduk output or blastn output - [summary](#summary) - The summary of the classification and the optional validation - [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline - [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution -Only the filtering results, the summary, MultiQC and pipeline information are shown by default in the results folder. +Only the filtering results, the summary, MultiQC and pipeline information are shown by default in the results folder. Also, if the output from the filter are classified using kraken2, a kraken2 folder, containing a `filtered/` and a `removed/`folder, will be shown. ### FastQC @@ -51,19 +53,55 @@ kraken2 classifies the reads. The important files are `*.classifiedreads.txt`, `
Output files -- `kraken2/`: Contains the output from the classification step. +- `kraken2/`: Contains the output from the kraken2 classification steps. + - `filtered/`: Contains the classification of the filtered reads (post-filtering). + - `.classifiedreads.txt`: The whole kraken2 output for filtered reads. + - `.kraken2.report.txt`: Statistics on how many reads were assigned to which taxon/taxonomic group in the filtered reads. - `isolated/`: Contains the isolated lines and ids for the taxon/taxa mentioned in the `tax2filter` parameter. - `.classified.txt`: The whole kraken2 output for the taxon/taxa mentioned in the `tax2filter` parameter. - `.ids.txt`: The ids from the whole kraken2 output assigned to the taxon/taxa mentioned in the `tax2filter` parameter. + - `removed/`: Contains the classification of the removed reads (post-filtering). + - `.classifiedreads.txt`: The whole kraken2 output for removed reads. + - `.kraken2.report.txt`: Statistics on how many reads were assigned to which taxon/taxonomic group in the removed reads. - `summary/`: Summary of the kraken2 process. - `.kraken2_summary.tsv`: Contains two three columns, column 1 is the sample name, column 2 the amount of lines in the untouched kraken2 output and column 3 the amount of lines in the isolated output. - `taxonomy/`: Contains the list of taxa to filter/to assess for. - `taxa_to_filter.txt`: Contains the taxon ids of all taxa to assess the data for or to filter out. - `.classifiedreads.txt`: The whole kraken2 output for all reads. - - `.kraken2.report.txt`: Statistics on how many reads where assigned to which taxon/taxonomic group. + - `.kraken2.report.txt`: Statistics on how many reads were assigned to which taxon/taxonomic group.
+### bbduk + +bbduk classifies the reads by kmer matching to a reference. +As soon as one k-mer is in the reference, the read is classified. +The important files are `*.bbduk.log` and `ids/*.bbduk.txt`. +`` can be replaced by `_longReads`, `_R1` or left as `` depending on the cases mentioned in [fastp](#fastp). + +
+Output files + +- `bbduk/`: Contains the output from the bbduk classification step. + - `ids/`: Contains the files with the IDs classified by bbduk. + - `.bbduk.txt`: Contains the classified IDs per sample. + - `.bbduk.log`: Contains statistics on the bbduk run. + +
+ +### classification + +Either the merged IDs from [bbduk](#bbduk) and [kraken2](#kraken2) or the ones produced by one of the tools are shown in this folder. Also, the summary files of the classification step are included. + +
+Output files + +- `classification/`: Contains the results and the summaries of the classification step. + - `ids/`: Contains either the merged ID files of the classification step or the ones from one classification tool. + - `.ids.txt`: Contains the classified IDs. + - `summary/`: Contains the summary files of either the classification step or the ones from one classification tool. - `.classification_summary.tsv`: Contains the count of reads classified. +
+ ### blastn blastn can validate the reads classified by kraken2 as the taxon/taxa to be assessed/to be filtered. To reduce computational burden only the highest scoring hit per input sequence is returned. If in any case one would need more information this can be done via the `max_hsps`- and `max_target_seqs`-flags in the `modules.config` file. @@ -89,17 +127,20 @@ In this folder, the filtered and re-renamed reads can be found. This result has Output files - `filter/`: Folder containing the filtered and re-renamed reads. - - `_filtered.fastq.gz`: The filtered reads, `` can stay as `` for single-end short reads, take the pattern `_{R1,R2}` for paired-end reads and `_longReads` for long reads. + - `filtered/`: Folder containing the decontaminated reads + - `_filtered.fastq.gz`: The filtered reads, `` can stay as `` for single-end short reads, take the pattern `_{R1,R2}` for paired-end reads and `_longReads` for long reads. +- `removed/`: Folder containing the removed reads (optional) + - `_removed.fastq.gz`: The removed reads, `` can stay as `` for single-end short reads, take the pattern `_{R1,R2}` for paired-end reads and `_longReads` for long reads. ### summary -The summary file lists all statistics of kraken2 and blastn per sample. It is a combination of the summary files of kraken2 and blastn and can be used for a quick overview of the pipeline run. If blastn is skipped, then only the statistics of kraken2 is shown. +The summary file lists all statistics of kraken2 and/or bbduk (and optionally blastn) per sample. It is a combination of the summary files of the classification step and blastn and can be used for a quick overview of the pipeline run. By default, only the summary of the classification step is shown. -| | kraken2 | isolatedkraken2 | blastn_unique_ids | blastn_lines | filteredblastn_unique_ids | filteredblastn_lines | -| ------------------------------------------------------------------------------------------------------------------ | -------------------------- | --------------------------------------- | ------------------------------------------------------------------------- | ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------- | -| `` (For short reads it is the same as in the `samplesheet.csv`, for long reads it is `_longReads`) | Read IDs in kraken2 output | Read IDs in the isolated kraken2 output | Number of unique IDs in blastn output, should be the same as blastn_lines | Number of lines in the blastn output | Number of IDs in the blastn output after the filtering for identity and coverage, should be the same as filteredblastn_lines | Number of lines in the blastn output after the filtering for identity and coverage | +| | classified with \* | blastn_unique_ids | blastn_lines | filteredblastn_unique_ids | filteredblastn_lines | +| ------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------- | ------------------------------------------------------------------------- | ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------- | +| `` (For short reads it is the same as in the `samplesheet.csv`, for long reads it is `_longReads`) | Number of IDs classified in the classification step | Number of unique IDs in blastn output, should be the same as blastn_lines | Number of lines in the blastn output | Number of IDs in the blastn output after the filtering for identity and coverage, should be the same as filteredblastn_lines | Number of lines in the blastn output after the filtering for identity and coverage |
Output files @@ -125,6 +166,33 @@ The summary file lists all statistics of kraken2 and blastn per sample. It is a Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see . +### Downstream samplesheets + +The pipeline can also generate input files for the following downstream +pipelines: + +- [nf-core/taxprofiler](https://nf-co.re/taxprofiler) +- [nf-core/mag](https://nf-co.re/mag) + +
+Output files + +- `downstream_samplesheets/` + - `taxprofiler.csv`: Filled out nf-core/taxprofiler `--input` csv with paths to reads saved in the results directory + - `mag-pe.csv`: Filled out nf-core/mag `--input` csv for paired-end reads with paths to reads saved in the results directory + - `mag-se.csv`: Filled out nf-core/mag `--input` csv for single-end reads with paths to reads saved in the results directory + +
+ +:::warning +Any generated downstream samplesheet is provided as 'best effort' and are not guaranteed to work straight out of the box! +They may not be complete (e.g. some columns may need to be manually filled in). +::: + +:::warning +Detaxizer can process long-reads independent from short reads. nf-core/mag (as of 3.1.0) can only take short, or short + long but not standalone long-reads as an input (this is being worked on). Standalone long-reads will not be included in the nf-core/mag samplesheets. +::: + ### Pipeline information
diff --git a/docs/usage.md b/docs/usage.md index 60948d6..fb32df7 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -6,7 +6,7 @@ ## Introduction -nf-core/detaxizer is a pipeline to assess raw (meta)genomic data for contaminations and optionally filter reads which were classified as contamination. Default taxa classified as contamination are **_Homo_** and **_Homo sapiens_**. +nf-core/detaxizer is a pipeline to assess raw (meta)genomic data for contaminations and optionally filter reads which were classified as contamination. The default taxon classified as contamination is **_Homo sapiens_**. ## Samplesheet input @@ -46,15 +46,22 @@ The databases used by detaxizer have an influence on the amount of false positiv The task of decontamination has to be balanced out between false positives and false negatives depending on what is needed in your use case. +> [!NOTE] +> Be aware that the `tax2filter` (default _Homo sapiens_) has to be in the provided kraken2 database (if kraken2 is used) and that the reference for bbduk (provided by the `fasta_bbduk` parameter) should contain the taxa to filter/assess if it is wanted to assess/remove the same taxa as in `tax2filter`. This overlap in the databases is not checked by the pipeline. To filter out/assess taxa with bbduk only, the `tax2filter` parameter is not needed but a fasta file with references of these taxa has to be provided. + ### kraken2 -To reduce false negatives a larger kraken2 database should be used. This comes at costs in terms of hardware requirements. For the largest kraken2 standard database (which can be found [here](https://benlangmead.github.io/aws-indexes/k2)) at least 100 GB of memory should be available, depending on the size of your data the required memory may be higher. For standard decontamination tasks the Standard-8 database can be used (which is the default), but it should always be kept in mind that this may lead to false negatives to some extend. +To reduce false negatives a larger kraken2 database should be used. This comes at costs in terms of hardware requirements. For the largest kraken2 standard database (which can be found [here](https://benlangmead.github.io/aws-indexes/k2)) at least 100 GB of memory should be available, depending on the size of your data the required memory may be higher. For standard decontamination tasks the Standard-8 GB database can be used (which is the default), but it should always be kept in mind that this may lead to false negatives to some extent. + +To build your own database refer to [this site](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#custom-databases). + +### bbduk -Also, pangenome databases of the organism(s) classified as contamination could increase the amount of true positives while reducing the hardware requirements. For human such a database can be found [here](https://zenodo.org/doi/10.5281/zenodo.8339731). Such a database will increase false positives, unless a custom database is built together with the data of the organisms not classified as contamination. To build your own database refer to [this site](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#custom-databases). +bbduk uses a fasta file which contains sequences from the taxon/taxa classified as contamination. Default is the `GRCh38` human reference genome. Provide a custom file using the `fasta_bbduk` parameter. ### blastn -The blastn database is built from a fasta file. Default is the `GRCh38` human reference genome. To decrease the amount of false negatives in this step or include different taxa, a database of several taxa can be used. The fasta containing desired sequences has to be provided to the pipeline by using the `fasta` parameter. +The blastn database is built from a fasta file. Default is the `GRCh38` human reference genome. To decrease the amount of false negatives in this step or include different taxa, a database of several taxa can be used. The fasta containing desired sequences has to be provided to the pipeline by using the `fasta_blastn` parameter. ## Running the pipeline @@ -89,9 +96,9 @@ The above pipeline run specified with a params file in yaml format: nextflow run nf-core/detaxizer -profile docker -params-file params.yaml ``` -with `params.yaml` containing: +with: -```yaml +```yaml title="params.yaml" input: './samplesheet.csv' outdir: './results/' <...> @@ -101,13 +108,21 @@ You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-c Before and after (if using the filter) the execution of the pipeline the headers inside the `.fastq.gz` files are renamed. This step is necessary to avoid difficulties with different header formats in the pipeline. The renamed headers will never be shown to you, except when looking into the work directory. Only the original read headers are shown in the results. -To change the taxon or taxonomic subtree which is classified by kraken2 as contamination use the `tax2filter` parameter (default `Homo`). The taxon has to be in the kraken2 database used, which can be specified using the `kraken2db` parameter. +To change the taxon or taxonomic subtree which is classified by kraken2 as contamination use the `tax2filter` parameter (default `Homo sapiens`). The taxon has to be in the kraken2 database used, which can be specified using the `kraken2db` parameter. + +To change what is classified by `bbduk`, a fasta containing the sequences of the contaminant taxon/taxa has to be provided using the `fasta_bbduk` parameter. + +If you want to run `bbduk` use the `--classification_bbduk` flag. For running both classification steps and use the merged output for filtering, use both flags (`--classification_kraken2` and `--classification_bbduk`). -To change the organism(s) which should be validated as contamination(s) by blasting against a database, you have to provide a fasta from which the blastn database is built using the `fasta` parameter. Also, if just one reference genome is needed for blastn and it is in [igenomes.config](../conf/igenomes.config) use the according name (e.g. `'GRCh38'`) as `genome` parameter. +To change the organism(s) which should be validated as contamination(s) by blasting against a database, you have to provide a fasta from which the blastn database is built using the `fasta_blastn` parameter. Also, if just one reference genome is needed for blastn and it is in `igenomes.config` use the according name (e.g. `'GRCh38'`) as `genome` parameter. -Skipping blastn can be done by using `--skip_blastn`. +blastn can be turned on using the `validation_blastn` parameter. -Optionally enabling the filter can be done by using `--enable_filter`. There are two options for the input of the filter, either the raw reads or the preprocessed ones. The first is the default option. Also, for the definition of the reads to be filtered by their IDs two options are available. Either the default is taken, the output from the `blastn` step, or using the output from the `kraken2` step. If `blastn` is skipped, the classified read IDs of `kraken2` are automatically used in the filtering step. +Optionally enabling the filter can be done by using `--enable_filter`. There are two options for the input of the filter, either the raw reads or the preprocessed ones. The first is the default option. Also, for the definition of the reads to be filtered by their IDs two options are available. Either the default is taken, the output from the classification step (kraken2), or using the output from the `blastn` step. + +If you want to output the removed reads, use `--output_removed_reads`. + +Optional classification of the filtered (and removed) reads can be done using `--classification_kraken2_post_filtering`. This uses the kraken2 database provided by `kraken2db`. ### Updating the pipeline @@ -169,6 +184,8 @@ If `-profile` is not specified, the pipeline will run locally and expect all sof - A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/) - `apptainer` - A generic configuration profile to be used with [Apptainer](https://apptainer.org/) +- `wave` + - A generic configuration profile to enable [Wave](https://seqera.io/wave/) containers. Use together with one of the above (requires Nextflow ` 24.03.0-edge` or later). - `conda` - A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter, Charliecloud, or Apptainer. @@ -210,14 +227,6 @@ See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the [`#configs` channel](https://nfcore.slack.com/channels/configs). -## Azure Resource Requests - -To be used with the `azurebatch` profile by specifying the `-profile azurebatch`. -We recommend providing a compute `params.vm_type` of `Standard_D16_v3` VMs by default but these options can be changed if required. - -Note that the choice of VM size depends on your quota and the overall workload during the analysis. -For a thorough list, please refer the [Azure Sizes for virtual machines in Azure](https://docs.microsoft.com/en-us/azure/virtual-machines/sizes). - ## Running in the background Nextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished. diff --git a/main.nf b/main.nf index defb3da..43bb5bb 100644 --- a/main.nf +++ b/main.nf @@ -9,15 +9,13 @@ ---------------------------------------------------------------------------------------- */ -nextflow.enable.dsl = 2 - /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ IMPORT FUNCTIONS / MODULES / SUBWORKFLOWS / WORKFLOWS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -include { DETAXIZER } from './workflows/detaxizer' +include { NFCORE_DETAXIZER } from './workflows/detaxizer' include { PIPELINE_INITIALISATION } from './subworkflows/local/utils_nfcore_detaxizer_pipeline' include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_detaxizer_pipeline' @@ -27,43 +25,15 @@ include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_deta ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -// -// WORKFLOW: Run main analysis pipeline depending on type of input -// -workflow NFCORE_DETAXIZER { - - take: - samplesheet // channel: samplesheet read in from --input - - main: - - // - // WORKFLOW: Run pipeline - // - DETAXIZER ( - samplesheet - ) - - emit: - multiqc_report = DETAXIZER.out.multiqc_report // channel: /path/to/multiqc_report.html - -} -/* -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - RUN MAIN WORKFLOW -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -*/ workflow { main: - // // SUBWORKFLOW: Run initialisation tasks // PIPELINE_INITIALISATION ( params.version, - params.help, params.validate_params, params.monochrome_logs, args, @@ -77,7 +47,6 @@ workflow { NFCORE_DETAXIZER ( PIPELINE_INITIALISATION.out.samplesheet ) - // // SUBWORKFLOW: Run completion tasks // diff --git a/modules.json b/modules.json index ba13e42..4d4f88c 100644 --- a/modules.json +++ b/modules.json @@ -5,34 +5,40 @@ "https://github.com/nf-core/modules.git": { "modules": { "nf-core": { + "bbmap/bbduk": { + "branch": "master", + "git_sha": "a1abf90966a2a4016d3c3e41e228bfcbd4811ccc", + "installed_by": ["modules"], + "patch": "modules/nf-core/bbmap/bbduk/bbmap-bbduk.diff" + }, "blast/blastn": { "branch": "master", - "git_sha": "209e5a3e2753c5e628736a662c877c20f341ee15", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["modules"] }, "blast/makeblastdb": { "branch": "master", - "git_sha": "a01c66c96e0bc610ad126e7adc4a94cd4acd1b48", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["modules"] }, "fastp": { "branch": "master", - "git_sha": "95cf5fe0194c7bf5cb0e3027a2eb7e7c89385080", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["modules"] }, "fastqc": { "branch": "master", - "git_sha": "f4ae1d942bd50c5c0b9bd2de1393ce38315ba57c", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["modules"] }, "kraken2/kraken2": { "branch": "master", - "git_sha": "653218e79ffa76fde20319e9062f8b8da5cf7555", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["modules"] }, "multiqc": { "branch": "master", - "git_sha": "b7ebe95761cd389603f9cc0e0dc384c0f663815a", + "git_sha": "cf17ca47590cc578dfb47db1c2a44ef86f89976d", "installed_by": ["modules"] } } @@ -41,17 +47,17 @@ "nf-core": { "utils_nextflow_pipeline": { "branch": "master", - "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa", + "git_sha": "3aa0aec1d52d492fe241919f0c6100ebf0074082", "installed_by": ["subworkflows"] }, "utils_nfcore_pipeline": { "branch": "master", - "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa", + "git_sha": "1b6b9a3338d011367137808b49b923515080e3ba", "installed_by": ["subworkflows"] }, - "utils_nfvalidation_plugin": { + "utils_nfschema_plugin": { "branch": "master", - "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa", + "git_sha": "bbd5a41f4535a8defafe6080e00ea74c45f4f96c", "installed_by": ["subworkflows"] } } diff --git a/modules/local/filter.nf b/modules/local/filter.nf index 0d973b4..c559780 100644 --- a/modules/local/filter.nf +++ b/modules/local/filter.nf @@ -2,17 +2,18 @@ process FILTER { tag "$meta.id" label 'process_high' - conda "bioconda::seqkit=2.8.0" + conda "bioconda::seqkit=2.8.2" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/seqkit:2.8.0--h9ee0642_0': - 'biocontainers/seqkit:2.8.0--h9ee0642_0'}" + 'https://depot.galaxyproject.org/singularity/seqkit:2.8.2--h9ee0642_0': + 'biocontainers/seqkit:2.8.2--h9ee0642_0'}" input: tuple val(meta), path(fastq), path(ids_to_remove) output: - tuple val(meta), path('*.fastq.gz') , emit: filtered - path "versions.yml" , emit: versions + tuple val(meta), path('*filtered_renamed.fastq.gz') , emit: filtered + tuple val(meta), path('*removed_renamed.fastq.gz') , optional: true , emit: removed + path "versions.yml" , emit: versions when: task.ext.when == null || task.ext.when @@ -28,9 +29,15 @@ process FILTER { do COUNTER=\$((COUNTER+1)) seqkit grep -v -f \${array2[\$(COUNTER-1)]} \$element -o \$(echo ${meta.id})_R\$(echo \$COUNTER)_filtered_renamed.fastq.gz + if [ "${params.output_removed_reads}" == "true" ]; then + seqkit grep -f \${array2[\$(COUNTER-1)]} \$element -o \$(echo ${meta.id})_R\$(echo \$COUNTER)_removed_renamed.fastq.gz + fi done else seqkit grep -v -f ${ids_to_remove} ${fastq} -o ${meta.id}_filtered_renamed.fastq.gz + if [ "${params.output_removed_reads}" == "true" ]; then + seqkit grep -f ${ids_to_remove} ${fastq} -o ${meta.id}_removed_renamed.fastq.gz + fi fi cat <<-END_VERSIONS > versions.yml diff --git a/modules/local/isolate_bbduk_ids.nf b/modules/local/isolate_bbduk_ids.nf new file mode 100644 index 0000000..e389b6e --- /dev/null +++ b/modules/local/isolate_bbduk_ids.nf @@ -0,0 +1,34 @@ +process ISOLATE_BBDUK_IDS { + tag "$meta.id" + label 'process_single' + + conda "bioconda::seqkit=2.8.2" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/seqkit:2.8.2--h9ee0642_0': + 'biocontainers/seqkit:2.8.2--h9ee0642_0'}" + + input: + tuple val(meta), path(contamination) + + output: + tuple val(meta), path('*.bbduk.txt') , emit: classified_ids + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + """ + if [ "$meta.single_end" == "true" ]; then + seqkit seq -n $contamination > ${meta.id}.bbduk.txt + else + seqkit seq -n ${contamination[0]} > read_ids1.txt + seqkit seq -n ${contamination[1]} > read_ids2.txt + awk '!seen[\$0]++' read_ids1.txt read_ids2.txt > ${meta.id}.bbduk.txt + fi + cat <<-END_VERSIONS > versions.yml + "${task.process}": + seqkit: \$(seqkit version | sed -E 's/.*v([0-9]+\\.[0-9]+\\.[0-9]+).*/\\1/') + END_VERSIONS + """ +} diff --git a/modules/local/isolate_ids_from_kraken2_to_blastn.nf b/modules/local/isolate_kraken2_ids.nf similarity index 96% rename from modules/local/isolate_ids_from_kraken2_to_blastn.nf rename to modules/local/isolate_kraken2_ids.nf index 65c7c77..4fd6cf9 100644 --- a/modules/local/isolate_ids_from_kraken2_to_blastn.nf +++ b/modules/local/isolate_kraken2_ids.nf @@ -1,4 +1,4 @@ -process ISOLATE_IDS_FROM_KRAKEN2_TO_BLASTN { +process ISOLATE_KRAKEN2_IDS { tag "$meta.id" label 'process_single' @@ -68,7 +68,7 @@ process ISOLATE_IDS_FROM_KRAKEN2_TO_BLASTN { filterList.append(line[1]) outfile.write("\\t".join(line)) - with open('${meta.id}.ids.txt', 'w') as outfile: + with open('${meta.id}.kraken2.ids.txt', 'w') as outfile: for entry in filterList: outfile.write(entry+"\\n") diff --git a/modules/local/merge_ids.nf b/modules/local/merge_ids.nf new file mode 100644 index 0000000..cfae824 --- /dev/null +++ b/modules/local/merge_ids.nf @@ -0,0 +1,33 @@ +process MERGE_IDS { + tag "$meta.id" + label 'process_high' + + conda "conda-forge::gawk=5.3.0" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/gawk:5.3.0' : + 'biocontainers/gawk:5.3.0' }" + + input: + tuple val(meta), path(ids) + + output: + tuple val(meta), path('*ids.txt') , emit: classified_ids + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + """ + stringarray=($ids) + if [ "\${#stringarray[@]}" == 1 ]; then + cat \${stringarray[0]} > ${meta.id}.ids.txt + else + awk '!seen[\$0]++' \${stringarray[0]} \${stringarray[1]} > ${meta.id}.ids.txt + fi + cat <<-END_VERSIONS > versions.yml + "${task.process}": + gawk: \$(awk -Wversion | sed '1!d; s/.*Awk //; s/,.*//') + END_VERSIONS + """ +} diff --git a/modules/local/parse_kraken2report.nf b/modules/local/parse_kraken2report.nf index b21d65c..ed2bdba 100644 --- a/modules/local/parse_kraken2report.nf +++ b/modules/local/parse_kraken2report.nf @@ -19,7 +19,7 @@ process PARSE_KRAKEN2REPORT { script: """ - parse_kraken2report.py -i $kraken2report -t $params.tax2filter + parse_kraken2report.py -i $kraken2report -t "$params.tax2filter" cat <<-END_VERSIONS > versions.yml "${task.process}": diff --git a/modules/local/prepare_fasta4blastn.nf b/modules/local/prepare_fasta4blastn.nf index 32dd3a2..28614a1 100644 --- a/modules/local/prepare_fasta4blastn.nf +++ b/modules/local/prepare_fasta4blastn.nf @@ -2,10 +2,10 @@ process PREPARE_FASTA4BLASTN { tag "$meta.id" label 'process_single' - conda "bioconda::seqkit=2.8.0" + conda "bioconda::seqkit=2.8.2" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/seqkit:2.8.0--h9ee0642_0': - 'biocontainers/seqkit:2.8.0--h9ee0642_0'}" + 'https://depot.galaxyproject.org/singularity/seqkit:2.8.2--h9ee0642_0': + 'biocontainers/seqkit:2.8.2--h9ee0642_0'}" input: tuple val(meta), path(trimmedreads), path(kraken2results) diff --git a/modules/local/rename_fastq_headers_after.nf b/modules/local/rename_fastq_headers_after.nf index 98ba51f..febe847 100644 --- a/modules/local/rename_fastq_headers_after.nf +++ b/modules/local/rename_fastq_headers_after.nf @@ -2,17 +2,18 @@ process RENAME_FASTQ_HEADERS_AFTER { tag "$meta.id" label 'process_medium' - conda "bioconda::seqkit=2.8.0" + conda "bioconda::seqkit=2.8.2" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/seqkit:2.8.0--h9ee0642_0': - 'biocontainers/seqkit:2.8.0--h9ee0642_0'}" + 'https://depot.galaxyproject.org/singularity/seqkit:2.8.2--h9ee0642_0': + 'biocontainers/seqkit:2.8.2--h9ee0642_0'}" input: - tuple val(meta), path(fastqfiltered), path(renamedHeaders) - + tuple val(meta) , path(fastqfiltered), path(renamedHeaders) + tuple val(meta2), path(fastqremoved) output: - tuple val(meta), path('*.fastq.gz'), emit: fastq - path "versions.yml" , emit: versions + tuple val(meta), path('*_filtered.fastq.gz') , emit: fastq + tuple val(meta), path('*_removed.fastq.gz') , optional: true , emit: fastq_removed + path "versions.yml" , emit: versions when: task.ext.when == null || task.ext.when @@ -20,15 +21,24 @@ process RENAME_FASTQ_HEADERS_AFTER { script: """ if [ "$meta.single_end" == "true" ]; then - gzip -d $renamedHeaders + gzip -f -d $renamedHeaders seqkit replace -p '^(.+)\$' -r '{kv}' -k *_headers.txt $fastqfiltered -o ${meta.id}_filtered.fastq.gz + if [ "$meta2" != "empty" ]; then + seqkit replace -p '^(.+)\$' -r '{kv}' -k *_headers.txt $fastqremoved -o ${meta.id}_removed.fastq.gz + fi rm *_headers.txt else - gzip -d ${renamedHeaders[0]} + gzip -f -d ${renamedHeaders[0]} seqkit replace -p '^(.+)\$' -r '{kv}' -k *_headers_fw.txt ${fastqfiltered[0]} -o ${meta.id}_R1_filtered.fastq.gz + if [ "$meta2" != "empty" ]; then + seqkit replace -p '^(.+)\$' -r '{kv}' -k *_headers_fw.txt ${fastqremoved[0]} -o ${meta.id}_R1_removed.fastq.gz + fi rm *_headers_fw.txt - gzip -d ${renamedHeaders[1]} + gzip -f -d ${renamedHeaders[1]} seqkit replace -p '^(.+)\$' -r '{kv}' -k *_headers_rv.txt ${fastqfiltered[1]} -o ${meta.id}_R2_filtered.fastq.gz + if [ "$meta2" != "empty" ]; then + seqkit replace -p '^(.+)\$' -r '{kv}' -k *_headers_rv.txt ${fastqremoved[1]} -o ${meta.id}_R2_removed.fastq.gz + fi rm *_headers_rv.txt fi cat <<-END_VERSIONS > versions.yml diff --git a/modules/local/summarizer.nf b/modules/local/summarizer.nf index bbeae42..5829f25 100644 --- a/modules/local/summarizer.nf +++ b/modules/local/summarizer.nf @@ -28,18 +28,18 @@ process SUMMARIZER { version_output = subprocess.getoutput('python --version') return version_output.split()[1] - files_kraken2 = glob.glob('*.kraken2_summary.tsv') + files_classified = glob.glob('*.classification_summary.tsv') files_blastn = glob.glob('*.blastn_summary.tsv') - kraken2_dfs = [pd.read_csv(file, sep="\\t", index_col=0) for file in files_kraken2] - df_kraken2 = pd.concat(kraken2_dfs) + classified_dfs = [pd.read_csv(file, sep="\\t", index_col=0) for file in files_classified] + df_classified = pd.concat(classified_dfs) if files_blastn != []: blastn_dfs = [pd.read_csv(file, sep="\\t", index_col=0) for file in files_blastn] df_blastn = pd.concat(blastn_dfs) - summary_df = df_kraken2.join(df_blastn) + summary_df = df_classified.join(df_blastn) summary_df.to_csv("summary.tsv", sep="\\t") else: - summary_df = df_kraken2 + summary_df = df_classified summary_df.to_csv("summary.tsv", sep="\\t") # Generate the version.yaml for MultiQC diff --git a/modules/local/summary_classification.nf b/modules/local/summary_classification.nf new file mode 100644 index 0000000..50faec8 --- /dev/null +++ b/modules/local/summary_classification.nf @@ -0,0 +1,56 @@ +process SUMMARY_CLASSIFICATION { + tag "$meta.id" + label 'process_single' + + conda "conda-forge::python=3.10.4 pandas=1.5.2" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/pandas:1.5.2' : + 'biocontainers/pandas:1.5.2' }" + input: + tuple val(meta),path(classification) + + output: + tuple val(meta), path("*.classification_summary.tsv") , emit: summary + path("versions.yml") , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + """ + #!/usr/bin/env python + import subprocess + import pandas as pd + + def get_version(): + version_output = subprocess.getoutput('python --version') + return version_output.split()[1] + + with open("${classification}", 'r') as fp: + lines = len(fp.readlines()) + if ("${params.classification_kraken2}" == 'false' and "${params.classification_bbduk}" == 'false') or ("${params.classification_kraken2}" == 'true' and "${params.classification_bbduk}" == 'false'): + classified_dict = { + "classified with kraken2": [str(lines)] + } + elif ("${params.classification_kraken2}" == 'false' and "${params.classification_bbduk}" == 'true'): + classified_dict = { + "classified with bbduk": [str(lines)] + } + elif ("${params.classification_kraken2}" == 'true' and "${params.classification_bbduk}" == 'true'): + classified_dict = { + "classified with kraken2 and bbduk": [str(lines)] + } + print(classified_dict) + df = pd.DataFrame(classified_dict) + index_name = "${meta.id}".replace("_R1","") + df.index = [index_name] + + df.to_csv(index_name + ".classification_summary.tsv",sep='\\t') + + # Generate the version.yaml for MultiQC + with open('versions.yml', 'w') as f: + f.write(f'"{subprocess.getoutput("echo ${task.process}")}":\\n') + f.write(f' python: {get_version()}\\n') + f.write(f' pandas: {pd.__version__}\\n') + """ +} diff --git a/modules/local/summary_kraken2.nf b/modules/local/summary_kraken2.nf deleted file mode 100644 index 8cf09b2..0000000 --- a/modules/local/summary_kraken2.nf +++ /dev/null @@ -1,75 +0,0 @@ -process SUMMARY_KRAKEN2 { - tag "$meta.id" - label 'process_single' - - conda "conda-forge::python=3.10.4 pandas=1.5.2" - container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/pandas:1.5.2' : - 'biocontainers/pandas:1.5.2' }" - input: - tuple val(meta),path(kraken2) - - - output: - tuple val(meta), path("*.kraken2_summary.tsv") , emit: summary - path("versions.yml") , emit: versions - - when: - task.ext.when == null || task.ext.when - - script: - """ - #!/usr/bin/env python - import subprocess - import pandas as pd - - def sort_list_of_files_by_pattern(kraken2_complete_list): - kraken2_dict = { - "kraken2": [], - "isolatedkraken2": [] - } - for entry in kraken2_complete_list: - if 'classifiedreads.txt' in entry: - kraken2_dict["kraken2"].append(entry) - else: - kraken2_dict["isolatedkraken2"].append(entry) - - return kraken2_dict - - def calculate_lines_of_file(path): - lines = 0 - with open(path,'r') as f: - for line in f: - if line == '\\n': - pass - else: - lines += 1 - return lines - - def get_version(): - version_output = subprocess.getoutput('python --version') - return version_output.split()[1] - - list_files = "${kraken2}".split(" ") - kraken2_dict = sort_list_of_files_by_pattern(list_files) - kraken2_dict_lines = { - "kraken2": 0, - "isolatedkraken2": 0 - } - for key in kraken2_dict.keys(): - for entry in kraken2_dict[key]: - kraken2_dict_lines[key] += calculate_lines_of_file(entry) - kraken2_dict_lines[key] = [ kraken2_dict_lines[key] ] - df = pd.DataFrame(kraken2_dict_lines) - - df.index = ["${meta.id}"] - - df.to_csv("${meta.id}.kraken2_summary.tsv",sep='\\t') - - # Generate the version.yaml for MultiQC - with open('versions.yml', 'w') as f: - f.write(f'"{subprocess.getoutput("echo ${task.process}")}":\\n') - f.write(f' python: {get_version()}\\n') - f.write(f' pandas: {pd.__version__}\\n') - """ -} diff --git a/modules/nf-core/bbmap/bbduk/bbmap-bbduk.diff b/modules/nf-core/bbmap/bbduk/bbmap-bbduk.diff new file mode 100644 index 0000000..982d9b4 --- /dev/null +++ b/modules/nf-core/bbmap/bbduk/bbmap-bbduk.diff @@ -0,0 +1,61 @@ +Changes in module 'nf-core/bbmap/bbduk' +'modules/nf-core/bbmap/bbduk/meta.yml' is unchanged +Changes in 'bbmap/bbduk/main.nf': +--- modules/nf-core/bbmap/bbduk/main.nf ++++ modules/nf-core/bbmap/bbduk/main.nf +@@ -1,6 +1,6 @@ + process BBMAP_BBDUK { + tag "$meta.id" +- label 'process_medium' ++ label 'process_high' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? +@@ -12,9 +12,10 @@ + path contaminants + + output: +- tuple val(meta), path('*.fastq.gz'), emit: reads +- tuple val(meta), path('*.log') , emit: log +- path "versions.yml" , emit: versions ++ tuple val(meta), path('*.uncontaminated.fastq.gz') , emit: reads ++ tuple val(meta), path('*.contaminated.fastq.gz') , emit: contaminated_reads ++ tuple val(meta), path('*.log') , emit: log ++ path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when +@@ -23,7 +24,8 @@ + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def raw = meta.single_end ? "in=${reads[0]}" : "in1=${reads[0]} in2=${reads[1]}" +- def trimmed = meta.single_end ? "out=${prefix}.fastq.gz" : "out1=${prefix}_1.fastq.gz out2=${prefix}_2.fastq.gz" ++ def trimmed = meta.single_end ? "out=${prefix}.uncontaminated.fastq.gz" : "out1=${prefix}_1.uncontaminated.fastq.gz out2=${prefix}_2.uncontaminated.fastq.gz" ++ def contaminated_reads = meta.single_end ? "outm=${prefix}.contaminated.fastq.gz" : "outm=${prefix}_1.contaminated.fastq.gz outm2=${prefix}_2.contaminated.fastq.gz" + def contaminants_fa = contaminants ? "ref=$contaminants" : '' + """ + maxmem=\$(echo \"$task.memory\"| sed 's/ GB/g/g') +@@ -31,6 +33,7 @@ + -Xmx\$maxmem \\ + $raw \\ + $trimmed \\ ++ $contaminated_reads \\ + threads=$task.cpus \\ + $args \\ + $contaminants_fa \\ +@@ -44,7 +47,7 @@ + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" +- def output_command = meta.single_end ? "echo '' | gzip > ${prefix}.fastq.gz" : "echo '' | gzip > ${prefix}_1.fastq.gz ; echo '' | gzip > ${prefix}_2.fastq.gz" ++ def output_command = meta.single_end ? "echo '' | gzip > ${prefix}.uncontaminated.fastq.gz ; echo '' | gzip > ${prefix}.contaminated.fastq.gz" : "echo '' | gzip > ${prefix}_1.uncontaminated.fastq.gz ; echo '' | gzip > ${prefix}_2.uncontaminated.fastq.gz ; echo '' | gzip > ${prefix}_1.contaminated.fastq.gz ; echo '' | gzip > ${prefix}_2.contaminated.fastq.gz" + """ + touch ${prefix}.bbduk.log + $output_command + +'modules/nf-core/bbmap/bbduk/environment.yml' is unchanged +'modules/nf-core/bbmap/bbduk/tests/main.nf.test' is unchanged +'modules/nf-core/bbmap/bbduk/tests/main.nf.test.snap' is unchanged +'modules/nf-core/bbmap/bbduk/tests/nextflow.config' is unchanged +'modules/nf-core/bbmap/bbduk/tests/tags.yml' is unchanged +************************************************************ diff --git a/modules/nf-core/bbmap/bbduk/environment.yml b/modules/nf-core/bbmap/bbduk/environment.yml new file mode 100644 index 0000000..a2f6550 --- /dev/null +++ b/modules/nf-core/bbmap/bbduk/environment.yml @@ -0,0 +1,5 @@ +channels: + - conda-forge + - bioconda +dependencies: + - bioconda::bbmap=39.10 diff --git a/modules/nf-core/bbmap/bbduk/main.nf b/modules/nf-core/bbmap/bbduk/main.nf new file mode 100644 index 0000000..75dd722 --- /dev/null +++ b/modules/nf-core/bbmap/bbduk/main.nf @@ -0,0 +1,60 @@ +process BBMAP_BBDUK { + tag "$meta.id" + label 'process_high' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bbmap:39.10--h92535d8_0': + 'biocontainers/bbmap:39.10--h92535d8_0' }" + + input: + tuple val(meta), path(reads) + path contaminants + + output: + tuple val(meta), path('*.uncontaminated.fastq.gz') , emit: reads + tuple val(meta), path('*.contaminated.fastq.gz') , emit: contaminated_reads + tuple val(meta), path('*.log') , emit: log + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def raw = meta.single_end ? "in=${reads[0]}" : "in1=${reads[0]} in2=${reads[1]}" + def trimmed = meta.single_end ? "out=${prefix}.uncontaminated.fastq.gz" : "out1=${prefix}_1.uncontaminated.fastq.gz out2=${prefix}_2.uncontaminated.fastq.gz" + def contaminated_reads = meta.single_end ? "outm=${prefix}.contaminated.fastq.gz" : "outm=${prefix}_1.contaminated.fastq.gz outm2=${prefix}_2.contaminated.fastq.gz" + def contaminants_fa = contaminants ? "ref=$contaminants" : '' + """ + maxmem=\$(echo \"$task.memory\"| sed 's/ GB/g/g') + bbduk.sh \\ + -Xmx\$maxmem \\ + $raw \\ + $trimmed \\ + $contaminated_reads \\ + threads=$task.cpus \\ + $args \\ + $contaminants_fa \\ + &> ${prefix}.bbduk.log + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bbmap: \$(bbversion.sh | grep -v "Duplicate cpuset") + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def output_command = meta.single_end ? "echo '' | gzip > ${prefix}.uncontaminated.fastq.gz ; echo '' | gzip > ${prefix}.contaminated.fastq.gz" : "echo '' | gzip > ${prefix}_1.uncontaminated.fastq.gz ; echo '' | gzip > ${prefix}_2.uncontaminated.fastq.gz ; echo '' | gzip > ${prefix}_1.contaminated.fastq.gz ; echo '' | gzip > ${prefix}_2.contaminated.fastq.gz" + """ + touch ${prefix}.bbduk.log + $output_command + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bbmap: \$(bbversion.sh | grep -v "Duplicate cpuset") + END_VERSIONS + """ +} diff --git a/modules/nf-core/bbmap/bbduk/meta.yml b/modules/nf-core/bbmap/bbduk/meta.yml new file mode 100644 index 0000000..5665a26 --- /dev/null +++ b/modules/nf-core/bbmap/bbduk/meta.yml @@ -0,0 +1,60 @@ +name: bbmap_bbduk +description: Adapter and quality trimming of sequencing reads +keywords: + - trimming + - adapter trimming + - quality trimming + - fastq +tools: + - bbmap: + description: BBMap is a short read aligner, as well as various other bioinformatic + tools. + homepage: https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/ + documentation: https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/ + licence: ["UC-LBL license (see package)"] + identifier: biotools:bbmap +input: + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. + - - contaminants: + type: file + description: | + Reference files containing adapter and/or contaminant sequences for sequence kmer matching +output: + - reads: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.fastq.gz": + type: file + description: The trimmed/modified fastq reads + pattern: "*fastq.gz" + - log: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.log": + type: file + description: Bbduk log file + pattern: "*bbduk.log" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@MGordon09" +maintainers: + - "@MGordon09" diff --git a/modules/nf-core/bbmap/bbduk/tests/main.nf.test b/modules/nf-core/bbmap/bbduk/tests/main.nf.test new file mode 100644 index 0000000..0f3e818 --- /dev/null +++ b/modules/nf-core/bbmap/bbduk/tests/main.nf.test @@ -0,0 +1,169 @@ +nextflow_process { + + name "Test Process BBMAP_BBDUK" + script "../main.nf" + process "BBMAP_BBDUK" + config "./nextflow.config" + + tag "modules" + tag "modules_nfcore" + tag "bbmap" + tag "bbmap/bbduk" + + test("sarscov2 - single end fastq - fastq") { + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:true ], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ] + ] + input[1] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert path(process.out.log.get(0).get(1)).getText().contains("Input is being processed as unpaired")}, + { assert snapshot(process.out.reads, + process.out.versions).match() } + ) + } + + } + + test("sarscov2 - paired end fastq - fastq") { + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:false ], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ] + ] + input[1] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert path(process.out.log.get(0).get(1)).getText().contains("Input is being processed as paired")}, + { assert snapshot(process.out.reads, + process.out.versions).match() } + + ) + } + } + + test("sarscov2 - single end w/ contams [fastq,fasta] - fastq") { + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:true ], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ] + ] + input[1] = [ + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/transcriptome.fasta', checkIfExists: true) + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert process.out.reads.get(0).get(1).endsWith("test.trim.fastq.gz") }, + { assert path(process.out.log.get(0).get(1)).getText().contains("Input is being processed as unpaired")}, + { assert snapshot(process.out.versions).match() } + + ) + } + } + + test("sarscov2 - paired end w/ contams [fastq,fasta] - fastq") { + + when { + process { + """ + input[0] = [ + [ id:'test', single_end:false ], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ] + ] + input[1] = [ + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/transcriptome.fasta', checkIfExists: true) + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert process.out.reads.get(0).get(1).get(0).endsWith("test.trim_1.fastq.gz") }, + { assert process.out.reads.get(0).get(1).get(1).endsWith("test.trim_2.fastq.gz") }, + { assert path(process.out.log.get(0).get(1)).getText().contains("Input is being processed as paired")}, + { assert snapshot(process.out.versions).match() } + + ) + } + } + + test("sarscov2 - single end fastq - fastq - stub") { + + options "-stub" + when { + process { + """ + input[0] = [ + [ id:'test', single_end:true ], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ] + ] + input[1] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + + ) + } + + } + + test("sarscov2 - paired end fastq - fastq - stub") { + + options "-stub" + when { + process { + """ + input[0] = [ + [ id:'test', single_end:false ], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ] + ] + input[1] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + +} diff --git a/modules/nf-core/bbmap/bbduk/tests/main.nf.test.snap b/modules/nf-core/bbmap/bbduk/tests/main.nf.test.snap new file mode 100644 index 0000000..258593e --- /dev/null +++ b/modules/nf-core/bbmap/bbduk/tests/main.nf.test.snap @@ -0,0 +1,183 @@ +{ + "sarscov2 - paired end fastq - fastq": { + "content": [ + [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test.trim_1.fastq.gz:md5,4161df271f9bfcd25d5845a1e220dbec", + "test.trim_2.fastq.gz:md5,2ebae722295ea66d84075a3b042e2b42" + ] + ] + ], + [ + "versions.yml:md5,6d196411d38a3c3011a38e1a87c9203c" + ] + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-19T15:58:21.643811233" + }, + "sarscov2 - single end w/ contams [fastq,fasta] - fastq": { + "content": [ + [ + "versions.yml:md5,6d196411d38a3c3011a38e1a87c9203c" + ] + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-19T15:58:37.601191377" + }, + "sarscov2 - paired end w/ contams [fastq,fasta] - fastq": { + "content": [ + [ + "versions.yml:md5,6d196411d38a3c3011a38e1a87c9203c" + ] + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-19T15:59:00.943639834" + }, + "sarscov2 - paired end fastq - fastq - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test.trim_1.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test.trim_2.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.trim.bbduk.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,6d196411d38a3c3011a38e1a87c9203c" + ], + "log": [ + [ + { + "id": "test", + "single_end": false + }, + "test.trim.bbduk.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "reads": [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test.trim_1.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test.trim_2.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "versions": [ + "versions.yml:md5,6d196411d38a3c3011a38e1a87c9203c" + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-19T15:59:42.05304912" + }, + "sarscov2 - single end fastq - fastq - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": true + }, + "test.trim.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": true + }, + "test.trim.bbduk.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,6d196411d38a3c3011a38e1a87c9203c" + ], + "log": [ + [ + { + "id": "test", + "single_end": true + }, + "test.trim.bbduk.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "reads": [ + [ + { + "id": "test", + "single_end": true + }, + "test.trim.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "versions": [ + "versions.yml:md5,6d196411d38a3c3011a38e1a87c9203c" + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-19T15:59:25.580963139" + }, + "sarscov2 - single end fastq - fastq": { + "content": [ + [ + [ + { + "id": "test", + "single_end": true + }, + "test.trim.fastq.gz:md5,4161df271f9bfcd25d5845a1e220dbec" + ] + ], + [ + "versions.yml:md5,6d196411d38a3c3011a38e1a87c9203c" + ] + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-19T15:58:00.420305019" + } +} \ No newline at end of file diff --git a/modules/nf-core/bbmap/bbduk/tests/nextflow.config b/modules/nf-core/bbmap/bbduk/tests/nextflow.config new file mode 100644 index 0000000..44c775d --- /dev/null +++ b/modules/nf-core/bbmap/bbduk/tests/nextflow.config @@ -0,0 +1,8 @@ +process { + + withName: BBMAP_BBDUK { + ext.args = 'trimq=10 qtrim=r' + ext.prefix = { "${meta.id}.trim" } + } + +} diff --git a/modules/nf-core/bbmap/bbduk/tests/tags.yml b/modules/nf-core/bbmap/bbduk/tests/tags.yml new file mode 100644 index 0000000..16d6171 --- /dev/null +++ b/modules/nf-core/bbmap/bbduk/tests/tags.yml @@ -0,0 +1,2 @@ +bbmap/bbduk: + - "modules/nf-core/bbmap/bbduk/**" diff --git a/modules/nf-core/blast/blastn/environment.yml b/modules/nf-core/blast/blastn/environment.yml index cb9b15d..777e097 100644 --- a/modules/nf-core/blast/blastn/environment.yml +++ b/modules/nf-core/blast/blastn/environment.yml @@ -1,7 +1,5 @@ -name: blast_blastn channels: - conda-forge - bioconda - - defaults dependencies: - - bioconda::blast=2.14.1 + - bioconda::blast=2.15.0 diff --git a/modules/nf-core/blast/blastn/main.nf b/modules/nf-core/blast/blastn/main.nf index 2613e54..68b43ba 100644 --- a/modules/nf-core/blast/blastn/main.nf +++ b/modules/nf-core/blast/blastn/main.nf @@ -4,8 +4,8 @@ process BLAST_BLASTN { conda "${moduleDir}/environment.yml" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/blast:2.14.1--pl5321h6f7f691_0': - 'biocontainers/blast:2.14.1--pl5321h6f7f691_0' }" + 'https://depot.galaxyproject.org/singularity/blast:2.15.0--pl5321h6f7f691_1': + 'biocontainers/blast:2.15.0--pl5321h6f7f691_1' }" input: tuple val(meta) , path(fasta) diff --git a/modules/nf-core/blast/blastn/meta.yml b/modules/nf-core/blast/blastn/meta.yml index a0d64dd..0f5e41b 100644 --- a/modules/nf-core/blast/blastn/meta.yml +++ b/modules/nf-core/blast/blastn/meta.yml @@ -13,39 +13,42 @@ tools: documentation: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=Blastdocs doi: 10.1016/S0022-2836(05)80360-2 licence: ["US-Government-Work"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - fasta: - type: file - description: Input fasta file containing queries sequences - pattern: "*.{fa,fasta,fa.gz,fasta.gz}" - - meta2: - type: map - description: | - Groovy Map containing db information - e.g. [ id:'test2', single_end:false ] - - db: - type: directory - description: Directory containing the blast database - pattern: "*" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - fasta: + type: file + description: Input fasta file containing queries sequences + pattern: "*.{fa,fasta,fa.gz,fasta.gz}" + - - meta2: + type: map + description: | + Groovy Map containing db information + e.g. [ id:'test2', single_end:false ] + - db: + type: directory + description: Directory containing the blast database + pattern: "*" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - txt: - type: file - description: File containing blastn hits - pattern: "*.txt" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.txt": + type: file + description: File containing blastn hits + pattern: "*.txt" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@joseespinosa" - "@drpatelh" diff --git a/modules/nf-core/blast/blastn/tests/main.nf.test b/modules/nf-core/blast/blastn/tests/main.nf.test index 02ecfab..aacc93c 100644 --- a/modules/nf-core/blast/blastn/tests/main.nf.test +++ b/modules/nf-core/blast/blastn/tests/main.nf.test @@ -15,7 +15,7 @@ nextflow_process { script "../../makeblastdb/main.nf" process { """ - input[0] = [ [id:'test2'], file(params.test_data['sarscov2']['genome']['genome_fasta'], checkIfExists: true) ] + input[0] = [ [id:'test2'], file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) ] """ } } @@ -29,7 +29,7 @@ nextflow_process { } process { """ - input[0] = [ [id:'test'], file(params.test_data['sarscov2']['genome']['genome_fasta'], checkIfExists: true) ] + input[0] = [ [id:'test'], file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) ] input[1] = BLAST_MAKEBLASTDB.out.db """ } @@ -53,7 +53,7 @@ nextflow_process { } process { """ - input[0] = [ [id:'test'], file(params.test_data['sarscov2']['genome']['genome_fasta_gz'], checkIfExists: true) ] + input[0] = [ [id:'test'], file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta.gz', checkIfExists: true) ] input[1] = BLAST_MAKEBLASTDB.out.db """ } diff --git a/modules/nf-core/blast/blastn/tests/main.nf.test.snap b/modules/nf-core/blast/blastn/tests/main.nf.test.snap index d1b5f3f..dd8b775 100644 --- a/modules/nf-core/blast/blastn/tests/main.nf.test.snap +++ b/modules/nf-core/blast/blastn/tests/main.nf.test.snap @@ -2,7 +2,7 @@ "versions": { "content": [ [ - "versions.yml:md5,2d5ffadc7035672f6a9e00b01d1751ea" + "versions.yml:md5,faf2471d836ebbf24d96d3e1f8720b17" ] ], "timestamp": "2023-12-11T07:20:03.54997013" @@ -10,7 +10,7 @@ "versions_zipped": { "content": [ [ - "versions.yml:md5,2d5ffadc7035672f6a9e00b01d1751ea" + "versions.yml:md5,faf2471d836ebbf24d96d3e1f8720b17" ] ], "timestamp": "2023-12-11T07:20:12.925782708" diff --git a/modules/nf-core/blast/makeblastdb/environment.yml b/modules/nf-core/blast/makeblastdb/environment.yml index a20783b..777e097 100644 --- a/modules/nf-core/blast/makeblastdb/environment.yml +++ b/modules/nf-core/blast/makeblastdb/environment.yml @@ -1,7 +1,5 @@ -name: blast_makeblastdb channels: - conda-forge - bioconda - - defaults dependencies: - bioconda::blast=2.15.0 diff --git a/modules/nf-core/blast/makeblastdb/meta.yml b/modules/nf-core/blast/makeblastdb/meta.yml index 9ed6390..826e62e 100644 --- a/modules/nf-core/blast/makeblastdb/meta.yml +++ b/modules/nf-core/blast/makeblastdb/meta.yml @@ -12,30 +12,33 @@ tools: documentation: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=Blastdocs doi: 10.1016/S0022-2836(05)80360-2 licence: ["US-Government-Work"] + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - fasta: - type: file - description: Input fasta file - pattern: "*.{fa,fasta,fa.gz,fasta.gz}" + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - fasta: + type: file + description: Input fasta file + pattern: "*.{fa,fasta,fa.gz,fasta.gz}" output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - db: - type: directory - description: Output directory containing blast database files - pattern: "*" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - ${meta.id}: + type: directory + description: Output directory containing blast database files + pattern: "*" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@joseespinosa" - "@drpatelh" diff --git a/modules/nf-core/blast/makeblastdb/tests/main.nf.test b/modules/nf-core/blast/makeblastdb/tests/main.nf.test index 983b165..21caf14 100644 --- a/modules/nf-core/blast/makeblastdb/tests/main.nf.test +++ b/modules/nf-core/blast/makeblastdb/tests/main.nf.test @@ -12,12 +12,9 @@ nextflow_process { test("Should build a blast db folder from a fasta file") { when { - params { - outdir = "$outputDir" - } process { """ - input[0] = [ [id:'test'], file(params.test_data['sarscov2']['genome']['genome_fasta'], checkIfExists: true) ] + input[0] = [ [id:'test'], file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) ] """ } } @@ -25,25 +22,28 @@ nextflow_process { then { assertAll( { assert process.success }, - { assert process.out.db - with(process.out.db) { - assert size() == 1 - with(get(0).get(1)) { - File folder = new File(get(0).get(1)) - File[] listOfFiles = folder.listFiles() - listOfFiles = listOfFiles.sort { it.name } - assert listOfFiles.length == 9 - assert snapshot("${get(0).get(1)}/${listOfFiles[0].name}").match("genome.fasta") - assert snapshot("${get(0).get(1)}/${listOfFiles[1].name}").match("genome.fasta.ndb") - assert snapshot("${get(0).get(1)}/${listOfFiles[2].name}").match("genome.fasta.nhr") - assert snapshot("${get(0).get(1)}/${listOfFiles[5].name}").match("genome.fasta.not") - assert snapshot("${get(0).get(1)}/${listOfFiles[6].name}").match("genome.fasta.nsq") - assert snapshot("${get(0).get(1)}/${listOfFiles[7].name}").match("genome.fasta.ntf") - assert snapshot("${get(0).get(1)}/${listOfFiles[8].name}").match("genome.fasta.nto") - } - } - }, - { assert process.out.versions } + { + assert process.out.db.size() == 1 + + def all_files = ( new File(process.out.db[0][1]) ).listFiles() + def stable_file_names = [ + 'genome.fasta', + 'genome.fasta.ndb', + 'genome.fasta.nhr', + 'genome.fasta.not', + 'genome.fasta.nsq', + 'genome.fasta.ntf', + 'genome.fasta.nto' + ] + + def stable_files = all_files.findAll { it.name in stable_file_names }.toSorted() + + assert snapshot( + all_files.collect { it.name }.toSorted(), + stable_files, + process.out.versions[0] + ).match() + } ) } @@ -52,12 +52,9 @@ nextflow_process { test("Should build a blast db folder from a zipped fasta file") { when { - params { - outdir = "$outputDir" - } process { """ - input[0] = [ [id:'test'], file(params.test_data['sarscov2']['genome']['genome_fasta_gz'], checkIfExists: true) ] + input[0] = [ [id:'test'], file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta.gz', checkIfExists: true) ] """ } } @@ -65,25 +62,28 @@ nextflow_process { then { assertAll( { assert process.success }, - { assert process.out.db - with(process.out.db) { - assert size() == 1 - with(get(0).get(1)) { - File folder = new File(get(0).get(1)) - File[] listOfFiles = folder.listFiles() - listOfFiles = listOfFiles.sort { it.name } - assert listOfFiles.length == 10 - assert snapshot("${get(0).get(1)}/${listOfFiles[0].name}").match("gz_genome.fasta") - assert snapshot("${get(0).get(1)}/${listOfFiles[2].name}").match("gz_genome.fasta.ndb") - assert snapshot("${get(0).get(1)}/${listOfFiles[3].name}").match("gz_genome.fasta.nhr") - assert snapshot("${get(0).get(1)}/${listOfFiles[6].name}").match("gz_genome.fasta.not") - assert snapshot("${get(0).get(1)}/${listOfFiles[7].name}").match("gz_genome.fasta.nsq") - assert snapshot("${get(0).get(1)}/${listOfFiles[8].name}").match("gz_genome.fasta.ntf") - assert snapshot("${get(0).get(1)}/${listOfFiles[9].name}").match("gz_genome.fasta.nto") - } - } - }, - { assert process.out.versions } + { + assert process.out.db.size() == 1 + + def all_files = ( new File(process.out.db[0][1]) ).listFiles() + def stable_file_names = [ + 'genome.fasta', + 'genome.fasta.ndb', + 'genome.fasta.nhr', + 'genome.fasta.not', + 'genome.fasta.nsq', + 'genome.fasta.ntf', + 'genome.fasta.nto' + ] + + def stable_files = all_files.findAll { it.name in stable_file_names }.toSorted() + + assert snapshot( + all_files.collect { it.name }.toSorted(), + stable_files, + process.out.versions[0] + ).match() + } ) } diff --git a/modules/nf-core/blast/makeblastdb/tests/main.nf.test.snap b/modules/nf-core/blast/makeblastdb/tests/main.nf.test.snap index b6f040e..d6db277 100644 --- a/modules/nf-core/blast/makeblastdb/tests/main.nf.test.snap +++ b/modules/nf-core/blast/makeblastdb/tests/main.nf.test.snap @@ -1,86 +1,63 @@ { - "genome.fasta": { - "content": [ - "genome.fasta:md5,6e9fe4042a72f2345f644f239272b7e6" - ], - "timestamp": "2023-11-07T12:52:38.457245596" - }, - "gz_genome.fasta.ntf": { - "content": [ - "genome.fasta.ntf:md5,de1250813f0c7affc6d12dac9d0fb6bb" - ], - "timestamp": "2023-11-07T12:58:02.121840034" - }, - "genome.fasta.not": { - "content": [ - "genome.fasta.not:md5,1e53e9d08f1d23af0299cfa87478a7bb" - ], - "timestamp": "2023-11-07T12:55:33.862012946" - }, - "genome.fasta.nhr": { - "content": [ - "genome.fasta.nhr:md5,f4b4ddb034fd3dd7b25c89e9d50c004e" - ], - "timestamp": "2023-11-07T12:55:33.857994517" - }, - "gz_genome.fasta.nhr": { - "content": [ - "genome.fasta.nhr:md5,f4b4ddb034fd3dd7b25c89e9d50c004e" - ], - "timestamp": "2023-11-07T12:58:02.102407993" - }, - "genome.fasta.ntf": { - "content": [ - "genome.fasta.ntf:md5,de1250813f0c7affc6d12dac9d0fb6bb" - ], - "timestamp": "2023-11-07T12:55:33.877288786" - }, - "gz_genome.fasta.not": { - "content": [ - "genome.fasta.not:md5,1e53e9d08f1d23af0299cfa87478a7bb" - ], - "timestamp": "2023-11-07T12:58:02.108135313" - }, - "gz_genome.fasta.ndb": { - "content": [ - "genome.fasta.ndb:md5,0d553c830656469211de113c5022f06d" - ], - "timestamp": "2023-11-07T12:58:02.094305556" - }, - "gz_genome.fasta.nsq": { - "content": [ - "genome.fasta.nsq:md5,982cbc7d9e38743b9b1037588862b9da" - ], - "timestamp": "2023-11-07T12:58:02.115010863" - }, - "genome.fasta.nto": { - "content": [ - "genome.fasta.nto:md5,33cdeccccebe80329f1fdbee7f5874cb" - ], - "timestamp": "2023-11-07T12:55:33.890761822" - }, - "gz_genome.fasta.nto": { - "content": [ - "genome.fasta.nto:md5,33cdeccccebe80329f1fdbee7f5874cb" - ], - "timestamp": "2023-11-07T12:58:02.12931429" - }, - "genome.fasta.ndb": { - "content": [ - "genome.fasta.ndb:md5,0d553c830656469211de113c5022f06d" - ], - "timestamp": "2023-11-07T12:55:33.853303997" - }, - "genome.fasta.nsq": { - "content": [ - "genome.fasta.nsq:md5,982cbc7d9e38743b9b1037588862b9da" - ], - "timestamp": "2023-11-07T12:55:33.866667927" - }, - "gz_genome.fasta": { - "content": [ - "genome.fasta:md5,6e9fe4042a72f2345f644f239272b7e6" - ], - "timestamp": "2023-11-07T12:58:02.081764854" + "Should build a blast db folder from a fasta file": { + "content": [ + [ + "genome.fasta", + "genome.fasta.ndb", + "genome.fasta.nhr", + "genome.fasta.nin", + "genome.fasta.njs", + "genome.fasta.not", + "genome.fasta.nsq", + "genome.fasta.ntf", + "genome.fasta.nto" + ], + [ + "genome.fasta:md5,6e9fe4042a72f2345f644f239272b7e6", + "genome.fasta.ndb:md5,0d553c830656469211de113c5022f06d", + "genome.fasta.nhr:md5,f4b4ddb034fd3dd7b25c89e9d50c004e", + "genome.fasta.not:md5,1e53e9d08f1d23af0299cfa87478a7bb", + "genome.fasta.nsq:md5,982cbc7d9e38743b9b1037588862b9da", + "genome.fasta.ntf:md5,de1250813f0c7affc6d12dac9d0fb6bb", + "genome.fasta.nto:md5,33cdeccccebe80329f1fdbee7f5874cb" + ], + "versions.yml:md5,cb63396fd8d8f4df57913b63452d6ba8" + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.4" + }, + "timestamp": "2024-08-09T15:40:32.52079" + }, + "Should build a blast db folder from a zipped fasta file": { + "content": [ + [ + "genome.fasta", + "genome.fasta.gz", + "genome.fasta.ndb", + "genome.fasta.nhr", + "genome.fasta.nin", + "genome.fasta.njs", + "genome.fasta.not", + "genome.fasta.nsq", + "genome.fasta.ntf", + "genome.fasta.nto" + ], + [ + "genome.fasta:md5,6e9fe4042a72f2345f644f239272b7e6", + "genome.fasta.ndb:md5,0d553c830656469211de113c5022f06d", + "genome.fasta.nhr:md5,f4b4ddb034fd3dd7b25c89e9d50c004e", + "genome.fasta.not:md5,1e53e9d08f1d23af0299cfa87478a7bb", + "genome.fasta.nsq:md5,982cbc7d9e38743b9b1037588862b9da", + "genome.fasta.ntf:md5,de1250813f0c7affc6d12dac9d0fb6bb", + "genome.fasta.nto:md5,33cdeccccebe80329f1fdbee7f5874cb" + ], + "versions.yml:md5,cb63396fd8d8f4df57913b63452d6ba8" + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.4" + }, + "timestamp": "2024-08-09T15:40:37.45154" } } \ No newline at end of file diff --git a/modules/nf-core/fastp/environment.yml b/modules/nf-core/fastp/environment.yml index 70389e6..26d4aca 100644 --- a/modules/nf-core/fastp/environment.yml +++ b/modules/nf-core/fastp/environment.yml @@ -1,7 +1,5 @@ -name: fastp channels: - conda-forge - bioconda - - defaults dependencies: - bioconda::fastp=0.23.4 diff --git a/modules/nf-core/fastp/main.nf b/modules/nf-core/fastp/main.nf index 4fc19b7..e1b9f56 100644 --- a/modules/nf-core/fastp/main.nf +++ b/modules/nf-core/fastp/main.nf @@ -10,6 +10,7 @@ process FASTP { input: tuple val(meta), path(reads) path adapter_fasta + val discard_trimmed_pass val save_trimmed_fail val save_merged @@ -18,9 +19,9 @@ process FASTP { tuple val(meta), path('*.json') , emit: json tuple val(meta), path('*.html') , emit: html tuple val(meta), path('*.log') , emit: log - path "versions.yml" , emit: versions tuple val(meta), path('*.fail.fastq.gz') , optional:true, emit: reads_fail tuple val(meta), path('*.merged.fastq.gz'), optional:true, emit: reads_merged + path "versions.yml" , emit: versions when: task.ext.when == null || task.ext.when @@ -30,6 +31,8 @@ process FASTP { def prefix = task.ext.prefix ?: "${meta.id}" def adapter_list = adapter_fasta ? "--adapter_fasta ${adapter_fasta}" : "" def fail_fastq = save_trimmed_fail && meta.single_end ? "--failed_out ${prefix}.fail.fastq.gz" : save_trimmed_fail && !meta.single_end ? "--failed_out ${prefix}.paired.fail.fastq.gz --unpaired1 ${prefix}_1.fail.fastq.gz --unpaired2 ${prefix}_2.fail.fastq.gz" : '' + def out_fq1 = discard_trimmed_pass ?: ( meta.single_end ? "--out1 ${prefix}.fastp.fastq.gz" : "--out1 ${prefix}_1.fastp.fastq.gz" ) + def out_fq2 = discard_trimmed_pass ?: "--out2 ${prefix}_2.fastp.fastq.gz" // Added soft-links to original fastqs for consistent naming in MultiQC // Use single ended for interleaved. Add --interleaved_in in config. if ( task.ext.args?.contains('--interleaved_in') ) { @@ -59,7 +62,7 @@ process FASTP { fastp \\ --in1 ${prefix}.fastq.gz \\ - --out1 ${prefix}.fastp.fastq.gz \\ + $out_fq1 \\ --thread $task.cpus \\ --json ${prefix}.fastp.json \\ --html ${prefix}.fastp.html \\ @@ -81,8 +84,8 @@ process FASTP { fastp \\ --in1 ${prefix}_1.fastq.gz \\ --in2 ${prefix}_2.fastq.gz \\ - --out1 ${prefix}_1.fastp.fastq.gz \\ - --out2 ${prefix}_2.fastp.fastq.gz \\ + $out_fq1 \\ + $out_fq2 \\ --json ${prefix}.fastp.json \\ --html ${prefix}.fastp.html \\ $adapter_list \\ @@ -103,14 +106,16 @@ process FASTP { stub: def prefix = task.ext.prefix ?: "${meta.id}" def is_single_output = task.ext.args?.contains('--interleaved_in') || meta.single_end - def touch_reads = is_single_output ? "${prefix}.fastp.fastq.gz" : "${prefix}_1.fastp.fastq.gz ${prefix}_2.fastp.fastq.gz" - def touch_merged = (!is_single_output && save_merged) ? "touch ${prefix}.merged.fastq.gz" : "" + def touch_reads = (discard_trimmed_pass) ? "" : (is_single_output) ? "echo '' | gzip > ${prefix}.fastp.fastq.gz" : "echo '' | gzip > ${prefix}_1.fastp.fastq.gz ; echo '' | gzip > ${prefix}_2.fastp.fastq.gz" + def touch_merged = (!is_single_output && save_merged) ? "echo '' | gzip > ${prefix}.merged.fastq.gz" : "" + def touch_fail_fastq = (!save_trimmed_fail) ? "" : meta.single_end ? "echo '' | gzip > ${prefix}.fail.fastq.gz" : "echo '' | gzip > ${prefix}.paired.fail.fastq.gz ; echo '' | gzip > ${prefix}_1.fail.fastq.gz ; echo '' | gzip > ${prefix}_2.fail.fastq.gz" """ - touch $touch_reads + $touch_reads + $touch_fail_fastq + $touch_merged touch "${prefix}.fastp.json" touch "${prefix}.fastp.html" touch "${prefix}.fastp.log" - $touch_merged cat <<-END_VERSIONS > versions.yml "${task.process}": diff --git a/modules/nf-core/fastp/meta.yml b/modules/nf-core/fastp/meta.yml index c22a16a..159404d 100644 --- a/modules/nf-core/fastp/meta.yml +++ b/modules/nf-core/fastp/meta.yml @@ -11,62 +11,100 @@ tools: documentation: https://github.com/OpenGene/fastp doi: 10.1093/bioinformatics/bty560 licence: ["MIT"] + identifier: biotools:fastp input: - - meta: - type: map - description: | - Groovy Map containing sample information. Use 'single_end: true' to specify single ended or interleaved FASTQs. Use 'single_end: false' for paired-end reads. - e.g. [ id:'test', single_end:false ] - - reads: - type: file - description: | - List of input FastQ files of size 1 and 2 for single-end and paired-end data, - respectively. If you wish to run interleaved paired-end data, supply as single-end data - but with `--interleaved_in` in your `modules.conf`'s `ext.args` for the module. - - adapter_fasta: - type: file - description: File in FASTA format containing possible adapters to remove. - pattern: "*.{fasta,fna,fas,fa}" - - save_trimmed_fail: - type: boolean - description: Specify true to save files that failed to pass trimming thresholds ending in `*.fail.fastq.gz` - - save_merged: - type: boolean - description: Specify true to save all merged reads to the a file ending in `*.merged.fastq.gz` + - - meta: + type: map + description: | + Groovy Map containing sample information. Use 'single_end: true' to specify single ended or interleaved FASTQs. Use 'single_end: false' for paired-end reads. + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. If you wish to run interleaved paired-end data, supply as single-end data + but with `--interleaved_in` in your `modules.conf`'s `ext.args` for the module. + - - adapter_fasta: + type: file + description: File in FASTA format containing possible adapters to remove. + pattern: "*.{fasta,fna,fas,fa}" + - - discard_trimmed_pass: + type: boolean + description: Specify true to not write any reads that pass trimming thresholds. + | This can be used to use fastp for the output report only. + - - save_trimmed_fail: + type: boolean + description: Specify true to save files that failed to pass trimming thresholds + ending in `*.fail.fastq.gz` + - - save_merged: + type: boolean + description: Specify true to save all merged reads to a file ending in `*.merged.fastq.gz` output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - reads: - type: file - description: The trimmed/modified/unmerged fastq reads - pattern: "*fastp.fastq.gz" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.fastp.fastq.gz": + type: file + description: The trimmed/modified/unmerged fastq reads + pattern: "*fastp.fastq.gz" - json: - type: file - description: Results in JSON format - pattern: "*.json" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.json": + type: file + description: Results in JSON format + pattern: "*.json" - html: - type: file - description: Results in HTML format - pattern: "*.html" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.html": + type: file + description: Results in HTML format + pattern: "*.html" - log: - type: file - description: fastq log file - pattern: "*.log" - - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.log": + type: file + description: fastq log file + pattern: "*.log" - reads_fail: - type: file - description: Reads the failed the preprocessing - pattern: "*fail.fastq.gz" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.fail.fastq.gz": + type: file + description: Reads the failed the preprocessing + pattern: "*fail.fastq.gz" - reads_merged: - type: file - description: Reads that were successfully merged - pattern: "*.{merged.fastq.gz}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.merged.fastq.gz": + type: file + description: Reads that were successfully merged + pattern: "*.{merged.fastq.gz}" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@drpatelh" - "@kevinmenden" diff --git a/modules/nf-core/fastp/tests/main.nf.test b/modules/nf-core/fastp/tests/main.nf.test index 6f1f489..30dbb8a 100644 --- a/modules/nf-core/fastp/tests/main.nf.test +++ b/modules/nf-core/fastp/tests/main.nf.test @@ -10,221 +10,290 @@ nextflow_process { test("test_fastp_single_end") { when { - params { - outdir = "$outputDir" - } + process { """ - adapter_fasta = [] - save_trimmed_fail = false - save_merged = false - input[0] = Channel.of([ [ id:'test', single_end:true ], [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ] ]) - input[1] = adapter_fasta - input[2] = save_trimmed_fail - input[3] = save_merged + input[1] = [] + input[2] = false + input[3] = false + input[4] = false """ } } then { - def html_text = [ "Q20 bases:12.922000 K (92.984097%)", - "single end (151 cycles)" ] - def log_text = [ "Q20 bases: 12922(92.9841%)", - "reads passed filter: 99" ] - def read_lines = ["@ERR5069949.2151832 NS500628:121:HK3MMAFX2:2:21208:10793:15304/1", - "TCATAAACCAAAGCACTCACAGTGTCAACAATTTCAGCAGGACAACGCCGACAAGTTCCGAGGAACATGTCTGGACCTATAGTTTTCATAAGTCTACACACTGAATTGAAATATTCTGGTTCTAGTGTGCCCTTAGTTAGCAATGTGCGT", - "AAAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEE - { assert path(process.out.reads.get(0).get(1)).linesGzip.contains(read_line) } - } - }, - { html_text.each { html_part -> - { assert path(process.out.html.get(0).get(1)).getText().contains(html_part) } - } - }, - { assert snapshot(process.out.json).match("test_fastp_single_end_json") }, - { log_text.each { log_part -> - { assert path(process.out.log.get(0).get(1)).getText().contains(log_part) } - } - }, - { - assert snapshot( - ( - [process.out.reads[0][0].toString()] + // meta - process.out.reads.collect { file(it[1]).getName() } + - process.out.json.collect { file(it[1]).getName() } + - process.out.html.collect { file(it[1]).getName() } + - process.out.log.collect { file(it[1]).getName() } + - process.out.reads_fail.collect { file(it[1]).getName() } + - process.out.reads_merged.collect { file(it[1]).getName() } - ).sort() - ).match("test_fastp_single_end-_match") - }, - { assert snapshot(process.out.versions).match("versions_single_end") } + { assert path(process.out.html.get(0).get(1)).getText().contains("single end (151 cycles)") }, + { assert path(process.out.log.get(0).get(1)).getText().contains("reads passed filter: 99") }, + { assert snapshot( + process.out.json, + process.out.reads, + process.out.reads_fail, + process.out.reads_merged, + process.out.versions).match() } ) } } - test("test_fastp_single_end-stub") { - - options '-stub' + test("test_fastp_paired_end") { when { - params { - outdir = "$outputDir" - } + process { """ adapter_fasta = [] + save_trimmed_pass = true save_trimmed_fail = false save_merged = false input[0] = Channel.of([ - [ id:'test', single_end:true ], - [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ] + [ id:'test', single_end:false ], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ] ]) - input[1] = adapter_fasta - input[2] = save_trimmed_fail - input[3] = save_merged + input[1] = [] + input[2] = false + input[3] = false + input[4] = false """ } } then { + assertAll( + { assert process.success }, + { assert path(process.out.html.get(0).get(1)).getText().contains("The input has little adapter percentage (~0.000000%), probably it's trimmed before.") }, + { assert path(process.out.log.get(0).get(1)).getText().contains("Q30 bases: 12281(88.3716%)") }, + { assert snapshot( + process.out.json, + process.out.reads, + process.out.reads_fail, + process.out.reads_merged, + process.out.versions).match() } + ) + } + } + test("fastp test_fastp_interleaved") { + + config './nextflow.interleaved.config' + when { + process { + """ + input[0] = Channel.of([ + [ id:'test', single_end:true ], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_interleaved.fastq.gz', checkIfExists: true) ] + ]) + input[1] = [] + input[2] = false + input[3] = false + input[4] = false + """ + } + } + + then { assertAll( { assert process.success }, - { - assert snapshot( - ( - [process.out.reads[0][0].toString()] + // meta - process.out.reads.collect { file(it[1]).getName() } + - process.out.json.collect { file(it[1]).getName() } + - process.out.html.collect { file(it[1]).getName() } + - process.out.log.collect { file(it[1]).getName() } + - process.out.reads_fail.collect { file(it[1]).getName() } + - process.out.reads_merged.collect { file(it[1]).getName() } - ).sort() - ).match("test_fastp_single_end-for_stub_match") - }, - { assert snapshot(process.out.versions).match("versions_single_end_stub") } + { assert path(process.out.html.get(0).get(1)).getText().contains("paired end (151 cycles + 151 cycles)") }, + { assert path(process.out.log.get(0).get(1)).getText().contains("reads passed filter: 162") }, + { assert process.out.reads_fail == [] }, + { assert process.out.reads_merged == [] }, + { assert snapshot( + process.out.reads, + process.out.json, + process.out.versions).match() } ) } } - test("test_fastp_paired_end") { + test("test_fastp_single_end_trim_fail") { when { - params { - outdir = "$outputDir" + + process { + """ + input[0] = Channel.of([ + [ id:'test', single_end:true ], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ] + ]) + input[1] = [] + input[2] = false + input[3] = true + input[4] = false + """ } + } + + then { + assertAll( + { assert process.success }, + { assert path(process.out.html.get(0).get(1)).getText().contains("single end (151 cycles)") }, + { assert path(process.out.log.get(0).get(1)).getText().contains("reads passed filter: 99") }, + { assert snapshot( + process.out.json, + process.out.reads, + process.out.reads_fail, + process.out.reads_merged, + process.out.versions).match() } + ) + } + } + + test("test_fastp_paired_end_trim_fail") { + + config './nextflow.save_failed.config' + when { process { """ - adapter_fasta = [] - save_trimmed_fail = false - save_merged = false + input[0] = Channel.of([ + [ id:'test', single_end:false ], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)] + ]) + input[1] = [] + input[2] = false + input[3] = true + input[4] = false + """ + } + } + then { + assertAll( + { assert process.success }, + { assert path(process.out.html.get(0).get(1)).getText().contains("The input has little adapter percentage (~0.000000%), probably it's trimmed before.") }, + { assert path(process.out.log.get(0).get(1)).getText().contains("reads passed filter: 162") }, + { assert snapshot( + process.out.reads, + process.out.reads_fail, + process.out.reads_merged, + process.out.json, + process.out.versions).match() } + ) + } + } + + test("test_fastp_paired_end_merged") { + + when { + process { + """ input[0] = Channel.of([ [ id:'test', single_end:false ], // meta map [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ] ]) - input[1] = adapter_fasta - input[2] = save_trimmed_fail - input[3] = save_merged + input[1] = [] + input[2] = false + input[3] = false + input[4] = true """ } } then { - def html_text = [ "Q20 bases:25.719000 K (93.033098%)", - "The input has little adapter percentage (~0.000000%), probably it's trimmed before."] - def log_text = [ "No adapter detected for read1", - "Q30 bases: 12281(88.3716%)"] - def json_text = ['"passed_filter_reads": 198'] - def read1_lines = ["@ERR5069949.2151832 NS500628:121:HK3MMAFX2:2:21208:10793:15304/1", - "TCATAAACCAAAGCACTCACAGTGTCAACAATTTCAGCAGGACAACGCCGACAAGTTCCGAGGAACATGTCTGGACCTATAGTTTTCATAAGTCTACACACTGAATTGAAATATTCTGGTTCTAGTGTGCCCTTAGTTAGCAATGTGCGT", - "AAAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEE - { assert path(process.out.reads.get(0).get(1).get(0)).linesGzip.contains(read1_line) } - } - }, - { read2_lines.each { read2_line -> - { assert path(process.out.reads.get(0).get(1).get(1)).linesGzip.contains(read2_line) } - } - }, - { html_text.each { html_part -> - { assert path(process.out.html.get(0).get(1)).getText().contains(html_part) } - } - }, - { json_text.each { json_part -> - { assert path(process.out.json.get(0).get(1)).getText().contains(json_part) } - } - }, - { log_text.each { log_part -> - { assert path(process.out.log.get(0).get(1)).getText().contains(log_part) } - } - }, - { - assert snapshot( - ( - [process.out.reads[0][0].toString()] + // meta - process.out.reads.collect { it[1].collect { item -> file(item).getName() } } + - process.out.json.collect { file(it[1]).getName() } + - process.out.html.collect { file(it[1]).getName() } + - process.out.log.collect { file(it[1]).getName() } + - process.out.reads_fail.collect { file(it[1]).getName() } + - process.out.reads_merged.collect { file(it[1]).getName() } - ).sort() - ).match("test_fastp_paired_end_match") - }, - { assert snapshot(process.out.versions).match("versions_paired_end") } + { assert path(process.out.html.get(0).get(1)).getText().contains("The input has little adapter percentage (~0.000000%), probably it's trimmed before.") }, + { assert path(process.out.log.get(0).get(1)).getText().contains("total reads: 75") }, + { assert snapshot( + process.out.json, + process.out.reads, + process.out.reads_fail, + process.out.reads_merged, + process.out.versions).match() }, ) } } - test("test_fastp_paired_end-stub") { - - options '-stub' + test("test_fastp_paired_end_merged_adapterlist") { when { - params { - outdir = "$outputDir" + process { + """ + input[0] = Channel.of([ + [ id:'test', single_end:false ], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ] + ]) + input[1] = Channel.of([ file(params.modules_testdata_base_path + 'delete_me/fastp/adapters.fasta', checkIfExists: true) ]) + input[2] = false + input[3] = false + input[4] = true + """ } + } + + then { + assertAll( + { assert process.success }, + { assert path(process.out.html.get(0).get(1)).getText().contains("
") }, + { assert path(process.out.log.get(0).get(1)).getText().contains("total bases: 13683") }, + { assert snapshot( + process.out.json, + process.out.reads, + process.out.reads_fail, + process.out.reads_merged, + process.out.versions).match() } + ) + } + } + + test("test_fastp_single_end_qc_only") { + + when { process { """ - adapter_fasta = [] - save_trimmed_fail = false - save_merged = false + input[0] = Channel.of([ + [ id:'test', single_end:true ], + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ] + ]) + input[1] = [] + input[2] = true + input[3] = false + input[4] = false + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert path(process.out.html.get(0).get(1)).getText().contains("single end (151 cycles)") }, + { assert path(process.out.log.get(0).get(1)).getText().contains("reads passed filter: 99") }, + { assert snapshot( + process.out.json, + process.out.reads, + process.out.reads, + process.out.reads_fail, + process.out.reads_fail, + process.out.reads_merged, + process.out.reads_merged, + process.out.versions).match() } + ) + } + } + test("test_fastp_paired_end_qc_only") { + + when { + process { + """ input[0] = Channel.of([ [ id:'test', single_end:false ], // meta map [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ] ]) - input[1] = adapter_fasta - input[2] = save_trimmed_fail - input[3] = save_merged + input[1] = [] + input[2] = true + input[3] = false + input[4] = false """ } } @@ -232,114 +301,99 @@ nextflow_process { then { assertAll( { assert process.success }, - { - assert snapshot( - ( - [process.out.reads[0][0].toString()] + // meta - process.out.reads.collect { it[1].collect { item -> file(item).getName() } } + - process.out.json.collect { file(it[1]).getName() } + - process.out.html.collect { file(it[1]).getName() } + - process.out.log.collect { file(it[1]).getName() } + - process.out.reads_fail.collect { file(it[1]).getName() } + - process.out.reads_merged.collect { file(it[1]).getName() } - ).sort() - ).match("test_fastp_paired_end-for_stub_match") - }, - { assert snapshot(process.out.versions).match("versions_paired_end-stub") } + { assert path(process.out.html.get(0).get(1)).getText().contains("The input has little adapter percentage (~0.000000%), probably it's trimmed before.") }, + { assert path(process.out.log.get(0).get(1)).getText().contains("Q30 bases: 12281(88.3716%)") }, + { assert snapshot( + process.out.json, + process.out.reads, + process.out.reads, + process.out.reads_fail, + process.out.reads_fail, + process.out.reads_merged, + process.out.reads_merged, + process.out.versions).match() } ) } } - test("fastp test_fastp_interleaved") { + test("test_fastp_single_end - stub") { + + options "-stub" - config './nextflow.interleaved.config' when { - params { - outdir = "$outputDir" + + process { + """ + input[0] = Channel.of([ + [ id:'test', single_end:true ], + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ] + ]) + input[1] = [] + input[2] = false + input[3] = false + input[4] = false + """ } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("test_fastp_paired_end - stub") { + + options "-stub" + + when { + process { """ adapter_fasta = [] + save_trimmed_pass = true save_trimmed_fail = false save_merged = false input[0] = Channel.of([ - [ id:'test', single_end:true ], // meta map - [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_interleaved.fastq.gz', checkIfExists: true) ] + [ id:'test', single_end:false ], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ] ]) - input[1] = adapter_fasta - input[2] = save_trimmed_fail - input[3] = save_merged + input[1] = [] + input[2] = false + input[3] = false + input[4] = false """ } } then { - def html_text = [ "Q20 bases:25.719000 K (93.033098%)", - "paired end (151 cycles + 151 cycles)"] - def log_text = [ "Q20 bases: 12922(92.9841%)", - "reads passed filter: 162"] - def read_lines = [ "@ERR5069949.2151832 NS500628:121:HK3MMAFX2:2:21208:10793:15304/1", - "TCATAAACCAAAGCACTCACAGTGTCAACAATTTCAGCAGGACAACGCCGACAAGTTCCGAGGAACATGTCTGGACCTATAGTTTTCATAAGTCTACACACTGAATTGAAATATTCTGGTTCTAGTGTGCCCTTAGTTAGCAATGTGCGT", - "AAAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEE - { assert path(process.out.reads.get(0).get(1)).linesGzip.contains(read_line) } - } - }, - { html_text.each { html_part -> - { assert path(process.out.html.get(0).get(1)).getText().contains(html_part) } - } - }, - { assert snapshot(process.out.json).match("fastp test_fastp_interleaved_json") }, - { log_text.each { log_part -> - { assert path(process.out.log.get(0).get(1)).getText().contains(log_part) } - } - }, - { - assert snapshot( - ( - [process.out.reads[0][0].toString()] + // meta - process.out.reads.collect { file(it[1]).getName() } + - process.out.json.collect { file(it[1]).getName() } + - process.out.html.collect { file(it[1]).getName() } + - process.out.log.collect { file(it[1]).getName() } + - process.out.reads_fail.collect { file(it[1]).getName() } + - process.out.reads_merged.collect { file(it[1]).getName() } - ).sort() - ).match("test_fastp_interleaved-_match") - }, - { assert snapshot(process.out.versions).match("versions_interleaved") } + { assert snapshot(process.out).match() } ) } } - test("fastp test_fastp_interleaved-stub") { + test("fastp - stub test_fastp_interleaved") { - options '-stub' + options "-stub" config './nextflow.interleaved.config' when { - params { - outdir = "$outputDir" - } process { """ - adapter_fasta = [] - save_trimmed_fail = false - save_merged = false - input[0] = Channel.of([ [ id:'test', single_end:true ], // meta map [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_interleaved.fastq.gz', checkIfExists: true) ] ]) - input[1] = adapter_fasta - input[2] = save_trimmed_fail - input[3] = save_merged + input[1] = [] + input[2] = false + input[3] = false + input[4] = false """ } } @@ -347,277 +401,112 @@ nextflow_process { then { assertAll( { assert process.success }, - { - assert snapshot( - ( - [process.out.reads[0][0].toString()] + // meta - process.out.reads.collect { file(it[1]).getName() } + - process.out.json.collect { file(it[1]).getName() } + - process.out.html.collect { file(it[1]).getName() } + - process.out.log.collect { file(it[1]).getName() } + - process.out.reads_fail.collect { file(it[1]).getName() } + - process.out.reads_merged.collect { file(it[1]).getName() } - ).sort() - ).match("test_fastp_interleaved-for_stub_match") - }, - { assert snapshot(process.out.versions).match("versions_interleaved-stub") } + { assert snapshot(process.out).match() } ) } } - test("test_fastp_single_end_trim_fail") { + test("test_fastp_single_end_trim_fail - stub") { + + options "-stub" when { - params { - outdir = "$outputDir" - } + process { """ - adapter_fasta = [] - save_trimmed_fail = true - save_merged = false - input[0] = Channel.of([ [ id:'test', single_end:true ], // meta map [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ] ]) - input[1] = adapter_fasta - input[2] = save_trimmed_fail - input[3] = save_merged + input[1] = [] + input[2] = false + input[3] = true + input[4] = false """ } } then { - def html_text = [ "Q20 bases:12.922000 K (92.984097%)", - "single end (151 cycles)"] - def log_text = [ "Q20 bases: 12922(92.9841%)", - "reads passed filter: 99" ] - def read_lines = [ "@ERR5069949.2151832 NS500628:121:HK3MMAFX2:2:21208:10793:15304/1", - "TCATAAACCAAAGCACTCACAGTGTCAACAATTTCAGCAGGACAACGCCGACAAGTTCCGAGGAACATGTCTGGACCTATAGTTTTCATAAGTCTACACACTGAATTGAAATATTCTGGTTCTAGTGTGCCCTTAGTTAGCAATGTGCGT", - "AAAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEE - { assert path(process.out.reads.get(0).get(1)).linesGzip.contains(read_line) } - } - }, - { failed_read_lines.each { failed_read_line -> - { assert path(process.out.reads_fail.get(0).get(1)).linesGzip.contains(failed_read_line) } - } - }, - { html_text.each { html_part -> - { assert path(process.out.html.get(0).get(1)).getText().contains(html_part) } - } - }, - { assert snapshot(process.out.json).match("test_fastp_single_end_trim_fail_json") }, - { log_text.each { log_part -> - { assert path(process.out.log.get(0).get(1)).getText().contains(log_part) } - } - }, - { assert snapshot(process.out.versions).match("versions_single_end_trim_fail") } + { assert snapshot(process.out).match() } ) } } - test("test_fastp_paired_end_trim_fail") { + test("test_fastp_paired_end_trim_fail - stub") { + + options "-stub" config './nextflow.save_failed.config' when { - params { - outdir = "$outputDir" - } process { """ - adapter_fasta = [] - save_trimmed_fail = true - save_merged = false - input[0] = Channel.of([ [ id:'test', single_end:false ], // meta map [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)] ]) - input[1] = adapter_fasta - input[2] = save_trimmed_fail - input[3] = save_merged + input[1] = [] + input[2] = false + input[3] = true + input[4] = false """ } } then { - def html_text = [ "Q20 bases:25.719000 K (93.033098%)", - "The input has little adapter percentage (~0.000000%), probably it's trimmed before."] - def log_text = [ "No adapter detected for read1", - "Q30 bases: 12281(88.3716%)"] - def json_text = ['"passed_filter_reads": 162'] - def read1_lines = ["@ERR5069949.2151832 NS500628:121:HK3MMAFX2:2:21208:10793:15304/1", - "TCATAAACCAAAGCACTCACAGTGTCAACAATTTCAGCAGGACAACGCCGACAAGTTCCGAGGAACATGTCTGGACCTATAGTTTTCATAAGTCTACACACTGAATTGAAATATTCTGGTTCTAGTGTGCCCTTAGTTAGCAATGTGCGT", - "AAAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEE - { assert path(process.out.reads.get(0).get(1).get(0)).linesGzip.contains(read1_line) } - } - }, - { read2_lines.each { read2_line -> - { assert path(process.out.reads.get(0).get(1).get(1)).linesGzip.contains(read2_line) } - } - }, - { failed_read2_lines.each { failed_read2_line -> - { assert path(process.out.reads_fail.get(0).get(1).get(2)).linesGzip.contains(failed_read2_line) } - } - }, - { html_text.each { html_part -> - { assert path(process.out.html.get(0).get(1)).getText().contains(html_part) } - } - }, - { json_text.each { json_part -> - { assert path(process.out.json.get(0).get(1)).getText().contains(json_part) } - } - }, - { log_text.each { log_part -> - { assert path(process.out.log.get(0).get(1)).getText().contains(log_part) } - } - }, - { assert snapshot(process.out.versions).match("versions_paired_end_trim_fail") } + { assert snapshot(process.out).match() } ) } } - test("test_fastp_paired_end_merged") { + test("test_fastp_paired_end_merged - stub") { + + options "-stub" when { - params { - outdir = "$outputDir" - } process { """ - adapter_fasta = [] - save_trimmed_fail = false - save_merged = true input[0] = Channel.of([ [ id:'test', single_end:false ], // meta map [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ] ]) - input[1] = adapter_fasta - input[2] = save_trimmed_fail - input[3] = save_merged + input[1] = [] + input[2] = false + input[3] = false + input[4] = true """ } } then { - def html_text = [ "
"] - def log_text = [ "Merged and filtered:", - "total reads: 75", - "total bases: 13683"] - def json_text = ['"merged_and_filtered": {', '"total_reads": 75', '"total_bases": 13683'] - def read1_lines = [ "@ERR5069949.1066259 NS500628:121:HK3MMAFX2:1:11312:18369:8333/1", - "CCTTATGACAGCAAGAACTGTGTATGATGATGGTGCTAGGAGAGTGTGGACACTTATGAATGTCTTGACACTCGTTTATAAAGTTTATTATGGTAATGCTTTAGATCAAGCCATTTCCATGTGGGCTCTTATAATCTCTGTTACTTC", - "AAAAAEAEEAEEEEEEEEEEEEEEEEAEEEEAEEEEEEEEAEEEEEEEEEEEEEEEEE/EAEEEEEE/6EEEEEEEEEEAEEAEEE/EE/AEEAEEEEEAEEEA/EEAAEAE - { assert path(process.out.reads.get(0).get(1).get(0)).linesGzip.contains(read1_line) } - } - }, - { read2_lines.each { read2_line -> - { assert path(process.out.reads.get(0).get(1).get(1)).linesGzip.contains(read2_line) } - } - }, - { read_merged_lines.each { read_merged_line -> - { assert path(process.out.reads_merged.get(0).get(1)).linesGzip.contains(read_merged_line) } - } - }, - { html_text.each { html_part -> - { assert path(process.out.html.get(0).get(1)).getText().contains(html_part) } - } - }, - { json_text.each { json_part -> - { assert path(process.out.json.get(0).get(1)).getText().contains(json_part) } - } - }, - { log_text.each { log_part -> - { assert path(process.out.log.get(0).get(1)).getText().contains(log_part) } - } - }, - { - assert snapshot( - ( - [process.out.reads[0][0].toString()] + // meta - process.out.reads.collect { it[1].collect { item -> file(item).getName() } } + - process.out.json.collect { file(it[1]).getName() } + - process.out.html.collect { file(it[1]).getName() } + - process.out.log.collect { file(it[1]).getName() } + - process.out.reads_fail.collect { file(it[1]).getName() } + - process.out.reads_merged.collect { file(it[1]).getName() } - ).sort() - ).match("test_fastp_paired_end_merged_match") - }, - { assert snapshot(process.out.versions).match("versions_paired_end_merged") } + { assert snapshot(process.out).match() } ) } } - test("test_fastp_paired_end_merged-stub") { + test("test_fastp_paired_end_merged_adapterlist - stub") { - options '-stub' + options "-stub" when { - params { - outdir = "$outputDir" - } process { """ - adapter_fasta = [] - save_trimmed_fail = false - save_merged = true - input[0] = Channel.of([ [ id:'test', single_end:false ], // meta map [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ] ]) - input[1] = adapter_fasta - input[2] = save_trimmed_fail - input[3] = save_merged + input[1] = Channel.of([ file(params.modules_testdata_base_path + 'delete_me/fastp/adapters.fasta', checkIfExists: true) ]) + input[2] = false + input[3] = false + input[4] = true """ } } @@ -625,101 +514,63 @@ nextflow_process { then { assertAll( { assert process.success }, - { - assert snapshot( - ( - [process.out.reads[0][0].toString()] + // meta - process.out.reads.collect { it[1].collect { item -> file(item).getName() } } + - process.out.json.collect { file(it[1]).getName() } + - process.out.html.collect { file(it[1]).getName() } + - process.out.log.collect { file(it[1]).getName() } + - process.out.reads_fail.collect { file(it[1]).getName() } + - process.out.reads_merged.collect { file(it[1]).getName() } - ).sort() - ).match("test_fastp_paired_end_merged-for_stub_match") - }, - { assert snapshot(process.out.versions).match("versions_paired_end_merged_stub") } + { assert snapshot(process.out).match() } ) } } - test("test_fastp_paired_end_merged_adapterlist") { + test("test_fastp_single_end_qc_only - stub") { + + options "-stub" when { - params { - outdir = "$outputDir" - } process { """ - adapter_fasta = Channel.of([ file(params.modules_testdata_base_path + 'delete_me/fastp/adapters.fasta', checkIfExists: true) ]) - save_trimmed_fail = false - save_merged = true + input[0] = Channel.of([ + [ id:'test', single_end:true ], + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ] + ]) + input[1] = [] + input[2] = true + input[3] = false + input[4] = false + """ + } + } + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("test_fastp_paired_end_qc_only - stub") { + + options "-stub" + + when { + process { + """ input[0] = Channel.of([ [ id:'test', single_end:false ], // meta map [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ] ]) - input[1] = adapter_fasta - input[2] = save_trimmed_fail - input[3] = save_merged + input[1] = [] + input[2] = true + input[3] = false + input[4] = false """ } } then { - def html_text = [ "
"] - def log_text = [ "Merged and filtered:", - "total reads: 75", - "total bases: 13683"] - def json_text = ['"merged_and_filtered": {', '"total_reads": 75', '"total_bases": 13683',"--adapter_fasta"] - def read1_lines = ["@ERR5069949.1066259 NS500628:121:HK3MMAFX2:1:11312:18369:8333/1", - "CCTTATGACAGCAAGAACTGTGTATGATGATGGTGCTAGGAGAGTGTGGACACTTATGAATGTCTTGACACTCGTTTATAAAGTTTATTATGGTAATGCTTTAGATCAAGCCATTTCCATGTGGGCTCTTATAATCTCTGTTACTTC", - "AAAAAEAEEAEEEEEEEEEEEEEEEEAEEEEAEEEEEEEEAEEEEEEEEEEEEEEEEE/EAEEEEEE/6EEEEEEEEEEAEEAEEE/EE/AEEAEEEEEAEEEA/EEAAEAE - { assert path(process.out.reads.get(0).get(1).get(0)).linesGzip.contains(read1_line) } - } - }, - { read2_lines.each { read2_line -> - { assert path(process.out.reads.get(0).get(1).get(1)).linesGzip.contains(read2_line) } - } - }, - { read_merged_lines.each { read_merged_line -> - { assert path(process.out.reads_merged.get(0).get(1)).linesGzip.contains(read_merged_line) } - } - }, - { html_text.each { html_part -> - { assert path(process.out.html.get(0).get(1)).getText().contains(html_part) } - } - }, - { json_text.each { json_part -> - { assert path(process.out.json.get(0).get(1)).getText().contains(json_part) } - } - }, - { log_text.each { log_part -> - { assert path(process.out.log.get(0).get(1)).getText().contains(log_part) } - } - }, - { assert snapshot(process.out.versions).match("versions_paired_end_merged_adapterlist") } + { assert snapshot(process.out).match() } ) } } -} +} \ No newline at end of file diff --git a/modules/nf-core/fastp/tests/main.nf.test.snap b/modules/nf-core/fastp/tests/main.nf.test.snap index 3e87628..54be7e4 100644 --- a/modules/nf-core/fastp/tests/main.nf.test.snap +++ b/modules/nf-core/fastp/tests/main.nf.test.snap @@ -1,55 +1,178 @@ { - "fastp test_fastp_interleaved_json": { + "test_fastp_single_end_qc_only - stub": { "content": [ - [ - [ - { - "id": "test", - "single_end": true - }, - "test.fastp.json:md5,b24e0624df5cc0b11cd5ba21b726fb22" + { + "0": [ + + ], + "1": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "3": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "4": [ + + ], + "5": [ + + ], + "6": [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" + ], + "html": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "json": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "log": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "reads": [ + + ], + "reads_fail": [ + + ], + "reads_merged": [ + + ], + "versions": [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" ] - ] + } ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nextflow": "24.04.2" }, - "timestamp": "2024-03-18T16:19:15.063001" + "timestamp": "2024-07-05T14:31:10.841098" }, - "test_fastp_paired_end_merged-for_stub_match": { + "test_fastp_paired_end": { "content": [ [ [ - "test_1.fastp.fastq.gz", - "test_2.fastp.fastq.gz" - ], - "test.fastp.html", - "test.fastp.json", - "test.fastp.log", - "test.merged.fastq.gz", - "{id=test, single_end=false}" + { + "id": "test", + "single_end": false + }, + "test.fastp.json:md5,1e0f8e27e71728e2b63fc64086be95cd" + ] + ], + [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test_1.fastp.fastq.gz:md5,67b2bbae47f073e05a97a9c2edce23c7", + "test_2.fastp.fastq.gz:md5,25cbdca08e2083dbd4f0502de6b62f39" + ] + ] + ], + [ + + ], + [ + + ], + [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" ] ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nextflow": "24.04.2" }, - "timestamp": "2024-01-17T18:10:13.467574" + "timestamp": "2024-07-05T13:43:28.665779" }, - "versions_interleaved": { + "test_fastp_paired_end_merged_adapterlist": { "content": [ + [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.json:md5,5914ca3f21ce162123a824e33e8564f6" + ] + ], + [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test_1.fastp.fastq.gz:md5,54b726a55e992a869fd3fa778afe1672", + "test_2.fastp.fastq.gz:md5,29d3b33b869f7b63417b8ff07bb128ba" + ] + ] + ], + [ + + ], + [ + [ + { + "id": "test", + "single_end": false + }, + "test.merged.fastq.gz:md5,c873bb1ab3fa859dcc47306465e749d5" + ] + ], [ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" ] ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nextflow": "24.04.2" }, - "timestamp": "2024-02-01T11:56:24.615634793" + "timestamp": "2024-07-05T13:44:18.210375" }, - "test_fastp_single_end_json": { + "test_fastp_single_end_qc_only": { "content": [ [ [ @@ -57,274 +180,1152 @@ "id": "test", "single_end": true }, - "test.fastp.json:md5,c852d7a6dba5819e4ac8d9673bedcacc" + "test.fastp.json:md5,5cc5f01e449309e0e689ed6f51a2294a" ] - ] - ], - "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" - }, - "timestamp": "2024-03-18T16:18:43.526412" - }, - "versions_paired_end": { - "content": [ + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ], [ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" ] ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nextflow": "24.04.2" }, - "timestamp": "2024-02-01T11:55:42.333545689" + "timestamp": "2024-07-05T13:44:27.380974" }, - "test_fastp_paired_end_match": { + "test_fastp_paired_end_trim_fail": { "content": [ [ [ - "test_1.fastp.fastq.gz", - "test_2.fastp.fastq.gz" - ], - "test.fastp.html", - "test.fastp.json", - "test.fastp.log", - "{id=test, single_end=false}" - ] - ], - "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" - }, - "timestamp": "2024-02-01T12:03:06.431833729" - }, - "test_fastp_interleaved-_match": { - "content": [ + { + "id": "test", + "single_end": false + }, + [ + "test_1.fastp.fastq.gz:md5,6ff32a64c5188b9a9192be1398c262c7", + "test_2.fastp.fastq.gz:md5,db0cb7c9977e94ac2b4b446ebd017a8a" + ] + ] + ], [ - "test.fastp.fastq.gz", - "test.fastp.html", - "test.fastp.json", - "test.fastp.log", - "{id=test, single_end=true}" - ] - ], - "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" - }, - "timestamp": "2024-03-18T16:19:15.111894" - }, - "test_fastp_paired_end_merged_match": { - "content": [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test.paired.fail.fastq.gz:md5,409b687c734cedd7a1fec14d316e1366", + "test_1.fail.fastq.gz:md5,4f273cf3159c13f79e8ffae12f5661f6", + "test_2.fail.fastq.gz:md5,f97b9edefb5649aab661fbc9e71fc995" + ] + ] + ], + [ + + ], [ [ - "test_1.fastp.fastq.gz", - "test_2.fastp.fastq.gz" - ], - "test.fastp.html", - "test.fastp.json", - "test.fastp.log", - "test.merged.fastq.gz", - "{id=test, single_end=false}" - ] - ], - "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" - }, - "timestamp": "2024-02-01T12:08:44.496251446" - }, - "versions_single_end_stub": { - "content": [ + { + "id": "test", + "single_end": false + }, + "test.fastp.json:md5,4c3268ddb50ea5b33125984776aa3519" + ] + ], [ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" ] ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nextflow": "24.04.2" }, - "timestamp": "2024-02-01T11:55:27.354051299" + "timestamp": "2024-07-05T13:43:58.749589" }, - "versions_interleaved-stub": { + "fastp - stub test_fastp_interleaved": { "content": [ - [ - "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" - ] + { + "0": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "3": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "4": [ + + ], + "5": [ + + ], + "6": [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" + ], + "html": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "json": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "log": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "reads": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "reads_fail": [ + + ], + "reads_merged": [ + + ], + "versions": [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" + ] + } ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nextflow": "24.04.2" }, - "timestamp": "2024-02-01T11:56:46.535528418" + "timestamp": "2024-07-05T13:50:00.270029" }, - "versions_single_end_trim_fail": { + "test_fastp_single_end - stub": { "content": [ - [ - "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" - ] + { + "0": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "3": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "4": [ + + ], + "5": [ + + ], + "6": [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" + ], + "html": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "json": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "log": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "reads": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "reads_fail": [ + + ], + "reads_merged": [ + + ], + "versions": [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" + ] + } ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nextflow": "24.04.2" }, - "timestamp": "2024-02-01T11:59:03.724591407" + "timestamp": "2024-07-05T13:49:42.502789" }, - "test_fastp_paired_end-for_stub_match": { + "test_fastp_paired_end_merged_adapterlist - stub": { "content": [ - [ - [ - "test_1.fastp.fastq.gz", - "test_2.fastp.fastq.gz" + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test_1.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test_2.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] ], - "test.fastp.html", - "test.fastp.json", - "test.fastp.log", - "{id=test, single_end=false}" - ] + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "3": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "4": [ + + ], + "5": [ + [ + { + "id": "test", + "single_end": false + }, + "test.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "6": [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "json": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "log": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "reads": [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test_1.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test_2.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "reads_fail": [ + + ], + "reads_merged": [ + [ + { + "id": "test", + "single_end": false + }, + "test.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "versions": [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" + ] + } ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nextflow": "24.04.2" }, - "timestamp": "2024-01-17T18:07:15.398827" + "timestamp": "2024-07-05T13:54:53.458252" }, - "versions_paired_end-stub": { + "test_fastp_paired_end_merged - stub": { "content": [ - [ - "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" - ] + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test_1.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test_2.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "3": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "4": [ + + ], + "5": [ + [ + { + "id": "test", + "single_end": false + }, + "test.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "6": [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "json": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "log": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "reads": [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test_1.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test_2.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "reads_fail": [ + + ], + "reads_merged": [ + [ + { + "id": "test", + "single_end": false + }, + "test.merged.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "versions": [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" + ] + } ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nextflow": "24.04.2" }, - "timestamp": "2024-02-01T11:56:06.50017282" + "timestamp": "2024-07-05T13:50:27.689379" }, - "versions_single_end": { + "test_fastp_paired_end_merged": { "content": [ [ - "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" - ] - ], - "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" - }, - "timestamp": "2024-02-01T11:55:07.67921647" - }, - "versions_paired_end_merged_stub": { - "content": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.json:md5,b712fd68ed0322f4bec49ff2a5237fcc" + ] + ], + [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test_1.fastp.fastq.gz:md5,54b726a55e992a869fd3fa778afe1672", + "test_2.fastp.fastq.gz:md5,29d3b33b869f7b63417b8ff07bb128ba" + ] + ] + ], + [ + + ], + [ + [ + { + "id": "test", + "single_end": false + }, + "test.merged.fastq.gz:md5,c873bb1ab3fa859dcc47306465e749d5" + ] + ], [ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" ] ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nextflow": "24.04.2" }, - "timestamp": "2024-02-01T11:59:47.350653154" + "timestamp": "2024-07-05T13:44:08.68476" }, - "test_fastp_interleaved-for_stub_match": { + "test_fastp_paired_end - stub": { "content": [ - [ - "test.fastp.fastq.gz", - "test.fastp.html", - "test.fastp.json", - "test.fastp.log", - "{id=test, single_end=true}" - ] + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test_1.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test_2.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "3": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "4": [ + + ], + "5": [ + + ], + "6": [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "json": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "log": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "reads": [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test_1.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test_2.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "reads_fail": [ + + ], + "reads_merged": [ + + ], + "versions": [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" + ] + } ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nextflow": "24.04.2" }, - "timestamp": "2024-01-17T18:08:06.127974" + "timestamp": "2024-07-05T13:49:51.679221" }, - "versions_paired_end_trim_fail": { + "test_fastp_single_end": { "content": [ + [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.json:md5,c852d7a6dba5819e4ac8d9673bedcacc" + ] + ], + [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.fastq.gz:md5,67b2bbae47f073e05a97a9c2edce23c7" + ] + ], + [ + + ], + [ + + ], [ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" ] ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nextflow": "24.04.2" }, - "timestamp": "2024-02-01T11:59:18.140484878" + "timestamp": "2024-07-05T13:43:18.834322" }, - "test_fastp_single_end-for_stub_match": { + "test_fastp_single_end_trim_fail - stub": { "content": [ - [ - "test.fastp.fastq.gz", - "test.fastp.html", - "test.fastp.json", - "test.fastp.log", - "{id=test, single_end=true}" - ] + { + "0": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "3": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "4": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "5": [ + + ], + "6": [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" + ], + "html": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "json": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "log": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "reads": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "reads_fail": [ + [ + { + "id": "test", + "single_end": true + }, + "test.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "reads_merged": [ + + ], + "versions": [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" + ] + } ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nextflow": "24.04.2" }, - "timestamp": "2024-01-17T18:06:00.244202" + "timestamp": "2024-07-05T14:05:36.898142" }, - "test_fastp_single_end-_match": { + "test_fastp_paired_end_trim_fail - stub": { "content": [ - [ - "test.fastp.fastq.gz", - "test.fastp.html", - "test.fastp.json", - "test.fastp.log", - "{id=test, single_end=true}" - ] + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test_1.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test_2.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "3": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "4": [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test.paired.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test_1.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test_2.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "5": [ + + ], + "6": [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "json": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "log": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "reads": [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test_1.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test_2.fastp.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "reads_fail": [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test.paired.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test_1.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test_2.fail.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "reads_merged": [ + + ], + "versions": [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" + ] + } ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nextflow": "24.04.2" }, - "timestamp": "2024-03-18T16:18:43.580336" + "timestamp": "2024-07-05T14:05:49.212847" }, - "versions_paired_end_merged_adapterlist": { + "fastp test_fastp_interleaved": { "content": [ + [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.fastq.gz:md5,217d62dc13a23e92513a1bd8e1bcea39" + ] + ], + [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.json:md5,b24e0624df5cc0b11cd5ba21b726fb22" + ] + ], [ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" ] ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nextflow": "24.04.2" }, - "timestamp": "2024-02-01T12:05:37.845370554" + "timestamp": "2024-07-05T13:43:38.910832" }, - "versions_paired_end_merged": { + "test_fastp_single_end_trim_fail": { "content": [ + [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.json:md5,9a7ee180f000e8d00c7fb67f06293eb5" + ] + ], + [ + [ + { + "id": "test", + "single_end": true + }, + "test.fastp.fastq.gz:md5,67b2bbae47f073e05a97a9c2edce23c7" + ] + ], + [ + [ + { + "id": "test", + "single_end": true + }, + "test.fail.fastq.gz:md5,3e4aaadb66a5b8fc9b881bf39c227abd" + ] + ], + [ + + ], [ "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" ] ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nextflow": "24.04.2" }, - "timestamp": "2024-02-01T11:59:32.860543858" + "timestamp": "2024-07-05T13:43:48.22378" }, - "test_fastp_single_end_trim_fail_json": { + "test_fastp_paired_end_qc_only": { "content": [ [ [ { "id": "test", - "single_end": true + "single_end": false }, - "test.fastp.json:md5,9a7ee180f000e8d00c7fb67f06293eb5" + "test.fastp.json:md5,623064a45912dac6f2b64e3f2e9901df" ] + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + + ], + [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" ] ], "meta": { "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nextflow": "24.04.2" + }, + "timestamp": "2024-07-05T13:44:36.334938" + }, + "test_fastp_paired_end_qc_only - stub": { + "content": [ + { + "0": [ + + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "3": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "4": [ + + ], + "5": [ + + ], + "6": [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "json": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.json:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "log": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fastp.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "reads": [ + + ], + "reads_fail": [ + + ], + "reads_merged": [ + + ], + "versions": [ + "versions.yml:md5,48ffc994212fb1fc9f83a74fa69c9f02" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.2" }, - "timestamp": "2024-01-17T18:08:41.942317" + "timestamp": "2024-07-05T14:31:27.096468" } } \ No newline at end of file diff --git a/modules/nf-core/fastqc/environment.yml b/modules/nf-core/fastqc/environment.yml index 1787b38..691d4c7 100644 --- a/modules/nf-core/fastqc/environment.yml +++ b/modules/nf-core/fastqc/environment.yml @@ -1,7 +1,5 @@ -name: fastqc channels: - conda-forge - bioconda - - defaults dependencies: - bioconda::fastqc=0.12.1 diff --git a/modules/nf-core/fastqc/main.nf b/modules/nf-core/fastqc/main.nf index 9e19a74..d8989f4 100644 --- a/modules/nf-core/fastqc/main.nf +++ b/modules/nf-core/fastqc/main.nf @@ -25,6 +25,14 @@ process FASTQC { def old_new_pairs = reads instanceof Path || reads.size() == 1 ? [[ reads, "${prefix}.${reads.extension}" ]] : reads.withIndex().collect { entry, index -> [ entry, "${prefix}_${index + 1}.${entry.extension}" ] } def rename_to = old_new_pairs*.join(' ').join(' ') def renamed_files = old_new_pairs.collect{ old_name, new_name -> new_name }.join(' ') + + // The total amount of allocated RAM by FastQC is equal to the number of threads defined (--threads) time the amount of RAM defined (--memory) + // https://github.com/s-andrews/FastQC/blob/1faeea0412093224d7f6a07f777fad60a5650795/fastqc#L211-L222 + // Dividing the task.memory by task.cpu allows to stick to requested amount of RAM in the label + def memory_in_mb = MemoryUnit.of("${task.memory}").toUnit('MB') / task.cpus + // FastQC memory value allowed range (100 - 10000) + def fastqc_memory = memory_in_mb > 10000 ? 10000 : (memory_in_mb < 100 ? 100 : memory_in_mb) + """ printf "%s %s\\n" $rename_to | while read old_name new_name; do [ -f "\${new_name}" ] || ln -s \$old_name \$new_name @@ -33,6 +41,7 @@ process FASTQC { fastqc \\ $args \\ --threads $task.cpus \\ + --memory $fastqc_memory \\ $renamed_files cat <<-END_VERSIONS > versions.yml diff --git a/modules/nf-core/fastqc/meta.yml b/modules/nf-core/fastqc/meta.yml index ee5507e..4827da7 100644 --- a/modules/nf-core/fastqc/meta.yml +++ b/modules/nf-core/fastqc/meta.yml @@ -16,35 +16,44 @@ tools: homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/ licence: ["GPL-2.0-only"] + identifier: biotools:fastqc input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - reads: - type: file - description: | - List of input FastQ files of size 1 and 2 for single-end and paired-end data, - respectively. + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - html: - type: file - description: FastQC report - pattern: "*_{fastqc.html}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.html": + type: file + description: FastQC report + pattern: "*_{fastqc.html}" - zip: - type: file - description: FastQC report archive - pattern: "*_{fastqc.zip}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.zip": + type: file + description: FastQC report archive + pattern: "*_{fastqc.zip}" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@drpatelh" - "@grst" diff --git a/modules/nf-core/fastqc/tests/main.nf.test b/modules/nf-core/fastqc/tests/main.nf.test index 70edae4..e9d79a0 100644 --- a/modules/nf-core/fastqc/tests/main.nf.test +++ b/modules/nf-core/fastqc/tests/main.nf.test @@ -23,17 +23,14 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - // NOTE The report contains the date inside it, which means that the md5sum is stable per day, but not longer than that. So you can't md5sum it. - // looks like this:
Mon 2 Oct 2023
test.gz
- // https://github.com/nf-core/modules/pull/3903#issuecomment-1743620039 - - { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_single") } + { assert process.success }, + // NOTE The report contains the date inside it, which means that the md5sum is stable per day, but not longer than that. So you can't md5sum it. + // looks like this:
Mon 2 Oct 2023
test.gz
+ // https://github.com/nf-core/modules/pull/3903#issuecomment-1743620039 + { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, + { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, + { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -54,16 +51,14 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" }, - { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" }, - { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" }, - { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" }, - { assert path(process.out.html[0][1][0]).text.contains("File typeConventional base calls") }, - { assert path(process.out.html[0][1][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_paired") } + { assert process.success }, + { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" }, + { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" }, + { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" }, + { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" }, + { assert path(process.out.html[0][1][0]).text.contains("File typeConventional base calls") }, + { assert path(process.out.html[0][1][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -83,13 +78,11 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_interleaved") } + { assert process.success }, + { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, + { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, + { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -109,13 +102,11 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_bam") } + { assert process.success }, + { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, + { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, + { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -138,22 +129,20 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" }, - { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" }, - { assert process.out.html[0][1][2] ==~ ".*/test_3_fastqc.html" }, - { assert process.out.html[0][1][3] ==~ ".*/test_4_fastqc.html" }, - { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" }, - { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" }, - { assert process.out.zip[0][1][2] ==~ ".*/test_3_fastqc.zip" }, - { assert process.out.zip[0][1][3] ==~ ".*/test_4_fastqc.zip" }, - { assert path(process.out.html[0][1][0]).text.contains("File typeConventional base calls") }, - { assert path(process.out.html[0][1][1]).text.contains("File typeConventional base calls") }, - { assert path(process.out.html[0][1][2]).text.contains("File typeConventional base calls") }, - { assert path(process.out.html[0][1][3]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_multiple") } + { assert process.success }, + { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" }, + { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" }, + { assert process.out.html[0][1][2] ==~ ".*/test_3_fastqc.html" }, + { assert process.out.html[0][1][3] ==~ ".*/test_4_fastqc.html" }, + { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" }, + { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" }, + { assert process.out.zip[0][1][2] ==~ ".*/test_3_fastqc.zip" }, + { assert process.out.zip[0][1][3] ==~ ".*/test_4_fastqc.zip" }, + { assert path(process.out.html[0][1][0]).text.contains("File typeConventional base calls") }, + { assert path(process.out.html[0][1][1]).text.contains("File typeConventional base calls") }, + { assert path(process.out.html[0][1][2]).text.contains("File typeConventional base calls") }, + { assert path(process.out.html[0][1][3]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -173,21 +162,18 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1] ==~ ".*/mysample_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/mysample_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_custom_prefix") } + { assert process.success }, + { assert process.out.html[0][1] ==~ ".*/mysample_fastqc.html" }, + { assert process.out.zip[0][1] ==~ ".*/mysample_fastqc.zip" }, + { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } test("sarscov2 single-end [fastq] - stub") { - options "-stub" - + options "-stub" when { process { """ @@ -201,12 +187,123 @@ nextflow_process { then { assertAll ( - { assert process.success }, - { assert snapshot(process.out.html.collect { file(it[1]).getName() } + - process.out.zip.collect { file(it[1]).getName() } + - process.out.versions ).match("fastqc_stub") } + { assert process.success }, + { assert snapshot(process.out).match() } ) } } + test("sarscov2 paired-end [fastq] - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ] + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("sarscov2 interleaved [fastq] - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_interleaved.fastq.gz', checkIfExists: true) + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("sarscov2 paired-end [bam] - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam', checkIfExists: true) + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("sarscov2 multiple [fastq] - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test2_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test2_2.fastq.gz', checkIfExists: true) ] + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("sarscov2 custom_prefix - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [ id:'mysample', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } } diff --git a/modules/nf-core/fastqc/tests/main.nf.test.snap b/modules/nf-core/fastqc/tests/main.nf.test.snap index 86f7c31..d5db309 100644 --- a/modules/nf-core/fastqc/tests/main.nf.test.snap +++ b/modules/nf-core/fastqc/tests/main.nf.test.snap @@ -1,88 +1,392 @@ { - "fastqc_versions_interleaved": { + "sarscov2 custom_prefix": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:40:07.293713" + "timestamp": "2024-07-22T11:02:16.374038" }, - "fastqc_stub": { + "sarscov2 single-end [fastq] - stub": { "content": [ - [ - "test.html", - "test.zip", - "versions.yml:md5,e1cc25ca8af856014824abd842e93978" - ] + { + "0": [ + [ + { + "id": "test", + "single_end": true + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": true + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": true + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": true + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:02:24.993809" + }, + "sarscov2 custom_prefix - stub": { + "content": [ + { + "0": [ + [ + { + "id": "mysample", + "single_end": true + }, + "mysample.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "mysample", + "single_end": true + }, + "mysample.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "mysample", + "single_end": true + }, + "mysample.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "mysample", + "single_end": true + }, + "mysample.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:31:01.425198" + "timestamp": "2024-07-22T11:03:10.93942" }, - "fastqc_versions_multiple": { + "sarscov2 interleaved [fastq]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:40:55.797907" + "timestamp": "2024-07-22T11:01:42.355718" }, - "fastqc_versions_bam": { + "sarscov2 paired-end [bam]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:40:26.795862" + "timestamp": "2024-07-22T11:01:53.276274" }, - "fastqc_versions_single": { + "sarscov2 multiple [fastq]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:39:27.043675" + "timestamp": "2024-07-22T11:02:05.527626" }, - "fastqc_versions_paired": { + "sarscov2 paired-end [fastq]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:01:31.188871" + }, + "sarscov2 paired-end [fastq] - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:02:34.273566" + }, + "sarscov2 multiple [fastq] - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:39:47.584191" + "timestamp": "2024-07-22T11:03:02.304411" }, - "fastqc_versions_custom_prefix": { + "sarscov2 single-end [fastq]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:01:19.095607" + }, + "sarscov2 interleaved [fastq] - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:02:44.640184" + }, + "sarscov2 paired-end [bam] - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:41:14.576531" + "timestamp": "2024-07-22T11:02:53.550742" } } \ No newline at end of file diff --git a/modules/nf-core/kraken2/kraken2/environment.yml b/modules/nf-core/kraken2/kraken2/environment.yml index 63be419..ba776d3 100644 --- a/modules/nf-core/kraken2/kraken2/environment.yml +++ b/modules/nf-core/kraken2/kraken2/environment.yml @@ -1,8 +1,7 @@ -name: kraken2_kraken2 channels: - conda-forge - bioconda - - defaults dependencies: - - bioconda::kraken2=2.1.2 - - conda-forge::pigz=2.6 + - "bioconda::kraken2=2.1.3" + - "coreutils=9.4" + - "pigz=2.8" diff --git a/modules/nf-core/kraken2/kraken2/main.nf b/modules/nf-core/kraken2/kraken2/main.nf index 92cd9c3..364a6fe 100644 --- a/modules/nf-core/kraken2/kraken2/main.nf +++ b/modules/nf-core/kraken2/kraken2/main.nf @@ -4,8 +4,8 @@ process KRAKEN2_KRAKEN2 { conda "${moduleDir}/environment.yml" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/mulled-v2-5799ab18b5fc681e75923b2450abaa969907ec98:87fc08d11968d081f3e8a37131c1f1f6715b6542-0' : - 'biocontainers/mulled-v2-5799ab18b5fc681e75923b2450abaa969907ec98:87fc08d11968d081f3e8a37131c1f1f6715b6542-0' }" + 'https://depot.galaxyproject.org/singularity/mulled-v2-8706a1dd73c6cc426e12dd4dd33a5e917b3989ae:c8cbdc8ff4101e6745f8ede6eb5261ef98bdaff4-0' : + 'biocontainers/mulled-v2-8706a1dd73c6cc426e12dd4dd33a5e917b3989ae:c8cbdc8ff4101e6745f8ede6eb5261ef98bdaff4-0' }" input: tuple val(meta), path(reads) diff --git a/modules/nf-core/kraken2/kraken2/meta.yml b/modules/nf-core/kraken2/kraken2/meta.yml index 7909ffe..8693764 100644 --- a/modules/nf-core/kraken2/kraken2/meta.yml +++ b/modules/nf-core/kraken2/kraken2/meta.yml @@ -13,63 +13,84 @@ tools: documentation: https://github.com/DerrickWood/kraken2/wiki/Manual doi: 10.1186/s13059-019-1891-0 licence: ["MIT"] + identifier: biotools:kraken2 input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - reads: - type: file - description: | - List of input FastQ files of size 1 and 2 for single-end and paired-end data, - respectively. - - db: - type: directory - description: Kraken2 database - - save_output_fastqs: - type: string - description: | - If true, optional commands are added to save classified and unclassified reads - as fastq files - - save_reads_assignment: - type: string - description: | - If true, an optional command is added to save a file reporting the taxonomic - classification of each input read + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. + - - db: + type: directory + description: Kraken2 database + - - save_output_fastqs: + type: string + description: | + If true, optional commands are added to save classified and unclassified reads + as fastq files + - - save_reads_assignment: + type: string + description: | + If true, an optional command is added to save a file reporting the taxonomic + classification of each input read output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - classified_reads_fastq: - type: file - description: | - Reads classified as belonging to any of the taxa - on the Kraken2 database. - pattern: "*{fastq.gz}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.classified{.,_}*": + type: file + description: | + Reads classified as belonging to any of the taxa + on the Kraken2 database. + pattern: "*{fastq.gz}" - unclassified_reads_fastq: - type: file - description: | - Reads not classified to any of the taxa - on the Kraken2 database. - pattern: "*{fastq.gz}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.unclassified{.,_}*": + type: file + description: | + Reads not classified to any of the taxa + on the Kraken2 database. + pattern: "*{fastq.gz}" - classified_reads_assignment: - type: file - description: | - Kraken2 output file indicating the taxonomic assignment of - each input read + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*classifiedreads.txt": + type: file + description: | + Kraken2 output file indicating the taxonomic assignment of + each input read - report: - type: file - description: | - Kraken2 report containing stats about classified - and not classifed reads. - pattern: "*.{report.txt}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*report.txt": + type: file + description: | + Kraken2 report containing stats about classified + and not classifed reads. + pattern: "*.{report.txt}" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@joseespinosa" - "@drpatelh" diff --git a/modules/nf-core/kraken2/kraken2/tests/main.nf.test b/modules/nf-core/kraken2/kraken2/tests/main.nf.test index 4c51302..c0843df 100644 --- a/modules/nf-core/kraken2/kraken2/tests/main.nf.test +++ b/modules/nf-core/kraken2/kraken2/tests/main.nf.test @@ -16,7 +16,7 @@ nextflow_process { input[0] = Channel.of([ [], file( - params.test_data['sarscov2']['genome']['kraken2_tar_gz'], + params.modules_testdata_base_path + "genomics/sarscov2/genome/db/kraken2.tar.gz", checkIfExists: true ) ]) @@ -32,7 +32,7 @@ nextflow_process { input[0] = [ [ id:'test', single_end:true ], // meta map [ file( - params.test_data['sarscov2']['illumina']['test_1_fastq_gz'], + params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_1.fastq.gz", checkIfExists: true )] ] @@ -69,16 +69,16 @@ nextflow_process { [ id:'test', single_end:false ], // meta map [ file( - params.test_data['sarscov2']['illumina']['test_1_fastq_gz'], + params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_1.fastq.gz", checkIfExists: true ), file( - params.test_data['sarscov2']['illumina']['test_2_fastq_gz'], + params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_2.fastq.gz", checkIfExists: true ) + ] ] - ] input[1] = UNTAR.out.untar.map{ it[1] } input[2] = true input[3] = false @@ -117,7 +117,7 @@ nextflow_process { input[0] = [ [ id:'test', single_end:true ], // meta map [ file( - params.test_data['sarscov2']['illumina']['test_1_fastq_gz'], + params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_1.fastq.gz", checkIfExists: true )] ] diff --git a/modules/nf-core/kraken2/kraken2/tests/main.nf.test.snap b/modules/nf-core/kraken2/kraken2/tests/main.nf.test.snap index c1bdd0c..b432f87 100644 --- a/modules/nf-core/kraken2/kraken2/tests/main.nf.test.snap +++ b/modules/nf-core/kraken2/kraken2/tests/main.nf.test.snap @@ -11,10 +11,14 @@ ] ], [ - "versions.yml:md5,bcb3e2520685846df02bb27cc6b1794b" + "versions.yml:md5,79adf2ca1cfc625cb77e391b27142c43" ] ], - "timestamp": "2023-10-25T09:01:29.775797" + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-04-04T18:47:03.745692" }, "sarscov2 illumina paired end [fastq]": { "content": [ @@ -28,10 +32,14 @@ ] ], [ - "versions.yml:md5,bcb3e2520685846df02bb27cc6b1794b" + "versions.yml:md5,79adf2ca1cfc625cb77e391b27142c43" ] ], - "timestamp": "2023-10-25T09:01:37.025389" + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-04-04T18:47:13.75649" }, "sarscov2 illumina single end [fastq] + save_reads_assignment": { "content": [ @@ -54,9 +62,13 @@ ] ], [ - "versions.yml:md5,bcb3e2520685846df02bb27cc6b1794b" + "versions.yml:md5,79adf2ca1cfc625cb77e391b27142c43" ] ], - "timestamp": "2023-10-25T09:01:45.775262" + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-04-04T18:47:22.459465" } } \ No newline at end of file diff --git a/modules/nf-core/multiqc/environment.yml b/modules/nf-core/multiqc/environment.yml index ca39fb6..6f5b867 100644 --- a/modules/nf-core/multiqc/environment.yml +++ b/modules/nf-core/multiqc/environment.yml @@ -1,7 +1,5 @@ -name: multiqc channels: - conda-forge - bioconda - - defaults dependencies: - - bioconda::multiqc=1.21 + - bioconda::multiqc=1.25.1 diff --git a/modules/nf-core/multiqc/main.nf b/modules/nf-core/multiqc/main.nf index 47ac352..cc0643e 100644 --- a/modules/nf-core/multiqc/main.nf +++ b/modules/nf-core/multiqc/main.nf @@ -3,14 +3,16 @@ process MULTIQC { conda "${moduleDir}/environment.yml" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/multiqc:1.21--pyhdfd78af_0' : - 'biocontainers/multiqc:1.21--pyhdfd78af_0' }" + 'https://depot.galaxyproject.org/singularity/multiqc:1.25.1--pyhdfd78af_0' : + 'biocontainers/multiqc:1.25.1--pyhdfd78af_0' }" input: path multiqc_files, stageAs: "?/*" path(multiqc_config) path(extra_multiqc_config) path(multiqc_logo) + path(replace_names) + path(sample_names) output: path "*multiqc_report.html", emit: report @@ -23,16 +25,22 @@ process MULTIQC { script: def args = task.ext.args ?: '' + def prefix = task.ext.prefix ? "--filename ${task.ext.prefix}.html" : '' def config = multiqc_config ? "--config $multiqc_config" : '' def extra_config = extra_multiqc_config ? "--config $extra_multiqc_config" : '' - def logo = multiqc_logo ? /--cl-config 'custom_logo: "${multiqc_logo}"'/ : '' + def logo = multiqc_logo ? "--cl-config 'custom_logo: \"${multiqc_logo}\"'" : '' + def replace = replace_names ? "--replace-names ${replace_names}" : '' + def samples = sample_names ? "--sample-names ${sample_names}" : '' """ multiqc \\ --force \\ $args \\ $config \\ + $prefix \\ $extra_config \\ $logo \\ + $replace \\ + $samples \\ . cat <<-END_VERSIONS > versions.yml @@ -44,7 +52,7 @@ process MULTIQC { stub: """ mkdir multiqc_data - touch multiqc_plots + mkdir multiqc_plots touch multiqc_report.html cat <<-END_VERSIONS > versions.yml diff --git a/modules/nf-core/multiqc/meta.yml b/modules/nf-core/multiqc/meta.yml index 45a9bc3..b16c187 100644 --- a/modules/nf-core/multiqc/meta.yml +++ b/modules/nf-core/multiqc/meta.yml @@ -1,5 +1,6 @@ name: multiqc -description: Aggregate results from bioinformatics analyses across many samples into a single report +description: Aggregate results from bioinformatics analyses across many samples into + a single report keywords: - QC - bioinformatics tools @@ -12,40 +13,59 @@ tools: homepage: https://multiqc.info/ documentation: https://multiqc.info/docs/ licence: ["GPL-3.0-or-later"] + identifier: biotools:multiqc input: - - multiqc_files: - type: file - description: | - List of reports / files recognised by MultiQC, for example the html and zip output of FastQC - - multiqc_config: - type: file - description: Optional config yml for MultiQC - pattern: "*.{yml,yaml}" - - extra_multiqc_config: - type: file - description: Second optional config yml for MultiQC. Will override common sections in multiqc_config. - pattern: "*.{yml,yaml}" - - multiqc_logo: - type: file - description: Optional logo file for MultiQC - pattern: "*.{png}" + - - multiqc_files: + type: file + description: | + List of reports / files recognised by MultiQC, for example the html and zip output of FastQC + - - multiqc_config: + type: file + description: Optional config yml for MultiQC + pattern: "*.{yml,yaml}" + - - extra_multiqc_config: + type: file + description: Second optional config yml for MultiQC. Will override common sections + in multiqc_config. + pattern: "*.{yml,yaml}" + - - multiqc_logo: + type: file + description: Optional logo file for MultiQC + pattern: "*.{png}" + - - replace_names: + type: file + description: | + Optional two-column sample renaming file. First column a set of + patterns, second column a set of corresponding replacements. Passed via + MultiQC's `--replace-names` option. + pattern: "*.{tsv}" + - - sample_names: + type: file + description: | + Optional TSV file with headers, passed to the MultiQC --sample_names + argument. + pattern: "*.{tsv}" output: - report: - type: file - description: MultiQC report file - pattern: "multiqc_report.html" + - "*multiqc_report.html": + type: file + description: MultiQC report file + pattern: "multiqc_report.html" - data: - type: directory - description: MultiQC data dir - pattern: "multiqc_data" + - "*_data": + type: directory + description: MultiQC data dir + pattern: "multiqc_data" - plots: - type: file - description: Plots created by MultiQC - pattern: "*_data" + - "*_plots": + type: file + description: Plots created by MultiQC + pattern: "*_data" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@abhi18av" - "@bunop" diff --git a/modules/nf-core/multiqc/tests/main.nf.test b/modules/nf-core/multiqc/tests/main.nf.test index f1c4242..33316a7 100644 --- a/modules/nf-core/multiqc/tests/main.nf.test +++ b/modules/nf-core/multiqc/tests/main.nf.test @@ -8,6 +8,8 @@ nextflow_process { tag "modules_nfcore" tag "multiqc" + config "./nextflow.config" + test("sarscov2 single-end [fastqc]") { when { @@ -17,6 +19,8 @@ nextflow_process { input[1] = [] input[2] = [] input[3] = [] + input[4] = [] + input[5] = [] """ } } @@ -41,6 +45,8 @@ nextflow_process { input[1] = Channel.of(file("https://github.com/nf-core/tools/raw/dev/nf_core/pipeline-template/assets/multiqc_config.yml", checkIfExists: true)) input[2] = [] input[3] = [] + input[4] = [] + input[5] = [] """ } } @@ -66,6 +72,8 @@ nextflow_process { input[1] = [] input[2] = [] input[3] = [] + input[4] = [] + input[5] = [] """ } } diff --git a/modules/nf-core/multiqc/tests/main.nf.test.snap b/modules/nf-core/multiqc/tests/main.nf.test.snap index bfebd80..2fcbb5f 100644 --- a/modules/nf-core/multiqc/tests/main.nf.test.snap +++ b/modules/nf-core/multiqc/tests/main.nf.test.snap @@ -2,14 +2,14 @@ "multiqc_versions_single": { "content": [ [ - "versions.yml:md5,21f35ee29416b9b3073c28733efe4b7d" + "versions.yml:md5,41f391dcedce7f93ca188f3a3ffa0916" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-02-29T08:48:55.657331" + "timestamp": "2024-10-02T17:51:46.317523" }, "multiqc_stub": { "content": [ @@ -17,25 +17,25 @@ "multiqc_report.html", "multiqc_data", "multiqc_plots", - "versions.yml:md5,21f35ee29416b9b3073c28733efe4b7d" + "versions.yml:md5,41f391dcedce7f93ca188f3a3ffa0916" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-02-29T08:49:49.071937" + "timestamp": "2024-10-02T17:52:20.680978" }, "multiqc_versions_config": { "content": [ [ - "versions.yml:md5,21f35ee29416b9b3073c28733efe4b7d" + "versions.yml:md5,41f391dcedce7f93ca188f3a3ffa0916" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-02-29T08:49:25.457567" + "timestamp": "2024-10-02T17:52:09.185842" } } \ No newline at end of file diff --git a/modules/nf-core/multiqc/tests/nextflow.config b/modules/nf-core/multiqc/tests/nextflow.config new file mode 100644 index 0000000..c537a6a --- /dev/null +++ b/modules/nf-core/multiqc/tests/nextflow.config @@ -0,0 +1,5 @@ +process { + withName: 'MULTIQC' { + ext.prefix = null + } +} diff --git a/nextflow.config b/nextflow.config index 17eda24..6b50e60 100644 --- a/nextflow.config +++ b/nextflow.config @@ -11,38 +11,54 @@ params { // Input options input = null + // References genome = 'GRCh38' igenomes_base = 's3://ngi-igenomes/igenomes/' igenomes_ignore = false + saveReference = true - fasta = null + fasta_blastn = null + fasta_bbduk = null // Workflow parameters - enable_filter = false - filter_trimmed = false - skip_blastn = false - filter_with_kraken2 = (params.enable_filter && params.skip_blastn) ? true : false - save_intermediates = false + preprocessing = (params.filter_trimmed) ? true : false + enable_filter = false + filter_trimmed = false + output_removed_reads = false + classification_kraken2 = false + classification_bbduk = false + validation_blastn = false + filter_with_classification = (params.enable_filter && !params.validation_blastn) ? true : false + classification_kraken2_post_filtering = false + save_intermediates = false // fastp parameter reads_minlength = 0 fastp_save_trimmed_fail = false fastp_qualified_quality = 0 - fastp_cut_mean_quality = 15 + fastp_cut_mean_quality = 1 + fastp_eval_duplication = false save_clipped_reads = false + // bbduk parameter + bbduk_kmers = 27 + // Kraken2preparation parameter - kraken2db = 'https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20231009.tar.gz' + kraken2db = 'https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20240904.tar.gz' // Kraken2 parameter - save_output_fastqs = false - kraken2confidence = 0.05 + save_output_fastqs = false + save_output_fastqs_filtered = false + save_output_fastqs_removed = false + kraken2confidence = 0.00 + kraken2confidence_filtered = 0.00 + kraken2confidence_removed = 0.00 // Taxon to check for/to filter in kraken2 format, it has to be present in the supplied kraken2 database - tax2filter = 'Homo' + tax2filter = 'Homo sapiens' // Parameters for the classification if a read is assigned to the taxa2filter or not - cutoff_tax2filter = 2 - cutoff_tax2keep = 0.5 + cutoff_tax2filter = 0 + cutoff_tax2keep = 0.0 cutoff_unclassified = 0.0 // BLASTN parameters @@ -50,6 +66,10 @@ params { blast_identity = 40.0 blast_evalue = 0.01 + // Generate downstream samplesheets + generate_downstream_samplesheets = false + generate_pipeline_samplesheets = 'taxprofiler,mag' + // MultiQC options multiqc_config = null multiqc_title = null @@ -58,173 +78,159 @@ params { multiqc_methods_description = null // Boilerplate options - outdir = 'results' - publish_dir_mode = 'copy' - email = null - email_on_fail = null - plaintext_email = false - monochrome_logs = false - hook_url = null - help = false - version = false + outdir = null + publish_dir_mode = 'copy' + email = null + email_on_fail = null + plaintext_email = false + monochrome_logs = false + hook_url = null + help = false + help_full = false + show_hidden = false + version = false + pipelines_testdata_base_path = 'https://raw.githubusercontent.com/nf-core/test-datasets/' + // Config options config_profile_name = null config_profile_description = null + custom_config_version = 'master' custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" config_profile_contact = null config_profile_url = null - // Max resource options - // Defaults only, expecting to be overwritten - max_memory = '128.GB' - max_cpus = 16 - max_time = '240.h' - // Schema validation default options - validationFailUnrecognisedParams = false - validationLenientMode = false - validationSchemaIgnoreParams = 'genomes,igenomes_base' - validationShowHiddenParams = false - validate_params = true - + validate_params = true } // Load base.config by default for all pipelines includeConfig 'conf/base.config' -// Load nf-core custom profiles from different Institutions -try { - includeConfig "${params.custom_config_base}/nfcore_custom.config" -} catch (Exception e) { - System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") -} - -// Load nf-core/detaxizer custom profiles from different institutions. -// Warning: Uncomment only if a pipeline-specific institutional config already exists on nf-core/configs! -// try { -// includeConfig "${params.custom_config_base}/pipeline/detaxizer.config" -// } catch (Exception e) { -// System.err.println("WARNING: Could not load nf-core/config/detaxizer profiles: ${params.custom_config_base}/pipeline/detaxizer.config") -// } profiles { debug { - dumpHashes = true - process.beforeScript = 'echo $HOSTNAME' - cleanup = false + dumpHashes = true + process.beforeScript = 'echo $HOSTNAME' + cleanup = false nextflow.enable.configProcessNamesValidation = true } conda { - conda.enabled = true - docker.enabled = false - singularity.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false - channels = ['conda-forge', 'bioconda', 'defaults'] - apptainer.enabled = false + conda.enabled = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + conda.channels = ['conda-forge', 'bioconda'] + apptainer.enabled = false } mamba { - conda.enabled = true - conda.useMamba = true - docker.enabled = false - singularity.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false - apptainer.enabled = false + conda.enabled = true + conda.useMamba = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + apptainer.enabled = false } docker { - docker.enabled = true - conda.enabled = false - singularity.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false - apptainer.enabled = false - docker.runOptions = '-u $(id -u):$(id -g)' + docker.enabled = true + conda.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + apptainer.enabled = false + docker.runOptions = '-u $(id -u):$(id -g)' } arm { - docker.runOptions = '-u $(id -u):$(id -g) --platform=linux/amd64' + docker.runOptions = '-u $(id -u):$(id -g) --platform=linux/amd64' } singularity { - singularity.enabled = true - singularity.autoMounts = true - conda.enabled = false - docker.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false - apptainer.enabled = false + singularity.enabled = true + singularity.autoMounts = true + conda.enabled = false + docker.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + apptainer.enabled = false } podman { - podman.enabled = true - conda.enabled = false - docker.enabled = false - singularity.enabled = false - shifter.enabled = false - charliecloud.enabled = false - apptainer.enabled = false + podman.enabled = true + conda.enabled = false + docker.enabled = false + singularity.enabled = false + shifter.enabled = false + charliecloud.enabled = false + apptainer.enabled = false } shifter { - shifter.enabled = true - conda.enabled = false - docker.enabled = false - singularity.enabled = false - podman.enabled = false - charliecloud.enabled = false - apptainer.enabled = false + shifter.enabled = true + conda.enabled = false + docker.enabled = false + singularity.enabled = false + podman.enabled = false + charliecloud.enabled = false + apptainer.enabled = false } charliecloud { - charliecloud.enabled = true - conda.enabled = false - docker.enabled = false - singularity.enabled = false - podman.enabled = false - shifter.enabled = false - apptainer.enabled = false + charliecloud.enabled = true + conda.enabled = false + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + apptainer.enabled = false } apptainer { - apptainer.enabled = true - apptainer.autoMounts = true - conda.enabled = false - docker.enabled = false - singularity.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false + apptainer.enabled = true + apptainer.autoMounts = true + conda.enabled = false + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + wave { + apptainer.ociAutoPull = true + singularity.ociAutoPull = true + wave.enabled = true + wave.freeze = true + wave.strategy = 'conda,container' } gitpod { - executor.name = 'local' - executor.cpus = 4 - executor.memory = 8.GB + executor.name = 'local' + executor.cpus = 4 + executor.memory = 8.GB } test { includeConfig 'conf/test.config' } - test_skip_blastn { includeConfig 'conf/test_skip_blastn.config' } + test_blastn { includeConfig 'conf/test_blastn.config' } test_filter_preprocessed { includeConfig 'conf/test_filter_preprocessed.config' } test_full { includeConfig 'conf/test_full.config' } } -// Set default registry for Apptainer, Docker, Podman and Singularity independent of -profile -// Will not be used unless Apptainer / Docker / Podman / Singularity are enabled -// Set to your registry if you have a mirror of containers -apptainer.registry = 'quay.io' -docker.registry = 'quay.io' -podman.registry = 'quay.io' -singularity.registry = 'quay.io' +// Load nf-core custom profiles from different Institutions +includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/nfcore_custom.config" : "/dev/null" -// Nextflow plugins -plugins { - id 'nf-validation@1.1.3' // Validation of pipeline parameters and creation of an input channel from a sample sheet -} +// Load nf-core/detaxizer custom profiles from different institutions. +includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/pipeline/detaxizer.config" : "/dev/null" + +// Set default registry for Apptainer, Docker, Podman, Charliecloud and Singularity independent of -profile +// Will not be used unless Apptainer / Docker / Podman / Charliecloud / Singularity are enabled +// Set to your registry if you have a mirror of containers +apptainer.registry = 'quay.io' +docker.registry = 'quay.io' +podman.registry = 'quay.io' +singularity.registry = 'quay.io' +charliecloud.registry = 'quay.io' // Load igenomes.config if required -if (!params.igenomes_ignore) { - includeConfig 'conf/igenomes.config' -} else { - params.genomes = [:] -} +includeConfig !params.igenomes_ignore ? 'conf/igenomes.config' : 'conf/igenomes_ignored.config' + // Export these variables to prevent local Python/R libraries from conflicting with those in the container // The JULIA depot path has been adjusted to a fixed path `/usr/local/share/julia` that needs to be used for packages in the container. // See https://apeltzer.github.io/post/03-julia-lang-nextflow/ for details on that. Once we have a common agreement on where to keep Julia packages, this is adjustable. @@ -236,8 +242,15 @@ env { JULIA_DEPOT_PATH = "/usr/local/share/julia" } -// Capture exit codes from upstream processes when piping -process.shell = ['/bin/bash', '-euo', 'pipefail'] +// Set bash options +process.shell = """\ +bash + +set -e # Exit if a tool returns a non-zero status/exit code +set -u # Treat unset variables and parameters as an error +set -o pipefail # Returns the status of the last command to exit with a non-zero status or zero if all successfully execute +set -C # No clobber - prevent output redirection from overwriting files. +""" // Disable process selector warnings by default. Use debug profile to enable warnings. nextflow.enable.configProcessNamesValidation = false @@ -264,45 +277,48 @@ manifest { name = 'nf-core/detaxizer' author = """Jannik Seidel""" homePage = 'https://github.com/nf-core/detaxizer' - description = """A pipeline to identify (and remove) certain sequences from raw genomic data. Default taxa to identify (and remove) are Homo and Homo sapiens. Removal is optional.""" + description = """A pipeline to identify (and remove) certain sequences from raw genomic data. Default taxon to identify (and remove) is Homo sapiens. Removal is optional.""" mainScript = 'main.nf' - nextflowVersion = '!>=23.04.0' - version = '1.0.0' + nextflowVersion = '!>=24.04.2' + version = '1.1.0' doi = '10.5281/zenodo.10877147' } -// Load modules.config for DSL2 module specific options -includeConfig 'conf/modules.config' +// Nextflow plugins +plugins { + id 'nf-schema@2.1.1' // Validation of pipeline parameters and creation of an input channel from a sample sheet +} + +validation { + defaultIgnoreParams = ["genomes"] + help { + enabled = true + command = "nextflow run $manifest.name -profile --input samplesheet.csv --outdir " + fullParameter = "help_full" + showHiddenParameter = "show_hidden" + beforeText = """ +-\033[2m----------------------------------------------------\033[0m- + \033[0;32m,--.\033[0;30m/\033[0;32m,-.\033[0m +\033[0;34m ___ __ __ __ ___ \033[0;32m/,-._.--~\'\033[0m +\033[0;34m |\\ | |__ __ / ` / \\ |__) |__ \033[0;33m} {\033[0m +\033[0;34m | \\| | \\__, \\__/ | \\ |___ \033[0;32m\\`-._,-`-,\033[0m + \033[0;32m`._,._,\'\033[0m +\033[0;35m ${manifest.name} ${manifest.version}\033[0m +-\033[2m----------------------------------------------------\033[0m- +""" + afterText = """${manifest.doi ? "* The pipeline\n" : ""}${manifest.doi.tokenize(",").collect { " https://doi.org/${it.trim().replace('https://doi.org/','')}"}.join("\n")}${manifest.doi ? "\n" : ""} +* The nf-core framework + https://doi.org/10.1038/s41587-020-0439-x -// Function to ensure that resource requirements don't go beyond -// a maximum limit -def check_max(obj, type) { - if (type == 'memory') { - try { - if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) - return params.max_memory as nextflow.util.MemoryUnit - else - return obj - } catch (all) { - println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'time') { - try { - if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) - return params.max_time as nextflow.util.Duration - else - return obj - } catch (all) { - println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'cpus') { - try { - return Math.min( obj, params.max_cpus as int ) - } catch (all) { - println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" - return obj - } +* Software dependencies + https://github.com/${manifest.name}/blob/master/CITATIONS.md +""" + } + summary { + beforeText = validation.help.beforeText + afterText = validation.help.afterText } } + +// Load modules.config for DSL2 module specific options +includeConfig 'conf/modules.config' diff --git a/nextflow_schema.json b/nextflow_schema.json index 4e4e961..0af8e7f 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -1,10 +1,10 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/nf-core/detaxizer/master/nextflow_schema.json", "title": "nf-core/detaxizer pipeline parameters", - "description": "A pipeline to identify (and remove) certain sequences from raw genomic data. Default taxa to identify (and remove) are Homo and Homo sapiens. Removal is optional.", + "description": "A pipeline to identify (and remove) certain sequences from raw genomic data. Default taxon to identify (and remove) is Homo sapiens. Removal is optional.", "type": "object", - "definitions": { + "$defs": { "input_output_options": { "title": "Input/output options", "type": "object", @@ -27,8 +27,7 @@ "type": "string", "format": "directory-path", "description": "The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.", - "fa_icon": "fas fa-folder-open", - "default": "results" + "fa_icon": "fas fa-folder-open" }, "email": { "type": "string", @@ -50,26 +49,44 @@ "description": "Parameters which enable/disable certain steps used in the workflow.", "default": "", "properties": { + "preprocessing": { + "type": "boolean", + "description": "If preprocessing with fastp should be turned on." + }, + "classification_bbduk": { + "type": "boolean", + "description": "Signifies that bbduk is used in the classification process. Can be combined with the 'classification_kraken2' parameter to run both." + }, + "classification_kraken2": { + "type": "boolean", + "description": "Signifies that kraken2 is used in the classification process. Can be combined with the 'classification_bbduk' parameter to run both. For kraken2 alone no parameter is needed." + }, + "validation_blastn": { + "type": "boolean", + "description": "If a validation of the classified reads via blastn should be carried out." + }, + "classification_kraken2_post_filtering": { + "type": "boolean", + "description": "If the filtered reads should be classified with kraken2." + }, + "filter_with_classification": { + "type": "boolean", + "description": "When a validation via blastn is wanted but the filtering should use the IDs from the classification process." + }, "enable_filter": { "type": "boolean", "description": "If the filtering step should be carried out.", "help_text": "If set to `True` the filter is used. Otherwise only assessing is performed." }, + "output_removed_reads": { + "type": "boolean", + "description": "If the removed reads should also be written to the output folder." + }, "filter_trimmed": { "type": "boolean", "description": "If the pre-processed reads should be used by the filter.", "help_text": "If set to `True` the the pre-proccesed reads are used for filtering. Else the raw reads are used." }, - "filter_with_kraken2": { - "type": "boolean", - "help_text": "If this is set to True the kraken2 output is used for filtering.", - "description": "If the output of kraken2 should be used for filtering." - }, - "skip_blastn": { - "type": "boolean", - "help_text": "Defines if blastn should be used or not. By default blastn is used in the workflow. If also the filter is enabled the reads are filtered using the output from blastn.", - "description": "If blastn should be skipped." - }, "save_intermediates": { "type": "boolean", "description": "Save intermediates to the results folder.", @@ -78,6 +95,23 @@ }, "fa_icon": "fas fa-angle-double-right" }, + "bbduk": { + "title": "bbduk", + "type": "object", + "description": "Parameter to customize bbduk execution", + "default": "", + "properties": { + "fasta_bbduk": { + "type": "string", + "description": "Location of the fasta which contains the contaminant sequences." + }, + "bbduk_kmers": { + "type": "integer", + "default": 27, + "description": "Length of k-mers for classification carried out by bbduk" + } + } + }, "kraken2": { "title": "kraken2", "type": "object", @@ -86,7 +120,7 @@ "properties": { "kraken2db": { "type": "string", - "default": "https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20231009.tar.gz", + "default": "https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20240904.tar.gz", "help_text": "For input how to use this parameter to fine-tune the step see the kraken2 section in the [usage documentation](https://nf-co.re/detaxizer/docs/usage#kraken2)", "description": "The database which is used in the classification step." }, @@ -95,20 +129,38 @@ "description": "Save unclassified reads and classified reads (those assigned to any taxon, not specifically assessed or filtered) to separate files.", "hidden": true }, + "save_output_fastqs_filtered": { + "type": "boolean", + "description": "Save unclassified reads and classified reads (those assigned to any taxon, not specifically assessed or filtered) to separate files. For the filtered reads." + }, + "save_output_fastqs_removed": { + "type": "boolean", + "description": "Save unclassified reads and classified reads (those assigned to any taxon, not specifically assessed or filtered) to separate files. For the removed reads." + }, "kraken2confidence": { "type": "number", - "default": 0.05, + "default": 0.0, "description": "Confidence in the classification of a read as a certain taxon.", "help_text": "Refer to https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#confidence-scoring for details." }, + "kraken2confidence_filtered": { + "type": "number", + "default": 0.0, + "description": "Confidence in the classification of a read as a certain taxon. For the filtered reads." + }, + "kraken2confidence_removed": { + "type": "number", + "default": 0.0, + "description": "Confidence in the classification of a read as a certain taxon. For the removed reads." + }, "cutoff_tax2filter": { "type": "integer", - "default": 2, + "default": 0, "description": "If a read has less k-mers assigned to the taxon/taxa to be assessed/to be filtered the read is ignored by the pipeline." }, "cutoff_tax2keep": { "type": "number", - "default": 0.5, + "default": 0.0, "minimum": 0, "maximum": 1, "description": "Ratio per read of assigned to tax2filter k-mers to k-mers assigned to any other taxon (except unclassified).", @@ -116,7 +168,7 @@ }, "cutoff_unclassified": { "type": "number", - "default": 0, + "default": 0.0, "description": "Ratio per read of assigned to tax2filter k-mers to unclassified k-mers.", "minimum": 0, "maximum": 1, @@ -124,7 +176,7 @@ }, "tax2filter": { "type": "string", - "default": "Homo", + "default": "Homo sapiens", "description": "The taxon or taxonomic group to be assessed or filtered by the pipeline.", "help_text": "If a whole taxonomic group should be assessed/filtered use the highest taxonomic name in the hierarchy. E.g. if you want to assess for/filter out the whole taxonomic subtree from Mammalia onward provide this parameter with the string 'Mammalia'." } @@ -137,9 +189,13 @@ "description": "Parameters to fine-tune the output of blastn.", "default": "", "properties": { + "fasta_blastn": { + "type": "string", + "description": "Location of the fasta from which the blastn database will be constructed." + }, "blast_coverage": { "type": "number", - "default": 40, + "default": 40.0, "description": "Coverage is the percentage of the query sequence which can be found in the alignments of the sequence match. It can be used to fine-tune the validation step." }, "blast_evalue": { @@ -149,7 +205,7 @@ }, "blast_identity": { "type": "number", - "default": 40, + "default": 40.0, "description": "Identity is the percentage of the exact matches in the query and the sequence found in the database. The parameter can be used to fine-tune the validation step." } }, @@ -177,9 +233,13 @@ }, "fastp_cut_mean_quality": { "type": "integer", - "default": 15, + "default": 1, "description": "fastp option to define the mean quality for trimming" }, + "fastp_eval_duplication": { + "type": "boolean", + "description": "fastp option if duplicates should be filtered or not before classification" + }, "save_clipped_reads": { "type": "boolean", "description": "fastp option to define if the clipped reads should be saved" @@ -200,15 +260,6 @@ "help_text": "If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. `--genome GRCh38`. \n\nSee the [nf-core website docs](https://nf-co.re/usage/reference_genomes) for more details.", "default": "GRCh38" }, - "fasta": { - "type": "string", - "format": "file-path", - "exists": true, - "mimetype": "text/plain", - "description": "Path to FASTA genome file.", - "help_text": "This parameter is *mandatory* if `--genome` is not specified. If you don't have a BWA index available this will be generated for you automatically. Combine with `--save_reference` to save BWA index for future runs.", - "fa_icon": "far fa-file-code" - }, "igenomes_ignore": { "type": "boolean", "description": "Do not load the iGenomes reference config.", @@ -222,10 +273,29 @@ }, "igenomes_base": { "type": "string", - "description": "Directory / URL base for iGenomes references.", - "default": "s3://ngi-igenomes/igenomes/", "format": "directory-path", - "fa_icon": "fas fa-cloud-download-alt" + "description": "The base path to the igenomes reference files", + "fa_icon": "fas fa-ban", + "hidden": true, + "default": "s3://ngi-igenomes/igenomes/" + } + } + }, + "generate_samplesheet_options": { + "title": "Downstream pipeline samplesheet generation options", + "type": "object", + "fa_icon": "fas fa-university", + "description": "Options for generating input samplesheets for complementary downstream pipelines.", + "properties": { + "generate_downstream_samplesheets": { + "type": "boolean", + "description": "Turn on generation of samplesheets for downstream pipelines." + }, + "generate_pipeline_samplesheets": { + "type": "string", + "default": "taxprofiler,mag", + "description": "Specify a comma separated string in quotes to specify which pipeline to generate a samplesheet for.", + "pattern": "^(taxprofiler|mag)(?:,(taxprofiler|mag)){0,1}" } } }, @@ -277,41 +347,6 @@ } } }, - "max_job_request_options": { - "title": "Max job request options", - "type": "object", - "fa_icon": "fab fa-acquisitions-incorporated", - "description": "Set the top limit for requested resources for any single job.", - "help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details.", - "properties": { - "max_cpus": { - "type": "integer", - "description": "Maximum number of CPUs that can be requested for any single job.", - "default": 16, - "fa_icon": "fas fa-microchip", - "hidden": true, - "help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`" - }, - "max_memory": { - "type": "string", - "description": "Maximum amount of memory that can be requested for any single job.", - "default": "128.GB", - "fa_icon": "fas fa-memory", - "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$", - "hidden": true, - "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" - }, - "max_time": { - "type": "string", - "description": "Maximum amount of time that can be requested for any single job.", - "default": "240.h", - "fa_icon": "far fa-clock", - "pattern": "^(\\d+\\.?\\s*(s|m|h|d|day)\\s*)+$", - "hidden": true, - "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" - } - } - }, "generic_options": { "title": "Generic options", "type": "object", @@ -319,12 +354,6 @@ "description": "Less common options for the pipeline, typically set in a config file.", "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.", "properties": { - "help": { - "type": "boolean", - "description": "Display help text.", - "fa_icon": "fas fa-question-circle", - "hidden": true - }, "version": { "type": "boolean", "description": "Display version and exit.", @@ -400,57 +429,46 @@ "fa_icon": "fas fa-check-square", "hidden": true }, - "validationShowHiddenParams": { - "type": "boolean", - "fa_icon": "far fa-eye-slash", - "description": "Show all params when using `--help`", - "hidden": true, - "help_text": "By default, parameters set as _hidden_ in the schema are not shown on the command line when a user runs with `--help`. Specifying this option will tell the pipeline to show all parameters." - }, - "validationFailUnrecognisedParams": { - "type": "boolean", - "fa_icon": "far fa-check-circle", - "description": "Validation of parameters fails when an unrecognised parameter is found.", - "hidden": true, - "help_text": "By default, when an unrecognised parameter is found, it returns a warinig." - }, - "validationLenientMode": { - "type": "boolean", + "pipelines_testdata_base_path": { + "type": "string", "fa_icon": "far fa-check-circle", - "description": "Validation of parameters in lenient more.", - "hidden": true, - "help_text": "Allows string values that are parseable as numbers or booleans. For further information see [JSONSchema docs](https://github.com/everit-org/json-schema#lenient-mode)." + "description": "Base URL or local path to location of pipeline test dataset files", + "default": "https://raw.githubusercontent.com/nf-core/test-datasets/", + "hidden": true } } } }, "allOf": [ { - "$ref": "#/definitions/input_output_options" + "$ref": "#/$defs/input_output_options" + }, + { + "$ref": "#/$defs/general_workflow_parameters" }, { - "$ref": "#/definitions/general_workflow_parameters" + "$ref": "#/$defs/bbduk" }, { - "$ref": "#/definitions/kraken2" + "$ref": "#/$defs/kraken2" }, { - "$ref": "#/definitions/blastn" + "$ref": "#/$defs/blastn" }, { - "$ref": "#/definitions/fastp_options" + "$ref": "#/$defs/fastp_options" }, { - "$ref": "#/definitions/reference_genome_options" + "$ref": "#/$defs/reference_genome_options" }, { - "$ref": "#/definitions/institutional_config_options" + "$ref": "#/$defs/generate_samplesheet_options" }, { - "$ref": "#/definitions/max_job_request_options" + "$ref": "#/$defs/institutional_config_options" }, { - "$ref": "#/definitions/generic_options" + "$ref": "#/$defs/generic_options" } ] } diff --git a/nf-test.config b/nf-test.config new file mode 100644 index 0000000..80c6412 --- /dev/null +++ b/nf-test.config @@ -0,0 +1,20 @@ +config { + // location for all nf-tests + testsDir "tests/" + + // nf-test directory including temporary files for each test + workDir ".nf-test" + + // location of library folder that is added automatically to the classpath + libDir "lib/" + + // location of an optional nextflow.config file specific for executing tests + configFile "nextflow.config" + + // run all test with the defined docker profile from the main nextflow.config + profile "" + + plugins { + load "nft-csv@0.1.0" + } +} diff --git a/pyproject.toml b/pyproject.toml deleted file mode 100644 index 5611062..0000000 --- a/pyproject.toml +++ /dev/null @@ -1,15 +0,0 @@ -# Config file for Python. Mostly used to configure linting of bin/*.py with Ruff. -# Should be kept the same as nf-core/tools to avoid fighting with template synchronisation. -[tool.ruff] -line-length = 120 -target-version = "py38" -cache-dir = "~/.cache/ruff" - -[tool.ruff.lint] -select = ["I", "E1", "E4", "E7", "E9", "F", "UP", "N"] - -[tool.ruff.lint.isort] -known-first-party = ["nf_core"] - -[tool.ruff.lint.per-file-ignores] -"__init__.py" = ["E402", "F401"] diff --git a/subworkflows/local/generate_downstream_samplesheets/main.nf b/subworkflows/local/generate_downstream_samplesheets/main.nf new file mode 100644 index 0000000..db3ae1a --- /dev/null +++ b/subworkflows/local/generate_downstream_samplesheets/main.nf @@ -0,0 +1,121 @@ +// +// Subworkflow with functionality specific to the nf-core/createtaxdb pipeline +// + +workflow SAMPLESHEET_TAXPROFILER { + take: + ch_reads + + main: + format = 'csv' // most common format in nf-core + + // Make your samplesheet channel construct here depending on your downstream + ch_list_for_samplesheet = ch_reads + .map { + meta, reads -> + def out_path = file(params.outdir).toString() + '/filter/filtered/' + def sample = meta.id + def run_accession = meta.id - "_longReads" + def instrument_platform = !meta.long_reads ? "ILLUMINA" : "OXFORD_NANOPORE" + def fastq_1 = meta.single_end ? out_path + reads.getName() : out_path + reads[0].getName() + def fastq_2 = !meta.single_end ? out_path + reads[1].getName() : "" + def fasta = "" + [ sample: sample, run_accession:run_accession, instrument_platform:instrument_platform, fastq_1:fastq_1, fastq_2:fastq_2, fasta:fasta ] + } + + channelToSamplesheet(ch_list_for_samplesheet,"${params.outdir}/downstream_samplesheets/taxprofiler", format) + +} + +workflow SAMPLESHEET_MAG { + // + // MAG doesn't take PE-data & SE-data in a single samplesheet + // + take: + ch_reads + + main: + format = 'csv' // most common format in nf-core + + // Combine the short and long reads belonging to the same sample + ch_reads + .map{meta, reads -> + tuple( groupKey(meta.id - "_longReads", 2), meta, reads) + } + .groupTuple(remainder: true) + .map{key, meta, reads -> + new_meta = [ + id: key, + run: key, + single_end: meta[0].single_end, + long_reads: meta[0]?.long_reads ?: meta[1]?.long_reads ?: false + ] + // Making sure the long reads are the final element of the array. + read_files = reads.flatten().sort(false){ a, b -> a.getName().tokenize('.')[0] <=> b.getName().tokenize('.')[0] } + [new_meta, read_files] + } + .tap{ ch_reads_grouped } + + // Make your samplesheet channel construct here depending on your downstream + ch_list_for_samplesheet = ch_reads_grouped + .map { + meta, reads -> + def out_path = file(params.outdir).toString() + '/filter/filtered/' + def sample = meta.id + def run = meta.run + def group = "" // only used for co-abundance in binning + def short_reads_1 = meta.long_reads == (reads.size() > 2) ? out_path + reads[0].getName() : "" // If long reads, but no short reads, then short_reads_1 is empty + def short_reads_2 = meta.long_reads == (reads.size() > 2) && reads[1] ? out_path + reads[1].getName() : "" + def long_reads = meta.long_reads ? out_path + reads.last().getName() : "" // If long reads, take final element + [sample: sample, run: run, group: group, short_reads_1: short_reads_1, short_reads_2: short_reads_2, long_reads: long_reads] + } + .tap{ ch_list_for_samplesheet_all } + .filter{ it.short_reads_1!="" } // MAG doesn't support standalone long reads + .branch{ + se: it.short_reads_2 =="" + pe: true + } + + // Throw a warning that only long reads are not supported yet by MAG + ch_list_for_samplesheet_all + .filter{ it.long_reads !="" && it.short_reads_1=="" } + .collect{ log.warn("Standalone long reads are not yet supported by the nf-core/mag pipeline and ARE REMOVED from the samplesheet 'mag-{se,pe}.csv' \n sample: ${it.sample}" )} + + channelToSamplesheet(ch_list_for_samplesheet.pe,"${params.outdir}/downstream_samplesheets/mag-pe", format) + channelToSamplesheet(ch_list_for_samplesheet.se, "${params.outdir}/downstream_samplesheets/mag-se", format) + +} + +workflow GENERATE_DOWNSTREAM_SAMPLESHEETS { + take: + ch_reads + + main: + def downstreampipeline_names = params.generate_pipeline_samplesheets.split(",") + + if ( downstreampipeline_names.contains('taxprofiler')) { + SAMPLESHEET_TAXPROFILER(ch_reads) + } + if ( downstreampipeline_names.contains('mag')) { + SAMPLESHEET_MAG(ch_reads) + } + +} + + +// Constructs the header string and then the strings of each row, and +def channelToSamplesheet(ch_list_for_samplesheet, path, format) { + format_sep = ["csv":",", "tsv":"\t", "txt":"\t"][format] + + ch_header = ch_list_for_samplesheet + + ch_header + .first() + .map{ it.keySet().join(format_sep) } + .concat( ch_list_for_samplesheet.map{ it.values().join(format_sep) }) + .collectFile( + name:"${path}.${format}", + newLine: true, + sort: false + ) +} diff --git a/subworkflows/local/generate_downstream_samplesheets/tests/main.nf.test b/subworkflows/local/generate_downstream_samplesheets/tests/main.nf.test new file mode 100644 index 0000000..67fe2b5 --- /dev/null +++ b/subworkflows/local/generate_downstream_samplesheets/tests/main.nf.test @@ -0,0 +1,65 @@ +nextflow_workflow { + name "Test Subworkflow GENERATE_DOWNSTREAM_SAMPLESHEETS" + script "../main.nf" + workflow "GENERATE_DOWNSTREAM_SAMPLESHEETS" + + tag "subworkflows" + tag "subworkflows_local" + tag "subworkflows/generate_downstream_samplesheets" + + test("reads - taxprofiler,mag") { + + when { + params { + modules_testdata_base_path = "https://raw.githubusercontent.com/nf-core/test-datasets/detaxizer/test_data/" + outdir = "." + generate_pipeline_samplesheets = 'taxprofiler,mag' + } + workflow { + """ + input[0] = Channel.of( + [ + [id:'test_paired-end_plus_long-reads_longReads', single_end:true, long_reads:true, amount_of_files:1], + file(params.modules_testdata_base_path + 'subset350.fq.gz',checkIfExists: true) + ], + [ + [id:'test_paired-end_plus_long-reads', single_end:false, long_reads:false, amount_of_files:2], + [ + file(params.modules_testdata_base_path + 'test_minigut_sample2_hg38host_R1.fastq.gz',checkIfExists: true), + file(params.modules_testdata_base_path + 'test_minigut_sample2_hg38host_R2.fastq.gz',checkIfExists: true) + ] + ], + [ + [id:'test_single-end_long_longReads', single_end:true, long_reads:true, amount_of_files:1], + file(params.modules_testdata_base_path + 'subset350.fq.gz',checkIfExists: true) + ], + [ + [id:'test_single-end_short', single_end:true, long_reads:false, amount_of_files:1], + file(params.modules_testdata_base_path + 'test_minigut_sample2_hg38host_R1.fastq.gz',checkIfExists: true) + ], + [ + [id:'test_paired-end', single_end:false, long_reads:false, amount_of_files:2], + [ + file(params.modules_testdata_base_path + 'test_minigut_sample2_hg38host_R1.fastq.gz',checkIfExists: true), + file(params.modules_testdata_base_path + 'test_minigut_sample2_hg38host_R2.fastq.gz',checkIfExists: true) + ] + ] + ) + """ + } + } + + then { + assertAll( + { assert workflow.success}, + { assert snapshot( + [ + "${params.outdir}/downstream_samplesheets/taxprofiler.csv", + "${params.outdir}/downstream_samplesheets/mag-pe.csv", + "${params.outdir}/downstream_samplesheets/mag-se.csv" + ]).match() + }, + ) + } + } +} diff --git a/subworkflows/local/generate_downstream_samplesheets/tests/main.nf.test.snap b/subworkflows/local/generate_downstream_samplesheets/tests/main.nf.test.snap new file mode 100644 index 0000000..090471c --- /dev/null +++ b/subworkflows/local/generate_downstream_samplesheets/tests/main.nf.test.snap @@ -0,0 +1,16 @@ +{ + "reads - taxprofiler,mag": { + "content": [ + [ + "./downstream_samplesheets/taxprofiler.csv", + "./downstream_samplesheets/mag-pe.csv", + "./downstream_samplesheets/mag-se.csv" + ] + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.4" + }, + "timestamp": "2024-10-15T09:05:21.506341802" + } +} \ No newline at end of file diff --git a/subworkflows/local/utils_nfcore_detaxizer_pipeline/main.nf b/subworkflows/local/utils_nfcore_detaxizer_pipeline/main.nf index fce7771..bd1e031 100644 --- a/subworkflows/local/utils_nfcore_detaxizer_pipeline/main.nf +++ b/subworkflows/local/utils_nfcore_detaxizer_pipeline/main.nf @@ -8,29 +8,25 @@ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -include { UTILS_NFVALIDATION_PLUGIN } from '../../nf-core/utils_nfvalidation_plugin' -include { paramsSummaryMap } from 'plugin/nf-validation' -include { fromSamplesheet } from 'plugin/nf-validation' -include { UTILS_NEXTFLOW_PIPELINE } from '../../nf-core/utils_nextflow_pipeline' +include { UTILS_NFSCHEMA_PLUGIN } from '../../nf-core/utils_nfschema_plugin' +include { paramsSummaryMap } from 'plugin/nf-schema' +include { samplesheetToList } from 'plugin/nf-schema' include { completionEmail } from '../../nf-core/utils_nfcore_pipeline' include { completionSummary } from '../../nf-core/utils_nfcore_pipeline' -include { dashedLine } from '../../nf-core/utils_nfcore_pipeline' -include { nfCoreLogo } from '../../nf-core/utils_nfcore_pipeline' include { imNotification } from '../../nf-core/utils_nfcore_pipeline' include { UTILS_NFCORE_PIPELINE } from '../../nf-core/utils_nfcore_pipeline' -include { workflowCitation } from '../../nf-core/utils_nfcore_pipeline' +include { UTILS_NEXTFLOW_PIPELINE } from '../../nf-core/utils_nextflow_pipeline' /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW TO INITIALISE PIPELINE -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow PIPELINE_INITIALISATION { take: version // boolean: Display version and exit - help // boolean: Display help text validate_params // boolean: Boolean whether to validate parameters against the schema at runtime monochrome_logs // boolean: Do not use coloured log outputs nextflow_cli_args // array: List of positional nextflow CLI args @@ -54,16 +50,10 @@ workflow PIPELINE_INITIALISATION { // // Validate parameters and generate parameter summary to stdout // - pre_help_text = nfCoreLogo(monochrome_logs) - post_help_text = '\n' + workflowCitation() + '\n' + dashedLine(monochrome_logs) - def String workflow_command = "nextflow run ${workflow.manifest.name} -profile --input samplesheet.csv --outdir " - UTILS_NFVALIDATION_PLUGIN ( - help, - workflow_command, - pre_help_text, - post_help_text, + UTILS_NFSCHEMA_PLUGIN ( + workflow, validate_params, - "nextflow_schema.json" + null ) // @@ -72,6 +62,7 @@ workflow PIPELINE_INITIALISATION { UTILS_NFCORE_PIPELINE ( nextflow_cli_args ) + // // Custom validation for pipeline parameters // @@ -80,8 +71,9 @@ workflow PIPELINE_INITIALISATION { // // Create channel from input file provided through params.input // + Channel - .fromSamplesheet("input") + .fromList(samplesheetToList(params.input, "${projectDir}/assets/schema_input.json")) .set { ch_samplesheet } emit: @@ -90,9 +82,9 @@ workflow PIPELINE_INITIALISATION { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW FOR PIPELINE COMPLETION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow PIPELINE_COMPLETION { @@ -107,7 +99,6 @@ workflow PIPELINE_COMPLETION { multiqc_report // string: Path to MultiQC report main: - summary_params = paramsSummaryMap(workflow, parameters_schema: "nextflow_schema.json") // @@ -115,27 +106,42 @@ workflow PIPELINE_COMPLETION { // workflow.onComplete { if (email || email_on_fail) { - completionEmail(summary_params, email, email_on_fail, plaintext_email, outdir, monochrome_logs, multiqc_report.toList()) + completionEmail( + summary_params, + email, + email_on_fail, + plaintext_email, + outdir, + monochrome_logs, + multiqc_report.toList() + ) } completionSummary(monochrome_logs) - if (hook_url) { imNotification(summary_params, hook_url) } } + + workflow.onError { + log.error "Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting" + } } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // // Check and validate pipeline parameters // def validateInputParameters() { genomeExistsError() + + if (params.generate_downstream_samplesheets && !params.generate_pipeline_samplesheets) { + error('[nf-core/detaxizer] If supplying `--generate_downstream_samplesheets`, you must also specify which pipeline to generate for with `--generate_pipeline_samplesheets! Check input.') + } } // @@ -145,7 +151,7 @@ def validateInputSamplesheet(input) { def (metas, fastqs) = input[1..2] // Check that multiple runs of the same sample are of the same datatype i.e. single-end / paired-end - def endedness_ok = metas.collect{ it.single_end }.unique().size == 1 + def endedness_ok = metas.collect{ meta -> meta.single_end }.unique().size == 1 if (!endedness_ok) { error("Please check input samplesheet -> Multiple runs of a sample must be of the same datatype i.e. single-end or paired-end: ${metas[0].id}") } @@ -177,7 +183,6 @@ def genomeExistsError() { error(error_string) } } - // // Generate methods description for MultiQC // @@ -186,10 +191,11 @@ def toolCitationText() { def citation_text = [ "Tools used in the workflow included:", "FastQC (Andrews 2010),", - "fastp (Chen et al. 2018)", - "Kraken2 (Wood et al. 2019),", - !params["skip_blastn"] ? "BLAST (Altschul et al. 1990)," : "", - !params["skip_blastn"] | params["enable_filter"] ? "seqkit (Shen et al. 2016)," : "", + params["preprocessing"] ? "fastp (Chen et al. 2018),": "", + params["classification_kraken2"] | !params["classification_bbduk"] & !params["classification_kraken2"] ? "Kraken2 (Wood et al. 2019)," : "", + params["classification_bbduk"] ? "BBMap (Bushnell B. 2022)," : "", + params["validation_blastn"] ? "BLAST (Altschul et al. 1990)," : "", + params["validation_blastn"] | params["enable_filter"] | params["classification_bbduk"] ? "seqkit (Shen et al. 2016)," : "", "MultiQC (Ewels et al. 2016)", "." ].join(' ').trim() @@ -200,11 +206,12 @@ def toolCitationText() { def toolBibliographyText() { def reference_text = [ - "
  • Andrews S, (2010) FastQC, URL: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
  • ", - "
  • Chen, S., Zhou, Y., Chen, Y. & Gu, J. (2018) fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890. doi: 10.1093/bioinformatics/bty560
  • ", - "
  • Wood, D. E., Lu, J. & Langmead, B. (2019) Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257. doi: 10.1186/s13059-019-1891-0
  • ", - !params["skip_blastn"] ? "
  • Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990) Basic local alignment search tool. Journal of Molecular Biology 215, 403–410. doi: 10.1016/s0022-2836(05)80360-2.
  • " : "", - !params["skip_blastn"] | params["enable_filter"] ? "
  • Shen, W., Le, S., Li, Y., & Hu, F. (2016). SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. In Q. Zou (Ed.), PLOS ONE (Vol. 11, Issue 10, p. e0163962). Public Library of Science (PLoS). doi: 10.1371/journal.pone.0163962
  • " : "", + "
  • Andrews, S. (2010) FastQC, URL: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
  • ", + params["preprocessing"] ? "
  • Chen, S., Zhou, Y., Chen, Y. & Gu, J. (2018) fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890. doi: 10.1093/bioinformatics/bty560
  • " : "", + params["classification_kraken2"] | !params["classification_bbduk"] & !params["classification_kraken2"] ? "
  • Wood, D. E., Lu, J. & Langmead, B. (2019) Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257. doi: 10.1186/s13059-019-1891-0
  • " : "", + params["classification_bbduk"] ? "
  • Bushnell, B. (2022) BBMap, URL: http://sourceforge.net/projects/bbmap/
  • " : "", + params["validation_blastn"] ? "
  • Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990) Basic local alignment search tool. Journal of Molecular Biology 215, 403–410. doi: 10.1016/s0022-2836(05)80360-2.
  • " : "", + params["validation_blastn"] | params["enable_filter"] | params["classification_bbduk"] ? "
  • Shen, W., Le, S., Li, Y., & Hu, F. (2016). SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. In Q. Zou (Ed.), PLOS ONE (Vol. 11, Issue 10, p. e0163962). Public Library of Science (PLoS). doi: 10.1371/journal.pone.0163962
  • " : "", "
  • Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. doi: /10.1093/bioinformatics/btw354
  • " ].join(' ').trim() @@ -218,8 +225,18 @@ def methodsDescriptionText(mqc_methods_yaml) { meta["manifest_map"] = workflow.manifest.toMap() // Pipeline DOI - meta["doi_text"] = meta.manifest_map.doi ? "(doi: ${meta.manifest_map.doi})" : "" - meta["nodoi_text"] = meta.manifest_map.doi ? "": "
  • If available, make sure to update the text to include the Zenodo DOI of version of the pipeline used.
  • " + if (meta.manifest_map.doi) { + // Using a loop to handle multiple DOIs + // Removing `https://doi.org/` to handle pipelines using DOIs vs DOI resolvers + // Removing ` ` since the manifest.doi is a string and not a proper list + def temp_doi_ref = "" + def manifest_doi = meta.manifest_map.doi.tokenize(",") + manifest_doi.each { doi_ref -> + temp_doi_ref += "(doi: ${doi_ref.replace("https://doi.org/", "").replace(" ", "")}), " + } + meta["doi_text"] = temp_doi_ref.substring(0, temp_doi_ref.length() - 2) + } else meta["doi_text"] = "" + meta["nodoi_text"] = meta.manifest_map.doi ? "" : "
  • If available, make sure to update the text to include the Zenodo DOI of version of the pipeline used.
  • " // Tool references meta["tool_citations"] = "" @@ -236,3 +253,4 @@ def methodsDescriptionText(mqc_methods_yaml) { return description_html.toString() } + diff --git a/subworkflows/nf-core/utils_nextflow_pipeline/main.nf b/subworkflows/nf-core/utils_nextflow_pipeline/main.nf index ac31f28..0fcbf7b 100644 --- a/subworkflows/nf-core/utils_nextflow_pipeline/main.nf +++ b/subworkflows/nf-core/utils_nextflow_pipeline/main.nf @@ -2,18 +2,13 @@ // Subworkflow with functionality that may be useful for any Nextflow pipeline // -import org.yaml.snakeyaml.Yaml -import groovy.json.JsonOutput -import nextflow.extension.FilesEx - /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW DEFINITION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow UTILS_NEXTFLOW_PIPELINE { - take: print_version // boolean: print version dump_parameters // boolean: dump parameters @@ -26,7 +21,7 @@ workflow UTILS_NEXTFLOW_PIPELINE { // Print workflow version and exit on --version // if (print_version) { - log.info "${workflow.manifest.name} ${getWorkflowVersion()}" + log.info("${workflow.manifest.name} ${getWorkflowVersion()}") System.exit(0) } @@ -49,16 +44,16 @@ workflow UTILS_NEXTFLOW_PIPELINE { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // // Generate version string // def getWorkflowVersion() { - String version_string = "" + def version_string = "" as String if (workflow.manifest.version) { def prefix_v = workflow.manifest.version[0] != 'v' ? 'v' : '' version_string += "${prefix_v}${workflow.manifest.version}" @@ -76,13 +71,13 @@ def getWorkflowVersion() { // Dump pipeline parameters to a JSON file // def dumpParametersToJSON(outdir) { - def timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss') - def filename = "params_${timestamp}.json" - def temp_pf = new File(workflow.launchDir.toString(), ".${filename}") - def jsonStr = JsonOutput.toJson(params) - temp_pf.text = JsonOutput.prettyPrint(jsonStr) + def timestamp = new java.util.Date().format('yyyy-MM-dd_HH-mm-ss') + def filename = "params_${timestamp}.json" + def temp_pf = new File(workflow.launchDir.toString(), ".${filename}") + def jsonStr = groovy.json.JsonOutput.toJson(params) + temp_pf.text = groovy.json.JsonOutput.prettyPrint(jsonStr) - FilesEx.copyTo(temp_pf.toPath(), "${outdir}/pipeline_info/params_${timestamp}.json") + nextflow.extension.FilesEx.copyTo(temp_pf.toPath(), "${outdir}/pipeline_info/params_${timestamp}.json") temp_pf.delete() } @@ -90,37 +85,40 @@ def dumpParametersToJSON(outdir) { // When running with -profile conda, warn if channels have not been set-up appropriately // def checkCondaChannels() { - Yaml parser = new Yaml() + def parser = new org.yaml.snakeyaml.Yaml() def channels = [] try { def config = parser.load("conda config --show channels".execute().text) channels = config.channels - } catch(NullPointerException | IOException e) { - log.warn "Could not verify conda channel configuration." - return + } + catch (NullPointerException e) { + log.warn("Could not verify conda channel configuration.") + return null + } + catch (IOException e) { + log.warn("Could not verify conda channel configuration.") + return null } // Check that all channels are present // This channel list is ordered by required channel priority. - def required_channels_in_order = ['conda-forge', 'bioconda', 'defaults'] + def required_channels_in_order = ['conda-forge', 'bioconda'] def channels_missing = ((required_channels_in_order as Set) - (channels as Set)) as Boolean // Check that they are in the right order - def channel_priority_violation = false - def n = required_channels_in_order.size() - for (int i = 0; i < n - 1; i++) { - channel_priority_violation |= !(channels.indexOf(required_channels_in_order[i]) < channels.indexOf(required_channels_in_order[i+1])) - } + def channel_priority_violation = required_channels_in_order != channels.findAll { ch -> ch in required_channels_in_order } if (channels_missing | channel_priority_violation) { - log.warn "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" + - " There is a problem with your Conda configuration!\n\n" + - " You will need to set-up the conda-forge and bioconda channels correctly.\n" + - " Please refer to https://bioconda.github.io/\n" + - " The observed channel order is \n" + - " ${channels}\n" + - " but the following channel order is required:\n" + - " ${required_channels_in_order}\n" + - "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" + log.warn """\ + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + There is a problem with your Conda configuration! + You will need to set-up the conda-forge and bioconda channels correctly. + Please refer to https://bioconda.github.io/ + The observed channel order is + ${channels} + but the following channel order is required: + ${required_channels_in_order} + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" + """.stripIndent(true) } } diff --git a/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config b/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config index d0a926b..a09572e 100644 --- a/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config +++ b/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config @@ -3,7 +3,7 @@ manifest { author = """nf-core""" homePage = 'https://127.0.0.1' description = """Dummy pipeline""" - nextflowVersion = '!>=23.04.0' + nextflowVersion = '!>=23.04.0' version = '9.9.9' doi = 'https://doi.org/10.5281/zenodo.5070524' } diff --git a/subworkflows/nf-core/utils_nfcore_pipeline/main.nf b/subworkflows/nf-core/utils_nfcore_pipeline/main.nf index a8b55d6..5cb7baf 100644 --- a/subworkflows/nf-core/utils_nfcore_pipeline/main.nf +++ b/subworkflows/nf-core/utils_nfcore_pipeline/main.nf @@ -2,17 +2,13 @@ // Subworkflow with utility functions specific to the nf-core pipeline template // -import org.yaml.snakeyaml.Yaml -import nextflow.extension.FilesEx - /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW DEFINITION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow UTILS_NFCORE_PIPELINE { - take: nextflow_cli_args @@ -25,23 +21,20 @@ workflow UTILS_NFCORE_PIPELINE { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // // Warn if a -profile or Nextflow config has not been provided to run the pipeline // def checkConfigProvided() { - valid_config = true + def valid_config = true as Boolean if (workflow.profile == 'standard' && workflow.configFiles.size() <= 1) { - log.warn "[$workflow.manifest.name] You are attempting to run the pipeline without any custom configuration!\n\n" + - "This will be dependent on your local compute environment but can be achieved via one or more of the following:\n" + - " (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" + - " (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" + - " (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" + - "Please refer to the quick start section and usage docs for the pipeline.\n " + log.warn( + "[${workflow.manifest.name}] You are attempting to run the pipeline without any custom configuration!\n\n" + "This will be dependent on your local compute environment but can be achieved via one or more of the following:\n" + " (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" + " (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" + " (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" + "Please refer to the quick start section and usage docs for the pipeline.\n " + ) valid_config = false } return valid_config @@ -52,12 +45,14 @@ def checkConfigProvided() { // def checkProfileProvided(nextflow_cli_args) { if (workflow.profile.endsWith(',')) { - error "The `-profile` option cannot end with a trailing comma, please remove it and re-run the pipeline!\n" + - "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + error( + "The `-profile` option cannot end with a trailing comma, please remove it and re-run the pipeline!\n" + "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + ) } if (nextflow_cli_args[0]) { - log.warn "nf-core pipelines do not accept positional arguments. The positional argument `${nextflow_cli_args[0]}` has been detected.\n" + - "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + log.warn( + "nf-core pipelines do not accept positional arguments. The positional argument `${nextflow_cli_args[0]}` has been detected.\n" + "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + ) } } @@ -65,20 +60,22 @@ def checkProfileProvided(nextflow_cli_args) { // Citation string for pipeline // def workflowCitation() { - return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + - "* The pipeline\n" + - " ${workflow.manifest.doi}\n\n" + - "* The nf-core framework\n" + - " https://doi.org/10.1038/s41587-020-0439-x\n\n" + - "* Software dependencies\n" + - " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" + def temp_doi_ref = "" + def manifest_doi = workflow.manifest.doi.tokenize(",") + // Handling multiple DOIs + // Removing `https://doi.org/` to handle pipelines using DOIs vs DOI resolvers + // Removing ` ` since the manifest.doi is a string and not a proper list + manifest_doi.each { doi_ref -> + temp_doi_ref += " https://doi.org/${doi_ref.replace('https://doi.org/', '').replace(' ', '')}\n" + } + return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + "* The pipeline\n" + temp_doi_ref + "\n" + "* The nf-core framework\n" + " https://doi.org/10.1038/s41587-020-0439-x\n\n" + "* Software dependencies\n" + " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" } // // Generate workflow version string // def getWorkflowVersion() { - String version_string = "" + def version_string = "" as String if (workflow.manifest.version) { def prefix_v = workflow.manifest.version[0] != 'v' ? 'v' : '' version_string += "${prefix_v}${workflow.manifest.version}" @@ -96,8 +93,8 @@ def getWorkflowVersion() { // Get software versions for pipeline // def processVersionsFromYAML(yaml_file) { - Yaml yaml = new Yaml() - versions = yaml.load(yaml_file).collectEntries { k, v -> [ k.tokenize(':')[-1], v ] } + def yaml = new org.yaml.snakeyaml.Yaml() + def versions = yaml.load(yaml_file).collectEntries { k, v -> [k.tokenize(':')[-1], v] } return yaml.dumpAsMap(versions).trim() } @@ -107,8 +104,8 @@ def processVersionsFromYAML(yaml_file) { def workflowVersionToYAML() { return """ Workflow: - $workflow.manifest.name: ${getWorkflowVersion()} - Nextflow: $workflow.nextflow.version + ${workflow.manifest.name}: ${getWorkflowVersion()} + Nextflow: ${workflow.nextflow.version} """.stripIndent().trim() } @@ -116,11 +113,7 @@ def workflowVersionToYAML() { // Get channel of software versions used in pipeline in YAML format // def softwareVersionsToYAML(ch_versions) { - return ch_versions - .unique() - .map { processVersionsFromYAML(it) } - .unique() - .mix(Channel.of(workflowVersionToYAML())) + return ch_versions.unique().map { version -> processVersionsFromYAML(version) }.unique().mix(Channel.of(workflowVersionToYAML())) } // @@ -128,25 +121,31 @@ def softwareVersionsToYAML(ch_versions) { // def paramsSummaryMultiqc(summary_params) { def summary_section = '' - for (group in summary_params.keySet()) { - def group_params = summary_params.get(group) // This gets the parameters of that particular group - if (group_params) { - summary_section += "

    $group

    \n" - summary_section += "
    \n" - for (param in group_params.keySet()) { - summary_section += "
    $param
    ${group_params.get(param) ?: 'N/A'}
    \n" + summary_params + .keySet() + .each { group -> + def group_params = summary_params.get(group) + // This gets the parameters of that particular group + if (group_params) { + summary_section += "

    ${group}

    \n" + summary_section += "
    \n" + group_params + .keySet() + .sort() + .each { param -> + summary_section += "
    ${param}
    ${group_params.get(param) ?: 'N/A'}
    \n" + } + summary_section += "
    \n" } - summary_section += "
    \n" } - } - String yaml_file_text = "id: '${workflow.manifest.name.replace('/','-')}-summary'\n" - yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" - yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" - yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" - yaml_file_text += "plot_type: 'html'\n" - yaml_file_text += "data: |\n" - yaml_file_text += "${summary_section}" + def yaml_file_text = "id: '${workflow.manifest.name.replace('/', '-')}-summary'\n" as String + yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" + yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" + yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" + yaml_file_text += "plot_type: 'html'\n" + yaml_file_text += "data: |\n" + yaml_file_text += "${summary_section}" return yaml_file_text } @@ -155,7 +154,7 @@ def paramsSummaryMultiqc(summary_params) { // nf-core logo // def nfCoreLogo(monochrome_logs=true) { - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map String.format( """\n ${dashedLine(monochrome_logs)} @@ -174,7 +173,7 @@ def nfCoreLogo(monochrome_logs=true) { // Return dashed line // def dashedLine(monochrome_logs=true) { - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map return "-${colors.dim}----------------------------------------------------${colors.reset}-" } @@ -182,7 +181,7 @@ def dashedLine(monochrome_logs=true) { // ANSII colours used for terminal logging // def logColours(monochrome_logs=true) { - Map colorcodes = [:] + def colorcodes = [:] as Map // Reset / Meta colorcodes['reset'] = monochrome_logs ? '' : "\033[0m" @@ -194,54 +193,54 @@ def logColours(monochrome_logs=true) { colorcodes['hidden'] = monochrome_logs ? '' : "\033[8m" // Regular Colors - colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" - colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" - colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" - colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" - colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" - colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" - colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" - colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" + colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" + colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" + colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" + colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" + colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" + colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" + colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" + colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" // Bold - colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" - colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" - colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" - colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" - colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" - colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" - colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" - colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" + colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" + colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" + colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" + colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" + colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" + colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" + colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" + colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" // Underline - colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" - colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" - colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" - colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" - colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" - colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" - colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" - colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" + colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" + colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" + colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" + colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" + colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" + colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" + colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" + colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" // High Intensity - colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" - colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" - colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" - colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" - colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" - colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" - colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" - colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" + colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" + colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" + colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" + colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" + colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" + colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" + colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" + colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" // Bold High Intensity - colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" - colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" - colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" - colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" - colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" - colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" - colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" - colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" + colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" + colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" + colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" + colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" + colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" + colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" + colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" + colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" return colorcodes } @@ -256,14 +255,15 @@ def attachMultiqcReport(multiqc_report) { mqc_report = multiqc_report.getVal() if (mqc_report.getClass() == ArrayList && mqc_report.size() >= 1) { if (mqc_report.size() > 1) { - log.warn "[$workflow.manifest.name] Found multiple reports from process 'MULTIQC', will use only one" + log.warn("[${workflow.manifest.name}] Found multiple reports from process 'MULTIQC', will use only one") } mqc_report = mqc_report[0] } } - } catch (all) { + } + catch (Exception all) { if (multiqc_report) { - log.warn "[$workflow.manifest.name] Could not attach MultiQC report to summary email" + log.warn("[${workflow.manifest.name}] Could not attach MultiQC report to summary email") } } return mqc_report @@ -275,26 +275,35 @@ def attachMultiqcReport(multiqc_report) { def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdir, monochrome_logs=true, multiqc_report=null) { // Set up the e-mail variables - def subject = "[$workflow.manifest.name] Successful: $workflow.runName" + def subject = "[${workflow.manifest.name}] Successful: ${workflow.runName}" if (!workflow.success) { - subject = "[$workflow.manifest.name] FAILED: $workflow.runName" + subject = "[${workflow.manifest.name}] FAILED: ${workflow.runName}" } def summary = [:] - for (group in summary_params.keySet()) { - summary << summary_params[group] - } + summary_params + .keySet() + .sort() + .each { group -> + summary << summary_params[group] + } def misc_fields = [:] misc_fields['Date Started'] = workflow.start misc_fields['Date Completed'] = workflow.complete misc_fields['Pipeline script file path'] = workflow.scriptFile misc_fields['Pipeline script hash ID'] = workflow.scriptId - if (workflow.repository) misc_fields['Pipeline repository Git URL'] = workflow.repository - if (workflow.commitId) misc_fields['Pipeline repository Git Commit'] = workflow.commitId - if (workflow.revision) misc_fields['Pipeline Git branch/tag'] = workflow.revision - misc_fields['Nextflow Version'] = workflow.nextflow.version - misc_fields['Nextflow Build'] = workflow.nextflow.build + if (workflow.repository) { + misc_fields['Pipeline repository Git URL'] = workflow.repository + } + if (workflow.commitId) { + misc_fields['Pipeline repository Git Commit'] = workflow.commitId + } + if (workflow.revision) { + misc_fields['Pipeline Git branch/tag'] = workflow.revision + } + misc_fields['Nextflow Version'] = workflow.nextflow.version + misc_fields['Nextflow Build'] = workflow.nextflow.build misc_fields['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp def email_fields = [:] @@ -332,39 +341,41 @@ def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdi // Render the sendmail template def max_multiqc_email_size = (params.containsKey('max_multiqc_email_size') ? params.max_multiqc_email_size : 0) as nextflow.util.MemoryUnit - def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "${workflow.projectDir}", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes() ] + def smail_fields = [email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "${workflow.projectDir}", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes()] def sf = new File("${workflow.projectDir}/assets/sendmail_template.txt") def sendmail_template = engine.createTemplate(sf).make(smail_fields) def sendmail_html = sendmail_template.toString() // Send the HTML e-mail - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map if (email_address) { try { - if (plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') } + if (plaintext_email) { +new org.codehaus.groovy.GroovyException('Send plaintext e-mail, not HTML') } // Try to send HTML e-mail using sendmail def sendmail_tf = new File(workflow.launchDir.toString(), ".sendmail_tmp.html") sendmail_tf.withWriter { w -> w << sendmail_html } - [ 'sendmail', '-t' ].execute() << sendmail_html - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (sendmail)-" - } catch (all) { + ['sendmail', '-t'].execute() << sendmail_html + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Sent summary e-mail to ${email_address} (sendmail)-") + } + catch (Exception all) { // Catch failures and try with plaintext - def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ] + def mail_cmd = ['mail', '-s', subject, '--content-type=text/html', email_address] mail_cmd.execute() << email_html - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (mail)-" + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Sent summary e-mail to ${email_address} (mail)-") } } // Write summary e-mail HTML to a file def output_hf = new File(workflow.launchDir.toString(), ".pipeline_report.html") output_hf.withWriter { w -> w << email_html } - FilesEx.copyTo(output_hf.toPath(), "${outdir}/pipeline_info/pipeline_report.html"); + nextflow.extension.FilesEx.copyTo(output_hf.toPath(), "${outdir}/pipeline_info/pipeline_report.html") output_hf.delete() // Write summary e-mail TXT to a file def output_tf = new File(workflow.launchDir.toString(), ".pipeline_report.txt") output_tf.withWriter { w -> w << email_txt } - FilesEx.copyTo(output_tf.toPath(), "${outdir}/pipeline_info/pipeline_report.txt"); + nextflow.extension.FilesEx.copyTo(output_tf.toPath(), "${outdir}/pipeline_info/pipeline_report.txt") output_tf.delete() } @@ -372,15 +383,17 @@ def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdi // Print pipeline summary on completion // def completionSummary(monochrome_logs=true) { - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map if (workflow.success) { if (workflow.stats.ignoredCount == 0) { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Pipeline completed successfully${colors.reset}-" - } else { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.yellow} Pipeline completed successfully, but with errored process(es) ${colors.reset}-" + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Pipeline completed successfully${colors.reset}-") + } + else { + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.yellow} Pipeline completed successfully, but with errored process(es) ${colors.reset}-") } - } else { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed with errors${colors.reset}-" + } + else { + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.red} Pipeline completed with errors${colors.reset}-") } } @@ -389,21 +402,30 @@ def completionSummary(monochrome_logs=true) { // def imNotification(summary_params, hook_url) { def summary = [:] - for (group in summary_params.keySet()) { - summary << summary_params[group] - } + summary_params + .keySet() + .sort() + .each { group -> + summary << summary_params[group] + } def misc_fields = [:] - misc_fields['start'] = workflow.start - misc_fields['complete'] = workflow.complete - misc_fields['scriptfile'] = workflow.scriptFile - misc_fields['scriptid'] = workflow.scriptId - if (workflow.repository) misc_fields['repository'] = workflow.repository - if (workflow.commitId) misc_fields['commitid'] = workflow.commitId - if (workflow.revision) misc_fields['revision'] = workflow.revision - misc_fields['nxf_version'] = workflow.nextflow.version - misc_fields['nxf_build'] = workflow.nextflow.build - misc_fields['nxf_timestamp'] = workflow.nextflow.timestamp + misc_fields['start'] = workflow.start + misc_fields['complete'] = workflow.complete + misc_fields['scriptfile'] = workflow.scriptFile + misc_fields['scriptid'] = workflow.scriptId + if (workflow.repository) { + misc_fields['repository'] = workflow.repository + } + if (workflow.commitId) { + misc_fields['commitid'] = workflow.commitId + } + if (workflow.revision) { + misc_fields['revision'] = workflow.revision + } + misc_fields['nxf_version'] = workflow.nextflow.version + misc_fields['nxf_build'] = workflow.nextflow.build + misc_fields['nxf_timestamp'] = workflow.nextflow.timestamp def msg_fields = [:] msg_fields['version'] = getWorkflowVersion() @@ -428,13 +450,13 @@ def imNotification(summary_params, hook_url) { def json_message = json_template.toString() // POST - def post = new URL(hook_url).openConnection(); + def post = new URL(hook_url).openConnection() post.setRequestMethod("POST") post.setDoOutput(true) post.setRequestProperty("Content-Type", "application/json") - post.getOutputStream().write(json_message.getBytes("UTF-8")); - def postRC = post.getResponseCode(); - if (! postRC.equals(200)) { - log.warn(post.getErrorStream().getText()); + post.getOutputStream().write(json_message.getBytes("UTF-8")) + def postRC = post.getResponseCode() + if (!postRC.equals(200)) { + log.warn(post.getErrorStream().getText()) } } diff --git a/subworkflows/nf-core/utils_nfschema_plugin/main.nf b/subworkflows/nf-core/utils_nfschema_plugin/main.nf new file mode 100644 index 0000000..4994303 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/main.nf @@ -0,0 +1,46 @@ +// +// Subworkflow that uses the nf-schema plugin to validate parameters and render the parameter summary +// + +include { paramsSummaryLog } from 'plugin/nf-schema' +include { validateParameters } from 'plugin/nf-schema' + +workflow UTILS_NFSCHEMA_PLUGIN { + + take: + input_workflow // workflow: the workflow object used by nf-schema to get metadata from the workflow + validate_params // boolean: validate the parameters + parameters_schema // string: path to the parameters JSON schema. + // this has to be the same as the schema given to `validation.parametersSchema` + // when this input is empty it will automatically use the configured schema or + // "${projectDir}/nextflow_schema.json" as default. This input should not be empty + // for meta pipelines + + main: + + // + // Print parameter summary to stdout. This will display the parameters + // that differ from the default given in the JSON schema + // + if(parameters_schema) { + log.info paramsSummaryLog(input_workflow, parameters_schema:parameters_schema) + } else { + log.info paramsSummaryLog(input_workflow) + } + + // + // Validate the parameters using nextflow_schema.json or the schema + // given via the validation.parametersSchema configuration option + // + if(validate_params) { + if(parameters_schema) { + validateParameters(parameters_schema:parameters_schema) + } else { + validateParameters() + } + } + + emit: + dummy_emit = true +} + diff --git a/subworkflows/nf-core/utils_nfschema_plugin/meta.yml b/subworkflows/nf-core/utils_nfschema_plugin/meta.yml new file mode 100644 index 0000000..f7d9f02 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/meta.yml @@ -0,0 +1,35 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json +name: "utils_nfschema_plugin" +description: Run nf-schema to validate parameters and create a summary of changed parameters +keywords: + - validation + - JSON schema + - plugin + - parameters + - summary +components: [] +input: + - input_workflow: + type: object + description: | + The workflow object of the used pipeline. + This object contains meta data used to create the params summary log + - validate_params: + type: boolean + description: Validate the parameters and error if invalid. + - parameters_schema: + type: string + description: | + Path to the parameters JSON schema. + This has to be the same as the schema given to the `validation.parametersSchema` config + option. When this input is empty it will automatically use the configured schema or + "${projectDir}/nextflow_schema.json" as default. The schema should not be given in this way + for meta pipelines. +output: + - dummy_emit: + type: boolean + description: Dummy emit to make nf-core subworkflows lint happy +authors: + - "@nvnieuwk" +maintainers: + - "@nvnieuwk" diff --git a/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test b/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test new file mode 100644 index 0000000..842dc43 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test @@ -0,0 +1,117 @@ +nextflow_workflow { + + name "Test Subworkflow UTILS_NFSCHEMA_PLUGIN" + script "../main.nf" + workflow "UTILS_NFSCHEMA_PLUGIN" + + tag "subworkflows" + tag "subworkflows_nfcore" + tag "subworkflows/utils_nfschema_plugin" + tag "plugin/nf-schema" + + config "./nextflow.config" + + test("Should run nothing") { + + when { + + params { + test_data = '' + } + + workflow { + """ + validate_params = false + input[0] = workflow + input[1] = validate_params + input[2] = "" + """ + } + } + + then { + assertAll( + { assert workflow.success } + ) + } + } + + test("Should validate params") { + + when { + + params { + test_data = '' + outdir = 1 + } + + workflow { + """ + validate_params = true + input[0] = workflow + input[1] = validate_params + input[2] = "" + """ + } + } + + then { + assertAll( + { assert workflow.failed }, + { assert workflow.stdout.any { it.contains('ERROR ~ Validation of pipeline parameters failed!') } } + ) + } + } + + test("Should run nothing - custom schema") { + + when { + + params { + test_data = '' + } + + workflow { + """ + validate_params = false + input[0] = workflow + input[1] = validate_params + input[2] = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + """ + } + } + + then { + assertAll( + { assert workflow.success } + ) + } + } + + test("Should validate params - custom schema") { + + when { + + params { + test_data = '' + outdir = 1 + } + + workflow { + """ + validate_params = true + input[0] = workflow + input[1] = validate_params + input[2] = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + """ + } + } + + then { + assertAll( + { assert workflow.failed }, + { assert workflow.stdout.any { it.contains('ERROR ~ Validation of pipeline parameters failed!') } } + ) + } + } +} diff --git a/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config new file mode 100644 index 0000000..0907ac5 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config @@ -0,0 +1,8 @@ +plugins { + id "nf-schema@2.1.0" +} + +validation { + parametersSchema = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + monochromeLogs = true +} \ No newline at end of file diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json similarity index 95% rename from subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json rename to subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json index 7626c1c..331e0d2 100644 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json @@ -1,10 +1,10 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/./master/nextflow_schema.json", "title": ". pipeline parameters", "description": "", "type": "object", - "definitions": { + "$defs": { "input_output_options": { "title": "Input/output options", "type": "object", @@ -87,10 +87,10 @@ }, "allOf": [ { - "$ref": "#/definitions/input_output_options" + "$ref": "#/$defs/input_output_options" }, { - "$ref": "#/definitions/generic_options" + "$ref": "#/$defs/generic_options" } ] } diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf b/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf deleted file mode 100644 index 2585b65..0000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf +++ /dev/null @@ -1,62 +0,0 @@ -// -// Subworkflow that uses the nf-validation plugin to render help text and parameter summary -// - -/* -======================================================================================== - IMPORT NF-VALIDATION PLUGIN -======================================================================================== -*/ - -include { paramsHelp } from 'plugin/nf-validation' -include { paramsSummaryLog } from 'plugin/nf-validation' -include { validateParameters } from 'plugin/nf-validation' - -/* -======================================================================================== - SUBWORKFLOW DEFINITION -======================================================================================== -*/ - -workflow UTILS_NFVALIDATION_PLUGIN { - - take: - print_help // boolean: print help - workflow_command // string: default commmand used to run pipeline - pre_help_text // string: string to be printed before help text and summary log - post_help_text // string: string to be printed after help text and summary log - validate_params // boolean: validate parameters - schema_filename // path: JSON schema file, null to use default value - - main: - - log.debug "Using schema file: ${schema_filename}" - - // Default values for strings - pre_help_text = pre_help_text ?: '' - post_help_text = post_help_text ?: '' - workflow_command = workflow_command ?: '' - - // - // Print help message if needed - // - if (print_help) { - log.info pre_help_text + paramsHelp(workflow_command, parameters_schema: schema_filename) + post_help_text - System.exit(0) - } - - // - // Print parameter summary to stdout - // - log.info pre_help_text + paramsSummaryLog(workflow, parameters_schema: schema_filename) + post_help_text - - // - // Validate parameters relative to the parameter JSON schema - // - if (validate_params){ - validateParameters(parameters_schema: schema_filename) - } - - emit: - dummy_emit = true -} diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml b/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml deleted file mode 100644 index 3d4a6b0..0000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml +++ /dev/null @@ -1,44 +0,0 @@ -# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json -name: "UTILS_NFVALIDATION_PLUGIN" -description: Use nf-validation to initiate and validate a pipeline -keywords: - - utility - - pipeline - - initialise - - validation -components: [] -input: - - print_help: - type: boolean - description: | - Print help message and exit - - workflow_command: - type: string - description: | - The command to run the workflow e.g. "nextflow run main.nf" - - pre_help_text: - type: string - description: | - Text to print before the help message - - post_help_text: - type: string - description: | - Text to print after the help message - - validate_params: - type: boolean - description: | - Validate the parameters and error if invalid. - - schema_filename: - type: string - description: | - The filename of the schema to validate against. -output: - - dummy_emit: - type: boolean - description: | - Dummy emit to make nf-core subworkflows lint happy -authors: - - "@adamrtalbot" -maintainers: - - "@adamrtalbot" - - "@maxulysse" diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test b/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test deleted file mode 100644 index 5784a33..0000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test +++ /dev/null @@ -1,200 +0,0 @@ -nextflow_workflow { - - name "Test Workflow UTILS_NFVALIDATION_PLUGIN" - script "../main.nf" - workflow "UTILS_NFVALIDATION_PLUGIN" - tag "subworkflows" - tag "subworkflows_nfcore" - tag "plugin/nf-validation" - tag "'plugin/nf-validation'" - tag "utils_nfvalidation_plugin" - tag "subworkflows/utils_nfvalidation_plugin" - - test("Should run nothing") { - - when { - - params { - monochrome_logs = true - test_data = '' - } - - workflow { - """ - help = false - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success } - ) - } - } - - test("Should run help") { - - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } } - ) - } - } - - test("Should run help with command") { - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = "nextflow run noorg/doesntexist" - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('nextflow run noorg/doesntexist') } }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } } - ) - } - } - - test("Should run help with extra text") { - - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = "nextflow run noorg/doesntexist" - pre_help_text = "pre-help-text" - post_help_text = "post-help-text" - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('pre-help-text') } }, - { assert workflow.stdout.any { it.contains('nextflow run noorg/doesntexist') } }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } }, - { assert workflow.stdout.any { it.contains('post-help-text') } } - ) - } - } - - test("Should validate params") { - - when { - - params { - monochrome_logs = true - test_data = '' - outdir = 1 - } - workflow { - """ - help = false - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = true - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.failed }, - { assert workflow.stdout.any { it.contains('ERROR ~ ERROR: Validation of pipeline parameters failed!') } } - ) - } - } -} diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml b/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml deleted file mode 100644 index 60b1cff..0000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml +++ /dev/null @@ -1,2 +0,0 @@ -subworkflows/utils_nfvalidation_plugin: - - subworkflows/nf-core/utils_nfvalidation_plugin/** diff --git a/workflows/detaxizer.nf b/workflows/detaxizer.nf index 8c76ee6..af77419 100644 --- a/workflows/detaxizer.nf +++ b/workflows/detaxizer.nf @@ -3,96 +3,96 @@ IMPORT MODULES / SUBWORKFLOWS / FUNCTIONS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ - - -include { FASTQC } from '../modules/nf-core/fastqc/main' -include { MULTIQC } from '../modules/nf-core/multiqc/main' -include { paramsSummaryMap } from 'plugin/nf-validation' -include { paramsSummaryMultiqc } from '../subworkflows/nf-core/utils_nfcore_pipeline' -include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline' -include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_detaxizer_pipeline' -include { getGenomeAttribute } from '../subworkflows/local/utils_nfcore_detaxizer_pipeline' - -include { FASTP } from '../modules/nf-core/fastp/main' -include { KRAKEN2_KRAKEN2 } from '../modules/nf-core/kraken2/kraken2/main' -include { BLAST_BLASTN } from '../modules/nf-core/blast/blastn/main' -include { BLAST_MAKEBLASTDB } from '../modules/nf-core/blast/makeblastdb/main' - -include { RENAME_FASTQ_HEADERS_PRE } from '../modules/local/rename_fastq_headers_pre' -include { KRAKEN2PREPARATION } from '../modules/local/kraken2preparation' -include { PARSE_KRAKEN2REPORT } from '../modules/local/parse_kraken2report' -include { ISOLATE_IDS_FROM_KRAKEN2_TO_BLASTN } from '../modules/local/isolate_ids_from_kraken2_to_blastn' -include { PREPARE_FASTA4BLASTN } from '../modules/local/prepare_fasta4blastn' -include { FILTER_BLASTN_IDENTCOV } from '../modules/local/filter_blastn_identcov' -include { FILTER } from '../modules/local/filter' -include { RENAME_FASTQ_HEADERS_AFTER } from '../modules/local/rename_fastq_headers_after' -include { SUMMARY_KRAKEN2 } from '../modules/local/summary_kraken2' -include { SUMMARY_BLASTN } from '../modules/local/summary_blastn' -include { SUMMARIZER } from '../modules/local/summarizer' +include { FASTQC } from '../modules/nf-core/fastqc/main' +include { MULTIQC } from '../modules/nf-core/multiqc/main' +include { paramsSummaryMap } from 'plugin/nf-schema' +include { paramsSummaryMultiqc } from '../subworkflows/nf-core/utils_nfcore_pipeline' +include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline' +include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_detaxizer_pipeline' +include { getGenomeAttribute } from '../subworkflows/local/utils_nfcore_detaxizer_pipeline' +include { GENERATE_DOWNSTREAM_SAMPLESHEETS } from '../subworkflows/local/generate_downstream_samplesheets/main.nf' + +include { FASTP } from '../modules/nf-core/fastp/main' +include { KRAKEN2_KRAKEN2 as KRAKEN2_KRAKEN2 } from '../modules/nf-core/kraken2/kraken2/main' +include { KRAKEN2_KRAKEN2 as KRAKEN2_POST_CLASSIFICATION_FILTERED } from '../modules/nf-core/kraken2/kraken2/main' +include { KRAKEN2_KRAKEN2 as KRAKEN2_POST_CLASSIFICATION_REMOVED } from '../modules/nf-core/kraken2/kraken2/main' +include { BBMAP_BBDUK } from '../modules/nf-core/bbmap/bbduk/main' +include { BLAST_BLASTN } from '../modules/nf-core/blast/blastn/main' +include { BLAST_MAKEBLASTDB } from '../modules/nf-core/blast/makeblastdb/main' + +include { RENAME_FASTQ_HEADERS_PRE } from '../modules/local/rename_fastq_headers_pre' +include { KRAKEN2PREPARATION } from '../modules/local/kraken2preparation' +include { PARSE_KRAKEN2REPORT } from '../modules/local/parse_kraken2report' +include { ISOLATE_KRAKEN2_IDS } from '../modules/local/isolate_kraken2_ids' +include { ISOLATE_BBDUK_IDS } from '../modules/local/isolate_bbduk_ids' +include { MERGE_IDS } from '../modules/local/merge_ids' +include { PREPARE_FASTA4BLASTN } from '../modules/local/prepare_fasta4blastn' +include { FILTER_BLASTN_IDENTCOV } from '../modules/local/filter_blastn_identcov' +include { FILTER } from '../modules/local/filter' +include { RENAME_FASTQ_HEADERS_AFTER } from '../modules/local/rename_fastq_headers_after' +include { SUMMARY_CLASSIFICATION } from '../modules/local/summary_classification' +include { SUMMARY_BLASTN } from '../modules/local/summary_blastn' +include { SUMMARIZER } from '../modules/local/summarizer' /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ RUN MAIN WORKFLOW ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -// speficy the fasta channel if it is not provided via --fasta -def fasta = Channel.empty() +// specify the ch_fasta_blastn channel if it is not provided via --fasta_blastn +def ch_fasta_blastn = Channel.empty() + +if ( !params.fasta_blastn && params.validation_blastn ) { + ch_fasta_blastn = Channel.fromPath(getGenomeAttribute('fasta')) +} else if ( params.validation_blastn ){ + // If params.fasta_blastn is there, use it for the creation of the blastn database + ch_fasta_blastn = Channel.fromPath(params.fasta_blastn) +} + +// specify the ch_fasta_bbduk channel if it is not provided via --fasta_bbduk -if (!params.fasta && !params.skip_blastn) { - fasta = Channel.fromPath(getGenomeAttribute('fasta')) -} else if (!params.skip_blastn){ - // If params.fasta is there, use it for the creation of the blastn database - fasta = Channel.fromPath(params.fasta) +def ch_fasta_bbduk = Channel.empty() + +if ( !params.fasta_bbduk && params.classification_bbduk ) { + ch_fasta_bbduk = Channel.fromPath(getGenomeAttribute('fasta')) +} else if ( params.classification_bbduk ){ + // If params.fasta_bbduk is there, use it for the creation of the blastn database + ch_fasta_bbduk = Channel.fromPath(params.fasta_bbduk) } -workflow DETAXIZER { +workflow NFCORE_DETAXIZER { take: ch_samplesheet // channel: samplesheet read in from --input - main: ch_versions = Channel.empty() ch_multiqc_files = Channel.empty() - ch_samplesheet.branch { + ch_short = ch_samplesheet.branch { shortReads: it[1] - }.set { - ch_short - } - - ch_short.shortReads.map{ + }.shortReads.map{ meta, short_reads_fastq_1, short_reads_fastq_2, long_reads_fastq_1 -> if (short_reads_fastq_2){ return [meta + [ single_end: false, long_reads: false , amount_of_files: 2 ], [ short_reads_fastq_1, short_reads_fastq_2 ] ] } else { return [meta + [ id: "${meta.id}_R1", single_end: true, long_reads: false, amount_of_files: 1 ], short_reads_fastq_1 ] } - }.set{ - ch_short } - ch_samplesheet.branch { + ch_long = ch_samplesheet.branch { longReads: it[3] - }.set { - ch_long - } - - ch_long.longReads.map { + }.longReads.map { meta, short_reads_fastq_1, short_reads_fastq_2, long_reads_fastq_1 -> return [meta + [ id: "${meta.id}_longReads", single_end: true, long_reads: true, amount_of_files: 1 ], long_reads_fastq_1 ] - }.set { - ch_long } ch_short_long = ch_short.mix(ch_long) - // // MODULE: Rename Fastq headers // RENAME_FASTQ_HEADERS_PRE(ch_short_long) - // // MODULE: Run FastQC // @@ -105,95 +105,164 @@ workflow DETAXIZER { // // MODULE: Run fastp // + if (params.preprocessing) { + FASTP ( RENAME_FASTQ_HEADERS_PRE.out.fastq, [], + [], params.fastp_save_trimmed_fail, [] ) + + ch_fastq_for_classification = FASTP.out.reads ch_versions = ch_versions.mix(FASTP.out.versions.first()) + } else { + ch_fastq_for_classification = RENAME_FASTQ_HEADERS_PRE.out.fastq + } + ////////////////////////////////////////////////// + // Classification + ////////////////////////////////////////////////// - // - // MODULE: Prepare Kraken2 Database - // - ch_kraken2_db = Channel.fromPath(params.kraken2db).map { - item -> [['id': "kraken2_db"], item] - } - KRAKEN2PREPARATION ( - ch_kraken2_db - ) - ch_versions = ch_versions.mix(KRAKEN2PREPARATION.out.versions) + if ( params.classification_kraken2_post_filtering || (!params.classification_kraken2 && !params.classification_bbduk) || (params.classification_kraken2) ){ + // + // MODULE: Prepare Kraken2 Database + // + ch_kraken2_db = Channel.fromPath(params.kraken2db).map { + item -> [['id': "kraken2_db"], item] + } - // - // MODULE: Run Kraken2 - // + KRAKEN2PREPARATION ( + ch_kraken2_db + ) + ch_versions = ch_versions.mix(KRAKEN2PREPARATION.out.versions.first()) + } - KRAKEN2_KRAKEN2 ( - FASTP.out.reads, - KRAKEN2PREPARATION.out.db.first(), - params.save_output_fastqs, - true - ) - ch_versions = ch_versions.mix(KRAKEN2_KRAKEN2.out.versions.first()) - // - // MODULE: Parse the taxonomy from the kraken2 report and return all subclasses of the tax2filter - // - PARSE_KRAKEN2REPORT( - KRAKEN2_KRAKEN2.out.report.take(1) - ) - ch_versions = ch_versions.mix(PARSE_KRAKEN2REPORT.out.versions) + if ((!params.classification_bbduk && !params.classification_kraken2) || (params.classification_kraken2)) { - // - // MODULE: Isolate the hits for a certain taxa and subclasses - // - ch_parsed_kraken2_report = PARSE_KRAKEN2REPORT.out.to_filter.map {meta, path -> path} + // + // MODULE: Run Kraken2 + // + KRAKEN2_KRAKEN2 ( + ch_fastq_for_classification, + KRAKEN2PREPARATION.out.db.first(), + params.save_output_fastqs, + true + ) + ch_versions = ch_versions.mix(KRAKEN2_KRAKEN2.out.versions.first()) - KRAKEN2_KRAKEN2.out.classified_reads_assignment.combine(ch_parsed_kraken2_report).set{ ch_combined } + // + // MODULE: Parse the taxonomy from the kraken2 report and return all subclasses of the tax2filter + // + PARSE_KRAKEN2REPORT( + KRAKEN2_KRAKEN2.out.report.take(1) + ) + ch_versions = ch_versions.mix(PARSE_KRAKEN2REPORT.out.versions) - ISOLATE_IDS_FROM_KRAKEN2_TO_BLASTN ( - ch_combined - ) + // + // MODULE: Isolate the hits for a certain taxa and subclasses + // + ch_parsed_kraken2_report = PARSE_KRAKEN2REPORT.out.to_filter.map {meta, path -> path} - ch_versions = ch_versions.mix(ISOLATE_IDS_FROM_KRAKEN2_TO_BLASTN.out.versions.first()) + ch_combined = KRAKEN2_KRAKEN2.out.classified_reads_assignment.combine(ch_parsed_kraken2_report) + + ISOLATE_KRAKEN2_IDS ( + ch_combined + ) + + ch_versions = ch_versions.mix(ISOLATE_KRAKEN2_IDS.out.versions.first()) - // - // MODULE: Summarize the kraken2 results and the isolated kraken2 hits - // - ch_prepare_summary_kraken2 = KRAKEN2_KRAKEN2.out.classified_reads_assignment.join(ISOLATE_IDS_FROM_KRAKEN2_TO_BLASTN.out.classified).map { - meta, path1, path2 -> - return [ meta, [ path1, path2 ] ] } - ch_combined_kraken2 = ch_prepare_summary_kraken2.map { - meta, path -> - return [ meta +[ id: meta.id.replaceAll("(_R1|_R2)", "") ] , path] - } - .map { - meta, path -> - path = path.flatten() - return [meta, path] + if (params.classification_bbduk) { + + // + // MODULE: Run bbduk + // + BBMAP_BBDUK ( + ch_fastq_for_classification, + ch_fasta_bbduk.first() + ) + ch_versions = ch_versions.mix(BBMAP_BBDUK.out.versions.first()) + + // + // MODULE: Run ISOLATE_BBDUK_IDS + // + ISOLATE_BBDUK_IDS( + BBMAP_BBDUK.out.contaminated_reads + ) + ch_versions = ch_versions.mix(ISOLATE_BBDUK_IDS.out.versions.first()) + + + } + + // Prepare MERGE_IDS Channel (with or without merging of IDs) + + if (params.classification_bbduk && params.classification_kraken2){ + + // + // MODULE: Merge IDs + // + MERGE_IDS( + ISOLATE_KRAKEN2_IDS.out.classified_ids.join( + ISOLATE_BBDUK_IDS.out.classified_ids, by: [0] + ).map{ + meta, path1, path2 -> + [meta,[path1,path2]] } + ) + + + } else if (params.classification_bbduk && !params.classification_kraken2){ + + // + // MODULE: Merge IDs + // + MERGE_IDS( + ISOLATE_BBDUK_IDS.out.classified_ids + ) + + + } else if (params.classification_kraken2 || (!params.classification_kraken2 && !params.classification_bbduk)){ - ch_kraken2_summary = SUMMARY_KRAKEN2( - ch_combined_kraken2 + // + // MODULE: Merge IDs + // + MERGE_IDS( + ISOLATE_KRAKEN2_IDS.out.classified_ids ) - ch_versions = ch_versions.mix(ch_kraken2_summary.versions.first()) + } + + ch_versions = ch_versions.mix(MERGE_IDS.out.versions.first()) + + // + // MODULE: Summarize the classification results + // + + SUMMARY_CLASSIFICATION( + MERGE_IDS.out.classified_ids + ) // Drop meta of kraken2_summary as it is not needed for the combination step of summarizer - ch_kraken2_summary = ch_kraken2_summary.summary.map { + ch_classification_summary = SUMMARY_CLASSIFICATION.out.summary.map { meta, path -> [path] - } + } + ch_versions = ch_versions.mix(SUMMARY_CLASSIFICATION.out.versions.first()) + + ////////////////////////////////////////////////// + // Validation + ////////////////////////////////////////////////// - if (!params.skip_blastn) { + if (params.validation_blastn) { // // MODULE: Extract the hits to fasta format // - ch_combined = FASTP.out.reads + ch_combined = ch_fastq_for_classification .join( - ISOLATE_IDS_FROM_KRAKEN2_TO_BLASTN.out.classified_ids, by: [0] + MERGE_IDS.out.classified_ids, by: [0] ) @@ -206,7 +275,7 @@ workflow DETAXIZER { // // MODULE: Run BLASTN // - ch_reference_fasta = fasta + ch_reference_fasta = ch_fasta_blastn ch_reference_fasta_with_meta = ch_reference_fasta.map { item -> [['id': "id-fasta-for-makeblastdb"], item] @@ -289,7 +358,6 @@ workflow DETAXIZER { } return [ meta, blastn[0], blastn[1], filteredblastn[0], filteredblastn[1] ] } - ch_blastn_summary = SUMMARY_BLASTN ( ch_blastn_combined ) @@ -306,19 +374,23 @@ workflow DETAXIZER { // if ( ( - ( params.skip_blastn && params.enable_filter ) || params.filter_with_kraken2 + ( !params.validation_blastn && params.enable_filter ) || params.filter_with_classification ) && !params.filter_trimmed ) { - ch_kraken2filter = RENAME_FASTQ_HEADERS_PRE.out.fastq - .join(ISOLATE_IDS_FROM_KRAKEN2_TO_BLASTN.out.classified_ids, by:[0]) + + ch_classification = RENAME_FASTQ_HEADERS_PRE.out.fastq + .join(MERGE_IDS.out.classified_ids, by:[0]) + FILTER( - ch_kraken2filter + ch_classification ) + ch_versions = ch_versions.mix(FILTER.out.versions.first()) } else if ( params.enable_filter && !params.filter_trimmed ) { + ch_blastn2filter = FILTER_BLASTN_IDENTCOV.out.classified_ids.map { meta, path -> return [ meta + [ id: meta.id.replaceAll("(_R1|_R2)", "") ], path ] @@ -327,31 +399,40 @@ workflow DETAXIZER { meta, path -> tuple(groupKey(meta, meta.amount_of_files), path) } .groupTuple(by:[0]) + ch_combined_short_long_id = RENAME_FASTQ_HEADERS_PRE.out.fastq.map { meta, path -> return [ meta + [ id: meta.id.replaceAll("(_R1|_R2)", "") ], path ] } + ch_blastnfilter = ch_combined_short_long_id.join( ch_blastn2filter, by:[0] ) + FILTER( ch_blastnfilter ) + ch_versions = ch_versions.mix(FILTER.out.versions.first()) + } else if ( ( - ( params.skip_blastn && params.enable_filter ) || params.filter_with_kraken2 + ( !params.validation_blastn && params.enable_filter ) || params.filter_with_classification ) && params.filter_trimmed ){ - ch_kraken2filter = FASTP.out.reads - .join(ISOLATE_IDS_FROM_KRAKEN2_TO_BLASTN.out.classified_ids, by:[0]) + + ch_classification = ch_fastq_for_classification + .join(MERGE_IDS.out.classified_ids, by:[0]) + FILTER( - ch_kraken2filter + ch_classification ) + ch_versions = ch_versions.mix(FILTER.out.versions.first()) } else if ( params.enable_filter && params.filter_trimmed ){ + ch_blastn2filter = FILTER_BLASTN_IDENTCOV.out.classified_ids.map { meta, path -> return [ meta + [ id: meta.id.replaceAll("(_R1|_R2)", "") ], path ] @@ -361,16 +442,19 @@ workflow DETAXIZER { } .groupTuple(by:[0]) - ch_combined_short_long_id = FASTP.out.reads.map { + ch_combined_short_long_id = ch_fastq_for_classification.map { meta, path -> return [ meta + [ id: meta.id.replaceAll("(_R1|_R2)", "") ], path ] } + ch_blastnfilter = ch_combined_short_long_id.join( ch_blastn2filter, by:[0] ) + FILTER( ch_blastnfilter ) + ch_versions = ch_versions.mix(FILTER.out.versions.first()) } @@ -378,6 +462,7 @@ workflow DETAXIZER { // MODULE: Rename headers after filtering // if ( params.enable_filter ) { + ch_headers = RENAME_FASTQ_HEADERS_PRE.out.headers.map { meta, path -> return [ meta + [ id: meta.id.replaceAll("(_R1|_R2)", "") ], path ] @@ -388,61 +473,151 @@ workflow DETAXIZER { return [ meta + [ id: meta.id.replaceAll("(_R1|_R2)", "") ], path ] } + ch_removed2rename = Channel.empty() + + if ( params.output_removed_reads ){ + ch_removed2rename = FILTER.out.removed.map { + meta, path -> + return [ meta + [ id: meta.id.replaceAll("(_R1|_R2)", "") ], path ] + } + } + ch_rename_filtered = ch_filtered2rename.join(ch_headers, by:[0]) - RENAME_FASTQ_HEADERS_AFTER( - ch_rename_filtered + ch_removed2rename = ch_removed2rename.ifEmpty(['empty', []]) + + if ( params.output_removed_reads ){ + + RENAME_FASTQ_HEADERS_AFTER( + ch_rename_filtered, + ch_removed2rename + ) + + ch_versions = ch_versions.mix(RENAME_FASTQ_HEADERS_AFTER.out.versions.first()) + + } else { + + RENAME_FASTQ_HEADERS_AFTER( + ch_rename_filtered, + ch_removed2rename.first() + ) + + ch_versions = ch_versions.mix(RENAME_FASTQ_HEADERS_AFTER.out.versions.first()) + + } + + if ( params.classification_kraken2_post_filtering ) { + + KRAKEN2_POST_CLASSIFICATION_FILTERED ( + RENAME_FASTQ_HEADERS_AFTER.out.fastq, + KRAKEN2PREPARATION.out.db.first(), + params.save_output_fastqs_filtered, + true + ) + + ch_versions = ch_versions.mix(KRAKEN2_POST_CLASSIFICATION_FILTERED.out.versions.first()) + + if (params.output_removed_reads) { + + KRAKEN2_POST_CLASSIFICATION_REMOVED ( + RENAME_FASTQ_HEADERS_AFTER.out.fastq_removed, + KRAKEN2PREPARATION.out.db.first(), + params.save_output_fastqs_removed, + true + ) + + ch_versions = ch_versions.mix(KRAKEN2_POST_CLASSIFICATION_REMOVED.out.versions.first()) + + } + + } } + // // MODULE: Summarize the classification process // - if (!params.skip_blastn){ - ch_summary = ch_kraken2_summary.mix(ch_blastn_summary).collect().map { - item -> [['id': "summary_of_kraken2_and_blastn"], item] + if (params.validation_blastn){ + + ch_summary = ch_classification_summary.mix(ch_blastn_summary).collect().map { + item -> [['id': "summary_of_classification_and_blastn"], item] } + } else { - ch_summary = ch_kraken2_summary.collect().map { - item -> [['id': "summary_of_kraken2"], item] + + ch_summary = ch_classification_summary.collect().map { + item -> [['id': "summary_of_classification"], item] } + } ch_summary = SUMMARIZER ( ch_summary ) + ch_versions = ch_versions.mix(ch_summary.versions) + if ( params.generate_downstream_samplesheets ) { + + GENERATE_DOWNSTREAM_SAMPLESHEETS ( RENAME_FASTQ_HEADERS_AFTER.out.fastq ) + + } + + // // Collate and save software versions // softwareVersionsToYAML(ch_versions) - .collectFile(storeDir: "${params.outdir}/pipeline_info", name: 'nf_core_pipeline_software_mqc_versions.yml', sort: true, newLine: true) - .set { ch_collated_versions } + .collectFile( + storeDir: "${params.outdir}/pipeline_info", + name: 'nf_core_' + 'pipeline_software_' + 'mqc_' + 'versions.yml', + sort: true, + newLine: true + ).set { ch_collated_versions } // // MODULE: MultiQC // - ch_multiqc_config = Channel.fromPath("$projectDir/assets/multiqc_config.yml", checkIfExists: true) - ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config, checkIfExists: true) : Channel.empty() - ch_multiqc_logo = params.multiqc_logo ? Channel.fromPath(params.multiqc_logo, checkIfExists: true) : Channel.empty() - summary_params = paramsSummaryMap(workflow, parameters_schema: "nextflow_schema.json") - ch_workflow_summary = Channel.value(paramsSummaryMultiqc(summary_params)) - ch_multiqc_custom_methods_description = params.multiqc_methods_description ? file(params.multiqc_methods_description, checkIfExists: true) : file("$projectDir/assets/methods_description_template.yml", checkIfExists: true) - ch_methods_description = Channel.value(methodsDescriptionText(ch_multiqc_custom_methods_description)) - ch_multiqc_files = ch_multiqc_files.mix(ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')) - ch_multiqc_files = ch_multiqc_files.mix(ch_collated_versions) - ch_multiqc_files = ch_multiqc_files.mix(ch_methods_description.collectFile(name: 'methods_description_mqc.yaml', sort: false)) + ch_multiqc_config = Channel.fromPath( + "$projectDir/assets/multiqc_config.yml", checkIfExists: true) + ch_multiqc_custom_config = params.multiqc_config ? + Channel.fromPath(params.multiqc_config, checkIfExists: true) : + Channel.empty() + ch_multiqc_logo = params.multiqc_logo ? + Channel.fromPath(params.multiqc_logo, checkIfExists: true) : + Channel.empty() + + summary_params = paramsSummaryMap( + workflow, parameters_schema: "nextflow_schema.json") + ch_workflow_summary = Channel.value(paramsSummaryMultiqc(summary_params)) + ch_multiqc_files = ch_multiqc_files.mix( + ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')) + ch_multiqc_custom_methods_description = params.multiqc_methods_description ? + file(params.multiqc_methods_description, checkIfExists: true) : + file("$projectDir/assets/methods_description_template.yml", checkIfExists: true) + ch_methods_description = Channel.value( + methodsDescriptionText(ch_multiqc_custom_methods_description)) + + ch_multiqc_files = ch_multiqc_files.mix(ch_collated_versions) + ch_multiqc_files = ch_multiqc_files.mix( + ch_methods_description.collectFile( + name: 'methods_description_mqc.yaml', + sort: true + ) + ) MULTIQC ( ch_multiqc_files.collect(), ch_multiqc_config.toList(), ch_multiqc_custom_config.toList(), - ch_multiqc_logo.toList() + ch_multiqc_logo.toList(), + [], + [] ) - emit: - multiqc_report = MULTIQC.out.report.toList() // channel: /path/to/multiqc_report.html + emit:multiqc_report = MULTIQC.out.report.toList() // channel: /path/to/multiqc_report.html versions = ch_versions // channel: [ path(versions.yml) ] + } /*