Skip to content

Accelerate SLURM job state queries with --only-job-state option to squeue #6570

@tcutts

Description

@tcutts

New feature

Make SLURM requests more efficient by [optionally] using the --only-job-state option to squeue

Use case

SLURM supported releases (24 and 25) introduced the ability to return just the job state information much more quickly. Nextflow should use this capability if it's available in order to reduce the RPC burden on SLURM itself and accelerate scheduling.

Suggested implementation

All that needs doing is changing the squeue command line in the executor plugin to:

squeue --noheader --only-job-state -o "%i %t" -t all

(i.e. one extra option). The format of the output is almost identical; I think the only difference is whether the % syntax for job arrays is emitted (i.e. how many jobs in the job array can be run at once, something I don't think nextflow makes use of)

If the SLURM administrator combines this with adding the enable_job_state_cache configuration in SLURM, it will mean that nextflow job state queries will no longer use the RPC mechanism which causes such performance difficulties for naive workflows.

Obviously there will need to be a bit more code than that; either to make the precise command configurable or to test the SLURM version for whether that option is available, and use it if it is.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions