This tool augments the output of squeue with additional information about the state of pending jobs and explains clearly why jobs are waiting.
- Display pending jobs nicely with additional details
- Identify and recommend fixes to common problems
- QOS limits (group, per-job, CPU, node, etc.)
- Jobs intersect with reservations
- Job ran but exited quickly
Command line options available using -h
conda create -n sq python=3.8
conda activate sq
pip install -r requirements.txt
pyinstaller --onefile sq.pyThen the output binary is located at: dist/sq
The most common problem is that the tool encounters output from a Slurm command (e.g., squeue, sinfo) that it can't parse.
A couple approaches to debugging are:
-
You can clone this repo (presumably into the Savio filesystem) and then run
python -m pdb sq.pymanually (including insertingpdbcommands and modifying code insq.py). -
You can add
--freeze $DIRNAME, andsqwill create a new directory at$DIRNAMEcontaining all the Slurm command outputs. You can then use the saved files to debug in the future with--load $DIRNAME.