You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sections/creating-workflows-and-pipelines/index.md
+60-24
Original file line number
Diff line number
Diff line change
@@ -38,67 +38,103 @@ We primarily use `Snakemake` in the Bloom lab. `Snakemake` is a workflow managem
38
38
39
39
#### Using snakemake with SLURM
40
40
41
-
To configure how snakemake interacts with SLURM, newer versions of snakemake (>8.0) require you to use the [profiles](https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles) system. This replaces cluster configuration files. Conceptually these work similarly to the old configuration files in that they allow you to configure how jobs are submitted to SLURM both globally and at a per-rule level. This system is still under active development so the advice here is current as of snakemake v8.24.
41
+
A `Snakemake` workflow consists of many connected steps called 'rules.' The same rule can be executed in parallel for each input sample to the workflow. Separately running instances of each rule are called 'jobs.' You can run each job _sequentially_, but it's much faster to run independent jobs _in parallel_. That's where a workload manager like [`Slurm`](https://slurm.schedmd.com/documentation.html) comes in handy. `Snakemake` can communicate with `Slurm` to allocate computational resources for running multiple jobs simultaneously.
42
+
43
+
As of `Snakemake` version `8.*.*`, you'll need to use [profiles](https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles) to configure how `Snakemake` interacts with `Slurm`.
42
44
43
45
##### Setup
44
46
45
-
To get started, first ensure that you have at least snakemake v8+ installed. You can check the Snakemake version with `snakemake --version`. In the latest version of the profile system, interaction with the job schedulers is abstracted away into plugins. For SLURM, you will need to install the [snakemake-executor-plugin-slurm](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html).
47
+
Ensure that you have at least `Snakemake` version `8.*.*`+ installed. You can check by running `snakemake --version`. You'll also need to install the [snakemake-executor-plugin-slurm](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html) plugin from conda.
The profiles are simply YAML files that specify job scheduler-specific parameters. In theory, you could have separate profiles depending on where the pipeline is being run but in practice we'll mostly be dealing with SLURM. An example of a fairly minimal profile is below:
55
+
"[Profiles](https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles)" are YAML files that specify how `Snakemake` should ask `Slurm` for resources.
50
56
57
+
Make a directory in your project called `profile` and create an empty file called `config.yaml`:
58
+
59
+
```bash
60
+
mkdir profile
61
+
touch profile/config.yaml
51
62
```
63
+
64
+
Open `config.yaml` and add the following information:
65
+
66
+
```yaml
52
67
executor: slurm
68
+
latency-wait: 60
69
+
jobs: 100
70
+
slurm-init-seconds-before-status-checks: 20
71
+
53
72
default-resources:
54
-
- runtime=720
55
-
jobs: 50
56
-
use-conda: true
73
+
slurm_account: <account_name>
74
+
runtime: 10800
75
+
cpus_per_task: 1
76
+
mem_mb: 4000
57
77
```
58
78
59
-
You can also specify rule-specific resource requirements in the profile configuration. For example, the code below would ask for 16 cpus anytime it submits a job for the `bigjob` rule.
79
+
This is the most basic version of a 'profile.' You can leave the first section untouched. However, you'll need to add the name of your `slurm_account` under `default-resources:` (e.g. `bloom_j`). As its name implies, `default-resources:` tells `Slurm` what resources should be allocated to a job if no other information is specified.
60
80
61
-
```
81
+
You'll occasionally have a rule that requires more resources. For example, alignment can be sped up significantly with multiple CPUs. Profiles can tell `Slurm` that jobs spawned from certain rules require more resources:
82
+
83
+
```yaml
62
84
set-resources:
63
-
bigjob:
64
-
cpus_per_task: 16
85
+
<rule_that_needs_more_cpus>:
86
+
cpus_per_task: 8
65
87
```
66
88
89
+
The `set-resources:` section tells `Slurm` that jobs from your `<rule_that_needs_more_cpus>` should get 8 CPUs, not the single CPU given by default.
90
+
67
91
:::tip
68
-
These can also be specified in the rules themselves as demonstrated in the section below. However, if they are set in both places, the value specified in the profile will override what is set in the rule.
92
+
Further details on configuring this `Slurm` with profiles can be found [here](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html).
69
93
:::
70
94
71
-
Further details on configuring this SLURM plugin can be found [here](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html).
72
-
73
95
##### Rule configuration
74
96
75
-
Rule specific parameters can also be set in the rule definition itself. This can be particularly useful for long running steps (e.g. alignment) which may be able to use multiple cores to speed things up. These can be set with the `resources`keyword:
97
+
If you have a rule that requires more resources, you should also define that in the rule itself with the `resources` and `threads` keywords:
76
98
77
-
```
78
-
rule bigjob
79
-
input: ...
80
-
output: ...
99
+
```snakemake
100
+
rule rule_that_needs_more_cpus:
101
+
input:
102
+
output:
81
103
threads: 8
82
104
resources:
83
105
mem_mb=16000,
84
106
cpus_per_task=8
85
-
shell: ...
107
+
shell:
86
108
```
87
109
88
110
:::warning
89
-
Currently, if you are submitting the snakemake job as a script itself, you must specify both `threads` and `cpus_per_task` or else this will not be properly propagated to SLURM. There is some ongoing discussion of this [issue](https://github.com/snakemake/snakemake-executor-plugin-slurm/issues/141) so it may be resolved at some point in the future.
111
+
Currently, if you are submitting the snakemake job as a script itself, you must specify both `threads` and `cpus_per_task` or else this will not be properly propagated to `Slurm`. There is some ongoing discussion of this [issue](https://github.com/snakemake/snakemake-executor-plugin-slurm/issues/141) so it may be resolved at some point in the future.
90
112
:::
91
113
92
114
##### Submission script
93
115
94
-
The main script to submit your pipeline to SLURM should look something like the following:
116
+
After configuring a profile and updating your rules, you'll need to make a `bash` script that runs your `Snakemake` pipeline:
This is just asking for 1 core with 1GB of memory for the main job of running the snakemake process, which will then spawn separate jobs for each rule that needs to be run. Therefore, we can keep the resource request here modest. Then in the `snakemake` command itself, you simply point it at your profiles configuration and your main `Snakefile` respectively.
130
+
The lines preceded by `#` tell `Slurm` that you'll need one CPU and 1GB of memory to start running the `Snakemake` pipeline. Once the `snakemake` command has been executed on this CPU, `Snakemake` will use the profile specified by `--workflow-profile` to begin submitting pipeline jobs to `Slurm`.
131
+
132
+
:::warning
133
+
If you run `Snakemake` by submitting a `bash` script to `Slurm` you'll get the following warning message:
134
+
135
+
```txt
136
+
You are running snakemake in a SLURM job context. This is not recommended, as it may lead to unexpected behavior. Please run Snakemake directly on the login node.
137
+
```
138
+
139
+
This shouldn't be a big issue, but please post an issue on this repository if you running into unexpected behavior.
0 commit comments