Skip to content

Running the analysis

Stephany Orjuela edited this page Jun 13, 2019 · 34 revisions

Print the list of commands without executing anything

It might be useful to have a look at what will happen if you start the ARMOR workflow with your current setup. To do this, run snakemake --use-conda -npr or snakemake -npr if you do not want to use conda. The -n parameter causes a dryrun, i.e. no execution and just displaying what will be done. The -p parameter prints the shell commands that will be executed by the pipeline; good for checking if file paths are correct. The -r parameter prints the reason why each rule will be executed (e.g. missing output file, new timestamp,...).

Run with --use-conda

If all the paths and individual configurations are defined in the config.yaml file (see configuration) and conda is available (see managing software), the workflow can be run from the command line with

snakemake --use-conda

Snakemake will create a conda environment from the envs/environment.yaml file and it will activate the environment before executing all rules with the conda directive. If you are an experienced conda user, you can specify a different environment for each rule within the Snakefile, and the correct environment will be activated.

If you want to use multiple cores, use

snakemake --use-conda --cores 12

This sets the total number of CPU cores used to run the entire workflow. By default, it uses only 1 CPU core.

This is different than setting the ncores: in the config.yaml. For example, if I set ncores: 2, I would be running 6 jobs in parallel for the multi-threading rules (FastQC, STAR, Salmon and DRIMSeq), but each job will be using 2 cores (thanks @matrs for the clarification). This means that ncores has to be smaller (or equal) than --cores.

You can look at an example here for more details.

If you want to run a specific rule, just do

snakemake --use-conda <ruleName>

Run from a different directory

First make sure the paths to all your input files are specified correctly in the config.yaml (see here). Relative paths will be interpreted relative to the Snakefile directory! To run the workflow outside of the folder containing the Snakefile (and all the scripts), specify the Snakefile path, and the path to the folder containing this file

snakemake --use-conda -s <path-to-Snakefile> -d <workdir>

Where workdir is the directory of the Snakefile.

Run by specifying the config.yaml

If you want to use a config.yaml file that is not located in the Snakefile directory, you can specify it with the --configfile parameter. Run the workflow from the Snakefile directory with

snakemake --use-conda --configfile <path-to-config.yaml>

Or see above for how to run the workflow from an arbitrary directory.

Run by manually creating a conda environment

After setting up your conda environment and system R installation (read first here), activate the environment and from within the environment run the pipeline with snakemake. For multiple cores use snakemake --cores 12.

Run without conda

In case you have all the necessary software in your path (see here) and you don't want to use conda, simply run the workflow without the --use-conda parameter:

snakemake.

Summary: If you do not want to use conda to manage your software (i.e. run-mode 2 and 3 of Managing software, simply omit the --use-conda parameter from the example commands.

Force re-execution upon configuration parameter changes

If invoked as described above, snakemake will execute a rule if the output is out-of-date with respect to the input, as determined by the time stamps of the corresponding files. In order to force re-execution in cases where the parameters (defined in the config.yaml file) have changed, call snakemake as:

snakemake --use-conda -R `snakemake --list-params-changes`

--list-params-changes will list the files that use any of the updated parameters, and -R will force their regeneration. See here for more details.

Get a nice summary of your workflow run

Use

snakemake -D > summary.txt

to generate a detailed summary of your workflow's output files after the run is finished (or to see the status of the output files at ay time), without re-running the workflow. As explained in the snakemake manual, the -D (or --detailed-summary) flag prints a summary of all files created by the workflow. It has the following columns: filename, modification time, rule version, input file(s), shell command, status, plan. One useful aspect of this is that you can easily retrieve the shell command that was used to generate each output file.

You can also use

snakemake --report report.html

to generate a nice visual report of run times and statistics of your workflow run.

Clone this wiki locally