1
1
# LEGEND L200 dataflow
2
2
3
- * Note: Still work in progress.*
4
-
5
3
Implementation of an automatic data processing flow for L200
6
4
data, based on
7
5
[ Snakemake] ( https://snakemake.readthedocs.io/ ) .
@@ -10,23 +8,12 @@ data, based on
10
8
## Configuration
11
9
12
10
Data processing resources are configured via a single site-dependent (and
13
- possibly user-dependent) configuration file, named " config.json" in the
11
+ possibly user-dependent) configuration file, named ` config.json ` in the
14
12
following. You may choose an arbitrary name, though.
15
13
16
14
Use the included [ templates/config.json] ( templates/config.json ) as a template
17
- and adjust the data base paths as necessary.
18
-
19
- When running Snakemake, the path to the config file * must* be provided via
20
- ` --configfile=path/to/configfile.json ` . For example, run
21
-
22
- ``` shell
23
- snakemake -j` nproc` --configfile=config.json file_to_generate
24
- ```
25
-
26
- ## Snakefile
27
-
28
- Snakemake is controlled using the Snakefile which specifies the rules to generate each file.
29
- The path to the Snakefile * must* be provided via ` --snakefile path/to/Snakefile ` .
15
+ and adjust the data base paths as necessary. Note that, when running Snakemake,
16
+ the default path to the config file is ` ./config.json ` .
30
17
31
18
32
19
## Key-Lists
@@ -47,11 +34,9 @@ which will generate the list of available file keys for all l200 files, resp.
47
34
a specific period, or a specific period and run, etc.
48
35
49
36
For example:
50
-
51
37
``` shell
52
- snakemake -j4 --configfile=config.json all-l200-myper.keylist
38
+ $ snakemake all-l200-myper.keylist
53
39
```
54
-
55
40
will generate a key-list with all files regarding period ` myper ` .
56
41
57
42
@@ -68,9 +53,8 @@ For file lists based on auto-generated key-lists like
68
53
automatically, if it doesn't exist.
69
54
70
55
Example:
71
-
72
56
``` shell
73
- snakemake -j4 --configfile=config.json all-mydet-mymeas-tier2.filelist
57
+ $ snakemake all-mydet-mymeas-tier2.filelist
74
58
```
75
59
76
60
File-lists may of course also be derived from custom keylists, generated
@@ -92,11 +76,9 @@ and produce all possible output for the given data tier, based on available
92
76
tier0 files which match the target.
93
77
94
78
Example:
95
-
96
79
``` shell
97
- snakemake -j ` nproc ` --configfile=config.json all-mydet-mymeas-tier2.gen
80
+ $ snakemake all-mydet-mymeas-tier2.gen
98
81
```
99
-
100
82
Targets like ` my-dataset-raw.gen ` (derived from a key-list
101
83
` my-dataset.keylist ` ) are of course allowed as well.
102
84
@@ -107,18 +89,14 @@ Snakemake supports monitoring by connecting to a
107
89
[ panoptes] ( https://github.com/panoptes-organization/panoptes ) server.
108
90
109
91
Run (e.g.)
110
-
111
92
``` shell
112
- panoptes --port 5000
113
-
93
+ $ panoptes --port 5000
114
94
```
115
-
116
95
in the background to run a panoptes server instance, which comes with a
117
96
GUI that can be accessed with a web-brower on the specified port.
118
97
119
98
Then use the Snakemake option ` --wms-monitor ` to instruct Snakemake to push
120
99
progress information to the panoptes server:
121
-
122
100
``` shell
123
101
snakemake --wms-monitor http://127.0.0.1:5000 [...]
124
102
```
@@ -131,24 +109,9 @@ instead supports Singularity containers via
131
109
for greater control.
132
110
133
111
To use this, the path to ` venv ` and the name of the environment must be set
134
- in " config.json" .
112
+ in ` config.json ` .
135
113
136
114
This is only relevant then running Snakemake * outside* of the software
137
115
container, e.g. then using a batch system (see below). If Snakemake
138
116
and the whole workflow is run inside of a container instance, no
139
- container-related settings in "config.json" are required.
140
-
141
-
142
- ## Running on a batch system
143
-
144
- A template configuration to run the dataflow on an SGE batch system is
145
- included in [ templates/snakemake-config] ( templates/snakemake-config ) .
146
- Copy the configuration into ` "$HOME/.config/snakemake" ` and adjust as
147
- necessary (especially batch-queue selection, number of jobs, etc.).
148
-
149
- You should then be able to run data production on the batch system via
150
- (e.g.):
151
-
152
- ``` shell
153
- snakemake --profile cluster-sge --jobs 20 --configfile=config.json all-l200-myper-dsp.gen
154
- ```
117
+ container-related settings in ` config.json ` are required.
0 commit comments