|
3 | 3 |
|
4 | 4 | 
|
5 | 5 |
|
6 |
| -The official repo of the new modular BioExcel2 version of HADDOCK. |
7 | 6 |
|
8 | 7 | **ATTENTION: This repository is under heavy development and may change abruptly.**
|
9 | 8 |
|
10 |
| -*** |
11 |
| -## Stages |
| 9 | +# Changelog |
| 10 | + |
| 11 | +* 0.0.alpha1 (06-11-2019) |
| 12 | + * First version of the skeleton code, |
| 13 | + * Simple protein-protein with ambiguious restraints |
| 14 | + * Scoring of an ensamble of models (clustering included) |
| 15 | + |
| 16 | +# TODO |
| 17 | + |
| 18 | +**WIP** |
| 19 | + |
| 20 | +# Instalation |
12 | 21 |
|
13 |
| -1. Initialization |
14 |
| -2. Pre-processing |
15 |
| -3. Topology generation |
16 |
| -4. Docking |
17 |
| - 1. Rigid-body |
18 |
| - 2. Semi-Flexible |
19 |
| - 3. Water-refinement |
| 22 | +* Requirements |
| 23 | + * [CNS 1.31 UU](https://surfdrive.surf.nl/files/index.php/apps/files/?dir=/Shared/HADDOCK/CNS&fileid=5041663829) |
| 24 | + * Python 3.7.x |
20 | 25 |
|
21 |
| -### 1. Initialization |
22 |
| -### 2. Pre-processing |
23 |
| -### 3. Topology generation |
24 |
| -### 4. Docking |
| 26 | +```bash |
| 27 | +$ git clone https://github.com/haddocking/haddock3.git |
| 28 | +$ cd haddock3 |
| 29 | +$ setenv PYTHONPATH ${PYTHONPATH}:`pwd` |
| 30 | +$ pip install -r requirements.txt |
| 31 | +$ cd haddock/src |
| 32 | +$ make |
| 33 | +$ chmod +x haddock/src/* |
| 34 | +$ cd ../../ |
25 | 35 |
|
26 |
| -## Workflows |
| 36 | +# Edit "cns_exe" and "haddock3" in the ini script |
| 37 | +$ vim haddock/etc/haddock3.ini |
| 38 | +``` |
| 39 | + |
| 40 | +# Execution |
| 41 | + |
| 42 | +```bash |
| 43 | +$ cd examples/protein-protein |
| 44 | +$ python ../../haddock/setup_haddock.py run.toml |
| 45 | +$ cd run1 |
| 46 | +$ python ../../../haddock/run_haddock.py |
| 47 | +``` |
| 48 | + |
| 49 | +# Scoring |
| 50 | + |
| 51 | +```bash |
| 52 | +$ cd examples/protein-protein |
| 53 | +$ python ../../haddock/workflows/scoring/setup_scoring.py scoring.toml |
| 54 | +$ cd run-scoring-example |
| 55 | +$ python3 ../../../haddock/workflows/scoring/run_scoring.py |
| 56 | +``` |
27 | 57 |
|
28 |
| -* Scoring |
29 |
| -* Refinement |
30 | 58 |
|
31 | 59 | ***
|
32 |
| -# Dev information |
| 60 | +# Development |
| 61 | + |
| 62 | +The default recipes are refactored versions of the "legacy" |
| 63 | +protocols (`generate.inp`, `refine.inp`, `refine_h2o.inp` and `scoring.inp`) now called: |
| 64 | + |
| 65 | + * `generate-topology.cns` |
| 66 | + * `it0.cns` |
| 67 | + * `it1.cns` |
| 68 | + * `itw.cns` |
| 69 | + * `scoring.cns` |
| 70 | + |
| 71 | +These recipes are independent from each other and compatible with the "modular" manner they are executed, |
| 72 | +example: |
| 73 | + |
| 74 | +By using `@RUN:protocols/scale_inter_final.cns`, the `scale_inter_final.cns` script is executed alongside the main |
| 75 | +`CNS` instance, having access to all variable definitions. To emulate this we developed a Recipe module |
| 76 | +(`haddock.modules.worker.recipe`) that reads the main `CNS` code, identify its dependency tree and recursively append |
| 77 | + each of the scripts to the appropriate position. |
| 78 | + |
| 79 | + This step is done by `setup_haddock.py`, which also adjusts the user-specified parameters, saving a single template |
| 80 | + to, for example, `run1/topology/template/` resulting in a longer, but self-contained, `.inp`. With this design |
| 81 | + there is no need to copy over `protocols/` and `toppar/` to the simulation folder, reducing I/O and taking less space. |
| 82 | + |
| 83 | +Having the execution separated in two steps allows the user to manually check the simulation or to prepare a large |
| 84 | +batch of files. However manually editing the templates, as one would edit `run.cns` is possible but should no longer be |
| 85 | +necessary. Each recipe has a companion `.json` that defines its default parameters (`it0.cns`/`it0.json`), |
| 86 | +non-default parameters are passed by the user via the `run.toml` file, an upgrade of `run.param` (or `new.html`). |
| 87 | + |
| 88 | + TOML ([Tom's Obvious, Minimal Language.](https://github.com/toml-lang/toml])) was chosen since it is human readable |
| 89 | + (allowing for comments) and an efficient low-impact way of passing parameters, example `run.toml`: |
| 90 | + |
| 91 | +```toml |
| 92 | +title = "HADDOCK3 Setup file" |
| 93 | +#===========================================================# |
| 94 | +[molecules] |
| 95 | +mol1 = '1AY7_r_u.pdb' |
| 96 | +mol1.segid = "A" |
| 97 | +mol2 = '1AY7_l_u.pdb' |
| 98 | +mol1.segid = "B" |
| 99 | + |
| 100 | +[restraints] |
| 101 | +ambig = 'ambig.tbl' |
| 102 | + |
| 103 | +[identifier] |
| 104 | +run = 1 |
| 105 | + |
| 106 | +[execution_parameters] |
| 107 | +scheme = 'parallel' |
| 108 | +nproc = 2 |
| 109 | + |
| 110 | +# Stage specific parameters |
| 111 | +[stage] |
| 112 | +[stage.topology] |
| 113 | +recipe='default' |
| 114 | + |
| 115 | +[stage.rigid_body] |
| 116 | +recipe='default' |
| 117 | +sampling = 200 |
| 118 | +params.auto_his = true |
| 119 | +params.noecv = false |
| 120 | + |
| 121 | +[stage.semi_flexible] |
| 122 | +recipe='default' |
| 123 | +sampling = 20 |
| 124 | + |
| 125 | +[stage.water_refinement] |
| 126 | +recipe='default' |
| 127 | +sampling = 20 |
| 128 | +#===========================================================# |
| 129 | +``` |
| 130 | + |
| 131 | +In this example the simulation scheme (reminiscent of the `Queue`) will be parallel, `it0` will use its default |
| 132 | +parameters with the exception of `auto_his` and `noecv` which were manually defined with `sampling=200`, `it1` and |
| 133 | +`itw` will be default with `n=20`. |
| 134 | + |
| 135 | +Running this setup file will create the following folder structure: |
| 136 | + |
| 137 | +``` |
| 138 | +run1/ |
| 139 | +
|
| 140 | +|_ data/ |
| 141 | + |_ ambig.tbl |
| 142 | + |_ mol1_1.pdb |
| 143 | + |_ mol2_1.pdb |
| 144 | + |_ run.toml |
| 145 | + |
| 146 | +|_ topology/ |
| 147 | + |_template/generate-topology.cns |
| 148 | + |
| 149 | +|_ rigid_body/ |
| 150 | + |_template/it0.cns |
| 151 | + |
| 152 | +|_ semi_flexible/ |
| 153 | + |_template/it1.cns |
| 154 | + |
| 155 | +|_ water_refinement/ |
| 156 | + |_template/itw.cns |
| 157 | +``` |
| 158 | + |
| 159 | +Here we define `one model = one task`, so during execution each `.inp`, `.out` and resulting files will be created |
| 160 | +inside its own folder: |
| 161 | + |
| 162 | +``` |
| 163 | +run1/topology/ |
| 164 | + |_ generate_0000001.inp |
| 165 | + |_ generate_0000001.out |
| 166 | + |_ generate_0000002.inp |
| 167 | + |_ generate_0000002.out |
| 168 | + |_ mol1_1.pdb |
| 169 | + |_ mol1_1.psf |
| 170 | + |_ mol2_1.pdb |
| 171 | + |_ mol2_1.psf |
| 172 | + |_ template/ |
| 173 | + |_ generate_topology.cns |
| 174 | +
|
| 175 | +``` |
| 176 | + |
| 177 | + |
| 178 | +**WIP** |
33 | 179 |
|
34 | 180 | ## CNS Refactoring guidelines
|
35 | 181 |
|
36 |
| -### Variables |
37 |
| -The stable implementation of HADDOCK relies heavily on global variables. |
| 182 | +### From global to local variables |
38 | 183 |
|
39 |
| -Example: `noecv` |
40 | 184 |
|
41 |
| -In `run.cns` the `noecv` variable is defined and evaluated to its global variable: |
| 185 | +The `noecv` parameter, which is responsible for random removal of restraints. In `run.cns`, `noecv` is |
| 186 | +defined and evaluated as follows, becoming a global variable |
42 | 187 |
|
43 | 188 | ```
|
44 |
| -define ( |
45 |
| -... |
46 |
| -{===>} noecv=true; |
47 |
| -... |
48 |
| -) |
| 189 | +define (...{===>} noecv=true;...) |
49 | 190 | ...
|
50 | 191 | evaluate (&data.noecv=&noecv)
|
51 |
| -... |
52 | 192 | ```
|
53 | 193 |
|
54 |
| -We can then see in `refine.inp` how this variable is used. First by reading it and make it usable as `$Data.noecv` anywhere in the code. |
| 194 | +This parameter is then called in `protocols/refine.inp`, first by reading it and then making it usable in the rest of |
| 195 | +the code as |
| 196 | +`$Data.noecv`. |
55 | 197 |
|
56 | 198 | ```
|
57 |
| -@RUN:run.cns( |
58 |
| -... |
59 |
| -Data =$Data; |
60 |
| -... |
61 |
| -) |
62 |
| -... |
63 |
| -if ($Data.noecv eq true) then |
| 199 | +@RUN:run.cns(...Data=$Data;...) |
64 | 200 | ...
|
| 201 | +if ($Data.noecv eq true) then ... |
65 | 202 | ```
|
66 | 203 |
|
67 |
| -To achieve this behavious in a modular sense, part of the CNS code must be refactored to account for the new input |
68 |
| -method. In the new |
69 |
| -implementation variables will be defined on the fly, reading from a `json` parameter file instead of `run.cns` and |
70 |
| -written on the header of the input file. |
| 204 | +To get the same in the new implementation, the `CNS` code **must** be refactored. Variables are now defined |
| 205 | +"*on-the-fly*" by a combination of reading a `.json` with default values and a `.toml` with user custom parameters |
| 206 | +instead directly editing `run1/run.cns` |
71 | 207 |
|
72 |
| -Example: |
73 |
| -``` |
74 |
| -# Json file |
| 208 | +```json |
75 | 209 | {
|
76 | 210 | "params": {
|
77 | 211 | "noecv": true
|
78 | 212 | }
|
79 | 213 | }
|
80 |
| -
|
81 |
| -# CNS input file header |
82 |
| -evaluate ($noecv=true) |
83 | 214 | ```
|
84 | 215 |
|
85 |
| -However `refine.inp` and its dependecies use `$Data.noecv` instead of `$noecv`, so this must be manually accounted |
86 |
| -for simply by adding the following to the top of the refactored CNS recipe. |
| 216 | +The issue arrives because "legacy" `CNS` protocols will use `$Data.noecv` instead of `$noecv`, to bypass this the |
| 217 | +main `CNS` protocol of a given recipe **needs** to be manually edited: |
87 | 218 | ```
|
88 | 219 | evaluate ($Data.noecv=$noecv)
|
89 | 220 | ```
|
90 | 221 |
|
91 |
| -Future edits should not use the `$Data.` format and stick to the literal variable instead. |
92 |
| -*** |
| 222 | +New `CNS` protocols should rely on the literal name of the variable, `$noecv` instead of `$Data.noecv`. |
0 commit comments