Skip to content

Commit 498abef

Browse files
committed
documentation and prot-prot example
1 parent 66c7a0b commit 498abef

File tree

11 files changed

+1136085
-57
lines changed

11 files changed

+1136085
-57
lines changed

README.md

+180-50
Original file line numberDiff line numberDiff line change
@@ -3,90 +3,220 @@
33

44
![HADDOCK3](docs/media/HADDOCK3-logo.png)
55

6-
The official repo of the new modular BioExcel2 version of HADDOCK.
76

87
**ATTENTION: This repository is under heavy development and may change abruptly.**
98

10-
***
11-
## Stages
9+
# Changelog
10+
11+
* 0.0.alpha1 (06-11-2019)
12+
* First version of the skeleton code,
13+
* Simple protein-protein with ambiguious restraints
14+
* Scoring of an ensamble of models (clustering included)
15+
16+
# TODO
17+
18+
**WIP**
19+
20+
# Instalation
1221

13-
1. Initialization
14-
2. Pre-processing
15-
3. Topology generation
16-
4. Docking
17-
1. Rigid-body
18-
2. Semi-Flexible
19-
3. Water-refinement
22+
* Requirements
23+
* [CNS 1.31 UU](https://surfdrive.surf.nl/files/index.php/apps/files/?dir=/Shared/HADDOCK/CNS&fileid=5041663829)
24+
* Python 3.7.x
2025

21-
### 1. Initialization
22-
### 2. Pre-processing
23-
### 3. Topology generation
24-
### 4. Docking
26+
```bash
27+
$ git clone https://github.com/haddocking/haddock3.git
28+
$ cd haddock3
29+
$ setenv PYTHONPATH ${PYTHONPATH}:`pwd`
30+
$ pip install -r requirements.txt
31+
$ cd haddock/src
32+
$ make
33+
$ chmod +x haddock/src/*
34+
$ cd ../../
2535

26-
## Workflows
36+
# Edit "cns_exe" and "haddock3" in the ini script
37+
$ vim haddock/etc/haddock3.ini
38+
```
39+
40+
# Execution
41+
42+
```bash
43+
$ cd examples/protein-protein
44+
$ python ../../haddock/setup_haddock.py run.toml
45+
$ cd run1
46+
$ python ../../../haddock/run_haddock.py
47+
```
48+
49+
# Scoring
50+
51+
```bash
52+
$ cd examples/protein-protein
53+
$ python ../../haddock/workflows/scoring/setup_scoring.py scoring.toml
54+
$ cd run-scoring-example
55+
$ python3 ../../../haddock/workflows/scoring/run_scoring.py
56+
```
2757

28-
* Scoring
29-
* Refinement
3058

3159
***
32-
# Dev information
60+
# Development
61+
62+
The default recipes are refactored versions of the "legacy"
63+
protocols (`generate.inp`, `refine.inp`, `refine_h2o.inp` and `scoring.inp`) now called:
64+
65+
* `generate-topology.cns`
66+
* `it0.cns`
67+
* `it1.cns`
68+
* `itw.cns`
69+
* `scoring.cns`
70+
71+
These recipes are independent from each other and compatible with the "modular" manner they are executed,
72+
example:
73+
74+
By using `@RUN:protocols/scale_inter_final.cns`, the `scale_inter_final.cns` script is executed alongside the main
75+
`CNS` instance, having access to all variable definitions. To emulate this we developed a Recipe module
76+
(`haddock.modules.worker.recipe`) that reads the main `CNS` code, identify its dependency tree and recursively append
77+
each of the scripts to the appropriate position.
78+
79+
This step is done by `setup_haddock.py`, which also adjusts the user-specified parameters, saving a single template
80+
to, for example, `run1/topology/template/` resulting in a longer, but self-contained, `.inp`. With this design
81+
there is no need to copy over `protocols/` and `toppar/` to the simulation folder, reducing I/O and taking less space.
82+
83+
Having the execution separated in two steps allows the user to manually check the simulation or to prepare a large
84+
batch of files. However manually editing the templates, as one would edit `run.cns` is possible but should no longer be
85+
necessary. Each recipe has a companion `.json` that defines its default parameters (`it0.cns`/`it0.json`),
86+
non-default parameters are passed by the user via the `run.toml` file, an upgrade of `run.param` (or `new.html`).
87+
88+
TOML ([Tom's Obvious, Minimal Language.](https://github.com/toml-lang/toml])) was chosen since it is human readable
89+
(allowing for comments) and an efficient low-impact way of passing parameters, example `run.toml`:
90+
91+
```toml
92+
title = "HADDOCK3 Setup file"
93+
#===========================================================#
94+
[molecules]
95+
mol1 = '1AY7_r_u.pdb'
96+
mol1.segid = "A"
97+
mol2 = '1AY7_l_u.pdb'
98+
mol1.segid = "B"
99+
100+
[restraints]
101+
ambig = 'ambig.tbl'
102+
103+
[identifier]
104+
run = 1
105+
106+
[execution_parameters]
107+
scheme = 'parallel'
108+
nproc = 2
109+
110+
# Stage specific parameters
111+
[stage]
112+
[stage.topology]
113+
recipe='default'
114+
115+
[stage.rigid_body]
116+
recipe='default'
117+
sampling = 200
118+
params.auto_his = true
119+
params.noecv = false
120+
121+
[stage.semi_flexible]
122+
recipe='default'
123+
sampling = 20
124+
125+
[stage.water_refinement]
126+
recipe='default'
127+
sampling = 20
128+
#===========================================================#
129+
```
130+
131+
In this example the simulation scheme (reminiscent of the `Queue`) will be parallel, `it0` will use its default
132+
parameters with the exception of `auto_his` and `noecv` which were manually defined with `sampling=200`, `it1` and
133+
`itw` will be default with `n=20`.
134+
135+
Running this setup file will create the following folder structure:
136+
137+
```
138+
run1/
139+
140+
|_ data/
141+
|_ ambig.tbl
142+
|_ mol1_1.pdb
143+
|_ mol2_1.pdb
144+
|_ run.toml
145+
146+
|_ topology/
147+
|_template/generate-topology.cns
148+
149+
|_ rigid_body/
150+
|_template/it0.cns
151+
152+
|_ semi_flexible/
153+
|_template/it1.cns
154+
155+
|_ water_refinement/
156+
|_template/itw.cns
157+
```
158+
159+
Here we define `one model = one task`, so during execution each `.inp`, `.out` and resulting files will be created
160+
inside its own folder:
161+
162+
```
163+
run1/topology/
164+
|_ generate_0000001.inp
165+
|_ generate_0000001.out
166+
|_ generate_0000002.inp
167+
|_ generate_0000002.out
168+
|_ mol1_1.pdb
169+
|_ mol1_1.psf
170+
|_ mol2_1.pdb
171+
|_ mol2_1.psf
172+
|_ template/
173+
|_ generate_topology.cns
174+
175+
```
176+
177+
178+
**WIP**
33179

34180
## CNS Refactoring guidelines
35181

36-
### Variables
37-
The stable implementation of HADDOCK relies heavily on global variables.
182+
### From global to local variables
38183

39-
Example: `noecv`
40184

41-
In `run.cns` the `noecv` variable is defined and evaluated to its global variable:
185+
The `noecv` parameter, which is responsible for random removal of restraints. In `run.cns`, `noecv` is
186+
defined and evaluated as follows, becoming a global variable
42187

43188
```
44-
define (
45-
...
46-
{===>} noecv=true;
47-
...
48-
)
189+
define (...{===>} noecv=true;...)
49190
...
50191
evaluate (&data.noecv=&noecv)
51-
...
52192
```
53193

54-
We can then see in `refine.inp` how this variable is used. First by reading it and make it usable as `$Data.noecv` anywhere in the code.
194+
This parameter is then called in `protocols/refine.inp`, first by reading it and then making it usable in the rest of
195+
the code as
196+
`$Data.noecv`.
55197

56198
```
57-
@RUN:run.cns(
58-
...
59-
Data =$Data;
60-
...
61-
)
62-
...
63-
if ($Data.noecv eq true) then
199+
@RUN:run.cns(...Data=$Data;...)
64200
...
201+
if ($Data.noecv eq true) then ...
65202
```
66203

67-
To achieve this behavious in a modular sense, part of the CNS code must be refactored to account for the new input
68-
method. In the new
69-
implementation variables will be defined on the fly, reading from a `json` parameter file instead of `run.cns` and
70-
written on the header of the input file.
204+
To get the same in the new implementation, the `CNS` code **must** be refactored. Variables are now defined
205+
"*on-the-fly*" by a combination of reading a `.json` with default values and a `.toml` with user custom parameters
206+
instead directly editing `run1/run.cns`
71207

72-
Example:
73-
```
74-
# Json file
208+
```json
75209
{
76210
"params": {
77211
"noecv": true
78212
}
79213
}
80-
81-
# CNS input file header
82-
evaluate ($noecv=true)
83214
```
84215

85-
However `refine.inp` and its dependecies use `$Data.noecv` instead of `$noecv`, so this must be manually accounted
86-
for simply by adding the following to the top of the refactored CNS recipe.
216+
The issue arrives because "legacy" `CNS` protocols will use `$Data.noecv` instead of `$noecv`, to bypass this the
217+
main `CNS` protocol of a given recipe **needs** to be manually edited:
87218
```
88219
evaluate ($Data.noecv=$noecv)
89220
```
90221

91-
Future edits should not use the `$Data.` format and stick to the literal variable instead.
92-
***
222+
New `CNS` protocols should rely on the literal name of the variable, `$noecv` instead of `$Data.noecv`.

0 commit comments

Comments
 (0)