Skip to content

Commit 9c6b857

Browse files
committed
First complete draft of the documentation, added clustering result
1 parent c2bbe26 commit 9c6b857

File tree

2 files changed

+239
-11
lines changed

2 files changed

+239
-11
lines changed

docs/README.md

+231-11
Original file line numberDiff line numberDiff line change
@@ -78,13 +78,13 @@ We found reasonable those values for number of swarms and glowworms per swarm in
7878

7979
Below there is a description of the rest of accepted paramenters of `lightdock_setup`:
8080

81-
- **seed_points** *STARTING_POINTS_SEED*: An integer can be specified as the seed used in the random number generator of the initial random poses of the ligand.
82-
- **ft** *ftdock_file*: LightDock can read the output of the venerable [FTDock](http://www.sbg.bio.ic.ac.uk/docking/ftdock.html) software in order to use the FTDock rigid-body predictions as the starting poses of the LightDock simulation. In order to do so, `lightdock_setup` classifies the different FTDock predictions according to its translation into the corresponding swarm over the surface of the receptor.
83-
- **noxt**: If this option is enabled, LightDock ignores OXT atoms. This is useful for several scoring functions which don't understand this special type of atom.
84-
- **anm**: If this option is enabled, the ANM mode is activated and backbone flexibility is modeled using ANM (via ProDy).
85-
- **seed_anm** *ANM_SEED*: An integer can be specified as the seed used in the random number generator of ANM normal modes extents.
86-
- **anm_rec** *ANM_REC*: The number of non-trivial normal modes calculated for the recepetor in the ANM mode.
87-
- **anm_lig** *ANM_LIG*: The number of non-trivial normal modes calculated for the ligand in the ANM mode.
81+
- **--seed_points** *STARTING_POINTS_SEED*: An integer can be specified as the seed used in the random number generator of the initial random poses of the ligand.
82+
- **--ft** *ftdock_file*: LightDock can read the output of the venerable [FTDock](http://www.sbg.bio.ic.ac.uk/docking/ftdock.html) software in order to use the FTDock rigid-body predictions as the starting poses of the LightDock simulation. In order to do so, `lightdock_setup` classifies the different FTDock predictions according to its translation into the corresponding swarm over the surface of the receptor.
83+
- **--noxt**: If this option is enabled, LightDock ignores OXT atoms. This is useful for several scoring functions which don't understand this special type of atom.
84+
- **--anm**: If this option is enabled, the ANM mode is activated and backbone flexibility is modeled using ANM (via ProDy).
85+
- **--seed_anm** *ANM_SEED*: An integer can be specified as the seed used in the random number generator of ANM normal modes extents.
86+
- **--anm_rec** *ANM_REC*: The number of non-trivial normal modes calculated for the recepetor in the ANM mode.
87+
- **--anm_lig** *ANM_LIG*: The number of non-trivial normal modes calculated for the ligand in the ANM mode.
8888

8989
### 2.1. Results of the setup
9090

@@ -111,19 +111,239 @@ After the execution of `lightdock_setup` script, several files and directories w
111111

112112
## 3. Run a simulation
113113

114-
TBC
114+
### 3.1. Parameters
115+
116+
In order to run a LightDock simulation, the `lightdock` script has to be executed. If the script is executed without arguments, a list of accepted options is displayed:
117+
118+
```bash
119+
usage: lightdock [-h] [-f configuration_file] [-s SCORING_FUNCTION]
120+
[-sg GSO_SEED] [-t TRANSLATION_STEP] [-r ROTATION_STEP] [-V]
121+
[-c CORES] [--profile] [-mpi] [-ns NMODES_STEP] [-min]
122+
setup_file steps
123+
lightdock: error: too few arguments
124+
```
125+
126+
The simplest way to execute a LightDock simulation is:
127+
128+
```bash
129+
lightdock setup.json 10
130+
```
131+
132+
The first parameter is the configuration file generated on the setup step, the second is the number of steps of the simulation.
133+
134+
The rest of possible arguments which `lightdock` accepts is:
135+
136+
- **-f** *configuration_file*: This is a special file containing the different parameters of the GSO algorithm. By default, this is not necessary to change, but advanced users might change some of the values. Here it is an example of the content of this file:
137+
138+
```
139+
##
140+
#
141+
# GlowWorm configuration file - algorithm parameters
142+
#
143+
##
144+
145+
[GSO]
146+
147+
# Rho
148+
rho = 0.4
149+
150+
# Gamma
151+
gamma = 0.6
152+
153+
# Initial Luciferin
154+
initialLuciferin = 5.0
155+
156+
# Initial glowworm vision range (in A)
157+
initialVisionRange = 15.0
158+
159+
# Max vision range (in A)
160+
maximumVisionRange = 40.0
161+
162+
# Beta
163+
beta = 0.16
164+
165+
# Max number of neighbors
166+
maximumNeighbors = 5
167+
```
168+
169+
These are the parameters used in the LightDock publication, many of them inherited from the original GSO publication. Please refer to the [Kaipa, Krishnanand N. and Ghose, Debasish](https://www.springer.com/gp/book/9783319515946) for more details.
170+
171+
- **-s** *SCORING_FUNCTION*: Probably one of the most important parameters of the simulation. The user is able to change the default scoring function (DFIRE) using this flag. A name of a scoring function or a file containing the name and weight of multiple scoring functions are accepted. See section 3.2 for a complete list of accepted scoring functions and how to combine them.
172+
- **-c** *CORES*: By default, LightDock makes use of the total number of available CPU cores on the hardware to run the simulation, but a different number of CPU cores can be specified via this option.
173+
- **-mpi**: If this flag is activated, LightDock will make use of the MPI4py library in order to spread to diffeerent nodes.
174+
- **--profile**: This is a experimental flag and it is intended for profiling computation time and memory used by LightDock.
175+
- **-sg** *GSO_SEED*: It is the integer used as a seed for the random number generator in charge of running the simulation. Different seeds will incur in different simulation outputs.
176+
- **-t** *TRANSLATION_STEP*: When the translation part of the optimization vector is interpolated, this parameter controls the interpolation point. By default is set to 0.5.
177+
- **-r** *ROTATION_STEP*: When the rotation part of the optimization vector is interpolated (using [quaternion SLERP](https://en.wikipedia.org/wiki/Slerp#Quaternion_Slerp)), this parameter controls the interpolation point. By default is set to 0.5.
178+
- **-ns** *NMODES_STEP*: When the ANM normal modes extent part of the optimization vector is interpolated, this parameter controls the interpolation point. By default is set to 0.5.
179+
- **-min**: If this option is enabled, a local minimization of the best glowworm in terms of scoring is performed for each step, at each swarm. The algorithm used is the Powell ([fmin_powell](https://docs.scipy.org/doc/scipy-0.18.1/reference/generated/scipy.optimize.fmin_powell.html)) implementation from the scipy.optimize library.
180+
- **-V**: displays the LightDock version.
181+
182+
183+
### 3.2. Available scoring functions
184+
185+
The complete list of scoring functions implemented in LightDock is:
186+
187+
- `cpydock`: Implementation in C of the [pyDock](https://www.ncbi.nlm.nih.gov/pubmed/17444519) scoring function
188+
- `dfire`: Implementation of the [DFIRE](https://www.ncbi.nlm.nih.gov/pubmed/15162489) scoring function in Cython.
189+
- `fastdfire`: Implementation of the DFIRE scoring function using the Python C-API, faster than `dfire`.
190+
- `dfire2`: Implementation of the [DFIRE2](https://www.ncbi.nlm.nih.gov/pubmed/18469178) scoring function using the Python C-API, despite a Cython version is also included for demonstrational purposes.
191+
- `dna`: Implementation of the pyDockDNA scoring function (no desolvation) and custom Van der Waals weight for protein-DNA docking. Implemented using the Python C-API.
192+
- `mj3h`: Pairwise contact energies for 20 types of residues, [Mj3h](https://www.ncbi.nlm.nih.gov/pubmed/10336383).
193+
- `pisa`: A statistical potential from the [Improving ranking of models for protein complexes with side chain modeling and atomic potentials](https://onlinelibrary.wiley.com/doi/abs/10.1002/prot.24214) publication.
194+
- `sd`: An electrostatics and Van der Waals based scoring function as described in the [SwarmDock publication](https://www.ncbi.nlm.nih.gov/pubmed/21152290), but using AMBER94 force-field charges and parameters.
195+
- `sipper`: Intermolecular pairwise propensities of exposed residues, [SIPPER](https://www.ncbi.nlm.nih.gov/pubmed/21214199).
196+
- `tobi`: [TOBI](https://bmcstructbiol.biomedcentral.com/articles/10.1186/1472-6807-10-40) potentials scoring function.
197+
- `vdw`: A truncated Van der Waals (Lennard-Jones potential) as described in the original [pyDock](https://www.ncbi.nlm.nih.gov/pubmed/17444519) publication.
198+
199+
### 3.2.1. Multiple scoring functions
200+
201+
Several scoring functions can be used simultaneously by LightDock during the minimization. Each glowworm in the simulation will count on a model for each different scoring function, thus physical memory could be a limit on the number of simultaneous scoring functions.
202+
203+
A file containing the name of the scoring function and its weight can be defined as this example:
204+
205+
```bash
206+
cat socring.conf
207+
sipper 0.5
208+
dfire 0.8
209+
```
210+
211+
For each pose, the scoring would be in this example the linear combination of both functions:
212+
213+
```Scoring = 0.5*SIPPER + 0.8*DFIRE```
214+
215+
216+
### 3.3. Tips and tricks
217+
218+
- All the available scoring fundtions can be found at the path `$LIGHTDOCK_HOME/lightdock/scoring`. Each scoring function has its own directory.
219+
115220

116221

117222
## 4. Generate models
118223

119-
TBC
224+
Once the simulation has completed, the predicted models can be generated as PDB structure files. In order to do so, execute the `lgd_generate_conformations.py` command:
225+
226+
```bash
227+
lgd_generate_conformations.py
228+
usage: conformer_conformations [-h]
229+
receptor_structure ligand_structure
230+
lightdock_output glowworms
231+
conformer_conformations: error: too few arguments
232+
```
233+
234+
For example, to generate the 10 models predicted in the step 5 in a swarm populated by 10 glowworms of the 2UUY example:
235+
236+
```bash
237+
cd $LIGHTDOCK_HOME/examples/2UUY
238+
cd swarm_0
239+
lgd_generate_conformations.py ../2UUY_rec.pdb ../2UUY_lig.pdb gso_5.out 10
240+
```
120241

242+
**IMPORTANT:** note that the structures used by this command are the originals used in the `lightdock_setup` command.
121243

122244
## 5. Clustering
123245

124-
TBC
246+
There are two different methods for clustering the predicted models implemented: *BSAS* and *hierarchical*. At the moment, *hierarchical* is deprecated and the *BSAS* method is the preferred one.
125247

248+
For each swarm, you can execute the `lgd_cluster_bsas.py` command. For example:
249+
250+
```bash
251+
cd swarm_0
252+
lgd_cluster_bsas.py gso_5.out
253+
```
254+
255+
The output would be:
256+
257+
```
258+
Reading CA from lightdock_3.pdb
259+
Reading CA from lightdock_6.pdb
260+
Reading CA from lightdock_0.pdb
261+
Reading CA from lightdock_5.pdb
262+
Reading CA from lightdock_9.pdb
263+
Reading CA from lightdock_7.pdb
264+
Reading CA from lightdock_2.pdb
265+
Reading CA from lightdock_8.pdb
266+
Reading CA from lightdock_1.pdb
267+
Reading CA from lightdock_4.pdb
268+
Glowworm 6 with pdb lightdock_6.pdb
269+
RMSD between 3 and 6 is 7.562
270+
New cluster 1
271+
Glowworm 0 with pdb lightdock_0.pdb
272+
RMSD between 3 and 0 is 7.757
273+
RMSD between 6 and 0 is 9.089
274+
New cluster 2
275+
Glowworm 5 with pdb lightdock_5.pdb
276+
RMSD between 3 and 5 is 6.856
277+
RMSD between 6 and 5 is 8.706
278+
RMSD between 0 and 5 is 4.665
279+
New cluster 3
280+
Glowworm 9 with pdb lightdock_9.pdb
281+
RMSD between 3 and 9 is 3.683
282+
Glowworm 9 goes into cluster 0
283+
Glowworm 7 with pdb lightdock_7.pdb
284+
RMSD between 3 and 7 is 6.830
285+
RMSD between 6 and 7 is 7.673
286+
RMSD between 0 and 7 is 6.709
287+
RMSD between 5 and 7 is 4.561
288+
New cluster 4
289+
Glowworm 2 with pdb lightdock_2.pdb
290+
RMSD between 3 and 2 is 7.346
291+
RMSD between 6 and 2 is 9.084
292+
RMSD between 0 and 2 is 7.646
293+
RMSD between 5 and 2 is 7.772
294+
RMSD between 7 and 2 is 9.414
295+
New cluster 5
296+
Glowworm 8 with pdb lightdock_8.pdb
297+
RMSD between 3 and 8 is 7.980
298+
RMSD between 6 and 8 is 5.623
299+
RMSD between 0 and 8 is 8.147
300+
RMSD between 5 and 8 is 8.182
301+
RMSD between 7 and 8 is 7.451
302+
RMSD between 2 and 8 is 8.337
303+
New cluster 6
304+
Glowworm 1 with pdb lightdock_1.pdb
305+
RMSD between 3 and 1 is 5.530
306+
RMSD between 6 and 1 is 9.025
307+
RMSD between 0 and 1 is 6.481
308+
RMSD between 5 and 1 is 6.954
309+
RMSD between 7 and 1 is 7.928
310+
RMSD between 2 and 1 is 3.114
311+
Glowworm 1 goes into cluster 5
312+
Glowworm 4 with pdb lightdock_4.pdb
313+
RMSD between 3 and 4 is 9.306
314+
RMSD between 6 and 4 is 7.367
315+
RMSD between 0 and 4 is 8.225
316+
RMSD between 5 and 4 is 9.455
317+
RMSD between 7 and 4 is 8.641
318+
RMSD between 2 and 4 is 9.509
319+
RMSD between 8 and 4 is 7.742
320+
New cluster 7
321+
{0: [3, 9], 1: [6], 2: [0], 3: [5], 4: [7], 5: [2, 1], 6: [8], 7: [4]}
322+
```
323+
324+
A new file in CSV format is created with the clustering information:
325+
326+
```cat cluster.repr
327+
0:2: 9.87810:3:lightdock_3.pdb
328+
1:1: 9.66368:6:lightdock_6.pdb
329+
2:1: 7.52192:0:lightdock_0.pdb
330+
3:1: 7.36888:5:lightdock_5.pdb
331+
4:1: 6.46572:7:lightdock_7.pdb
332+
5:2: 5.66227:2:lightdock_2.pdb
333+
6:1: 5.03967:8:lightdock_8.pdb
334+
7:1:-34.67761:4:lightdock_4.pdb
335+
```
336+
337+
For each line, the information is:
338+
339+
```
340+
cluster_id : population : best_scoring : number_of_neighbors : representative PDB structure
341+
```
126342

127343
## 6. Custom Scoring Functions
128344

129-
TBC
345+
New scoring functions can be added to the LightDock framework. Every different scoring function called by `lighdock` using the `-s` flag represents a directory in the `$LGITHDOCK_HOME/lightdock/scoring` path.
346+
347+
There is a template available to use as a skeleton in the `$LGITHDOCK_HOME/lightdock/scoring/template` directory.
348+
349+
This section will be completed with more details in the future. In the meantime, you can look to the implementation of the different scoring functions already coded in the framework.

examples/2UUY/swarm_0/cluster.repr

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
0:2: 9.87810:3:lightdock_3.pdb
2+
1:1: 9.66368:6:lightdock_6.pdb
3+
2:1: 7.52192:0:lightdock_0.pdb
4+
3:1: 7.36888:5:lightdock_5.pdb
5+
4:1: 6.46572:7:lightdock_7.pdb
6+
5:2: 5.66227:2:lightdock_2.pdb
7+
6:1: 5.03967:8:lightdock_8.pdb
8+
7:1:-34.67761:4:lightdock_4.pdb

0 commit comments

Comments
 (0)