Instruction

1. common_names

Description:

Merge identical chemical structures to one common name in a SMILES file (also see unique_molecules tool). Useful for identifying unique chemical structures in a SMILES file.

Author/owner: C3/Eli Lilly and Co

Sample 1:

common_names -S ./output -s 10000 -r 10000 -D + -v input1.smi input2.smi

Explanation:

Find all compounds in input1.smi and input2.smi with common structure and different name; write them to file output.smi with new name consisting of old names separated by a "+" symbol; maximum number of molecules to process is 10000 (-s 10000); report progress every 10000 rows (-r 10000)

Shell output:

… output to './output'

File output: (output.smi)

Combined compound list

Help command:

common_names

2. fetch_smiles_quick

Description:

Fetches records from one file based on identifiers in another file

Author/Owner: C3/Eli Lilly and Co

Sample 1:

fetch_smiles_quick -j -c 1 -C 2 -X notInRecord -Y notInIdentifier record.w structures.smi

Explanation:

Fetches records from record.w file to identifier.smi file based on common identifiers. The matched records (column 1 in record.w and column 2 in structures.smi) will be displayed in the shell window. The list of unmatched identifiers will be saved in the notInRecord file. The list of unmatched records will be saved in the notInIdentifier file. The generated identifier file is a descriptor file without header record(-j).

Shell output: (matched record)

O=C(C)C=CC=C(C)CCC=C(C)C PBCHM1756999 24 3 6 349.4 5 0.2083 10

File output: (notInIdentifier)

Unmatched records

File output: (notInRecord)

Unmatched identifiers

Help command:

fetch_smiles_quick

3. unique_molecules

Description:

Filters out duplicate chemical structures based on unique smiles

Author/owner: C3/Eli Lilly and Co

Sample 1:

unique_molecules -S unique -D duplicate -v -l input.smi

Explanation:

Traverse structures in input.smi and identify duplicate structures; write duplicates in duplicate.smi; write unique in unique.smi; only consider largest fragment of each smiles (-l)

Shell output:

Execution summary

File output: (duplicate.smi)

Duplicate molecule list

File output: (unique.smi)

Unique molecule list

Help command:

unique_molecules

4. unique_rows

Description:

Identifies the unique rows in a file

Author/Owner: C3/Eli Lilly and Co

Sample 1:

unique_rows -c 1 -c 2 input.dat

Explanation:

Check input.dat file for unique rows based on values in column 1 and 2. The unique rows will be displayed in the shell window

Shell output:

Unique row list

Help command:

unique_rows

5. iwcut

Description:

Extract columns from a text file

Author/Owner: C3/Eli Lilly and Co

Sample 1: iwcut -f 5,3 descriptor.txt

Explanation:

Extract column 5 and column 3 from descriptor.txt file. The extracted columns data will be displayed in the shell window

Shell output:

Data from column 5 and column 3

Help command:

iwcut

6. fileconv

Description:

Structure file utility to clean up SMILES files and filter on specific criteria. It can also be used to convert between chemical file formats including, e.g from SDF to SMILES

Author/owner: C3/Eli Lilly and Co

Sample 1:

fileconv -Y dbg -B 100 -S -a input.smi

Explanation:

Debug/print each molecule structure in input.smi; ignore as many as 100 fatal input errors

Shell output:

Molecule information

Sample 2:

fileconv -F 6 -c 4 -C 14 -v -i smi -S selection list.smi

Explanation:

Select the molecules that have number of atoms ranging from 4-14 and less than 6 fragments from list.smi file; store results in file with selection.smi

Shell output:

Execution summary

File output: (selection.smi)

List of molecules meeting the search criteria

Sample 3:

fileconv -o sdf -i smi -S single single.smi

Explanation:

Convert single.smi file to the sdf format single.sdf

File output: (single.sdf)

Converted sdf file

Help command:

fileconv

7. rxn_signature

Description:

Generates reaction signatures for input reactions.

Author/Owner: C3/Eli Lilly and Co

Sample 1:

rxn_signature -v -r 0,1,2 -C Cfile -F Ffile all.rsmi >all.sig 2>all.log

Explanation:

Extract the reaction signatures of all reactions in all.rsmi. Store signatures in all.sig – program prints to stdout. The signature radius from the reaction core (i.e. the changing atoms) to the signature is 0 1 2. The list of changed atoms are written to Cfile. Failed reactions are written to Ffile.

Notes:

Reaction signatures capture the extended core of a reaction around the atoms that change in a reaction. A signature is based on the unique smiles of the reaction core. The smiles includes atoms colored by their environment in the original reaction smiles. In addition, information about the ring bond status in the original reaction smiles is appended to the reaction signature produced.

Help command:

rxn_signature

8. rxn_standardize

Description:

Checks and standardizes input chemical reactions; converts to a reaction smiles file format

Author/Owner: C3/Eli Lilly and Co

Sample 1:

rxn_standardize -s -c -D x -X igbad -v -C 60 -K -E autocreate -e -o -b -f gsub input.rsmi > output.rsmi

Explanation:

Check and standardize reactions in an input reactions smiles (.rsmi) file. Discard chirality on input (-c). Discard reactions containing duplicate atom map numbers (-D x). Ignore bad reactions (-X igbad). Discard any reaction where the largest reactant has more than 60 atoms (-C 60). Kekule fix (-K). Automatically create new elements when encountered (-E autocreate). Move small fragments that show up on products to orphan status (-e). Create reagent fragments that are orphans (-o). Remove duplicate reactants, even if atom maps scrambled (-b). Replace unusual characters in reaction names with _ (-f gsub).

Notes:

Input file can be in RDF or rsmi format. Output is in rsmi format

Help command:

rxn_standardize

9. tsubstructure

Description:

Perform 2D substructure searches with SMILES/SMARTS against SMILES files

Author/owner: C3/Eli Lilly and Co

Sample 1:

tsubstructure -s 'C(C)(=O)C' -m hits.smi -n nonhits.smi list.smi

Explanation:

Search for molecules in list.smi containing defined smarts (-s); write hits in hits.smi (-m) and nonhits in nonhits.smi(-n)

Sample 2:

tsubstructure.sh -f -b -A D -o smi -m hits.smi -s 'C(C)(=O)C' list.smi

Explanation:

Search for molecules containing defined smarts (-s); only find one embedding of the query (-f); for each molecule, break after finding a query which matches (-b); use daylight aromaticity (-A D); write hits in hits.smi

Note:

Use -X to successfully skip structures with unconventional symbols, e.g. X, R, ...

Sample 3:

tsubstructure -s '[ND1H2]-[C@H]1CCN2CCCCC2C1' -A D -o usmi -m match.out list.smi

Explanation:

Search for molecules containing defined smarts (-s)

Sample 4:

tsubstructure -A D -q carboxylic_acids.qry -u -M imp2exp -m match.smi list.smi

Explanation:

Find all matches to specific query file (-q) and place in match.smi (-m); use Daylight aromaticity; convert implicit hydrogen in target molecules to explicit before matching attempt (-M imp2exp); find unique matches only (-u)

Help command:

tsubstructure

10. retrosynthesis

Description:

Defines synthetic routes for input chemical structures by deconstructing input molecules into reactants using a set of known reactions templates. Conceptually, the inverse process of chemical reaction synthesis as implemented by tool trxn.

Author/Owner: C3/Eli Lilly and Co

Sample 1:

retrosynthesis -Y all -X kg -X kekule -X ersfrm -a 2 -q f -v -R 1 -I CentroidRxnSmi_1 -P UST:AZUCORS -M ncon -M ring -M unsat -M arom 10Cmpds.smi

Explanation:

Looks for synthesis paths for the molecules in 10Cmpds.smi using the reaction signatures in CentroidRxnSmi_1. Various standardization flags (-Y, -X, -q, -P, -M options). Require at least 2 heavy atoms in fragments (-a), verbose (-v), centroid radius 1 (-R).

Help command:

retrosynthesis

11. trxn

Description:

Performs reactions between reactant molecules to enumerate product structures. Uses a control reaction file, a scaffold SMILES file and zero or more reactant SMILES files. Conceptually inverse of retrosynthesis process as implemented by tool retrosynthesis.

Author/owner: C3/Eli Lilly and Co

Sample 1:

trxn -v -r 1.2.1_Aldehyde_reductive_amination_FROM_amines_AND_aldehydes.rxn -Z -z i -M RMX -m RMX -S 1.2.1_run 20180412_amines.smi 20180412_aldehydes.smi

Explanation:

Perform reaction in 1.2.1_Aldehyde_reductive_amination_FROM_amines_AND_aldehydes.rxn ignoring sidechains (-Z) and modules (-z i) not reacting, ignoring sidechains with multiple substructure match (-M RMX), ignoring scaffolds that generate multiple structure hits(-m RMX). Output file is saved to 1.2.1_run 20180412_amines.smi 20180412_aldehydes.

Sample 2:

trxn -v –r 2.1.2_Carboxylic_acid_+_amine_condensation_FROM_amines_AND_carboxylic_acids.rxn -Z -z i -M RMX -m RMX -S 2.1.2_run 20180412_amines.smi 20180412_carboxylic_acids.smixbntr

Explanation:

Perform reaction in ./2.1.2_Carboxylic_acid_+_amine_condensation_FROM_amines_AND_carboxylic_acids.rxn

Help command:

trxn

12. iwdemerit

Description:

Computes demerit of a molecule. In this context demerits refers to non-desirable molecular structure characteristics/features.

Author/owner: C3/Eli Lilly and Co

Sample 1:

iwdemerit -A D -A I -S foo -G - -f 99999 -t -W imp2exp -W maxe=1 -E autocreate -q F:PAINS/queries_latest -O hard -W dnv=0 -W slist -i smi pubchem_example.smi

Explanation:

Compute the demerits for the molecules in pubchem_example.smi (-i smi pubchem_example.smi) using the queries_latest query file (-q F:PAINS/queries_latest). The good, non-rejected (-G) structures will be written into foo.demerit (-S foo). Use Daylight aromaticity definitions (-A D) and enable input of aromatic structures (-A I). Molecules are rejected when they have 9999 or higher demerits (-f 9999). Append demerit text to molecule names (-t). Make implicit hydrogen explicit (-W imp2exp), maximum number of substructure queries to identity is 1 (-W maxe=1), use value 0 in the query file as the demerit score (-W dnv=0) and write a sorted list of demerit values and reasons (-W slist). Skip all the hard coded substructure queries (-O hard).

Help command:

iwdemerit

13. random_smiles

Description:

Generate random smiles based on input smiles.

Author/owner: C3/Eli Lilly and Co

Sample 1:

random_smiles.sh -n 5 -a -e -A D -v pubchem_example.smi

Explanation:

Generate 5 new random smiles (-n 5) based on the smiles in pubchem_examples.smi using Daylight aromaticity(-A D). Append permutation number to name (-a) and echo initial molecule (-e)

Help command:

random_smiles

14. preferred_smiles

Description:

Write either unique smiles (if interpretable) or non aromatic unique form.

Author/owner: C3/Eli Lilly and Co

Sample 1:

preferred_smiles.sh pubchem_example.smi

Explanation:

Write unique smiles for the smiles in the pubchem_example.smi

Help command:

preferred_smiles

15. rotatable_bonds

Description:

Calculate the rotatable bonds in the modules

Author/owner: C3/Eli Lilly and Co

Sample 1:

rotatable_bonds.sh pubchem_example.smi

Explanation:

Calculate the rotatable bonds for molecules in the pubchem_example.smi

Help command:

rotatable_bonds

16. concat_files

Description:

Concatenates descriptor files by joining on identifiers

Author/owner: C3/Eli Lilly and Co

Sample 1:

concat_files t1.1 t1.2

Explanation:

Concatenates the descriptor files t1.1 and t1.2 based on the identifier

Help command:

concat_files

17. msort

Description:

Sorts a molecule file by various criteria

Author/owner: C3/Eli Lilly and Co

Sample 1:

msort -a pubchem_example.smi

Explanation:

Sort the molecule file pubchem_example.smi based on the number of atoms the molecule has (-a)

Help command:

msort

18. tp_first_pass

Description:

Runs molecules through the Lilly medchem rules, skip to next molecule upon crossing a threshold or instant kill rule

Author/owner: C3/Eli Lilly and Co

Sample 1:

tp_first_pass -C 20 -i smi -o smi -a -L bad0 -S ok0 pubchem_example.smi

Explanation:

Filter the input smile file (-i smi) pubchem_example.smi with maximum atom count 20 (-C 20). Write molecules to the smile file (-o smi) bad0.smi (-L bad0) if the molecule atom count is large than 20, otherwise write the molecules to the ok0.smi (-S ok0)

Help command:

tp_first_pass

19. mol2qry

Description:

Converts a molecule to a query file

Author/owner: C3/Eli Lilly and Co

Sample 1:

mol2qry -M 'C1=CC=CC=C1CCCC' -S out

Explanation:

Convert the molecule 'C1=CC=CC=C1CCCC' (-M 'C1=CC=CC=C1CCCC) to the output query file out.qry (-S out)

Help command:

mol2qry

20. molecular_scaffold

Description:

Chemical structure fragmentation tool to recursively cuts molecules into fragments

Author/owner: C3/Eli Lilly and Co

Sample 1:

molecular_scaffold -g all -t -c pubchem_example.smi

Explanation:

Recursively cuts molecules in pubchem_example.smi file into fragments with all standardistions (-g all), removing cis-trans bonds from input (-t) and all chirality from input molecules (-c)

Help command:

molecular_scaffold

21. molecule_subset

Description:

Extracts a subset of atoms from set of molecules based on a single substructure

Author/owner: C3/Eli Lilly and Co

Sample 1:

molecule_subset -s c1ccccc1 pubchem_example.smi

Explanation:

List a subset of molecules from pubchem_example.smi which contain the substructure c1ccccc1 (-s c1ccccc1)

Help command:

molecule_subset

22. rgroup

Description:

Identifies substituents from substructure matched molecule

Author/owner: C3/Eli Lilly and Co

Sample 1:

rgroup -s c1ccccc1 pubchem_example.smi

Explanation:

Identifies substituents for the molecules matching substructure c1ccccc1 (-s c1ccccc1)

Help command:

rgroup

23. ring_extraction

Description:

Extracts rings from molecules

Author/owner: C3/Eli Lilly and Co

Sample 1:

ring_extraction pubchem_example.smi

Explanation:

Extracts rings from molecules in the pubchem_example.smi

Help command:

ring_extraction

24. ring_trimming

Description:

Exhaustively trim rings from ring systems, preserving aromaticiy

Author/owner: C3/Eli Lilly and Co

Sample 1:

ring_trimming -u -c -w parent -w rings -w scaffold -m 1 -J 2 -j 1 pubchem_example.smi

Explanation:

Exhaustively trim rings from pubchem_example.smi, using only unique structures from each input molecule (-u), removing all chiral centres (-c), writing parent (-w parent), writing isoloated ring systems (-w ring), writing scaffold (-w scaffold), maximum number of rings to remove from a ring system is 1 (-m 1), 2 isotope for where the ring joins are broken (-J 2), 1 isotope for where scaffold joined the reset of the molecule.

Help command:

ring_trimming

25. sp3_filter

Description:

Filter molecules according to sp3 content

Author/owner: C3/Eli Lilly and Co

Sample 1:

sp3_filter -c 2 -x 2 -U out pubchem_example.smi

Explanation:

Filter molecules in the pubchem_examples.smi with minimum number of 2 Carbon sp3 atoms (-c 2) and minimum number of 2 non-Carbon sp3 atoms (-x 2). The rejected molecules are written into out.smi file (-U out)

Help command:

sp3_filter

26. tautomer_generation

Description:

Enumerate tautomeric forms for molecules

Author/owner: C3/Eli Lilly and Co

Sample 1:

tautomer_generation pubchem_example.smi

Explanation:

Enumerate tautomeric forms for molecules in the pubchem_example.smi

Help command:

tautomer_generation

27. smiles_mutation

Description:

Generate new smiles from a set of input molecules by random strong operations

Author/owner: C3/Eli Lilly and Co

Sample 1:

smiles_mutation -N 50000 -n 20 -p 5 -c 15 -C 40 pubchem_example_short.smi

Explanation:

Generate new smiles from the molecules in the pubchem_example_short.smi, running 50000 iterations (-N 50000), completing refresh from initial smiles every 20 iterations(-n 20), generating 5 random repliciates of each starting molecule (-p 5), minimum number of atom in generated molecules is 15 (-c 15), and maximum number of atom in generated moelcules is 40 (-C 40)

Help command:

smile_mutation

28. rxn_substructure_search

Description:

Search substructure over reactions

Author/owner: C3/Eli Lilly and Co

Sample 1:

rxn_substructure_search -q 'C(F)(F)F>>' -m found sample_reactions.rsmi

Explanation:

Search the substructure 'C(F)(F)F>>' (-q 'C(F)(F)F>>' in smaple_reactions.rsmi file, allowing to match anywhere in reagents/agents/products (-b), saving the matched result to found.rxnsmi file (-m found)

Help command:

rxn_substructure_search

29. activity_consistency

Description:

Group identical molecules (including isomers) with varying activity values

Author/owner: C3/Eli Lilly and Co

Sample 1:

activity_consistency -a -l -e 2 -X pubchem_in.act pubchem.smi

Explanation:

Group identical molecules in pubchem.smi using experimental data in pubchem_in.act (-X pubchem_in.act), using the activity data from the column 2 of pubchem_in.act file (-e 2), reducing to graph form (-a), reducing to largest fragment (-l)

Help command:

activity_consistency

30. descriptor_file_to_01_fingerprints

Description:

Converts an integer descriptor file to fingerprints, either of type fixed 0/1 or non colliding counted

Author/owner: C3/Eli Lilly and Co

Sample 1:

descriptor_file_to_01_fingerprints -F NCFP -S pubchem.smi cleaned_descriptor.txt

Explanation:

Create the fingerprints for the smiles in the pubchem.smi (-S pubchem.smi), using descriptors in the cleaned_descriptor.txt and tag string NCFP (-F NCFP)

Help command:

descriptor_file_to_01_fingerprints

31. descriptors_to_fingerprint

Description:

Converts a descriptors to a sparse fingerprint

Author/owner: C3/Eli Lilly and Co

Sample 1:

descriptors_to_fingerprint -S pubchem.smi -D w_natoms:10,40,1 -D w_nelem:1,8,1 pubchem.w

Explanation:

Compute fingerprint for the smiles in the pubchem.smi (-S pubchem.smi), using descriptor corresponding to number of heavy atoms w_natoms (-D w_natoms:10,40,1) and descriptor corresponding to number of elements w_nelem (-D w_nelem:1,8,1) in the pubchem.w file tabular descriptor file. The minimum value for the w_natoms descriptor is -1, the maximum value is 40, and the incremental unit between the minimum and maximum is 1. The minimum value for the w_nelem descriptor is 1, the maximum value is 8, and the incremental unit between the minimum and maximum is 1.

Help command:

descriptors_to_fingerprints

32. gfp_distance_matrix

Description:

Computes the distance matrix for a pool of gfp fingerprints; generates a human-readable ascii matrix

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_distance_matrix pubchem.gfp

Explanation:

Compute the distance matrix for the gfp fingerprint pubchem.gfp

Help command:

gfp_distance_matrix

33. gfp_leader_v2

Description:

Performs clustering with leader (sphere exclusion) algorithm on gfp descriptors

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_leader_v2 -t 0.3 pubchem.gfp

Explanation:

Compute clustering with leader algorithm on pubchem.gfp with distance threshold 0.3 (-t 0.3)

Help command:

gfp_leader_v2

34. gfp_nearneighbours

Description:

Finds near neighbours in a set of fingerprints

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_nearneighbours -p pubchem.gfp -n 2 -T 0.5 pubchem_10.gfp

Explanation:

Find near neighbours for the fingerprints in pubchem_10.gfp; compare against fingerprints in haystack fingerprint set pubchem.gfp (-p pubchem.gfp), search for 2 neighbours for each descriptor (-n 2) and discard distance greater than 0.5 (-T 0.5)

Help command:

gfp_nearneighbours

35. gfp_single_linkage

Description:

Finds the single linkage in the gfp fingerpint set

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_single_linkage -t 0.25 pubchem.gfp

Explanation:

Compute clustering with single linkage algorithm on pubchem.gfp with distance 0.25 (-t 0.25) as the threshold value for grouping

Help command:

gfp_single_linkage

36. gfp_sparse_to_fixed

Description:

Converts non-colliding fingerprints to fixed counted, or binary forms

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_sparse_to_fixed -F NCSELW pubchem.gfp

Explanation:

Convert non-colliding fingerprints NCSELW (-F NCSELW) in pubchem.gfp to binary form

Help command:

gfp_sparse_to_fixed

37. gfp_distance_filter

Description:

Filters molecules according to how close they are to members of a comparison pool

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_distance_filter -p pubchem_10.gfp -t 0.7 -n 3 -U U1 pubchem.gfp

Explanation:

Compare fingerprints in pubchem.gfp to pubchem_10.gfp. The minimum required distance between molecules is 0.7 (-t 0.7). Any molecules in pubchem.gfp will be rejected if it violates the minimum distance requirement at least 3 times (-n 3). The rejected molecule will be saved into U1 file (-U U1)

Help command:

gfp_distance_filter

38. gfp_pairwise_distances

Description:

Calculates pairwise distance between molecules

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_pairwise_distances -p pubchem.gfp -T 0.5 100PairsId

Explanation:

Calculate the pairwise distance between the molecules in pubchem.gfp (-p pubchem.gfp). The distance will not be reported if it is larger than the threshold value 0.5 (-T 0.5). The required pairs for calculation are listed in the 100PairsId file.

Help command:

gfp_pairwise_distances

39. gfp_to_descriptors

Description:

Converts fingerprints to descriptors

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_to_descriptors -f -F FPDSC pubchem.gfp

Explanation:

Convert the fixed width fingerprint (-f) pubchem.gfp into descriptor format; use fingerprint type FPDSC (-F FPDSC) in fingerprint file.

Help command:

gfp_to_descriptors

40. gfp_nearneighbours_single_file

Description:

Finds near neighbours within a set of fingerprints

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_nearneighbours_single_file -p -z -T 0.2 pubchem.gfp FPDSC,w=0.2 -F NCSELW,nc,w=0.8 -V a=0.03 -V b=1.7

Explanation:

Finds near neighbours in the pubchem.gfp file. Writes all pair-wise distances in 3 columns (-p); discards any molecule without neighbours (-z) or distances greater than 0.2. Assign weight 0.2 to fingerprint type FPDSC 0.2 (-F FPDSC,w=0.2), and weight 0.8 to non-colliding fingerprint with tag NCSELW (-F NCSELW,nc,w=0.8) for distance calculation. Use Tversky asymmetric similarity with parameters a set to 0.03 (-V a=0.03) and b set to 1.7 (-V b=1.7)

Help command:

gfp_nearneighbours_single_file

41. gfp_lnearneighbours

Description:

Finds near neighbours of compounds supplied in gfp fingerprint format; can handle LARGE numbers of compounds

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_lnearneighbours -F FPDSC,w=0.3 -F NCSELW,nc,w=0.7 -T 0.4 -n 2 -h -p pubchem_needles.gfp pubchem_haystack.gfp

Explanation:

Finds 2 near neighbours (-n 2) for each fingerprint in the needle file pubchem_needles.gfp against haystack file pubchem_haystack.gfp (-p). Discards neighbours with zero distance and the same ID as the target (-h) or with a distance larger than 0.4 (-T). Fingerprint of type FPDSC weighs 0.3 (-F FPDSC,w=0.3), and non-colliding fingerprint with tag NCSELW weighs 0.7 in the distance calculation(-F NCSELW,nc,w=0.7)

Help command:

gfp_lnearneighbours

42. gfp_add_descriptors

Description:

Adds descriptors to the gfp file with matching ID

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_add_descriptors -D FPADD pubchem.gfp cleaned_descriptor.txt

Explanation:

Add the descriptor cleaned_descriptor.txt to the gfp file pubchem.gfp; use tag FPADD (-D FPADD)

Help command:

gfp_add_descriptors

43. gfp_profile_activity_by_bits

Description:

Scans a fingerprint file and computes average activity associated with each bit; an activity file needs to be provided

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_profile_activity_by_bits -E activityFile.txt pubchem.gfp

Explanation:

Compute the average activity from file activityFile.txt (-E activityFile.txt) for fingerprints in the pubchem.gfp.

Help command:

gfp_profile_activity_by_bits

44. gfp_spread_v2

Description:

Calculates the spread distance of fingerprints against target fingerprint set. Sort output by spread distance.

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_spread_v2 -A pubchem_10.gfp pubchem.gfp

Explanation:

Compute the spread distance of fingerprints in pubchem.gfp. Bias away from fingerprints in pubchem_10.gfp (-A pubchem_10.gfp)

Help command:

gfp_spread_v2

45. gfp_spread_buckets_v2

Description:

Calculates the spread distance of fingerprints against target fingerprint set while considering a bucketized variable like activity/property

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_spread_buckets_v2 -B NATOMS pubchem_with_natom.gfp

Explanation:

Compute the spread distance of the fingerprints in pubchem_with_natom.gfp while considering the variable NATOMS (-B NATOMS) also included in the fingerprint file.

Help command:

gfp_spread_buckets_v2

46. nplotnn

Description:

Processes the output of several gfp nearneighbour-based tools into human readable form including SMILES format

Author/owner: C3/Eli Lilly and Co

Sample 1:

nplotnn -L def leader_result_raw.txt

Explanation:

Reformat the output of leader clustering to tabular (-L tbl)

Help command:

nplotnn

47. tdt_sort

Description:

Sorts fields of a TDT file or stream according to specific tag, properties, or the degree of each node

Author/owner: C3/Eli Lilly and Co

Sample 1:

tdt_sort -T FPADD,col=4 -r unsorted.gfp

Explanation:

Sort the file unsorted.gfp in reverse order (-r) based on the 4th column of FPADD field (-T FPADD,col=4)

Help command:

tdt_sort

48. tdt_join

Description:

Joins two TDT streams, possible with different identifiers

Author/owner: C3/Eli Lilly and Co

Sample 1:

tdt_join.sh -d part1.gfp part2.gfp

Explanation:

Join fingerprint file part1.gfp with fingerprint file part2.gfp and eliminate duplicate tags from second file (-d)

Help command:

tdt_join

49. dbf

Description:

Computes topological and/or 3D distances between features

Author/owner: C3/Eli Lilly and Co

Sample 1:

dbf input.sdf

Explanation:

Compute the distances between the features in the input.sdf file

Help command:

dbf

50. dicer

Description:

Structure fragmentation tool which can optionally use a query fle

Author/owner: C3/Eli Lilly and Co

Sample 1:

dicer -k 2 -C auto -s 'ClC' -s 'BrC' -s 'FC' -B bscb -m 0 -M 15 -X 5000 -i smi -A I -A D input.smi

Explanation:

Break a maximum of 2 bonds at a time (-k 2), write smiles and complementary smiles with auto label on break points (-C auto), define SMARTS for cut points (-s '...'), allow C-C single bonds to break (-B bscb), keep fragments between 0 (-m 0) and 15 (-M 15) atoms, produce no more than 5000 fragments per molecule (-X 5000); input is provided in smiles format (-i smi); enable input of aromatic structures (-A I) with Daylight aromaticity definitions (-A D).

Help command:

dicer

51. hydrophobic_sections

Description:

Hydrophobic section descriptors

Author/owner: C3/Eli Lilly and Co

Sample 1:

hydrophobic_sections input.smi

Explanation:

Compute the hydrophobic section descriptors for the smiles in the input.smi file

Help command:

hydrophobic_sections

52. iwdescr

calculate 200+ descriptors for structures in an input SMILES file

Author/owner: C3/Eli Lilly and Co

Sample 1:

iwdescr input.smi

Explanation:

Generate a large set of descriptors for compounds in input.smi file. Results written to stdout.

Help command:

iwdescr

53. iwecfp

Extended Connectivity fingerprints for structures in an input SMILES files

Author/owner: C3/Eli Lilly and Co

Sample 1:

iwecfp -r 2 -R 6 input.smi

Explanation:

Compute EC fingerprint with a shell width 2-6 (-r 2; -R 6); Results written to stdout.

Help command:

iwecfp

54. iwfp

Compute hashed path-based fingerprints for structures in an input SMILES file

Author/owner: C3/Eli Lilly and Co

Sample 1:

iwfp test.smi

Explanation:

Produces hashed path-based fingerprints of compounds for the smiles in the test.smi file. Results written to stdout.

Help command:

iwfp

55. jwsadb

Calculate charge surface area related descriptors for structures in input SMILES file.

Author/owner: C3/Eli Lilly and Co

Sample 1:

jwsadb input.smi

Explanation:

Calculate charge surface area related descriptors for the smiles in the input.smi file. Results written to stdout.

Sample 2:

jwsadb -b -F contrib/data/AtomicPhysChemParameter/wildman_crippen.dat test.sdf

Explanation:

Calculate CoMMA related descriptors for the smiles in the test.sdf file using the ghose-crippen parameters in wildman_crippen.dat(-F contrib/data/AtomicPhysChemParameter/wildman_crippen.dat). Results written to stdout.

Help command:

jwsadb

56. maccskeys

Generate MACCS keys for structures in an input SMILES file.

Author/owner: C3/Eli Lilly and Co

Sample 1:

maccskeys -A D input.smi

Explanation:

Generate MACCS keys for the smiles in the input.smi file; use Daylight aromaticity definitions (-A D). Results written to stdout.

Help command:

maccskeys

57. make_these_molecules

Enumerate structures from a reaction template, labels and starting materials. Uses a rxn file, a reagents ID file and two reagent SMILES files.

Author/owner: C3/Eli Lilly and Co

Sample 1:

make_these_molecules -E autocreate -A D -z i -z f -l -R amideFormation.rxn -M products.lbl -S out.smi amines.smi acids.smi

Explanation:

Generate a compound for each reagent pair in each row of file products.lbl (-M products.lbl), using reaction file amideFormation.rxn (-R amideFormation.rxn) and reagent structures in the provided smiles files amines.smi and acids.smi (order is important); use Daylight aromaticity definitions (-A D); take first of any multiple matches in sidechains(-z f), ignore no match errors in sidechains(-z i), reduce to largest fragment (-l). Results are written to file out.smi (-S).

Sample 2:

make_these_molecules -E autocreate -A D -z i -z f -l -R amideFormation.rxn -R amideFormation.rxn -M products.lbl amines.smi scaffold_bb.smi acids.smi

Explanation:

Generate a compound from three reagent components in each row of file products.lbl (-M products.lbl); using reaction file amideFormation.rxn (-R amideFormation.rxn) for reactions both between the first and second component and between the second and third component; and reagent structures in the provided smiles files amines.smi, scaffold_bb.smi and acids.smi (order is important). Use Daylight aromaticity definitions (-A D). Take first of any multiple matches in sidechains(-z f), ignore no match errors in sidechains(-z i) and reduce to largest fragment (-l)

Help command:

make_these_molecules

58. molecular_abstraction

Performs molecular transformations associated with various molecular abstractions

Author/owner: C3/Eli Lilly and Co

Sample 1:

molecular_abstraction -c -a 'allbonds(single).allatoms(N WRITE)' input.smi

Explanation:

Remove all chirality from input molecules (-c); change all bonds to single bonds and all heavy atoms to N (-a). Results written to stdout.

Help command:

molecular_abstraction

Converts an MDL query file with specified RGroups to a LillyMol query file and/or a Smarts-like file format

Author/owner: C3/Eli Lilly and Co

Sample 1:

remove_and_label -R nonorganic=0 -p -e 1 -i ignore_bad_m -E anylength -E autocreate -Q outQ1.qry -J outQ2.smt input.mdl

Explanation:

Converts the input MDL query file input.mdl to LillyMol query file outQ1.qry (-Q) and smarts-like outQ2.smt (-J); remove non organic elements (-R nonorganic=0); preserve ring membership of attachment points (-p); allow one extra connection at each substitution point (-e 1). Bad molecules are ignored (-i ignore_bad_m). Elements can be any length (-E anylength); automatically creater new element when it is encountered (-E autocreate). Help command:

remove_and_label

63. remove_matched_atoms

Strips fragments/matched atoms from molecules in input SMILES file; Can be used in place of a simple de-protection reaction.

Author/owner: C3/Eli Lilly and Co

Sample 1:

remove_matched_atoms -s 'CC1CCCNC1' -S stripped_matches input.smi

Explanation:

Remove matched atoms 'CC1CCCNC1' (-s) from the structures in input.smi file. Write results in file stripped_matches.smi (-S).

Help command:

remove_matched_atoms

64. ring_fingerprint

Compute fingerprints of structures in input SMILES file based on ring properties - primarily aromatic rings.

Author/owner: C3/Eli Lilly and Co

Sample 1:

ring_fingerprint input.smi

Explanation:

Compute the fingerprints for the smiles in input.smi file

Computes 'temperature' descriptors

Author/owner: C3/Eli Lilly and Co

Sample 1:

temperature -a input.smi

Explanation: Computes 'temperature' descriptors for compounds in in file input.smi; output in descriptor file format (-a). Results written to stdout.

Help command:

Description:

Naive Bayes predictive model training and prediction

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_naive_bayesian -t 50 -f train -c 2 -F FPIW -C 2/50 -A train.act train.gfp

Explanation:

Build a Naive Bayes model on compounds using the FPIW tag (-F FPIW) of the fingerprints file train.gfp and activity data in the second column (-c 2) of activity file train.act (-f train -A train.act). The cutoff value for active and inactive molecules is 50 (-t 50). Run 2 cross validations and using 50% of record as training (-C 2/50). Results written to stdout.

Help command:

gfp_naive_bayesian

Release 5

91. abraham

Description:

Compute Abraham descriptors

Author/owner: C3/Eli Lilly and Co

Sample 1:

abraham -l -F $LILLYMOL_HOME/contrib/data/queries/abraham/Abraham -P $LILLYMOL_HOME/contrib/data/queries/abraham/Alpha2H -g all pubchem_example.smi

Explanation:

Compute the Abraham descriptors for SMILES in pubchem_example.smi file using control file Abraham (-F $LILLYMOL_HOME/contrib/data/queries/abraham/Abraham) and alpha2H queries Alpha2H (-P $LILLYMOL_HOME/contrib/data/queries/abraham/Alpha2H). Reduce to largest fragment (-l) and using all chemical standardization.

Help command:

abraham

92. distance_matrix_spread

Description:

Spread implementation, distances from a distance matrix file

Author/owner: C3/Eli Lilly and Co

Sample 1:

distance_matrix_spread -n 20 in_matrix.dat

Explanation:

Calculate distance spread for 20 items (-n) in the in_matrix.dat file.

Help command:

distance_matrix_spread

93. distance_matrix_from_distances

Description:

Produce a distance matrix file using distance data from file

Author/owner: C3/Eli Lilly and Co

Sample 1:

distance_matrix_from_distances -t sqh -S output.dat input.txt

Explanation:

Generate distance matrix file output.dat (-S output.dat) in the format of square data with a header record (-t sqh) using distance data from input.txt

Help command:

distance_matrix_from_distances

94. iwecfp_intermolecular

Description:

Compute protein-ligand interaction fingerprints. Three main types of fingerprints have been implemented: ecfp, path and atom pairs. For ecfp fingerprints, bits represents a circular substructure from the ligand and a circular substructure from the protein (both central atoms from the ligand and the protein must be within a given distance threshold). It is possible to change the radius shell (on both ligand and protein), the distance cut-off between ligand and protein atoms, limit the number of atoms in the protein substructure, to atom types, the atom relationship (atom pairs and atom paths)...

Author/owner: C3/Eli Lilly and Co

Sample 1:

iwecfp_intermolecular -J NCECB3 -T 3 -R 3 -k in/protein.sdf in/ligand.sdf

Explanation:

Compute the extended connectivity fingerprints for protein in protein.sdf and ligand.sdf file. Use the NCECB3 as the tag for the generated fingerprint(-J NCECB3). Make the intermolecular connections when the distance is less than 3 (-T 3) and maximum shell radius is 3 (-R 3). All files used in the example were prepared using Schrodinger Protein Preparation Wizard tool.

Help command:

iwecfp_intermolecular

95. tshadow

Description:

Compute shadow area for molecule

Author/owner: C3/Eli Lilly and Co

Sample 1:

tshadow -L -G in/in.sdf

Explanation:

Compute the shadow area for the molecules in the in.sdf file. Rotate to longest distance between atoms and compute radius of gyration

Help command:

tshadow