-
Notifications
You must be signed in to change notification settings - Fork 32
Instruction
Description:
Merge identical chemical structures to one common name in a SMILES file (also see unique_molecules tool). Useful for identifying unique chemical structures in a SMILES file.
Author/owner: C3/Eli Lilly and Co
Sample 1:
common_names -S ./output -s 10000 -r 10000 -D + -v input1.smi input2.smi
Explanation:
Find all compounds in input1.smi and input2.smi with common structure and different name; write them to file output.smi with new name consisting of old names separated by a "+" symbol; maximum number of molecules to process is 10000 (-s 10000); report progress every 10000 rows (-r 10000)
Shell output:
… output to './output'
File output: (output.smi)
Combined compound list
Help command:
common_names
Description:
Fetches records from one file based on identifiers in another file
Author/Owner: C3/Eli Lilly and Co
Sample 1:
fetch_smiles_quick -j -c 1 -C 2 -X notInRecord -Y notInIdentifier record.w structures.smi
Explanation:
Fetches records from record.w file to identifier.smi file based on common identifiers. The matched records (column 1 in record.w and column 2 in structures.smi) will be displayed in the shell window. The list of unmatched identifiers will be saved in the notInRecord file. The list of unmatched records will be saved in the notInIdentifier file. The generated identifier file is a descriptor file without header record(-j).
Shell output: (matched record)
O=C(C)C=CC=C(C)CCC=C(C)C PBCHM1756999 24 3 6 349.4 5 0.2083 10
File output: (notInIdentifier)
Unmatched records
File output: (notInRecord)
Unmatched identifiers
Help command:
fetch_smiles_quick
Description:
Filters out duplicate chemical structures based on unique smiles
Author/owner: C3/Eli Lilly and Co
Sample 1:
unique_molecules -S unique -D duplicate -v -l input.smi
Explanation:
Traverse structures in input.smi and identify duplicate structures; write duplicates in duplicate.smi; write unique in unique.smi; only consider largest fragment of each smiles (-l)
Shell output:
Execution summary
File output: (duplicate.smi)
Duplicate molecule list
File output: (unique.smi)
Unique molecule list
Help command:
unique_molecules
Description:
Identifies the unique rows in a file
Author/Owner: C3/Eli Lilly and Co
Sample 1:
unique_rows -c 1 -c 2 input.dat
Explanation:
Check input.dat file for unique rows based on values in column 1 and 2. The unique rows will be displayed in the shell window
Shell output:
Unique row list
Help command:
unique_rows
Description:
Extract columns from a text file
Author/Owner: C3/Eli Lilly and Co
Sample 1: iwcut -f 5,3 descriptor.txt
Explanation:
Extract column 5 and column 3 from descriptor.txt file. The extracted columns data will be displayed in the shell window
Shell output:
Data from column 5 and column 3
Help command:
iwcut
Description:
Structure file utility to clean up SMILES files and filter on specific criteria. It can also be used to convert between chemical file formats including, e.g from SDF to SMILES
Author/owner: C3/Eli Lilly and Co
Sample 1:
fileconv -Y dbg -B 100 -S -a input.smi
Explanation:
Debug/print each molecule structure in input.smi; ignore as many as 100 fatal input errors
Shell output:
Molecule information
Sample 2:
fileconv -F 6 -c 4 -C 14 -v -i smi -S selection list.smi
Explanation:
Select the molecules that have number of atoms ranging from 4-14 and less than 6 fragments from list.smi file; store results in file with selection.smi
Shell output:
Execution summary
File output: (selection.smi)
List of molecules meeting the search criteria
Sample 3:
fileconv -o sdf -i smi -S single single.smi
Explanation:
Convert single.smi file to the sdf format single.sdf
File output: (single.sdf)
Converted sdf file
Help command:
fileconv
Description:
Generates reaction signatures for input reactions.
Author/Owner: C3/Eli Lilly and Co
Sample 1:
rxn_signature -v -r 0,1,2 -C Cfile -F Ffile all.rsmi >all.sig 2>all.log
Explanation:
Extract the reaction signatures of all reactions in all.rsmi. Store signatures in all.sig – program prints to stdout. The signature radius from the reaction core (i.e. the changing atoms) to the signature is 0 1 2. The list of changed atoms are written to Cfile. Failed reactions are written to Ffile.
Notes:
Reaction signatures capture the extended core of a reaction around the atoms that change in a reaction. A signature is based on the unique smiles of the reaction core. The smiles includes atoms colored by their environment in the original reaction smiles. In addition, information about the ring bond status in the original reaction smiles is appended to the reaction signature produced.
Help command:
rxn_signature
Description:
Checks and standardizes input chemical reactions; converts to a reaction smiles file format
Author/Owner: C3/Eli Lilly and Co
Sample 1:
rxn_standardize -s -c -D x -X igbad -v -C 60 -K -E autocreate -e -o -b -f gsub input.rsmi > output.rsmi
Explanation:
Check and standardize reactions in an input reactions smiles (.rsmi) file. Discard chirality on input (-c). Discard reactions containing duplicate atom map numbers (-D x). Ignore bad reactions (-X igbad). Discard any reaction where the largest reactant has more than 60 atoms (-C 60). Kekule fix (-K). Automatically create new elements when encountered (-E autocreate). Move small fragments that show up on products to orphan status (-e). Create reagent fragments that are orphans (-o). Remove duplicate reactants, even if atom maps scrambled (-b). Replace unusual characters in reaction names with _ (-f gsub).
Notes:
Input file can be in RDF or rsmi format. Output is in rsmi format
Help command:
rxn_standardize
Description:
Perform 2D substructure searches with SMILES/SMARTS against SMILES files
Author/owner: C3/Eli Lilly and Co
Sample 1:
tsubstructure -s 'C(C)(=O)C' -m hits.smi -n nonhits.smi list.smi
Explanation:
Search for molecules in list.smi containing defined smarts (-s); write hits in hits.smi (-m) and nonhits in nonhits.smi(-n)
Sample 2:
tsubstructure.sh -f -b -A D -o smi -m hits.smi -s 'C(C)(=O)C' list.smi
Explanation:
Search for molecules containing defined smarts (-s); only find one embedding of the query (-f); for each molecule, break after finding a query which matches (-b); use daylight aromaticity (-A D); write hits in hits.smi
Note:
Use -X to successfully skip structures with unconventional symbols, e.g. X, R, ...
Sample 3:
tsubstructure -s '[ND1H2]-[C@H]1CCN2CCCCC2C1' -A D -o usmi -m match.out list.smi
Explanation:
Search for molecules containing defined smarts (-s)
Sample 4:
tsubstructure -A D -q carboxylic_acids.qry -u -M imp2exp -m match.smi list.smi
Explanation:
Find all matches to specific query file (-q) and place in match.smi (-m); use Daylight aromaticity; convert implicit hydrogen in target molecules to explicit before matching attempt (-M imp2exp); find unique matches only (-u)
Help command:
tsubstructure
Description:
Defines synthetic routes for input chemical structures by deconstructing input molecules into reactants using a set of known reactions templates. Conceptually, the inverse process of chemical reaction synthesis as implemented by tool trxn.
Author/Owner: C3/Eli Lilly and Co
Sample 1:
retrosynthesis -Y all -X kg -X kekule -X ersfrm -a 2 -q f -v -R 1 -I CentroidRxnSmi_1 -P UST:AZUCORS -M ncon -M ring -M unsat -M arom 10Cmpds.smi
Explanation:
Looks for synthesis paths for the molecules in 10Cmpds.smi using the reaction signatures in CentroidRxnSmi_1. Various standardization flags (-Y, -X, -q, -P, -M options). Require at least 2 heavy atoms in fragments (-a), verbose (-v), centroid radius 1 (-R).
Help command:
retrosynthesis
Description:
Performs reactions between reactant molecules to enumerate product structures. Uses a control reaction file, a scaffold SMILES file and zero or more reactant SMILES files. Conceptually inverse of retrosynthesis process as implemented by tool retrosynthesis.
Author/owner: C3/Eli Lilly and Co
Sample 1:
trxn -v -r 1.2.1_Aldehyde_reductive_amination_FROM_amines_AND_aldehydes.rxn -Z -z i -M RMX -m RMX -S 1.2.1_run 20180412_amines.smi 20180412_aldehydes.smi
Explanation:
Perform reaction in 1.2.1_Aldehyde_reductive_amination_FROM_amines_AND_aldehydes.rxn ignoring sidechains (-Z) and modules (-z i) not reacting, ignoring sidechains with multiple substructure match (-M RMX), ignoring scaffolds that generate multiple structure hits(-m RMX). Output file is saved to 1.2.1_run 20180412_amines.smi 20180412_aldehydes.
Sample 2:
trxn -v –r 2.1.2_Carboxylic_acid_+_amine_condensation_FROM_amines_AND_carboxylic_acids.rxn -Z -z i -M RMX -m RMX -S 2.1.2_run 20180412_amines.smi 20180412_carboxylic_acids.smixbntr
Explanation:
Perform reaction in ./2.1.2_Carboxylic_acid_+_amine_condensation_FROM_amines_AND_carboxylic_acids.rxn
Help command:
trxn
Description:
Computes demerit of a molecule. In this context demerits refers to non-desirable molecular structure characteristics/features.
Author/owner: C3/Eli Lilly and Co
Sample 1:
iwdemerit -A D -A I -S foo -G - -f 99999 -t -W imp2exp -W maxe=1 -E autocreate -q F:PAINS/queries_latest -O hard -W dnv=0 -W slist -i smi pubchem_example.smi
Explanation:
Compute the demerits for the molecules in pubchem_example.smi (-i smi pubchem_example.smi) using the queries_latest query file (-q F:PAINS/queries_latest). The good, non-rejected (-G) structures will be written into foo.demerit (-S foo). Use Daylight aromaticity definitions (-A D) and enable input of aromatic structures (-A I). Molecules are rejected when they have 9999 or higher demerits (-f 9999). Append demerit text to molecule names (-t). Make implicit hydrogen explicit (-W imp2exp), maximum number of substructure queries to identity is 1 (-W maxe=1), use value 0 in the query file as the demerit score (-W dnv=0) and write a sorted list of demerit values and reasons (-W slist). Skip all the hard coded substructure queries (-O hard).
Help command:
iwdemerit
Description:
Generate random smiles based on input smiles.
Author/owner: C3/Eli Lilly and Co
Sample 1:
random_smiles.sh -n 5 -a -e -A D -v pubchem_example.smi
Explanation:
Generate 5 new random smiles (-n 5) based on the smiles in pubchem_examples.smi using Daylight aromaticity(-A D). Append permutation number to name (-a) and echo initial molecule (-e)
Help command:
random_smiles
Description:
Write either unique smiles (if interpretable) or non aromatic unique form.
Author/owner: C3/Eli Lilly and Co
Sample 1:
preferred_smiles.sh pubchem_example.smi
Explanation:
Write unique smiles for the smiles in the pubchem_example.smi
Help command:
preferred_smiles
Description:
Calculate the rotatable bonds in the modules
Author/owner: C3/Eli Lilly and Co
Sample 1:
rotatable_bonds.sh pubchem_example.smi
Explanation:
Calculate the rotatable bonds for molecules in the pubchem_example.smi
Help command:
rotatable_bonds
Description:
Concatenates descriptor files by joining on identifiers
Author/owner: C3/Eli Lilly and Co
Sample 1:
concat_files t1.1 t1.2
Explanation:
Concatenates the descriptor files t1.1 and t1.2 based on the identifier
Help command:
concat_files
Description:
Sorts a molecule file by various criteria
Author/owner: C3/Eli Lilly and Co
Sample 1:
msort -a pubchem_example.smi
Explanation:
Sort the molecule file pubchem_example.smi based on the number of atoms the molecule has (-a)
Help command:
msort
Description:
Runs molecules through the Lilly medchem rules, skip to next molecule upon crossing a threshold or instant kill rule
Author/owner: C3/Eli Lilly and Co
Sample 1:
tp_first_pass -C 20 -i smi -o smi -a -L bad0 -S ok0 pubchem_example.smi
Explanation:
Filter the input smile file (-i smi) pubchem_example.smi with maximum atom count 20 (-C 20). Write molecules to the smile file (-o smi) bad0.smi (-L bad0) if the molecule atom count is large than 20, otherwise write the molecules to the ok0.smi (-S ok0)
Help command:
tp_first_pass
Description:
Converts a molecule to a query file
Author/owner: C3/Eli Lilly and Co
Sample 1:
mol2qry -M 'C1=CC=CC=C1CCCC' -S out
Explanation:
Convert the molecule 'C1=CC=CC=C1CCCC' (-M 'C1=CC=CC=C1CCCC) to the output query file out.qry (-S out)
Help command:
mol2qry
Description:
Chemical structure fragmentation tool to recursively cuts molecules into fragments
Author/owner: C3/Eli Lilly and Co
Sample 1:
molecular_scaffold -g all -t -c pubchem_example.smi
Explanation:
Recursively cuts molecules in pubchem_example.smi file into fragments with all standardistions (-g all), removing cis-trans bonds from input (-t) and all chirality from input molecules (-c)
Help command:
molecular_scaffold
Description:
Extracts a subset of atoms from set of molecules based on a single substructure
Author/owner: C3/Eli Lilly and Co
Sample 1:
molecule_subset -s c1ccccc1 pubchem_example.smi
Explanation:
List a subset of molecules from pubchem_example.smi which contain the substructure c1ccccc1 (-s c1ccccc1)
Help command:
molecule_subset
Description:
Identifies substituents from substructure matched molecule
Author/owner: C3/Eli Lilly and Co
Sample 1:
rgroup -s c1ccccc1 pubchem_example.smi
Explanation:
Identifies substituents for the molecules matching substructure c1ccccc1 (-s c1ccccc1)
Help command:
rgroup
Description:
Extracts rings from molecules
Author/owner: C3/Eli Lilly and Co
Sample 1:
ring_extraction pubchem_example.smi
Explanation:
Extracts rings from molecules in the pubchem_example.smi
Help command:
ring_extraction
Description:
Exhaustively trim rings from ring systems, preserving aromaticiy
Author/owner: C3/Eli Lilly and Co
Sample 1:
ring_trimming -u -c -w parent -w rings -w scaffold -m 1 -J 2 -j 1 pubchem_example.smi
Explanation:
Exhaustively trim rings from pubchem_example.smi, using only unique structures from each input molecule (-u), removing all chiral centres (-c), writing parent (-w parent), writing isoloated ring systems (-w ring), writing scaffold (-w scaffold), maximum number of rings to remove from a ring system is 1 (-m 1), 2 isotope for where the ring joins are broken (-J 2), 1 isotope for where scaffold joined the reset of the molecule.
Help command:
ring_trimming
Description:
Filter molecules according to sp3 content
Author/owner: C3/Eli Lilly and Co
Sample 1:
sp3_filter -c 2 -x 2 -U out pubchem_example.smi
Explanation:
Filter molecules in the pubchem_examples.smi with minimum number of 2 Carbon sp3 atoms (-c 2) and minimum number of 2 non-Carbon sp3 atoms (-x 2). The rejected molecules are written into out.smi file (-U out)
Help command:
sp3_filter
Description:
Enumerate tautomeric forms for molecules
Author/owner: C3/Eli Lilly and Co
Sample 1:
tautomer_generation pubchem_example.smi
Explanation:
Enumerate tautomeric forms for molecules in the pubchem_example.smi
Help command:
tautomer_generation
Description:
Generate new smiles from a set of input molecules by random strong operations
Author/owner: C3/Eli Lilly and Co
Sample 1:
smiles_mutation -N 50000 -n 20 -p 5 -c 15 -C 40 pubchem_example_short.smi
Explanation:
Generate new smiles from the molecules in the pubchem_example_short.smi, running 50000 iterations (-N 50000), completing refresh from initial smiles every 20 iterations(-n 20), generating 5 random repliciates of each starting molecule (-p 5), minimum number of atom in generated molecules is 15 (-c 15), and maximum number of atom in generated moelcules is 40 (-C 40)
Help command:
smile_mutation
Description:
Search substructure over reactions
Author/owner: C3/Eli Lilly and Co
Sample 1:
rxn_substructure_search -q 'C(F)(F)F>>' -m found sample_reactions.rsmi
Explanation:
Search the substructure 'C(F)(F)F>>' (-q 'C(F)(F)F>>' in smaple_reactions.rsmi file, allowing to match anywhere in reagents/agents/products (-b), saving the matched result to found.rxnsmi file (-m found)
Help command:
rxn_substructure_search
Description:
Group identical molecules (including isomers) with varying activity values
Author/owner: C3/Eli Lilly and Co
Sample 1:
activity_consistency -a -l -e 2 -X pubchem_in.act pubchem.smi
Explanation:
Group identical molecules in pubchem.smi using experimental data in pubchem_in.act (-X pubchem_in.act), using the activity data from the column 2 of pubchem_in.act file (-e 2), reducing to graph form (-a), reducing to largest fragment (-l)
Help command:
activity_consistency
Description:
Converts an integer descriptor file to fingerprints, either of type fixed 0/1 or non colliding counted
Author/owner: C3/Eli Lilly and Co
Sample 1:
descriptor_file_to_01_fingerprints -F NCFP -S pubchem.smi cleaned_descriptor.txt
Explanation:
Create the fingerprints for the smiles in the pubchem.smi (-S pubchem.smi), using descriptors in the cleaned_descriptor.txt and tag string NCFP (-F NCFP)
Help command:
descriptor_file_to_01_fingerprints
Description:
Converts a descriptors to a sparse fingerprint
Author/owner: C3/Eli Lilly and Co
Sample 1:
descriptors_to_fingerprint -S pubchem.smi -D w_natoms:10,40,1 -D w_nelem:1,8,1 pubchem.w
Explanation:
Compute fingerprint for the smiles in the pubchem.smi (-S pubchem.smi), using descriptor corresponding to number of heavy atoms w_natoms (-D w_natoms:10,40,1) and descriptor corresponding to number of elements w_nelem (-D w_nelem:1,8,1) in the pubchem.w file tabular descriptor file. The minimum value for the w_natoms descriptor is -1, the maximum value is 40, and the incremental unit between the minimum and maximum is 1. The minimum value for the w_nelem descriptor is 1, the maximum value is 8, and the incremental unit between the minimum and maximum is 1.
Help command:
descriptors_to_fingerprints
Description:
Computes the distance matrix for a pool of gfp fingerprints; generates a human-readable ascii matrix
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_distance_matrix pubchem.gfp
Explanation:
Compute the distance matrix for the gfp fingerprint pubchem.gfp
Help command:
gfp_distance_matrix
Description:
Performs clustering with leader (sphere exclusion) algorithm on gfp descriptors
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_leader_v2 -t 0.3 pubchem.gfp
Explanation:
Compute clustering with leader algorithm on pubchem.gfp with distance threshold 0.3 (-t 0.3)
Help command:
gfp_leader_v2
Description:
Finds near neighbours in a set of fingerprints
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_nearneighbours -p pubchem.gfp -n 2 -T 0.5 pubchem_10.gfp
Explanation:
Find near neighbours for the fingerprints in pubchem_10.gfp; compare against fingerprints in haystack fingerprint set pubchem.gfp (-p pubchem.gfp), search for 2 neighbours for each descriptor (-n 2) and discard distance greater than 0.5 (-T 0.5)
Help command:
gfp_nearneighbours
Description:
Finds the single linkage in the gfp fingerpint set
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_single_linkage -t 0.25 pubchem.gfp
Explanation:
Compute clustering with single linkage algorithm on pubchem.gfp with distance 0.25 (-t 0.25) as the threshold value for grouping
Help command:
gfp_single_linkage
Description:
Converts non-colliding fingerprints to fixed counted, or binary forms
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_sparse_to_fixed -F NCSELW pubchem.gfp
Explanation:
Convert non-colliding fingerprints NCSELW (-F NCSELW) in pubchem.gfp to binary form
Help command:
gfp_sparse_to_fixed
Description:
Filters molecules according to how close they are to members of a comparison pool
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_distance_filter -p pubchem_10.gfp -t 0.7 -n 3 -U U1 pubchem.gfp
Explanation:
Compare fingerprints in pubchem.gfp to pubchem_10.gfp. The minimum required distance between molecules is 0.7 (-t 0.7). Any molecules in pubchem.gfp will be rejected if it violates the minimum distance requirement at least 3 times (-n 3). The rejected molecule will be saved into U1 file (-U U1)
Help command:
gfp_distance_filter
Description:
Calculates pairwise distance between molecules
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_pairwise_distances -p pubchem.gfp -T 0.5 100PairsId
Explanation:
Calculate the pairwise distance between the molecules in pubchem.gfp (-p pubchem.gfp). The distance will not be reported if it is larger than the threshold value 0.5 (-T 0.5). The required pairs for calculation are listed in the 100PairsId file.
Help command:
gfp_pairwise_distances
Description:
Converts fingerprints to descriptors
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_to_descriptors -f -F FPDSC pubchem.gfp
Explanation:
Convert the fixed width fingerprint (-f) pubchem.gfp into descriptor format; use fingerprint type FPDSC (-F FPDSC) in fingerprint file.
Help command:
gfp_to_descriptors
Description:
Finds near neighbours within a set of fingerprints
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_nearneighbours_single_file -p -z -T 0.2 pubchem.gfp FPDSC,w=0.2 -F NCSELW,nc,w=0.8 -V a=0.03 -V b=1.7
Explanation:
Finds near neighbours in the pubchem.gfp file. Writes all pair-wise distances in 3 columns (-p); discards any molecule without neighbours (-z) or distances greater than 0.2. Assign weight 0.2 to fingerprint type FPDSC 0.2 (-F FPDSC,w=0.2), and weight 0.8 to non-colliding fingerprint with tag NCSELW (-F NCSELW,nc,w=0.8) for distance calculation. Use Tversky asymmetric similarity with parameters a set to 0.03 (-V a=0.03) and b set to 1.7 (-V b=1.7)
Help command:
gfp_nearneighbours_single_file
Description:
Finds near neighbours of compounds supplied in gfp fingerprint format; can handle LARGE numbers of compounds
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_lnearneighbours -F FPDSC,w=0.3 -F NCSELW,nc,w=0.7 -T 0.4 -n 2 -h -p pubchem_needles.gfp pubchem_haystack.gfp
Explanation:
Finds 2 near neighbours (-n 2) for each fingerprint in the needle file pubchem_needles.gfp against haystack file pubchem_haystack.gfp (-p). Discards neighbours with zero distance and the same ID as the target (-h) or with a distance larger than 0.4 (-T). Fingerprint of type FPDSC weighs 0.3 (-F FPDSC,w=0.3), and non-colliding fingerprint with tag NCSELW weighs 0.7 in the distance calculation(-F NCSELW,nc,w=0.7)
Help command:
gfp_lnearneighbours
Description:
Adds descriptors to the gfp file with matching ID
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_add_descriptors -D FPADD pubchem.gfp cleaned_descriptor.txt
Explanation:
Add the descriptor cleaned_descriptor.txt to the gfp file pubchem.gfp; use tag FPADD (-D FPADD)
Help command:
gfp_add_descriptors
Description:
Scans a fingerprint file and computes average activity associated with each bit; an activity file needs to be provided
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_profile_activity_by_bits -E activityFile.txt pubchem.gfp
Explanation:
Compute the average activity from file activityFile.txt (-E activityFile.txt) for fingerprints in the pubchem.gfp.
Help command:
gfp_profile_activity_by_bits
Description:
Calculates the spread distance of fingerprints against target fingerprint set. Sort output by spread distance.
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_spread_v2 -A pubchem_10.gfp pubchem.gfp
Explanation:
Compute the spread distance of fingerprints in pubchem.gfp. Bias away from fingerprints in pubchem_10.gfp (-A pubchem_10.gfp)
Help command:
gfp_spread_v2
Description:
Calculates the spread distance of fingerprints against target fingerprint set while considering a bucketized variable like activity/property
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_spread_buckets_v2 -B NATOMS pubchem_with_natom.gfp
Explanation:
Compute the spread distance of the fingerprints in pubchem_with_natom.gfp while considering the variable NATOMS (-B NATOMS) also included in the fingerprint file.
Help command:
gfp_spread_buckets_v2
Description:
Processes the output of several gfp nearneighbour-based tools into human readable form including SMILES format
Author/owner: C3/Eli Lilly and Co
Sample 1:
nplotnn -L def leader_result_raw.txt
Explanation:
Reformat the output of leader clustering to tabular (-L tbl)
Help command:
nplotnn
Description:
Sorts fields of a TDT file or stream according to specific tag, properties, or the degree of each node
Author/owner: C3/Eli Lilly and Co
Sample 1:
tdt_sort -T FPADD,col=4 -r unsorted.gfp
Explanation:
Sort the file unsorted.gfp in reverse order (-r) based on the 4th column of FPADD field (-T FPADD,col=4)
Help command:
tdt_sort
Description:
Joins two TDT streams, possible with different identifiers
Author/owner: C3/Eli Lilly and Co
Sample 1:
tdt_join.sh -d part1.gfp part2.gfp
Explanation:
Join fingerprint file part1.gfp with fingerprint file part2.gfp and eliminate duplicate tags from second file (-d)
Help command:
tdt_join
Description:
Computes topological and/or 3D distances between features
Author/owner: C3/Eli Lilly and Co
Sample 1:
dbf input.sdf
Explanation:
Compute the distances between the features in the input.sdf file
Help command:
dbf
Description:
Structure fragmentation tool which can optionally use a query fle
Author/owner: C3/Eli Lilly and Co
Sample 1:
dicer -k 2 -C auto -s 'ClC' -s 'BrC' -s 'FC' -B bscb -m 0 -M 15 -X 5000 -i smi -A I -A D input.smi
Explanation:
Break a maximum of 2 bonds at a time (-k 2), write smiles and complementary smiles with auto label on break points (-C auto), define SMARTS for cut points (-s '...'), allow C-C single bonds to break (-B bscb), keep fragments between 0 (-m 0) and 15 (-M 15) atoms, produce no more than 5000 fragments per molecule (-X 5000); input is provided in smiles format (-i smi); enable input of aromatic structures (-A I) with Daylight aromaticity definitions (-A D).
Help command:
dicer
Description:
Hydrophobic section descriptors
Author/owner: C3/Eli Lilly and Co
Sample 1:
hydrophobic_sections input.smi
Explanation:
Compute the hydrophobic section descriptors for the smiles in the input.smi file
Help command:
hydrophobic_sections
calculate 200+ descriptors for structures in an input SMILES file
Author/owner: C3/Eli Lilly and Co
Sample 1:
iwdescr input.smi
Explanation:
Generate a large set of descriptors for compounds in input.smi file. Results written to stdout.
Help command:
iwdescr
Extended Connectivity fingerprints for structures in an input SMILES files
Author/owner: C3/Eli Lilly and Co
Sample 1:
iwecfp -r 2 -R 6 input.smi
Explanation:
Compute EC fingerprint with a shell width 2-6 (-r 2; -R 6); Results written to stdout.
Help command:
iwecfp
Compute hashed path-based fingerprints for structures in an input SMILES file
Author/owner: C3/Eli Lilly and Co
Sample 1:
iwfp test.smi
Explanation:
Produces hashed path-based fingerprints of compounds for the smiles in the test.smi file. Results written to stdout.
Help command:
iwfp
Calculate charge surface area related descriptors for structures in input SMILES file.
Author/owner: C3/Eli Lilly and Co
Sample 1:
jwsadb input.smi
Explanation:
Calculate charge surface area related descriptors for the smiles in the input.smi file. Results written to stdout.
Sample 2:
jwsadb -b -F contrib/data/AtomicPhysChemParameter/wildman_crippen.dat test.sdf
Explanation:
Calculate CoMMA related descriptors for the smiles in the test.sdf file using the ghose-crippen parameters in wildman_crippen.dat(-F contrib/data/AtomicPhysChemParameter/wildman_crippen.dat). Results written to stdout.
Help command:
jwsadb
Generate MACCS keys for structures in an input SMILES file.
Author/owner: C3/Eli Lilly and Co
Sample 1:
maccskeys -A D input.smi
Explanation:
Generate MACCS keys for the smiles in the input.smi file; use Daylight aromaticity definitions (-A D). Results written to stdout.
Help command:
maccskeys
Enumerate structures from a reaction template, labels and starting materials. Uses a rxn file, a reagents ID file and two reagent SMILES files.
Author/owner: C3/Eli Lilly and Co
Sample 1:
make_these_molecules -E autocreate -A D -z i -z f -l -R amideFormation.rxn -M products.lbl -S out.smi amines.smi acids.smi
Explanation:
Generate a compound for each reagent pair in each row of file products.lbl (-M products.lbl), using reaction file amideFormation.rxn (-R amideFormation.rxn) and reagent structures in the provided smiles files amines.smi and acids.smi (order is important); use Daylight aromaticity definitions (-A D); take first of any multiple matches in sidechains(-z f), ignore no match errors in sidechains(-z i), reduce to largest fragment (-l). Results are written to file out.smi (-S).
Sample 2:
make_these_molecules -E autocreate -A D -z i -z f -l -R amideFormation.rxn -R amideFormation.rxn -M products.lbl amines.smi scaffold_bb.smi acids.smi
Explanation:
Generate a compound from three reagent components in each row of file products.lbl (-M products.lbl); using reaction file amideFormation.rxn (-R amideFormation.rxn) for reactions both between the first and second component and between the second and third component; and reagent structures in the provided smiles files amines.smi, scaffold_bb.smi and acids.smi (order is important). Use Daylight aromaticity definitions (-A D). Take first of any multiple matches in sidechains(-z f), ignore no match errors in sidechains(-z i) and reduce to largest fragment (-l)
Help command:
make_these_molecules
Performs molecular transformations associated with various molecular abstractions
Author/owner: C3/Eli Lilly and Co
Sample 1:
molecular_abstraction -c -a 'allbonds(single).allatoms(N WRITE)' input.smi
Explanation:
Remove all chirality from input molecules (-c); change all bonds to single bonds and all heavy atoms to N (-a). Results written to stdout.
Help command:
molecular_abstraction
Perform molecular transformations on input molecules using reactions supplied
Author/owner: C3/Eli Lilly and Co
Sample 1:
molecular_transformations -A I -A D -d -z d -m each -R transformationRule.rxn input.smi
Explanation: Perform the transformations on structures in input.smi file using the reaction in transformationRule.rxn (-R transformationRule.rxn). Enable input of aromatic structures (-A I) with Daylight aromaticity definitions (-A D); Suppress duplicate molecules (-d) and discard molecules not reacting (-z d); Enumerate each scaffold hit separately together with the combined set(-m each).
Help command:
molecular_transformations
Synthesize molecules from isotopically labelled reactant sets; returns those identical to seeds in a given file
Author/owner: C3/Eli Lilly and Co
Sample 1:
molecules_from_reagents.sh -M NN -R bb1_iso.smi -R bb2_iso.smi seed_structures.smi
Explanation:
Identify structures from file seed_structures.smi that can be synthesized by isotopically labeled reactants bb1_iso.smi and bb2_iso.smi (-R); Write intermediate bb1 and bb2 that could yield seed structures in files with stem NN (-M). Results written to stdout.
Help command:
molecules_from_reagents
Chemistry aware molecular perturbation tool
Author/owner: C3/Eli Lilly and Co
Sample 1:
random_molecular_permutations -x 10 -p 100 -c 10 -c 40 -R 8 -Y 6 -U all -s 1234567 input.smi
Explanation:
Perform 100 molecule permutation (-p 100) on molecules with atom count between 10 (-c 10) and 40 (-C 40). Start with 10 copies of molecule (-x 10); Never form a ring with 8 atoms or larger(-R 8) and the maximum number of rings allowed in each molecule is 6 (-Y 6); Globally checking for duplicate molecules (-U all); Use 1234567 as the seed for the random number generator (-s 1234567).
Help command:
random_molecular_permutations
Converts an MDL query file with specified RGroups to a LillyMol query file and/or a Smarts-like file format
Author/owner: C3/Eli Lilly and Co
Sample 1:
remove_and_label -R nonorganic=0 -p -e 1 -i ignore_bad_m -E anylength -E autocreate -Q outQ1.qry -J outQ2.smt input.mdl
Explanation:
Converts the input MDL query file input.mdl to LillyMol query file outQ1.qry (-Q) and smarts-like outQ2.smt (-J); remove non organic elements (-R nonorganic=0); preserve ring membership of attachment points (-p); allow one extra connection at each substitution point (-e 1). Bad molecules are ignored (-i ignore_bad_m). Elements can be any length (-E anylength); automatically creater new element when it is encountered (-E autocreate). Help command:
remove_and_label
Strips fragments/matched atoms from molecules in input SMILES file; Can be used in place of a simple de-protection reaction.
Author/owner: C3/Eli Lilly and Co
Sample 1:
remove_matched_atoms -s 'CC1CCCNC1' -S stripped_matches input.smi
Explanation:
Remove matched atoms 'CC1CCCNC1' (-s) from the structures in input.smi file. Write results in file stripped_matches.smi (-S).
Help command:
remove_matched_atoms
Compute fingerprints of structures in input SMILES file based on ring properties - primarily aromatic rings.
Author/owner: C3/Eli Lilly and Co
Sample 1:
ring_fingerprint input.smi
Explanation:
Compute the fingerprints for the smiles in input.smi file
Help command:
ring_fingerprint
Replaces rings in input structures with rings from a pre-computed ring database. Companion program to ring_extraction.
Author/owner: C3/Eli Lilly and Co
Sample 1: ring_replacement -g all -A D -u -P PUBCHEM.5a.smi test.smi
Explanation: Replace the 5a ring of structures in test.smi using the labelled rings in PUBCHEM.5a.smi (-P PUBCHEM.5a.smi, created by tool ring_extraction). Use all possible standardizations (-g all) and Daylight aromaticity definitions (-A D). Process unique structures only (-u).
Help command:
Computes ring substitution fingerprints
Author/owner: C3/Eli Lilly and Co
Sample 1:
ring_substitution input.smi
Explanation:
Compute the ring substitution fingerprints for input.smi. Results written to stdout.
Help command:
ring_substitution
Computes Pfizer/Lipinski rule-of-five violations on molecules in input file. Returns molecule id and the values of the properties calculated.
Author/owner: C3/Eli Lilly and Co
Sample 1:
rule_of_five input.smi
Explanation:
Computes rule-of-five properties and violations on the molecules in the input file input.smi. Results written to stdout.
Help command:
rule_of_five
Produces fingerprints around the atoms that change in a reaction
Author/owner: C3/Eli Lilly and Co
Sample 1:
rxn_fingerprint -i -M ebch -M cbc -M noiso -M naqm -I 1 -a -d -h -r 1 -J NCR -E autocreate -m first input.rsmi
Explanation:
Produce fingerprint for the reaction smiles file input.rsmi (-i). Generate extra bits for the changing atoms(-M ebch); changes in bonding will be included with the changing atom count(-M cbc); discard reactions containing isotopic atoms (-M noiso); suppress 'no atoms in query' message (-M naqm). Mark changing atoms with 1 (-I 1); append changing atoms count to name (-a); expand shell to include doubly bonded atoms (-d). An atom is considered changing if any new bond appears (-h). Use 1 as radius from changing atoms to fingerprint (-r 1); use NCR as tag for each radius (-J NCR); Automatically create new elements when encountered (-E autocreate); if multiple reagents are present process the first one (-m first).
Help command:
rxn_fingerprint
Computes degree of substitution around a substructure match
Author/owner: C3/Eli Lilly and Co
Sample 1:
substitutions -s 'CC1CCCNC1' -S table.smi input.smi
Explanation:
Compute the degree of substitution for smiles in input.smi around the substructure 'CC1CCCNC1' (-s 'CC1CCCNC1'); results written in file table.smi (-S)
Help command:
substitutions
Computes 'temperature' descriptors
Author/owner: C3/Eli Lilly and Co
Sample 1:
temperature -a input.smi
Explanation: Computes 'temperature' descriptors for compounds in in file input.smi; output in descriptor file format (-a). Results written to stdout.
Help command:
temperature
Perform 2D substructures searches using supplied queries against structure files
Author/owner: C3/Eli Lilly and Co
Sample 1:
tnass -q F:unique_queries_tnass -x 255 -a -A D -i smi test.smi
Explanation: Profile molecules in smiles file test.smi using queries indicated in query file unique_queries_tnass (-q F:); input is in smiles format (-i smi); use daylight aromaticity definition (-A D); output hits in array format (-a); limit to 255 hit occurrences (-x)
Help command:
tnass
Description:
Compute the average of selected columns in a file.
Author/owner: C3/Eli Lilly and Co
Sample 1:
average -c 3 -y space -j descriptor.txt
Explanation:
Compute the average of the data in column 3 (-c 3) in the descriptor.txt file (-j); use space as delimiter (-y space)
Help command:
average
Description:
Use logical expressions to filter a descriptor file.
Author/owner: C3/Eli Lilly and Co
Sample 1:
dfilefilter -e 'logp<4&&logd074<2' -B reject.txt descriptor.txt
Explanation:
Filter rows in the descriptor file descriptor.txt for records with value less than 4 in column logp and value less than 2 in column logd074 (-e 'logp<4&&logd074<2'). The rejected rows will be saved in the reject.txt file(-B reject.txt)
Help command:
dfilefilter
Description:
Calculate distribution of selected columns of the input data set.
Author/owner: C3/Eli Lilly and Co
Sample 1:
distribution -H logd074 -s 1 -d mrv_logd074 descriptor.txt
Explanation:
Calculate the distribution for the column mrv_logd074 (-d mrv_logd074) in the descriptor.txt file. The header for the generated distribution column is logd074 (-H logd074); skip the first row of the input file (-s 1)
Help command:
distribution
Description:
Fetch records from one file based on the identifiers in another file.
Author/owner: C3/Eli Lilly and Co
Sample 1:
fetch_sdf_quick -c 2 in.smi in.sdf
Explanation:
Fetch the sdf records from in.sdf for the records in file in.smi using identifier in the second column (-c 2)
Help command:
fetch_sdf_quick
Description:
Splits a file into chunks based on regular expressions.
Author/owner: C3/Eli Lilly and Co
Sample 1:
iwsplit -n 10 -stem split pubchem_example.smi
Explanation:
Split the pubchem_example.smi into small files containing 10 row each (-n 10) using split as the stem for generated files(-stem split)
Help command:
iwsplit
Description:
Normalize the columns in a file.
Author/owner: C3/Eli Lilly and Co
Sample 1:
normalise -i 1 -j descriptor.txt
Explanation:
Normalize the column in descriptor file (-j) descriptor.txt; ignore the first column (-i 1)
Help command:
normalise
Description:
Filter out the data column without enough variance.
Author/owner: C3/Eli Lilly and Co
Sample 1:
notenoughvariance -j -n 1000 -V keep=8 Pubchem1000.dat
Explanation:
Filter out columns of the descriptor type file (-j) Pubchem1000.dat file. Store 1000 different values for calculation (-n 1000); keep 8 most desirable columns bsaed on most variance (-V keep=8)
Help command:
notenoughvariance
Description:
Randomly select records from the provided file; accommodates single-line (default) and multi-line records.
Author/owner: C3/Eli Lilly and Co
Sample 1:
random_records -n 25 pubchem_example.smi
Explanation:
Randomly select 25 records (-n 25) from the pubchem_example.smi file. Results written to stdout.
Help command:
random_records
Description:
Rearrange the columns in one descriptor file to look like another descriptor file
Author/owner: C3/Eli Lilly and Co
Sample 1:
rearrange_columns t11.i1 t11.i2
Explanation:
Rearrange and merge the columns from t11.i2 with t11.i1
Help command:
rearrange_columns
Description:
Count number of lines, columns in a file
Author/owner: C3/Eli Lilly and Co
Sample 1:
tcount -n 9 -d tab descriptor.txt
Explanation:
Counts the columns of the first 10 lines of file descriptor.txt(-n 9); reports number of columns per line; uses tab (-d tab) as the delimiter. Results written to stdout.
Help command:
tcount
Description:
Identifies missing descriptors from a descriptor file
Author/owner: C3/Eli Lilly and Co
Sample 1:
whatsmissing -r -n 0 descriptor.txt
Explanation:
Identify the missing descriptors from the descriptor.txt file, enforcing column count check (-r) and displaying the records without missing any column (-n 0). Results written to stdout.
Help command:
whatsmissing
Description:
Computes the distance matrix for a pool of gfp fingerprints; generates a binary file
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_distance_matrix_iwdm -S output pubchem_100.gfp
Explanation:
Generate the distance matrix for the pubchem_100.gfp file and save data to output file (-S output)
Help command:
gfp_distance_matrix_iwdm
Description:
Leader clustering implementation requiring MPR IW MK MK2 tdt-like/gfp fingerprints
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_leader_standard -t 0.35 pubchem_example.gfp
Explanation:
Perform leader clustering on the input fingerprint file pubchem_example.gfp using maximum radius distance 0.35 (-t). Results written to stdout.
Help command:
gfp_leader_standard
Description:
Find near neighbours of a set of needle compounds in a set of haystack compounds
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_lnearneighbours_standard -p pubchem_seeds.gfp -T 0.5 pubchem_example.gfp
Explanation:
Find near neighbours of the needle compounds in pubchem_seeds.gfp file (-p pubchem_seeds.gfp) in haystack pubchem_example.gfp with maximum distance 0.5 (-T 0.5); compounds represented in tdt-like, gfp format. Results written to stdout.
Help command:
gfp_lnearneighbours_standard
Description:
Perform quick maximum diversity subset selection using spread method
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_spread_standard -n 10 pubchem_example.gfp
Explanation:
Pick subset of the 10 (-n 10) most diverse compounds from fingerprint file pubchem_example.gfp. Results written to stdout.
Help command:
gfp_spread_standard
Description:
Converts multi component tdt-like fingerprints to descriptors
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_to_descriptors_multiple -x 0.5 -F FPMK pubchem_example.gfp
Explanation:
Convert the fingerprint with tag FPMK (-F FPMK) to descriptors; discard bits that hit less than 50% of the time (-x 0.5). Results written to stdout.
Help command:
gfp_to_descriptors_multiple
Description:
Computes B-squared and other statistics for input file
Author/owner: C3/Eli Lilly and Co
Sample 1:
iwstats -j -e 2 -p 3 descriptor.txt
Explanation:
Compute B-squared and other statistics for predicted data in column 3 (-p 3) of the input file using column 2 data (-e 2) as the experimental values. Treat input file as a descriptor file (-j)
Help command:
iwstats
Description:
Performs Leader, Taylor-Butina, or Jarvis-Patrick clustering starting from a tdt-formatted near-neighbour list
Author/owner: C3/Eli Lilly and Co
Sample 1:
nn_leader_and_jp -h -T -t 0.3 descriptor.txt
Explanation:
Perform Taylor Butina clustering (-T) for the near-neighbor list file descriptor.txt, suppressing self neighbors(-h) and using 0.3 as the distance threshold (-t 0.3). Results written to stdout.
Help command:
nn_leader_and_jp
Description:
Naive Bayes predictive model training and prediction
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_naive_bayesian -t 50 -f train -c 2 -F FPIW -C 2/50 -A train.act train.gfp
Explanation:
Build a Naive Bayes model on compounds using the FPIW tag (-F FPIW) of the fingerprints file train.gfp and activity data in the second column (-c 2) of activity file train.act (-f train -A train.act). The cutoff value for active and inactive molecules is 50 (-t 50). Run 2 cross validations and using 50% of record as training (-C 2/50). Results written to stdout.
Help command:
gfp_naive_bayesian
Release 5
Description:
Compute Abraham descriptors
Author/owner: C3/Eli Lilly and Co
Sample 1:
abraham -l -F $LILLYMOL_HOME/contrib/data/queries/abraham/Abraham -P $LILLYMOL_HOME/contrib/data/queries/abraham/Alpha2H -g all pubchem_example.smi
Explanation:
Compute the Abraham descriptors for SMILES in pubchem_example.smi file using control file Abraham (-F $LILLYMOL_HOME/contrib/data/queries/abraham/Abraham) and alpha2H queries Alpha2H (-P $LILLYMOL_HOME/contrib/data/queries/abraham/Alpha2H). Reduce to largest fragment (-l) and using all chemical standardization.
Help command:
abraham
Description:
Spread implementation, distances from a distance matrix file
Author/owner: C3/Eli Lilly and Co
Sample 1:
distance_matrix_spread -n 20 in_matrix.dat
Explanation:
Calculate distance spread for 20 items (-n) in the in_matrix.dat file.
Help command:
distance_matrix_spread
Description:
Produce a distance matrix file using distance data from file
Author/owner: C3/Eli Lilly and Co
Sample 1:
distance_matrix_from_distances -t sqh -S output.dat input.txt
Explanation:
Generate distance matrix file output.dat (-S output.dat) in the format of square data with a header record (-t sqh) using distance data from input.txt
Help command:
distance_matrix_from_distances
Description:
Compute protein-ligand interaction fingerprints. Three main types of fingerprints have been implemented: ecfp, path and atom pairs. For ecfp fingerprints, bits represents a circular substructure from the ligand and a circular substructure from the protein (both central atoms from the ligand and the protein must be within a given distance threshold). It is possible to change the radius shell (on both ligand and protein), the distance cut-off between ligand and protein atoms, limit the number of atoms in the protein substructure, to atom types, the atom relationship (atom pairs and atom paths)...
Author/owner: C3/Eli Lilly and Co
Sample 1:
iwecfp_intermolecular -J NCECB3 -T 3 -R 3 -k in/protein.sdf in/ligand.sdf
Explanation:
Compute the extended connectivity fingerprints for protein in protein.sdf and ligand.sdf file. Use the NCECB3 as the tag for the generated fingerprint(-J NCECB3). Make the intermolecular connections when the distance is less than 3 (-T 3) and maximum shell radius is 3 (-R 3). All files used in the example were prepared using Schrodinger Protein Preparation Wizard tool.
Help command:
iwecfp_intermolecular
Description:
Compute shadow area for molecule
Author/owner: C3/Eli Lilly and Co
Sample 1:
tshadow -L -G in/in.sdf
Explanation:
Compute the shadow area for the molecules in the in.sdf file. Rotate to longest distance between atoms and compute radius of gyration
Help command:
tshadow