|
| 1 | +Advanced Features and API Reference |
| 2 | +=================================== |
| 3 | + |
| 4 | +This guide covers advanced features of the Rxn-INSIGHT package and |
| 5 | +provides a concise API reference for the core classes. |
| 6 | + |
| 7 | +Reaction Class |
| 8 | +-------------- |
| 9 | + |
| 10 | +The ``Reaction`` class is the central component for analyzing chemical |
| 11 | +reactions. |
| 12 | + |
| 13 | +Key Attributes |
| 14 | +~~~~~~~~~~~~~~ |
| 15 | + |
| 16 | +- ``reaction``: SMILES representation of the reaction |
| 17 | +- ``reactants``: SMILES string of reactants |
| 18 | +- ``products``: SMILES string of products |
| 19 | +- ``mapped_reaction``: Reaction with atom mappings |
| 20 | +- ``reaction_class``: Classification of the reaction |
| 21 | +- ``name``: Name of the reaction |
| 22 | +- ``scaffold``: Molecular scaffold of the product |
| 23 | +- ``byproducts``: Tuple of byproducts from the reaction |
| 24 | +- ``template``: Extracted reaction template |
| 25 | + |
| 26 | +Important Methods |
| 27 | +~~~~~~~~~~~~~~~~~ |
| 28 | + |
| 29 | +``get_reaction_info()`` |
| 30 | +^^^^^^^^^^^^^^^^^^^^^^^ |
| 31 | + |
| 32 | +Returns a comprehensive dictionary with reaction details: - Reaction |
| 33 | +class and name - Functional groups in reactants and products - Ring |
| 34 | +systems - Byproducts - Scaffold information - Atom mapping information |
| 35 | + |
| 36 | +``find_neighbors(df, fp='MACCS', concatenate=True, max_return=100, threshold=0.3, broaden=False, full_search=False)`` |
| 37 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 38 | + |
| 39 | +Finds similar reactions in a database: - ``df``: Pandas DataFrame |
| 40 | +containing reaction data - ``fp``: Fingerprint type (‘MACCS’ or |
| 41 | +‘Morgan’) - ``concatenate``: Whether to concatenate reactant and product |
| 42 | +fingerprints - ``max_return``: Maximum number of results to return - |
| 43 | +``threshold``: Similarity threshold (0-1) - ``broaden``: Use broader |
| 44 | +search criteria - ``full_search``: Perform a full database search |
| 45 | +(slower) |
| 46 | + |
| 47 | +``suggest_conditions(df)`` |
| 48 | +^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 49 | + |
| 50 | +Suggests optimal conditions based on similar reactions: - ``df``: Pandas |
| 51 | +DataFrame containing reaction data - Returns: A dictionary with |
| 52 | +suggested solvent, catalyst, and reagent |
| 53 | + |
| 54 | +``get_class()`` |
| 55 | +^^^^^^^^^^^^^^^ |
| 56 | + |
| 57 | +Determines and returns the reaction class. |
| 58 | + |
| 59 | +``get_name()`` |
| 60 | +^^^^^^^^^^^^^^ |
| 61 | + |
| 62 | +Determines and returns the reaction name. |
| 63 | + |
| 64 | +``get_byproducts()`` |
| 65 | +^^^^^^^^^^^^^^^^^^^^ |
| 66 | + |
| 67 | +Calculates and returns likely byproducts. |
| 68 | + |
| 69 | +``get_scaffold()`` |
| 70 | +^^^^^^^^^^^^^^^^^^ |
| 71 | + |
| 72 | +Extracts and returns the molecular scaffold. |
| 73 | + |
| 74 | +``get_rings_in_reactants()`` |
| 75 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 76 | + |
| 77 | +Identifies ring structures in reactants. |
| 78 | + |
| 79 | +``get_rings_in_products()`` |
| 80 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 81 | + |
| 82 | +Identifies ring structures in products. |
| 83 | + |
| 84 | +Molecule Class |
| 85 | +-------------- |
| 86 | + |
| 87 | +The ``Molecule`` class handles operations related to individual |
| 88 | +molecules. |
| 89 | + |
| 90 | +.. _key-attributes-1: |
| 91 | + |
| 92 | +Key Attributes |
| 93 | +~~~~~~~~~~~~~~ |
| 94 | + |
| 95 | +- ``mol``: RDKit molecule object |
| 96 | +- ``smiles``: SMILES representation |
| 97 | +- ``inchi``: InChI identifier |
| 98 | +- ``inchikey``: InChIKey identifier |
| 99 | +- ``scaffold``: Murcko scaffold of the molecule |
| 100 | +- ``maccs_fp``: MACCS fingerprint |
| 101 | +- ``morgan_fp``: Morgan fingerprint |
| 102 | + |
| 103 | +.. _important-methods-1: |
| 104 | + |
| 105 | +Important Methods |
| 106 | +~~~~~~~~~~~~~~~~~ |
| 107 | + |
| 108 | +``get_functional_groups(df=None)`` |
| 109 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 110 | + |
| 111 | +Identifies functional groups in the molecule. |
| 112 | + |
| 113 | +``get_rings()`` |
| 114 | +^^^^^^^^^^^^^^^ |
| 115 | + |
| 116 | +Extracts ring structures from the molecule. |
| 117 | + |
| 118 | +``search_reactions(df)`` |
| 119 | +^^^^^^^^^^^^^^^^^^^^^^^^ |
| 120 | + |
| 121 | +Finds reactions in the database where this molecule is a product. |
| 122 | + |
| 123 | +``search_reactions_by_scaffold(df, threshold=0.5, max_return=100, fp='MACCS')`` |
| 124 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 125 | + |
| 126 | +Finds reactions with similar product scaffolds. |
| 127 | + |
| 128 | +Database Class |
| 129 | +-------------- |
| 130 | + |
| 131 | +The ``Database`` class manages collections of reactions. |
| 132 | + |
| 133 | +Key Methods |
| 134 | +~~~~~~~~~~~ |
| 135 | + |
| 136 | +``create_database_from_df(df, reaction_column, solvent_column='SOLVENT', reagent_column='REAGENT', catalyst_column='CATALYST', yield_column='YIELD', ref_column='REF')`` |
| 137 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 138 | + |
| 139 | +Creates a reaction database from a DataFrame: - ``df``: Input DataFrame |
| 140 | +with reaction data - ``reaction_column``: Column containing reaction |
| 141 | +SMILES - Other parameters: Specify column names for conditions |
| 142 | + |
| 143 | +``create_database_from_csv(fname, reaction_column, ...)`` |
| 144 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 145 | + |
| 146 | +Creates a database from a CSV file. |
| 147 | + |
| 148 | +``save_to_parquet(fname)`` |
| 149 | +^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 150 | + |
| 151 | +Saves the database to a parquet file. |
| 152 | + |
| 153 | +``get_class_distribution()`` |
| 154 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 155 | + |
| 156 | +Returns the distribution of reaction classes. |
| 157 | + |
| 158 | +``get_name_distribution()`` |
| 159 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 160 | + |
| 161 | +Returns the distribution of reaction names. |
| 162 | + |
| 163 | +Utility Functions |
| 164 | +----------------- |
| 165 | + |
| 166 | +The ``utils`` module contains various helper functions: |
| 167 | + |
| 168 | +Reaction Handling |
| 169 | +~~~~~~~~~~~~~~~~~ |
| 170 | + |
| 171 | +- ``get_atom_mapping(rxn, rxn_mapper=None)``: Maps atoms in a reaction |
| 172 | +- ``get_reaction_template(reaction, radius_reactants=2, radius_products=2)``: |
| 173 | + Extracts a reaction template |
| 174 | +- ``sanitize_mapped_reaction(rxn)``: Cleans up a mapped reaction |
| 175 | +- ``remove_atom_mapping(rxn, smarts=False)``: Removes atom mapping |
| 176 | + |
| 177 | +Fingerprinting and Similarity |
| 178 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 179 | + |
| 180 | +- ``get_fp(rxn, fp='MACCS', concatenate=True)``: Gets a fingerprint for |
| 181 | + a reaction |
| 182 | +- ``get_similarity(v1, v2, metric='jaccard')``: Calculates similarity |
| 183 | + between fingerprints |
| 184 | +- ``maccs_fp(mol)``: Gets MACCS fingerprint for a molecule |
| 185 | +- ``morgan_fp(mol)``: Gets Morgan fingerprint for a molecule |
| 186 | + |
| 187 | +Scaffold Analysis |
| 188 | +~~~~~~~~~~~~~~~~~ |
| 189 | + |
| 190 | +- ``get_scaffold(mol)``: Gets the Murcko scaffold of a molecule |
| 191 | +- ``get_ring_systems(mol, include_spiro=False)``: Identifies ring |
| 192 | + systems |
| 193 | + |
| 194 | +Ranking Functions |
| 195 | +~~~~~~~~~~~~~~~~~ |
| 196 | + |
| 197 | +- ``get_solvent_ranking(df)``: Ranks solvents by frequency |
| 198 | +- ``get_catalyst_ranking(df)``: Ranks catalysts by frequency |
| 199 | +- ``get_reagent_ranking(df)``: Ranks reagents by frequency |
| 200 | + |
| 201 | +Advanced Usage Examples |
| 202 | +----------------------- |
| 203 | + |
| 204 | +Custom Reaction Classification |
| 205 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 206 | + |
| 207 | +.. code:: python |
| 208 | +
|
| 209 | + from rxn_insight.reaction import Reaction |
| 210 | + from rxn_insight.classification import ReactionClassifier |
| 211 | +
|
| 212 | + # Create a reaction |
| 213 | + reaction_smiles = "CC(=O)OC1=CC=CC=C1>>OC1=CC=CC=C1.CC(=O)O" |
| 214 | +
|
| 215 | + # Access the classifier directly for advanced analysis |
| 216 | + rxn = Reaction(reaction_smiles) |
| 217 | + classifier = rxn.classifier |
| 218 | +
|
| 219 | + # Directly check classification properties |
| 220 | + print(f"Is functional group interconversion: {classifier.is_fgi()}") |
| 221 | + print(f"Is deprotection: {classifier.is_deprotection()}") |
| 222 | + print(f"Is protection: {classifier.is_protection()}") |
| 223 | + print(f"Is oxidation: {classifier.is_oxidation()}") |
| 224 | + print(f"Is reduction: {classifier.is_reduction()}") |
| 225 | + print(f"Is C-C coupling: {classifier.is_cc_coupling()}") |
| 226 | +
|
| 227 | +Working with Atom Mappings |
| 228 | +~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 229 | + |
| 230 | +.. code:: python |
| 231 | +
|
| 232 | + from rxn_insight.reaction import Reaction |
| 233 | + from rxnmapper import RXNMapper |
| 234 | +
|
| 235 | + # Initialize RXNMapper |
| 236 | + rxn_mapper = RXNMapper() |
| 237 | +
|
| 238 | + # Map a reaction |
| 239 | + rxn_smiles = "OB(O)c1ccccc1.Brc1ccccc1>>c1ccc(-c2ccccc2)cc1" |
| 240 | + mapped_rxn = rxn_mapper.get_attention_guided_atom_maps([rxn_smiles])[0]["mapped_rxn"] |
| 241 | +
|
| 242 | + # Create a Reaction with the mapping |
| 243 | + rxn = Reaction(mapped_rxn, keep_mapping=True) |
| 244 | +
|
| 245 | + # Get the reaction center |
| 246 | + reaction_center = rxn.get_reaction_center() |
| 247 | + print(f"Reaction center: {reaction_center}") |
| 248 | +
|
| 249 | +Custom Similarity Metrics |
| 250 | +~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 251 | + |
| 252 | +.. code:: python |
| 253 | +
|
| 254 | + from rxn_insight.reaction import Reaction |
| 255 | + from rxn_insight.utils import get_fp, get_similarity |
| 256 | + import numpy as np |
| 257 | +
|
| 258 | + # Define two reactions |
| 259 | + rxn1 = "OB(O)c1ccccc1.Brc1ccccc1>>c1ccc(-c2ccccc2)cc1" |
| 260 | + rxn2 = "OB(O)c1ccc(C)cc1.Brc1ccccc1>>c1ccc(-c2ccc(C)cc2)cc1" |
| 261 | +
|
| 262 | + # Get fingerprints |
| 263 | + fp1 = get_fp(rxn1, fp="Morgan", concatenate=True) |
| 264 | + fp2 = get_fp(rxn2, fp="Morgan", concatenate=True) |
| 265 | +
|
| 266 | + # Calculate similarity using different metrics |
| 267 | + similarity_metrics = ["jaccard", "dice", "cosine", "euclidean", "manhattan"] |
| 268 | +
|
| 269 | + for metric in similarity_metrics: |
| 270 | + similarity = get_similarity(fp1, fp2, metric=metric) |
| 271 | + print(f"{metric} similarity: {similarity:.4f}") |
| 272 | +
|
| 273 | +Working with Reaction Templates |
| 274 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 275 | + |
| 276 | +.. code:: python |
| 277 | +
|
| 278 | + from rxn_insight.reaction import Reaction |
| 279 | + from rxn_insight.utils import get_reaction_template |
| 280 | + from rdkit import Chem |
| 281 | + from rdkit.Chem import AllChem |
| 282 | +
|
| 283 | + # Create a reaction |
| 284 | + rxn_smiles = "OB(O)c1ccccc1.Brc1ccccc1>>c1ccc(-c2ccccc2)cc1" |
| 285 | + rxn = Reaction(rxn_smiles) |
| 286 | +
|
| 287 | + # Extract template with different radii parameters |
| 288 | + template1 = get_reaction_template(rxn.mapped_reaction, radius_reactants=1, radius_products=1) |
| 289 | + template2 = get_reaction_template(rxn.mapped_reaction, radius_reactants=2, radius_products=1) |
| 290 | +
|
| 291 | + print(f"Template (radius 1,1): {template1}") |
| 292 | + print(f"Template (radius 2,1): {template2}") |
| 293 | +
|
| 294 | + # Use template to predict products for new reactants |
| 295 | + rxn_template = AllChem.ReactionFromSmarts(template1) |
| 296 | + new_reactants = ["OB(O)c1ccc(F)cc1", "Brc1ccc(Cl)cc1"] |
| 297 | + reactant_mols = [Chem.MolFromSmiles(r) for r in new_reactants] |
| 298 | +
|
| 299 | + # Run the reaction |
| 300 | + products = rxn_template.RunReactants(reactant_mols) |
| 301 | + if products: |
| 302 | + predicted_product = Chem.MolToSmiles(products[0][0]) |
| 303 | + print(f"Predicted product: {predicted_product}") |
| 304 | +
|
| 305 | +These examples demonstrate some of the advanced features available in |
| 306 | +Rxn-INSIGHT. Refer to the source code for more detailed documentation of |
| 307 | +each function and class. |
0 commit comments