Skip to content

Commit 01b7eb7

Browse files
committed
Add comprehensive tutorials
1 parent 86a2fdc commit 01b7eb7

File tree

8 files changed

+1394
-6
lines changed

8 files changed

+1394
-6
lines changed

docs/source/advanced-features.rst

Lines changed: 307 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,307 @@
1+
Advanced Features and API Reference
2+
===================================
3+
4+
This guide covers advanced features of the Rxn-INSIGHT package and
5+
provides a concise API reference for the core classes.
6+
7+
Reaction Class
8+
--------------
9+
10+
The ``Reaction`` class is the central component for analyzing chemical
11+
reactions.
12+
13+
Key Attributes
14+
~~~~~~~~~~~~~~
15+
16+
- ``reaction``: SMILES representation of the reaction
17+
- ``reactants``: SMILES string of reactants
18+
- ``products``: SMILES string of products
19+
- ``mapped_reaction``: Reaction with atom mappings
20+
- ``reaction_class``: Classification of the reaction
21+
- ``name``: Name of the reaction
22+
- ``scaffold``: Molecular scaffold of the product
23+
- ``byproducts``: Tuple of byproducts from the reaction
24+
- ``template``: Extracted reaction template
25+
26+
Important Methods
27+
~~~~~~~~~~~~~~~~~
28+
29+
``get_reaction_info()``
30+
^^^^^^^^^^^^^^^^^^^^^^^
31+
32+
Returns a comprehensive dictionary with reaction details: - Reaction
33+
class and name - Functional groups in reactants and products - Ring
34+
systems - Byproducts - Scaffold information - Atom mapping information
35+
36+
``find_neighbors(df, fp='MACCS', concatenate=True, max_return=100, threshold=0.3, broaden=False, full_search=False)``
37+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
38+
39+
Finds similar reactions in a database: - ``df``: Pandas DataFrame
40+
containing reaction data - ``fp``: Fingerprint type (‘MACCS’ or
41+
‘Morgan’) - ``concatenate``: Whether to concatenate reactant and product
42+
fingerprints - ``max_return``: Maximum number of results to return -
43+
``threshold``: Similarity threshold (0-1) - ``broaden``: Use broader
44+
search criteria - ``full_search``: Perform a full database search
45+
(slower)
46+
47+
``suggest_conditions(df)``
48+
^^^^^^^^^^^^^^^^^^^^^^^^^^
49+
50+
Suggests optimal conditions based on similar reactions: - ``df``: Pandas
51+
DataFrame containing reaction data - Returns: A dictionary with
52+
suggested solvent, catalyst, and reagent
53+
54+
``get_class()``
55+
^^^^^^^^^^^^^^^
56+
57+
Determines and returns the reaction class.
58+
59+
``get_name()``
60+
^^^^^^^^^^^^^^
61+
62+
Determines and returns the reaction name.
63+
64+
``get_byproducts()``
65+
^^^^^^^^^^^^^^^^^^^^
66+
67+
Calculates and returns likely byproducts.
68+
69+
``get_scaffold()``
70+
^^^^^^^^^^^^^^^^^^
71+
72+
Extracts and returns the molecular scaffold.
73+
74+
``get_rings_in_reactants()``
75+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
76+
77+
Identifies ring structures in reactants.
78+
79+
``get_rings_in_products()``
80+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
81+
82+
Identifies ring structures in products.
83+
84+
Molecule Class
85+
--------------
86+
87+
The ``Molecule`` class handles operations related to individual
88+
molecules.
89+
90+
.. _key-attributes-1:
91+
92+
Key Attributes
93+
~~~~~~~~~~~~~~
94+
95+
- ``mol``: RDKit molecule object
96+
- ``smiles``: SMILES representation
97+
- ``inchi``: InChI identifier
98+
- ``inchikey``: InChIKey identifier
99+
- ``scaffold``: Murcko scaffold of the molecule
100+
- ``maccs_fp``: MACCS fingerprint
101+
- ``morgan_fp``: Morgan fingerprint
102+
103+
.. _important-methods-1:
104+
105+
Important Methods
106+
~~~~~~~~~~~~~~~~~
107+
108+
``get_functional_groups(df=None)``
109+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
110+
111+
Identifies functional groups in the molecule.
112+
113+
``get_rings()``
114+
^^^^^^^^^^^^^^^
115+
116+
Extracts ring structures from the molecule.
117+
118+
``search_reactions(df)``
119+
^^^^^^^^^^^^^^^^^^^^^^^^
120+
121+
Finds reactions in the database where this molecule is a product.
122+
123+
``search_reactions_by_scaffold(df, threshold=0.5, max_return=100, fp='MACCS')``
124+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
125+
126+
Finds reactions with similar product scaffolds.
127+
128+
Database Class
129+
--------------
130+
131+
The ``Database`` class manages collections of reactions.
132+
133+
Key Methods
134+
~~~~~~~~~~~
135+
136+
``create_database_from_df(df, reaction_column, solvent_column='SOLVENT', reagent_column='REAGENT', catalyst_column='CATALYST', yield_column='YIELD', ref_column='REF')``
137+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
138+
139+
Creates a reaction database from a DataFrame: - ``df``: Input DataFrame
140+
with reaction data - ``reaction_column``: Column containing reaction
141+
SMILES - Other parameters: Specify column names for conditions
142+
143+
``create_database_from_csv(fname, reaction_column, ...)``
144+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
145+
146+
Creates a database from a CSV file.
147+
148+
``save_to_parquet(fname)``
149+
^^^^^^^^^^^^^^^^^^^^^^^^^^
150+
151+
Saves the database to a parquet file.
152+
153+
``get_class_distribution()``
154+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
155+
156+
Returns the distribution of reaction classes.
157+
158+
``get_name_distribution()``
159+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
160+
161+
Returns the distribution of reaction names.
162+
163+
Utility Functions
164+
-----------------
165+
166+
The ``utils`` module contains various helper functions:
167+
168+
Reaction Handling
169+
~~~~~~~~~~~~~~~~~
170+
171+
- ``get_atom_mapping(rxn, rxn_mapper=None)``: Maps atoms in a reaction
172+
- ``get_reaction_template(reaction, radius_reactants=2, radius_products=2)``:
173+
Extracts a reaction template
174+
- ``sanitize_mapped_reaction(rxn)``: Cleans up a mapped reaction
175+
- ``remove_atom_mapping(rxn, smarts=False)``: Removes atom mapping
176+
177+
Fingerprinting and Similarity
178+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
179+
180+
- ``get_fp(rxn, fp='MACCS', concatenate=True)``: Gets a fingerprint for
181+
a reaction
182+
- ``get_similarity(v1, v2, metric='jaccard')``: Calculates similarity
183+
between fingerprints
184+
- ``maccs_fp(mol)``: Gets MACCS fingerprint for a molecule
185+
- ``morgan_fp(mol)``: Gets Morgan fingerprint for a molecule
186+
187+
Scaffold Analysis
188+
~~~~~~~~~~~~~~~~~
189+
190+
- ``get_scaffold(mol)``: Gets the Murcko scaffold of a molecule
191+
- ``get_ring_systems(mol, include_spiro=False)``: Identifies ring
192+
systems
193+
194+
Ranking Functions
195+
~~~~~~~~~~~~~~~~~
196+
197+
- ``get_solvent_ranking(df)``: Ranks solvents by frequency
198+
- ``get_catalyst_ranking(df)``: Ranks catalysts by frequency
199+
- ``get_reagent_ranking(df)``: Ranks reagents by frequency
200+
201+
Advanced Usage Examples
202+
-----------------------
203+
204+
Custom Reaction Classification
205+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
206+
207+
.. code:: python
208+
209+
from rxn_insight.reaction import Reaction
210+
from rxn_insight.classification import ReactionClassifier
211+
212+
# Create a reaction
213+
reaction_smiles = "CC(=O)OC1=CC=CC=C1>>OC1=CC=CC=C1.CC(=O)O"
214+
215+
# Access the classifier directly for advanced analysis
216+
rxn = Reaction(reaction_smiles)
217+
classifier = rxn.classifier
218+
219+
# Directly check classification properties
220+
print(f"Is functional group interconversion: {classifier.is_fgi()}")
221+
print(f"Is deprotection: {classifier.is_deprotection()}")
222+
print(f"Is protection: {classifier.is_protection()}")
223+
print(f"Is oxidation: {classifier.is_oxidation()}")
224+
print(f"Is reduction: {classifier.is_reduction()}")
225+
print(f"Is C-C coupling: {classifier.is_cc_coupling()}")
226+
227+
Working with Atom Mappings
228+
~~~~~~~~~~~~~~~~~~~~~~~~~~
229+
230+
.. code:: python
231+
232+
from rxn_insight.reaction import Reaction
233+
from rxnmapper import RXNMapper
234+
235+
# Initialize RXNMapper
236+
rxn_mapper = RXNMapper()
237+
238+
# Map a reaction
239+
rxn_smiles = "OB(O)c1ccccc1.Brc1ccccc1>>c1ccc(-c2ccccc2)cc1"
240+
mapped_rxn = rxn_mapper.get_attention_guided_atom_maps([rxn_smiles])[0]["mapped_rxn"]
241+
242+
# Create a Reaction with the mapping
243+
rxn = Reaction(mapped_rxn, keep_mapping=True)
244+
245+
# Get the reaction center
246+
reaction_center = rxn.get_reaction_center()
247+
print(f"Reaction center: {reaction_center}")
248+
249+
Custom Similarity Metrics
250+
~~~~~~~~~~~~~~~~~~~~~~~~~
251+
252+
.. code:: python
253+
254+
from rxn_insight.reaction import Reaction
255+
from rxn_insight.utils import get_fp, get_similarity
256+
import numpy as np
257+
258+
# Define two reactions
259+
rxn1 = "OB(O)c1ccccc1.Brc1ccccc1>>c1ccc(-c2ccccc2)cc1"
260+
rxn2 = "OB(O)c1ccc(C)cc1.Brc1ccccc1>>c1ccc(-c2ccc(C)cc2)cc1"
261+
262+
# Get fingerprints
263+
fp1 = get_fp(rxn1, fp="Morgan", concatenate=True)
264+
fp2 = get_fp(rxn2, fp="Morgan", concatenate=True)
265+
266+
# Calculate similarity using different metrics
267+
similarity_metrics = ["jaccard", "dice", "cosine", "euclidean", "manhattan"]
268+
269+
for metric in similarity_metrics:
270+
similarity = get_similarity(fp1, fp2, metric=metric)
271+
print(f"{metric} similarity: {similarity:.4f}")
272+
273+
Working with Reaction Templates
274+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
275+
276+
.. code:: python
277+
278+
from rxn_insight.reaction import Reaction
279+
from rxn_insight.utils import get_reaction_template
280+
from rdkit import Chem
281+
from rdkit.Chem import AllChem
282+
283+
# Create a reaction
284+
rxn_smiles = "OB(O)c1ccccc1.Brc1ccccc1>>c1ccc(-c2ccccc2)cc1"
285+
rxn = Reaction(rxn_smiles)
286+
287+
# Extract template with different radii parameters
288+
template1 = get_reaction_template(rxn.mapped_reaction, radius_reactants=1, radius_products=1)
289+
template2 = get_reaction_template(rxn.mapped_reaction, radius_reactants=2, radius_products=1)
290+
291+
print(f"Template (radius 1,1): {template1}")
292+
print(f"Template (radius 2,1): {template2}")
293+
294+
# Use template to predict products for new reactants
295+
rxn_template = AllChem.ReactionFromSmarts(template1)
296+
new_reactants = ["OB(O)c1ccc(F)cc1", "Brc1ccc(Cl)cc1"]
297+
reactant_mols = [Chem.MolFromSmiles(r) for r in new_reactants]
298+
299+
# Run the reaction
300+
products = rxn_template.RunReactants(reactant_mols)
301+
if products:
302+
predicted_product = Chem.MolToSmiles(products[0][0])
303+
print(f"Predicted product: {predicted_product}")
304+
305+
These examples demonstrate some of the advanced features available in
306+
Rxn-INSIGHT. Refer to the source code for more detailed documentation of
307+
each function and class.

0 commit comments

Comments
 (0)