PROVESID is a member of the family of PROVES packages that provides Pythonic access to online services of chemical identifiers and data. The goal is to have a clean interface to the most important online databases with a simple, intuitive (and documented), up-to-date, and extendable interface. We offer interfaces to PubChem, NCI chemical identifier resolver, CAS Common Chemistry, IUPAC OPSIN, ChEBI, and ClassyFire. We highly recommend the new users to jump head-first into examples folder and get started by playing with the code. We also keep documenting the old and new functionalities here.
The package can be installed from PyPi by running
pip install provesid
To install the latest development version (for developers and enthusiasts), clone or download this repository, for to the root folder and install it by
pip install -e .
PubChem
from provesid.pubchem import PubChemAPI
pc = PubChemAPI() # Now with unlimited caching!
cids_aspirin = pc.get_cids_by_name('aspirin')
res_basic = pc.get_basic_compound_info(cids_aspirin[0])which returns
{
"CID": 2244,
"MolecularFormula": "C9H8O4",
"MolecularWeight": "180.16",
"SMILES": "CC(=O)OC1=CC=CC=C1C(=O)O",
"InChI": "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)",
"InChIKey": "BSYNRYMUTXBXSQ-UHFFFAOYSA-N",
"IUPACName": "2-acetyloxybenzoic acid",
"success": true,
"cid": 2244,
"error": null
}PubChem View for data
from provesid import PubChemView, get_property_table
logp_table = get_property_table(cids_aspirin[0], "LogP")
logp_tablewhich returns a table with the reported values of logP for aspirin (including the references for each data point).
Chemical Identifier Resolver
from provesid import NCIChemicalIdentifierResolver
resolver = NCIChemicalIdentifierResolver()
smiles = resolver.resolve(compound, 'smiles')OPSIN
from provesid import OPSIN
opsin = OPSIN()
methane_result = opsin.get_id("methane")which returns:
{'status': 'SUCCESS',
'message': '',
'inchi': 'InChI=1/CH4/h1H4',
'stdinchi': 'InChI=1S/CH4/h1H4',
'stdinchikey': 'VNWKTOKETHGBQD-UHFFFAOYSA-N',
'smiles': 'C'}CAS Common Chemistry
# One-time API key setup
from provesid import set_cas_api_key
set_cas_api_key("your-cas-api-key") # Configure once
# Then use anywhere without specifying API key
from provesid import CASCommonChem
ccc = CASCommonChem() # Automatically uses stored API key
water_info = ccc.cas_to_detail("7732-18-5")
print("Water (7732-18-5):")
print(f" Name: {water_info.get('name')}")
print(f" Molecular Formula: {water_info.get('molecularFormula')}")
print(f" Molecular Mass: {water_info.get('molecularMass')}")
print(f" SMILES: {water_info.get('smile')}")
print(f" InChI: {water_info.get('inchi')}")
print(f" Status: {water_info.get('status')}")which returns
Water (7732-18-5):
Name: Water
Molecular Formula: H<sub>2</sub>O
Molecular Mass: 18.02
SMILES: O
InChI: InChI=1S/H2O/h1H2
Status: Success
ClassyFire
See the tutorial notebook.
If you're interested in contributing to PROVESID or need to understand the release workflow, please see our comprehensive Development Guide which includes:
- 🛠️ Development setup and environment configuration
- 🚀 Step-by-step release workflow and version management
- 🧪 Testing procedures and guidelines
- 📚 Documentation building and contribution guidelines
- 🔍 Code quality standards and tools
- 🤝 Contribution workflow and pull request guidelines
Several other Python (and other) packages and sample codes are available. We are inspired by them and tried to improve upon them based on our personal experiences working with chemical identifiers and data.
- PubChemPy and docs
- CIRpy and docs
- IUPAC cookbook for a tutorial on using various web APIs.
- more?
We will provide Python interfaces to more online services, including:
- ZeroPM even though there is no web API, the data is available on GitHub. I have written an interface that is not shared here since it can make this codebase too large, and I aim to keep it lean. We will find a way to share it.
- More? Please open an issue and let us know what else you would like to have included.