forked from RaoulWolf/pcapi
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
133 lines (99 loc) · 5.04 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r}
#| include = FALSE
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# pcapi <img src="man/figures/logo.svg" align="right" height="139" />
<!-- badges: start -->
[](https://github.com/RaoulWolf/pcapi/actions)
[](https://app.codecov.io/gh/RaoulWolf/pcapi?branch=master)
[](https://lifecycle.r-lib.org/articles/stages.html#experimental)
<!-- badges: end -->
> A [ZeroPM](https://zeropm.eu/) R package
The goal of the pcapi package is to provide a minimal and lightweight
interface to the [PubChem](https://pubchem.ncbi.nlm.nih.gov/) API services.
The dependencies of pcapi are kept at a bare minimum:
[curl](https://cran.r-project.org/web/packages/curl/index.html) for handling
the API requests and
[jsonlite](https://cran.r-project.org/web/packages/jsonlite/index.html) to
parse data from JSON format.
Currently, the following formats can be used to retrieve PubChem
CIDs: InChI, InChIKey, names (including synonyms), SDF, and SMILES.
### But what about [webchem](https://docs.ropensci.org/webchem/)?
First of all, [webchem](https://docs.ropensci.org/webchem/) is a fantastic
package - please use it! webchem offers support for many different API
services, not only PubChem's API service. However, there are issues related to
the way webchem sets up the queries against PubChem API services, which can
result in errors. This has been observed, for example, when SMILES or InChI
strings contain backslashes, or an MOL/SDF is used as input. With pcapi, these
problems are (hopefully) solved. Additionally, pcapi will continuously
implement some of the lesser known functionalities, such as the retrival of
transformation products.
## Installation
You can install the development version of pcapi from
[GitHub](https://github.com/) with:
```{r install}
#| eval = FALSE
# install.packages("remotes")
remotes::install_github("ZeroPM-H2020/pcapi")
```
## Functions
PubChem's API services are vast and flexible. Not all functionality will be
available immediately, and some functionalities may not be worth it to
implement. That being said, the goal of pcapi is to provide a satisfying
user-experience, covering the most relevant functionalities of the API. The
following table gives an overview of the available functions, as well as their
expected inputs and outputs.
| Function | Expected input | Output |
|:-------------------------- |:------------------------------------------------ |:------------------------------------- |
| `post_to_cid()` | InChI, InChIKey, MOL, name, SDF, SMILES, synonym | integer vector of PubChem CID(s) |
| `post_to_property()` | PubChem CID | data frame of properties |
| `post_to_transformation()` | PubChem CID | data frame of transformation products |
| `post_to_use()` | PubChem CID | data frame of use categories |
| `post_to_standardize()` | InChI, SDF, SMILES | data frame of standardized compounds |
## Examples
The following examples show the functionality of two API functions of
pcapi: `post_to_cid()` and `post_to_property()`.
This is an example which shows you how to get the PubChem CID for the herbicide
[atrazine](https://en.wikipedia.org/wiki/Atrazine):
```{r example1}
library(pcapi)
atrazine_cid <- post_to_cid("atrazine", format = "name")
atrazine_cid
```
Using this PubChem CID, we can now get all available descriptors and
physico-chemical properties for atrazine:
```{r example2}
atrazine_properties <- post_to_property(atrazine_cid, property = "all")
str(atrazine_properties)
```
To get an overview of all transformation products, the PubChem CID can be used
as well:
```{r example3}
atrazine_transformations <- post_to_transformation(atrazine_cid)
str(atrazine_transformations)
```
The use categories can also be retrieved using the PubChem CID:
```{r example4}
atrazine_uses <- post_to_use(atrazine_cid)
str(atrazine_uses)
```
## Acknowledgement
This R package was developed by the EnviData initiative at the
[Norwegian Geotechnical Institute (NGI)](https://www.ngi.no/eng) as part of the
project
[ZeroPM: Zero pollution of Persistent, Mobile substances](https://zeropm.eu/).
This project has received funding from the European Union's Horizon 2020
research and innovation programme under grant agreement No 101036756.
---
If you find this package useful and can afford it, please consider making a
donation to a humanitarian non-profit organization, such as
[Sea-Watch](https://sea-watch.org/en/). Thank you.