|
1 |
| -# Some Notes on the Design of `parsnip` |
| 1 | +# Some Notes on the Design of parsnip |
2 | 2 |
|
3 |
| -`parsnip` is trying to solve the issues of unified interfaces for the myriad R modeling functions that have very heterogeneous interfaces and return values. It defines a set of modules, which are specific tasks, such as |
| 3 | +The parsnip package is trying to solve the issues of unified interfaces for the myriad R modeling functions that have very heterogeneous interfaces and return values. It defines a set of modules, which are specific tasks, such as |
4 | 4 |
|
5 | 5 | * fitting the model
|
6 | 6 | * obtaining numeric predictions for regression models
|
7 | 7 | * computing different types of predictions for classification and censored regression models
|
8 | 8 |
|
9 | 9 | and so on. The list of modules is likely to grow over time to include variable importance scores and so on,.
|
10 | 10 |
|
11 |
| -`caret` was written for the same purpose. The approach there was to encapsulate the modules as functions (see [this directory](https://github.com/topepo/caret/tree/master/models/files) for examples). The issue with having these modules as functions are: |
| 11 | +The caret package was written for the same purpose. The approach there was to encapsulate the modules as functions (see [this directory](https://github.com/topepo/caret/tree/master/models/files) for examples). The issue with having these modules as functions are: |
12 | 12 |
|
13 | 13 | * A lot of code duplication.
|
14 | 14 | * More difficult to maintain.
|
15 | 15 | * Any functions in open code had to be a dependency of some sort. This led to a long ago version having about 200 package dependencies which was problematic.
|
16 | 16 |
|
17 |
| -To get around the last point, `caret` _compiles_ these modules into a large list and saves it in the package as an RData file. This avoids `R CMD check` from noticing that code and triggering warnings about dependencies. |
| 17 | +To get around the last point, caret _compiles_ these modules into a large list and saves it in the package as an RData file. This avoids `R CMD check` from noticing that code and triggering warnings about dependencies. |
18 | 18 |
|
19 | 19 | ## Model Fitting Modules
|
20 | 20 |
|
21 |
| -`parsnip` approaches the problem differently and relies more on using `call` objects for the modules. In the simple cases, the fit module is a list that contains information about the module including the package and function name for the call as well as any default options. For example, for logistic regression using `glm`, the module may look like: |
| 21 | +parsnip approaches the problem differently and relies more on using `call` objects for the modules. In the simple cases, the fit module is a list that contains information about the module including the package and function name for the call as well as any default options. For example, for logistic regression using `glm`, the module may look like: |
22 | 22 |
|
23 | 23 | ```r
|
24 | 24 | list(
|
@@ -77,9 +77,9 @@ The same is true for quosures.
|
77 | 77 |
|
78 | 78 | Making predictions is done in a manner similar to fitting models; a call is created in the same way. However, there are additional complexities.
|
79 | 79 |
|
80 |
| -First, the data or model fit object may require some preprocessing to make the predict function work. This does _not_ include executing a formula method on the data but may include coercing the new data into an appropriate format. It can also be used to check for specific fit object requirements. For example, an additional option is required for the `ranger` package to compute class probabilities. The `pre` element of a prediction module can be used to check that the relevant option is set correctly. |
| 80 | +First, the data or model fit object may require some preprocessing to make the predict function work. This does _not_ include executing a formula method on the data but may include coercing the new data into an appropriate format. It can also be used to check for specific fit object requirements. For example, an additional option is required for the ranger package to compute class probabilities. The `pre` element of a prediction module can be used to check that the relevant option is set correctly. |
81 | 81 |
|
82 |
| -Second, there is a high likelihood that the results of executing the prediction code will require post-processing to put the results into a usable format. `ranger`, for example, returns an object of specific class that contains the predicted values for the new data. The `post` element of the prediction module would extract this value and put it into a more consistent format. |
| 82 | +Second, there is a high likelihood that the results of executing the prediction code will require post-processing to put the results into a usable format. ranger, for example, returns an object of specific class that contains the predicted values for the new data. The `post` element of the prediction module would extract this value and put it into a more consistent format. |
83 | 83 |
|
84 | 84 | The postprocessor can also be used to coerce the results into a [_tidy format_](https://tidymodels.github.io/model-implementation-principles/model-predictions.html#return-values).
|
85 | 85 |
|
0 commit comments