Skip to content

Idea: move check_args() up into new_model_spec() #1098

@EmilHvitfeldt

Description

@EmilHvitfeldt
Member

Adding a call to check_args() on this line https://github.com/tidymodels/parsnip/blob/main/R/misc.R#L350 produces the following results. The only questionable use-case is the last one, which people might use.

Still looking at other places where this would fail

library(parsnip)
library(discrim)

discrim_linear(penalty = -2)
#> Error in `new_model_spec()`:
#> ! The amount of regularization, `penalty`, should be `>= 0`.

discrim_linear(penalty = tune()) %>% 
  set_args(penalty = 1)
#> Linear Discriminant Model Specification (classification)
#> 
#> Main Arguments:
#>   penalty = 1
#> 
#> Computational engine: MASS

discrim_linear(penalty = tune())
#> Linear Discriminant Model Specification (classification)
#> 
#> Main Arguments:
#>   penalty = tune()
#> 
#> Computational engine: MASS

discrim_linear(penalty = 1) %>% 
  set_args(penalty = -2)
#> Error in `new_model_spec()`:
#> ! The amount of regularization, `penalty`, should be `>= 0`.

discrim_linear(penalty = -2) %>% 
  set_args(penalty = 1)
#> Error in `new_model_spec()`:
#> ! The amount of regularization, `penalty`, should be `>= 0`.

Activity

EmilHvitfeldt

EmilHvitfeldt commented on Apr 3, 2024

@EmilHvitfeldt
MemberAuthor

right now the only blocker we have found for this, is dependent on how we handle #1099

simonpcouch

simonpcouch commented on Apr 3, 2024

@simonpcouch
Contributor

Very much agreed on the linked issue that we can do a better job checking arguments in check_args().

I think it's important to note here that the pattern in check_args() methods is to eval_tidy() each argument:

parsnip/R/boost_tree.R

Lines 167 to 169 in dc9a66e

check_args.boost_tree <- function(object) {
args <- lapply(object$args, rlang::eval_tidy)

If we're able to eval_tidy() each argument to model specification functions as soon as they're inputted, then there's no reason we should be capturing them as quosures in the first place. The reason we capture them as quosures is so that we can evaluate them in an environment that we construct, the eval_env, at fit() time:

parsnip/R/fit.R

Lines 156 to 158 in dc9a66e

# Create an environment with the evaluated argument objects. This will be
# used when a model call is made later.
eval_env <- rlang::env()

This environment supports defining arguments in relation to the data to be fit()ted to, like:

library(parsnip)

boost_tree(mtry = ncol(.x()) / 2, mode = "regression") %>% fit(mpg ~ ., mtcars)
#> parsnip model object
#> 
#> ##### xgb.Booster
#> raw: 21.6 Kb 
#> call:
#>   xgboost::xgb.train(params = list(eta = 0.3, max_depth = 6, gamma = 0, 
#>     colsample_bytree = 1, colsample_bynode = 0.5, min_child_weight = 1, 
#>     subsample = 1), data = x$data, nrounds = 15, watchlist = x$watchlist, 
#>     verbose = 0, nthread = 1, objective = "reg:squarederror")
#> params (as set within xgb.train):
#>   eta = "0.3", max_depth = "6", gamma = "0", colsample_bytree = "1", colsample_bynode = "0.5", min_child_weight = "1", subsample = "1", nthread = "1", objective = "reg:squarederror", validate_parameters = "TRUE"
#> xgb.attributes:
#>   niter
#> callbacks:
#>   cb.evaluation.log()
#> # of features: 10 
#> niter: 15
#> nfeatures : 10 
#> evaluation_log:
#>      iter training_rmse
#>     <num>         <num>
#>         1    14.9313149
#>         2    10.9841618
#> ---                    
#>        14     0.6109443
#>        15     0.5196238

...where .x() is a data descriptor. As of now, check_args() knows that it's unable to evaluate such an expression:

boost_tree(mtry = ncol(.x()) / 2, mode = "regression") %>% check_args()
#> Error in `descr_env$.x()`:
#> ! Descriptor context not set

Created on 2024-04-03 with reprex v2.1.0

Calling check_args() will not be able to give the same output when called inside of new_model_spec() because it doesn't yet have access to the data--the fact that the behavior might differ between these two contexts, to me, signals that we ought not to check_args() until we have access to the data at fit().

There are definitely improvements to be made to argument checking, as in the loophole that you pointed out in the linked issue. I think that's a good place to direct focus for now. :)

simonpcouch

simonpcouch commented on Sep 9, 2024

@simonpcouch
Contributor

Given the implications for data descriptors, I'm going to go ahead and close. Still very much on board for #1095. :)

github-actions

github-actions commented on Sep 25, 2024

@github-actions

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

locked and limited conversation to collaborators on Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @EmilHvitfeldt@simonpcouch

        Issue actions

          Idea: move `check_args()` up into `new_model_spec()` · Issue #1098 · tidymodels/parsnip