-
Notifications
You must be signed in to change notification settings - Fork 65
Open
Description
The problem
Summary: group_initial_split()
fails often with small-frequency groups even if adjusting prop
to reflect the small-frequency group?
I'm using group_initial_split()
with a small number (4) groups. As I have one group with low frequency (10%), my intuition was that by setting prop=0.9
, this group would be selected within the training sample. However, I get very often (around 70%) error messages such as:
#> Error in
group_mc_cv()
:
#> ! Some assessment sets contained zero rows
#> ℹ Consider using a non-grouped resampling method
How come this happens even if I adjusted prop
? This fails even if I get the exact proportion of the group (1-freq(small_group))!? Am I misunderstanding the prop
argument?
Thanks!
Reproducible example
library(rsample)
dat <- data.frame(group = sample(LETTERS[1:4], prob = c(0.3, 0.3, 0.3, 0.1), replace = TRUE, size=1000),
x = rnorm(1000))
table(dat$group)
#>
#> A B C D
#> 340 270 298 92
set.seed(123)
dat_split <- group_initial_split(dat, group, prop=0.9)
#> Error in `group_mc_cv()`:
#> ! Some assessment sets contained zero rows
#> ℹ Consider using a non-grouped resampling method
# This will fail about 80% times:
set.seed(1234)
mean(sapply(1:100, \(x) inherits(try(group_initial_split(dat, group, prop=0.9), silent = TRUE), "try-error")))
#> [1] 0.79
Created on 2024-09-08 with reprex v2.1.1
Metadata
Metadata
Assignees
Labels
No labels