Skip to content

Using group_initial_split() with small group will fail even if adjusting the prop parameter? #534

@MatthieuStigler

Description

@MatthieuStigler

The problem

Summary: group_initial_split() fails often with small-frequency groups even if adjusting prop to reflect the small-frequency group?

I'm using group_initial_split() with a small number (4) groups. As I have one group with low frequency (10%), my intuition was that by setting prop=0.9, this group would be selected within the training sample. However, I get very often (around 70%) error messages such as:

#> Error in group_mc_cv():
#> ! Some assessment sets contained zero rows
#> ℹ Consider using a non-grouped resampling method

How come this happens even if I adjusted prop? This fails even if I get the exact proportion of the group (1-freq(small_group))!? Am I misunderstanding the prop argument?

Thanks!

Reproducible example

library(rsample)
dat <- data.frame(group = sample(LETTERS[1:4], prob = c(0.3, 0.3, 0.3, 0.1), replace = TRUE, size=1000),
                  x = rnorm(1000))
table(dat$group)
#> 
#>   A   B   C   D 
#> 340 270 298  92


set.seed(123)
dat_split <- group_initial_split(dat, group, prop=0.9)
#> Error in `group_mc_cv()`:
#> ! Some assessment sets contained zero rows
#> ℹ Consider using a non-grouped resampling method

# This will fail about 80% times:
set.seed(1234)
mean(sapply(1:100, \(x) inherits(try(group_initial_split(dat, group, prop=0.9), silent = TRUE), "try-error")))
#> [1] 0.79

Created on 2024-09-08 with reprex v2.1.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions