split_table
: add support for multi-category stratification and other stratification options
#198
Labels
diff:2|intermediate
A modest understanding of the languages involved and platform is required.
lang:python
Python 3
scope:1|several-projects
More than one repository (likely a dependency) is impacted.
time:2|medium
May take some time to complete (even with familiarity).
weight:2|moderate
It will happen soon.
Improvement Description
Stratify randomized splitting based on one or more sample metadata categories, or other options. Some example uses:
Current Behavior
Stratification can only be done based on a single value, and in the pipelines (e.g.,
classify_samples
) this is the target value.Proposed Behavior
The
stratify
option should accept a list of column names, instead of abool
, possibly also additional options for more complicated stratification types (temporal, group kfold)Comments
See sklearn stratification docs for some other examples....
References
forum x-ref
The text was updated successfully, but these errors were encountered: