Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split_table: add support for multi-category stratification and other stratification options #198

Open
nbokulich opened this issue Jan 31, 2021 · 0 comments
Labels
diff:2|intermediate A modest understanding of the languages involved and platform is required. lang:python Python 3 scope:1|several-projects More than one repository (likely a dependency) is impacted. time:2|medium May take some time to complete (even with familiarity). weight:2|moderate It will happen soon.

Comments

@nbokulich
Copy link
Member

Improvement Description
Stratify randomized splitting based on one or more sample metadata categories, or other options. Some example uses:

  1. stratify by group (e.g., for cross-group or cross-batch prediction)
  2. stratification of dependent samples (e.g., different sample types collected from same subject)
  3. temporal stratification

Current Behavior
Stratification can only be done based on a single value, and in the pipelines (e.g., classify_samples) this is the target value.

Proposed Behavior
The stratify option should accept a list of column names, instead of a bool, possibly also additional options for more complicated stratification types (temporal, group kfold)

Comments
See sklearn stratification docs for some other examples....

References
forum x-ref

@nbokulich nbokulich added diff:2|intermediate A modest understanding of the languages involved and platform is required. lang:python Python 3 scope:1|several-projects More than one repository (likely a dependency) is impacted. time:2|medium May take some time to complete (even with familiarity). weight:2|moderate It will happen soon. labels Jan 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
diff:2|intermediate A modest understanding of the languages involved and platform is required. lang:python Python 3 scope:1|several-projects More than one repository (likely a dependency) is impacted. time:2|medium May take some time to complete (even with familiarity). weight:2|moderate It will happen soon.
Projects
None yet
Development

No branches or pull requests

1 participant