Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Dataset type enforcement and error messages for aggregate_over_dataset esp regarding new_fingerprint kwarg #214

Open
scottfleming opened this issue Apr 27, 2024 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@scottfleming
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Anything that relies on aggregate_over_dataset assumes that what is passed in as dataset is a huggingface Dataset object rather than a DatasetDict object. This is potentially problematic because if a user eg loads the dataset via dataset = datasets.load_dataset(path_to_dataset) rather than dataset = datasets.Dataset.from_parquet(path_to_dataset) then dataset will be a DatasetDict but while the map function for DatasetDict exists it does not have a new_fingerprint kwarg and so throws an opaque error.

Describe the solution you'd like
Add type hints to aggregate_over_dataset. Throw an error that isn't so opaque if someone tries to pass in a Dataset object rather than a DatasetDict object.

@scottfleming scottfleming added the documentation Improvements or additions to documentation label Apr 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant