Better Dataset type enforcement and error messages for aggregate_over_dataset
esp regarding new_fingerprint
kwarg
#214
Labels
documentation
Improvements or additions to documentation
Is your feature request related to a problem? Please describe.
Anything that relies on aggregate_over_dataset assumes that what is passed in as
dataset
is a huggingfaceDataset
object rather than aDatasetDict
object. This is potentially problematic because if a user eg loads the dataset viadataset = datasets.load_dataset(path_to_dataset)
rather thandataset = datasets.Dataset.from_parquet(path_to_dataset)
thendataset
will be aDatasetDict
but while themap
function forDatasetDict
exists it does not have a new_fingerprint kwarg and so throws an opaque error.Describe the solution you'd like
Add type hints to
aggregate_over_dataset
. Throw an error that isn't so opaque if someone tries to pass in aDataset
object rather than aDatasetDict
object.The text was updated successfully, but these errors were encountered: