You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A clear and concise description of what the bug is.
The function tokenize_and_apply_input_masking (see here) is applied as a preprocessing step that includes, among other things, ensuring that max_seq_length is enforced in the provided dataset. It is called from _process_dataset_configshere, which calls HF's dataset map function. The dataset map function is called here, where it passes things like the max_length as a kwarg with fn_kwargs.
As I've been debugging, it appears that this line here in tokenize_and_apply_input_masking is possibly an issue, where it takes the kwargs and pulls out fn_kwargs. However, HF's dataset.map function passes the the fn_kwargs as the function's kwargs (see documentation), it doesn't pass it as a dictionary containing fn_kwargs.
Describe the bug
A clear and concise description of what the bug is.
The function
tokenize_and_apply_input_masking
(see here) is applied as a preprocessing step that includes, among other things, ensuring thatmax_seq_length
is enforced in the provided dataset. It is called from_process_dataset_configs
here, which calls HF's datasetmap
function. The dataset map function is called here, where it passes things like themax_length
as a kwarg withfn_kwargs
.As I've been debugging, it appears that this line here in
tokenize_and_apply_input_masking
is possibly an issue, where it takes the kwargs and pulls outfn_kwargs
. However, HF's dataset.map function passes the thefn_kwargs
as the function's kwargs (see documentation), it doesn't pass it as a dictionary containingfn_kwargs
.Should this
Instead be this
Platform
Please provide details about the environment you are using, including the following:
Sample Code
Please include a minimal sample of the code that will (if possible) reproduce the bug in isolation
Expected behavior
A clear and concise description of what you expected to happen.
Observed behavior
What you see happening (error messages, stack traces, etc...)
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: