Replies: 1 comment
-
Because it is the first word with umlaut that appears in the WikiText-2 dataset that we use to test tokenizers and at that time it was causing the BPE tokenizers to fail, so I added it to the list, similar to all other failing words. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
llama.cpp/convert_hf_to_gguf_update.py
Line 279 in 84ec8a5
And if you want another option:
#11600
Beta Was this translation helpful? Give feedback.
All reactions