You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
currently in llama.cpp, convert.py assumes tokenizer.model file in the model path. seems like this works for any case that uses a sentencepiece tokenizer, but nothing else.
huggingface's tokenizer library is neat and provides more options than sentencepiece. it would be really great if ggml support any tokenizers from huggingface. i believe this means it'd expect merges.txt and vocab.json.