Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The perplexity tool returns abnormal values #70

Open
ppp-max opened this issue Nov 11, 2024 · 4 comments
Open

The perplexity tool returns abnormal values #70

ppp-max opened this issue Nov 11, 2024 · 4 comments

Comments

@ppp-max
Copy link

ppp-max commented Nov 11, 2024

Hello,Sorry to bother you.

T tested the PPLs of llama.cpp and T-MAC is abnormal, which values are 110682 and 53515, so big. But we know that the normal value should be very small. So then I try to test the latest llama.cpp( https://github.com/ggerganov/llama.cpp,)'s PPL(about 6~9), which is nomal.

Image
https://github.com/ggerganov/llama.cpp/blob/master/examples/perplexity/README.md

Have you tested the PPL data, or do it need to do additional processing on the PPL data?

Thank you for your assistance!

@QingtaoLi1
Copy link
Contributor

QingtaoLi1 commented Nov 11, 2024

@ppp-max Which models are you testing? And do you check llama-cli to see whether the output tokens are normal?

Recently, we find that some EfficientQAT Llama-2-7b models has vocab_size=32001, but the meta/Llama-2-7b has vocab_size=32000; thus, the perplexity becomes abnormally high. After hacking and forcing it to be 32000 (removing the last one), we got correct PPL numbers. You can see our PR to llama.cpp for the numbers.

Image

@ppp-max
Copy link
Author

ppp-max commented Nov 11, 2024

I used models are llama-2-7b-chat.Q4_0.gguf and llama-2-7b-chat.Q2_K.gguf, which downloaded from https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF.
And I getted different PPL when testing the same gguf with different llama.cpp (https://github.com/ggerganov/llama.cpp and https://github.com/kaleid-liner/llama.cpp).

And how to hack and force vocab_size to 32000? Thanks.

@QingtaoLi1
Copy link
Contributor

QingtaoLi1 commented Nov 12, 2024

I used models are llama-2-7b-chat.Q4_0.gguf and llama-2-7b-chat.Q2_K.gguf, which downloaded from https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF. And I getted different PPL when testing the same gguf with different llama.cpp (https://github.com/ggerganov/llama.cpp and https://github.com/kaleid-liner/llama.cpp).

And how to hack and force vocab_size to 32000? Thanks.

@ppp-max I think it's better to use non-chat version of models to test PPL. From our test, chat version will give slightly higher PPL numbers, but still below 10. We've tested a Q4_0 model (downloaded from meta/Llama-2-7b and quantized using llama-quantize), in which the origin llama.cpp, kaleid-liner llama.cpp and T-MAC got almost the same PPL (5.961764, 5.962298, 5.962719).

For the vocab_size problem, have you checked the llama-cli output tokens? If the output are random tokens instead of human sentences, probably you should firstly check other parts, e.g. the configuration, build, command options, etc. If the generated tokens are normal, you can check model.vocab.n_vocab or model.hparams.n_vocab or the weight tensor shapes after loading the model to see if the problem is indeed vocab_size.

@QingtaoLi1
Copy link
Contributor

@ppp-max I notice that your issue #61 mentioned that you used Llama-2-7b-EfficientQAT-w2g128-GPTQ and Llama-2-7b-EfficientQAT-w4g128-GPTQ. They are where I find the vocab size problem.

My hacking is quite tricky and temporary, so tbh I don't wanna put it here. But you can use it as a temprary solution like me. I forcely set model.hparams.n_vocab and model.vocab.n_vocab to be 32000 after loading model hparams and vocab, and resize model.vocab.id_to_token to 32000. And then when reading tensor info in ggml.c, change the tensor shape if (info->ne[j] == 32001) { info->ne[j] = 32000; }

Hope these can help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants