Why is there no difference in the E2E performance of T-MAC and llama.cpp on arm machine? #61

ppp-max · 2024-10-11T10:06:18Z

I used a ARM machine to test the end-to-end output, but the performance does not match the results mentioned in the paper. The tested data of llama.cpp and T-MAC is nearly same. I've posted the measured data below.

And the frequency of this machine is 2.5 GHz, the bandwidth of this machine 680 G/s per core.

kaleid-liner · 2024-10-12T06:00:17Z

Is 680 G/s memory bandwidth? It seems invalid. You also didn't post the data of llama.cpp. It would be more helpful if you provide the model architecture , whether 4bit or 2bit, and device name.

ppp-max · 2024-10-14T01:46:41Z

Sorry, the data was pasted wrong. Here‘s llama.cpp's data which used model bitnet_b1_58-3B and thread 4.

And then I tested Llama-2-7b-EfficientQAT-w2g128-GPTQ、Llama-2-7b-EfficientQAT-w4g128-GPTQ， which have the same results（there is no difference of the E2E performance between T-MAC and llama.cpp）
And I computed the bandwidth of this machine again，whis is 340 G/s. Sorry about that.
Look forward to your reply. Thk.

QingtaoLi1 · 2024-11-19T10:12:38Z

@ppp-max Your speed is quite low while the memory bandwidth is strangely high. May I double check that 340 is G bits or G bytes? The speed you provide is close to our Raspberry Pi, while its memory bandwidth is only about 48 GB/s. And do you see obvious speed gap between T-MAC and llama.cpp using one single thread? If that's the case, we tend to consider that the 4 threads case meets memory bound, as the roofline model we show in our main page,

QingtaoLi1 mentioned this issue Nov 19, 2024

The perplexity tool returns abnormal values #70

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is there no difference in the E2E performance of T-MAC and llama.cpp on arm machine? #61

Why is there no difference in the E2E performance of T-MAC and llama.cpp on arm machine? #61

ppp-max commented Oct 11, 2024

kaleid-liner commented Oct 12, 2024

ppp-max commented Oct 14, 2024 •

edited

Loading

QingtaoLi1 commented Nov 19, 2024 •

edited

Loading

Why is there no difference in the E2E performance of T-MAC and llama.cpp on arm machine? #61

Why is there no difference in the E2E performance of T-MAC and llama.cpp on arm machine? #61

Comments

ppp-max commented Oct 11, 2024

kaleid-liner commented Oct 12, 2024

ppp-max commented Oct 14, 2024 • edited Loading

QingtaoLi1 commented Nov 19, 2024 • edited Loading

ppp-max commented Oct 14, 2024 •

edited

Loading

QingtaoLi1 commented Nov 19, 2024 •

edited

Loading