You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used a ARM machine to test the end-to-end output, but the performance does not match the results mentioned in the paper. The tested data of llama.cpp and T-MAC is nearly same. I've posted the measured data below.
And the frequency of this machine is 2.5 GHz, the bandwidth of this machine 680 G/s per core.
The text was updated successfully, but these errors were encountered:
Is 680 G/s memory bandwidth? It seems invalid. You also didn't post the data of llama.cpp. It would be more helpful if you provide the model architecture , whether 4bit or 2bit, and device name.
Sorry, the data was pasted wrong. Here‘s llama.cpp's data which used model bitnet_b1_58-3B and thread 4.
And then I tested Llama-2-7b-EfficientQAT-w2g128-GPTQ、Llama-2-7b-EfficientQAT-w4g128-GPTQ, which have the same results(there is no difference of the E2E performance between T-MAC and llama.cpp)
And I computed the bandwidth of this machine again,whis is 340 G/s. Sorry about that.
Look forward to your reply. Thk.
@ppp-max Your speed is quite low while the memory bandwidth is strangely high. May I double check that 340 is G bits or G bytes? The speed you provide is close to our Raspberry Pi, while its memory bandwidth is only about 48 GB/s. And do you see obvious speed gap between T-MAC and llama.cpp using one single thread? If that's the case, we tend to consider that the 4 threads case meets memory bound, as the roofline model we show in our main page,
I used a ARM machine to test the end-to-end output, but the performance does not match the results mentioned in the paper. The tested data of llama.cpp and T-MAC is nearly same. I've posted the measured data below.
And the frequency of this machine is 2.5 GHz, the bandwidth of this machine 680 G/s per core.
The text was updated successfully, but these errors were encountered: