why llama.cpp inference with Vulkan backend by using Android GPU has very bad performance #9464

FranzKafkaYu · 2024-09-13T07:05:25Z

FranzKafkaYu
Sep 13, 2024

I have tried many times when I enable Vulkan backend with GPU acceleration,the performance is very bad.Currently I have tried Qulcomm Adreno GPU and ARM Mali GPU,Adreno GPU will faied in loading model,and Mali GPU can load model and execute inference but the Performance is very bad.

I tried other projects,such as MLC-LLM and MediaPipe,they work with GPU and the performance is decent.Why llama.cpp can't compete with these projects?Can someone explain this to me？

FranzKafkaYu · 2024-09-13T07:14:54Z

FranzKafkaYu
Sep 13, 2024
Author

with same model,same prompt,same output:

reference with pure CPU will cost 1500ms~1700ms

reference with Vulkan(GPU) will cost 24000ms~25000ms

1 reply

kinchahoy Nov 5, 2024

+1 I see the same issue with Vulkan on ARM RK3588 devices. I don't think it's a driver issue.

amirvenus · 2025-04-17T23:09:24Z

amirvenus
Apr 17, 2025

Has anyone managed to find a solution?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

why llama.cpp inference with Vulkan backend by using Android GPU has very bad performance #9464

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

why llama.cpp inference with Vulkan backend by using Android GPU has very bad performance #9464

Uh oh!

FranzKafkaYu Sep 13, 2024

Replies: 2 comments · 1 reply

Uh oh!

Uh oh!

FranzKafkaYu Sep 13, 2024 Author

Uh oh!

kinchahoy Nov 5, 2024

Uh oh!

amirvenus Apr 17, 2025

FranzKafkaYu
Sep 13, 2024

Replies: 2 comments 1 reply

FranzKafkaYu
Sep 13, 2024
Author

amirvenus
Apr 17, 2025