why llama.cpp inference with Vulkan backend by using Android GPU has very bad performance #9464
                  
                    
                      FranzKafkaYu
                    
                  
                
                  started this conversation in
                General
              
            Replies: 2 comments 1 reply
-
| with same model,same prompt,same output: reference with pure CPU will cost 1500ms~1700ms reference with Vulkan(GPU) will cost 24000ms~25000ms | 
Beta Was this translation helpful? Give feedback.
                  
                    1 reply
                  
                
            -
| Has anyone managed to find a solution? | 
Beta Was this translation helpful? Give feedback.
                  
                    0 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
related issuse
I have tried many times when I enable Vulkan backend with GPU acceleration,the performance is very bad.Currently I have tried Qulcomm Adreno GPU and ARM Mali GPU,Adreno GPU will faied in loading model,and Mali GPU can load model and execute inference but the Performance is very bad.
I tried other projects,such as MLC-LLM and MediaPipe,they work with GPU and the performance is decent.Why llama.cpp can't compete with these projects?Can someone explain this to me?
Beta Was this translation helpful? Give feedback.
All reactions