Skip to content

Misc. bug: performance regression in llama-server (ggml-vulkan) #17033

@kpouget

Description

@kpouget

Name and Version

master branch

Operating systems

Mac

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server --model /path/to/llama3.2 --host 0.0.0.0 --port 11434 --n-gpu-layers 99

Problem description & steps to reproduce

I think #16736 introduced a regression in the performance of llama-server, when backed by ggml-vulkan (but no regression with ggml-metal, which seems strange):

Note that llama-bench isn't affected, pp512=223t/s and tg128=28t/s didn't degrade.

I looked further at the performance of the commits between these two days (tests ran on another system, so the absolute values are different)

b6924..b6927

Image
    - b6924
    - cd5e3b575 # b6925
    - 2f966b8ed # b6926
    - b6927
Image

and they highlight that the performance dropped occurred with cd5e3b5, when #16736 was merged.

First Bad Commit

cd5e3b5

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions