Misc. bug: performance regression in `llama-server` (`ggml-vulkan`)

### Name and Version

master branch

### Operating systems

Mac

### Which llama.cpp modules do you know to be affected?

llama-server

### Command line

```shell
llama-server --model /path/to/llama3.2 --host 0.0.0.0 --port 11434 --n-gpu-layers 99
```

### Problem description & steps to reproduce

 I think https://github.com/ggml-org/llama.cpp/pull/16736 introduced a regression in the performance of `llama-server`, when backed by `ggml-vulkan` (but no regression with `ggml-metal`, which seems strange):

* on [2025-11-02](https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-psap-topsail-main-mac_ai-jump-ci-cpt/1984953945801363456/artifacts/jump-ci-cpt/005-test/artifacts/test-artifacts/004__plots/regression_report.html) `b6923`
    * TTFT `3777ms`
    * ITL `41ms` 

* on [2025-11-03](https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-psap-topsail-main-mac_ai-jump-ci-cpt/1985316382480273408/artifacts/jump-ci-cpt/005-test/artifacts/test-artifacts/004__plots/regression_report.html) `b6933`
    * TTFT `7459ms`
    * ITL `62ms`

Note that `llama-bench` isn't affected, `pp512=223t/s` and `tg128=28t/s` didn't degrade.

I looked further at the performance of the commits between these two days (tests ran on another system, so the absolute values are different)

`b6924..b6927`

<img width="1200" height="650" alt="Image" src="https://github.com/user-attachments/assets/ff7b2fdb-4e3a-4d48-a82f-5ab29c1c1868" />

```
    - b6924
    - cd5e3b575 # b6925
    - 2f966b8ed # b6926
    - b6927
 ```

<img width="1200" height="650" alt="Image" src="https://github.com/user-attachments/assets/d55ecfac-92cb-4e46-b671-1d3842409654" />

and they highlight that the performance dropped occurred with cd5e3b575, when https://github.com/ggml-org/llama.cpp/pull/16736 was merged.

### First Bad Commit

cd5e3b575

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: performance regression in `llama-server` (`ggml-vulkan`) #17033

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: performance regression in llama-server (ggml-vulkan) #17033

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Misc. bug: performance regression in `llama-server` (`ggml-vulkan`) #17033