-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Open
Labels
Description
Name and Version
master branch
Operating systems
Mac
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server --model /path/to/llama3.2 --host 0.0.0.0 --port 11434 --n-gpu-layers 99Problem description & steps to reproduce
I think #16736 introduced a regression in the performance of llama-server, when backed by ggml-vulkan (but no regression with ggml-metal, which seems strange):
-
on 2025-11-02
b6923- TTFT
3777ms - ITL
41ms
- TTFT
-
on 2025-11-03
b6933- TTFT
7459ms - ITL
62ms
- TTFT
Note that llama-bench isn't affected, pp512=223t/s and tg128=28t/s didn't degrade.
I looked further at the performance of the commits between these two days (tests ran on another system, so the absolute values are different)
b6924..b6927
- b6924
- cd5e3b575 # b6925
- 2f966b8ed # b6926
- b6927
and they highlight that the performance dropped occurred with cd5e3b5, when #16736 was merged.