Releases: ggerganov/llama.cpp
Releases Β· ggerganov/llama.cpp
b4601
server : update help metrics processing/deferred (#11512) This commit updates the help text for the metrics `requests_processing` and `requests_deferred` to be more grammatically correct. Currently the returned metrics look like this: ```console \# HELP llamacpp:requests_processing Number of request processing. \# TYPE llamacpp:requests_processing gauge llamacpp:requests_processing 0 \# HELP llamacpp:requests_deferred Number of request deferred. \# TYPE llamacpp:requests_deferred gauge llamacpp:requests_deferred 0 ``` With this commit, the metrics will look like this: ```console \# HELP llamacpp:requests_processing Number of requests processing. \# TYPE llamacpp:requests_processing gauge llamacpp:requests_processing 0 \# HELP llamacpp:requests_deferred Number of requests deferred. \# TYPE llamacpp:requests_deferred gauge llamacpp:requests_deferred 0 ``` This is also consistent with the description of the metrics in the server examples [README.md](https://github.com/ggerganov/llama.cpp/tree/master/examples/server#get-metrics-prometheus-compatible-metrics-exporter).
b4600
`ci`: ccache for all github worfklows (#11516)
b4599
Tool call support (generic + native for Llama, Functionary, Hermes, Mβ¦
b4598
HIP: require at least HIP 5.5
b4595
sync: minja (#11499)
b4594
vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#1β¦
b4589
server : add /apply-template endpoint for additional use cases of Minβ¦
b4588
vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360) * vulkan: initial support for IQ3_S * vulkan: initial support for IQ3_XXS * vulkan: initial support for IQ2_XXS * vulkan: initial support for IQ2_XS * vulkan: optimize Q3_K by removing branches * vulkan: implement dequantize variants for coopmat2 * vulkan: initial support for IQ2_S * vulkan: vertically realign code * port failing dequant callbacks from mul_mm * Fix array length mismatches * vulkan: avoid using workgroup size before it is referenced * tests: increase timeout for Vulkan llvmpipe backend --------- Co-authored-by: Jeff Bolz <[email protected]>
b4586
vulkan: Catch pipeline creation failure and print an error message (#β¦
b4585
Parse https://ollama.com/library/ syntax (#11480) People search for ollama models using the web ui, this change allows one to copy the url from the browser and for it to be compatible with llama-run. Signed-off-by: Eric Curtin <[email protected]>