Skip to content

Releases: ggerganov/llama.cpp

b4601

31 Jan 05:34
a2df278
Compare
Choose a tag to compare
server : update help metrics processing/deferred (#11512)

This commit updates the help text for the metrics `requests_processing`
and `requests_deferred` to be more grammatically correct.

Currently the returned metrics look like this:
```console
\# HELP llamacpp:requests_processing Number of request processing.
\# TYPE llamacpp:requests_processing gauge
llamacpp:requests_processing 0
\# HELP llamacpp:requests_deferred Number of request deferred.
\# TYPE llamacpp:requests_deferred gauge
llamacpp:requests_deferred 0
```

With this commit, the metrics will look like this:
```console
\# HELP llamacpp:requests_processing Number of requests processing.
\# TYPE llamacpp:requests_processing gauge
llamacpp:requests_processing 0
\# HELP llamacpp:requests_deferred Number of requests deferred.
\# TYPE llamacpp:requests_deferred gauge
llamacpp:requests_deferred 0
```
This is also consistent with the description of the metrics in the
server examples [README.md](https://github.com/ggerganov/llama.cpp/tree/master/examples/server#get-metrics-prometheus-compatible-metrics-exporter).

b4600

30 Jan 22:47
553f1e4
Compare
Choose a tag to compare
`ci`: ccache for all github worfklows (#11516)

b4599

30 Jan 20:00
8b576b6
Compare
Choose a tag to compare
Tool call support (generic + native for Llama, Functionary, Hermes, M…

b4598

30 Jan 17:15
Compare
Choose a tag to compare
HIP: require at least HIP 5.5

b4595

30 Jan 11:34
3d804de
Compare
Choose a tag to compare
sync: minja (#11499)

b4594

30 Jan 11:21
ffd0821
Compare
Choose a tag to compare
vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#1…

b4589

29 Jan 20:15
eb7cf15
Compare
Choose a tag to compare
server : add /apply-template endpoint for additional use cases of Min…

b4588

29 Jan 18:33
66ee4f2
Compare
Choose a tag to compare
vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360)

* vulkan: initial support for IQ3_S

* vulkan: initial support for IQ3_XXS

* vulkan: initial support for IQ2_XXS

* vulkan: initial support for IQ2_XS

* vulkan: optimize Q3_K by removing branches

* vulkan: implement dequantize variants for coopmat2

* vulkan: initial support for IQ2_S

* vulkan: vertically realign code

* port failing dequant callbacks from mul_mm

* Fix array length mismatches

* vulkan: avoid using workgroup size before it is referenced

* tests: increase timeout for Vulkan llvmpipe backend

---------

Co-authored-by: Jeff Bolz <[email protected]>

b4586

29 Jan 16:13
2711d02
Compare
Choose a tag to compare
vulkan: Catch pipeline creation failure and print an error message (#…

b4585

29 Jan 12:05
f0d4b29
Compare
Choose a tag to compare
Parse https://ollama.com/library/ syntax (#11480)

People search for ollama models using the web ui, this change
allows one to copy the url from the browser and for it to be
compatible with llama-run.

Signed-off-by: Eric Curtin <[email protected]>