Releases · ggerganov/llama.cpp

31 Jan 05:34

a2df278

b4601

server : update help metrics processing/deferred (#11512)

This commit updates the help text for the metrics `requests_processing`
and `requests_deferred` to be more grammatically correct.

Currently the returned metrics look like this:
```console
\# HELP llamacpp:requests_processing Number of request processing.
\# TYPE llamacpp:requests_processing gauge
llamacpp:requests_processing 0
\# HELP llamacpp:requests_deferred Number of request deferred.
\# TYPE llamacpp:requests_deferred gauge
llamacpp:requests_deferred 0
```

With this commit, the metrics will look like this:
```console
\# HELP llamacpp:requests_processing Number of requests processing.
\# TYPE llamacpp:requests_processing gauge
llamacpp:requests_processing 0
\# HELP llamacpp:requests_deferred Number of requests deferred.
\# TYPE llamacpp:requests_deferred gauge
llamacpp:requests_deferred 0
```
This is also consistent with the description of the metrics in the
server examples [README.md](https://github.com/ggerganov/llama.cpp/tree/master/examples/server#get-metrics-prometheus-compatible-metrics-exporter).

Assets 22

30 Jan 22:47

github-actions

b4600

553f1e4

b4600

`ci`: ccache for all github worfklows (#11516)

Assets 22

30 Jan 20:00

github-actions

b4599

8b576b6

b4599

Tool call support (generic + native for Llama, Functionary, Hermes, M…

Assets 23

30 Jan 17:15

github-actions

b4598

27d135c

b4598

HIP: require at least HIP 5.5

Assets 23

30 Jan 11:34

github-actions

b4595

3d804de

b4595

sync: minja (#11499)

Assets 23

30 Jan 11:21

github-actions

b4594

ffd0821

b4594

vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#1…

Assets 23

29 Jan 20:15

github-actions

b4589

eb7cf15

b4589

server : add /apply-template endpoint for additional use cases of Min…

Assets 23

29 Jan 18:33

github-actions

b4588

66ee4f2

b4588

vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360)

* vulkan: initial support for IQ3_S

* vulkan: initial support for IQ3_XXS

* vulkan: initial support for IQ2_XXS

* vulkan: initial support for IQ2_XS

* vulkan: optimize Q3_K by removing branches

* vulkan: implement dequantize variants for coopmat2

* vulkan: initial support for IQ2_S

* vulkan: vertically realign code

* port failing dequant callbacks from mul_mm

* Fix array length mismatches

* vulkan: avoid using workgroup size before it is referenced

* tests: increase timeout for Vulkan llvmpipe backend

---------

Co-authored-by: Jeff Bolz <[email protected]>

Assets 23

29 Jan 16:13

github-actions

b4586

2711d02

b4586

vulkan: Catch pipeline creation failure and print an error message (#…

Assets 23

29 Jan 12:05

github-actions

b4585

f0d4b29

b4585

Parse https://ollama.com/library/ syntax (#11480)

People search for ollama models using the web ui, this change
allows one to copy the url from the browser and for it to be
compatible with llama-run.

Signed-off-by: Eric Curtin <[email protected]>

Assets 23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b4601

b4600

b4599

b4598

b4595

b4594

b4589

b4588

b4586

b4585