Skip to content

Releases: ggerganov/llama.cpp

b4628

04 Feb 01:00
cde3833
Compare
Choose a tag to compare
`tool-call`: allow `--chat-template chatml` w/ `--jinja`, default to …

b4623

03 Feb 12:57
21c84b5
Compare
Choose a tag to compare
CUDA: fix Volta FlashAttention logic (#11615)

b4621

02 Feb 23:22
6eecde3
Compare
Choose a tag to compare
HIP: fix flash_attn_stream_k_fixup warning (#11604)

b4620

02 Feb 22:24
396856b
Compare
Choose a tag to compare
CUDA/HIP: add support for selectable warp size to mmv (#11519)

CUDA/HIP: add support for selectable warp size to mmv

b4619

02 Feb 21:55
4d0598e
Compare
Choose a tag to compare
HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectu…

b4618

02 Feb 20:57
90f9b88
Compare
Choose a tag to compare
nit: more informative crash when grammar sampler fails (#11593)

b4617

02 Feb 19:12
864a0b6
Compare
Choose a tag to compare
CUDA: use mma PTX instructions for FlashAttention (#11583)

* CUDA: use mma PTX instructions for FlashAttention

* __shfl_sync workaround for movmatrix

* add __shfl_sync to HIP

Co-authored-by: Diego Devesa <[email protected]>

b4616

02 Feb 15:56
84ec8a5
Compare
Choose a tag to compare
Name colors (#11573)

It's more descriptive, use #define's so we can use compile-time
concatenations.

Signed-off-by: Eric Curtin <[email protected]>

b4615

02 Feb 10:26
bfcce4d
Compare
Choose a tag to compare
`tool-call`: support Command R7B (+ return tool_plan "thoughts" in AP…

b4614

02 Feb 10:11
6980448
Compare
Choose a tag to compare
Fix exotic ci env that lacks ostringstream::str (#11581)