Releases: ggerganov/llama.cpp
Releases · ggerganov/llama.cpp
b4628
b4623
CUDA: fix Volta FlashAttention logic (#11615)
b4621
HIP: fix flash_attn_stream_k_fixup warning (#11604)
b4620
CUDA/HIP: add support for selectable warp size to mmv (#11519) CUDA/HIP: add support for selectable warp size to mmv
b4619
HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectu…
b4618
nit: more informative crash when grammar sampler fails (#11593)
b4617
CUDA: use mma PTX instructions for FlashAttention (#11583) * CUDA: use mma PTX instructions for FlashAttention * __shfl_sync workaround for movmatrix * add __shfl_sync to HIP Co-authored-by: Diego Devesa <[email protected]>
b4616
Name colors (#11573) It's more descriptive, use #define's so we can use compile-time concatenations. Signed-off-by: Eric Curtin <[email protected]>
b4615
`tool-call`: support Command R7B (+ return tool_plan "thoughts" in AP…
b4614
Fix exotic ci env that lacks ostringstream::str (#11581)