Releases: CodeLinaro/llama.cpp
Releases · CodeLinaro/llama.cpp
b3799
ggml-alloc : fix list of allocated tensors with GGML_ALLOCATOR_DEBUG …
b3798
Update CUDA graph on scale change plus clear nodes/params (#9550) * Avoid using saved CUDA graph if scale changes and reset nodes/params on update Fixes https://github.com/ggerganov/llama.cpp/issues/9451 * clear before resize
b3796
quantize : improve type name parsing (#9570) quantize : do not ignore invalid types in arg parsing quantize : ignore case of type and ftype arguments
b3795
ggml : fix builds (#0) ggml-ci
b3790
CUDA: fix sum.cu compilation for CUDA < 11.7 (#9562)
b3787
server : clean-up completed tasks from waiting list (#9531) ggml-ci
b3785
ggml : fix n_threads_cur initialization with one thread (#9538) * ggml : fix n_threads_cur initialization with one thread * Update ggml/src/ggml.c --------- Co-authored-by: Max Krasnyansky <[email protected]>
b3772
ggml : move common CPU backend impl to new header (#9509)
b3749
feat: remove a sampler from a chain (#9445) * feat: remove a sampler from a chain * fix: return removed sampler * fix: safer casting
b3733
llama : skip token bounds check when evaluating embeddings (#9437)