Releases: CodeLinaro/llama.cpp
Releases · CodeLinaro/llama.cpp
b4255
vulkan: optimize and reenable split_k (#10637) Use vector loads when possible in mul_mat_split_k_reduce. Use split_k when there aren't enough workgroups to fill the shaders.
b4242
llama : add enum for built-in chat templates (#10623) * llama : add enum for supported chat templates * use "built-in" instead of "supported" * arg: print list of built-in templates * fix test * update server README
b4226
ggml : move AMX to the CPU backend (#10570) * ggml : move AMX to the CPU backend --------- Co-authored-by: Georgi Gerganov <[email protected]>
b4224
imatrix : support combine-only (#10492) * imatrix-combine-only idea * ensured that behavior consistent with log
b4215
ggml : remove redundant copyright notice + update authors
b4202
common : fix duplicated file name with hf_repo and hf_file (#10550)
b4191
ci : fix cuda releases (#10532)
b4174
vulkan: Fix a vulkan-shaders-gen arugment parsing error (#10484) The vulkan-shaders-gen was not parsing the --no-clean argument correctly. Because the previous code was parsing the arguments which have a value only and the --no-clean argument does not have a value, it was not being parsed correctly. This commit can now correctly parse arguments that don't have values.
b4173
Introduce llama-run (#10291) It's like simple-chat but it uses smart pointers to avoid manual memory cleanups. Less memory leaks in the code now. Avoid printing multiple dots. Split code into smaller functions. Uses no exception handling. Signed-off-by: Eric Curtin <[email protected]>
b4170
server : enable cache_prompt by default (#10501) ggml-ci