v0.9.2

XprobeBot released this 08 Mar 06:09

· 527 commits to main since this release

29f4c10

What's new in 0.9.2 (2024-03-08)

These are the changes in inference v0.9.2.

New features

FEAT: Add a command / SDK interface to query which models are able to… by @hainaweiben in #1076
FEAT: add a docker-compose-distributed example with multiple workers by @bufferoverflow in #1064
FEAT: Support download and merge multiple parts of gguf files by @notsyncing in #1075
FEAT: Supports LoRA for LLM and image models by @ChengjieLi28 in #1080

Enhancements

ENH: Supports n_gpu_layers parameter for llama-cpp-python by @ChengjieLi28 in #1070
ENH: Add a dropdown to the web UI to support adjusting GPU offload layers for llama.cpp loader by @notsyncing in #1073
ENH: [UI] Show replica on running model page by @ChengjieLi28 in #1093
ENH: Add "[DONE]" to the end of stream generation for better openai SDK compatibility by @ZhangTianrong in #1062
ENH: [UI] Support setting CPU when selecting n_gpu by @ChengjieLi28 in #1096

Documentation

DOC: Extra parameters for launching models by @aresnow1 in #1077
DOC: contribution doc by @Ago327 in #1092
DOC: doc for lora by @ChengjieLi28 in #1103

Others

Update llm_family.json to correct the context length of glaive coder by @mikeshi80 in #1083

New Contributors

@mikeshi80 made their first contribution in #1083
@bufferoverflow made their first contribution in #1064
@Ago327 made their first contribution in #1092

Full Changelog: v0.9.1...v0.9.2

Contributors

bufferoverflow, mikeshi80, and 6 other contributors

Assets 2