Skip to content

v0.9.2

Compare
Choose a tag to compare
@XprobeBot XprobeBot released this 08 Mar 06:09
· 527 commits to main since this release
29f4c10

What's new in 0.9.2 (2024-03-08)

These are the changes in inference v0.9.2.

New features

Enhancements

  • ENH: Supports n_gpu_layers parameter for llama-cpp-python by @ChengjieLi28 in #1070
  • ENH: Add a dropdown to the web UI to support adjusting GPU offload layers for llama.cpp loader by @notsyncing in #1073
  • ENH: [UI] Show replica on running model page by @ChengjieLi28 in #1093
  • ENH: Add "[DONE]" to the end of stream generation for better openai SDK compatibility by @ZhangTianrong in #1062
  • ENH: [UI] Support setting CPU when selecting n_gpu by @ChengjieLi28 in #1096

Documentation

Others

  • Update llm_family.json to correct the context length of glaive coder by @mikeshi80 in #1083

New Contributors

Full Changelog: v0.9.1...v0.9.2