Releases: xorbitsai/inference
Releases · xorbitsai/inference
v0.10.1
What's new in 0.10.1 (2024-04-12)
These are the changes in inference v0.10.1.
New features
- FEAT: add support for qwen1.5 32B chat model by @mikeshi80 in #1249
- FEAT: Support Qwen MoE model for huggingface and modelscope by @xiaodouzi666 in #1263
- FEAT: Enable streaming in tool calls for Qwen when using vllm by @zhanghx0905 in #1215
Enhancements
- ENH: make function create_embedding could receive extra args by @amumu96 in #1224
- ENH: support more GPTQ and AWQ format for some models by @xiaodouzi666 in #1243
- ENH: support multi gpus for qwen-vl and yi-vl by @qinxuye in #1236
- ENH: support llamacpp multiple gpu by @amumu96 in #1229
- ENH: UI: paper material for cards by @Minamiyama in #1261
- REF: Refactor launch model for Web UI by @yiboyasss in #1254
- REF: Remove ctransformers supports by @mujin2 in #1267
Bug fixes
- BUG: Fix docker cpu build by @ChengjieLi28 in #1213
- BUG: Fix cannot start xinference in docker due to
cv2
by @ChengjieLi28 in #1217 - BUG: Cannot start xinference in docker by @ChengjieLi28 in #1219
- BUG: Fix
opencv
issue in docker container by @ChengjieLi28 in #1227 - BUG: Fix the launch bug of OmnilMM 12B. by @hainaweiben in #1241
- BUG: style spell error by @Minamiyama in #1247
- BUG: Fix issue with supervisor not clearing information after worker exit by @hainaweiben in #1231
- BUG: custom models on the web ui by @yiboyasss in #1259
- BUG: fix system prompts for chatglm3 and internlm2 pytorch by @qinxuye in #1271
- BUG: Fix authority and jump issue by @yiboyasss in #1276
- BUG: fix custom vision model by @qinxuye in #1280
Tests
- TST: Fix tests due to
llama-cpp-python
v0.2.58
by @ChengjieLi28 in #1242
Documentation
- DOC: auto gen vllm doc & add chatglm3-{32k, 128k} support for vllm by @qinxuye in #1234
- DOC: update models doc by @qinxuye in #1246
- DOC: update readme by @qinxuye in #1268
New Contributors
- @amumu96 made their first contribution in #1224
- @xiaodouzi666 made their first contribution in #1243
- @yiboyasss made their first contribution in #1254
Full Changelog: v0.10.0...v0.10.1
v0.10.0
What's new in 0.10.0 (2024-03-29)
These are the changes in inference v0.10.0.
New features
- FEAT: launch UI of audio model. by @hainaweiben in #1102
- FEAT: Supports
OmniLMM
chat model by @hainaweiben in #1171 - FEAT: Added vllm support for deepseek models by @ivanzfb in #1200
- FEAT: force to specify worker ip and gpu idx when launching models by @ChengjieLi28 in #1195
- FEAT: OAuth system supports api-key by @Ago327 in #1168
- FEAT: Support deepseek vl by @codingl2k1 in #1175
- FEAT: support some builtin new models by @mujin2 in #1204
Enhancements
- BLD: add autoawq in setup by @utopia2077 in #1190
Bug fixes
- BUG: Fix the incorrect model interface address caused a 307 redirect to HTTP, blocking the request and preventing the display of the model list. by @wertycn in #1182
- BUG: fix doc fail introduced by #1171 & update readme by @qinxuye in #1203
- BUG: Increase validator types for thie 'input' parameter of embeddings to match OpenAI API by @Minamiyama in #1201
Documentation
- DOC: internal design by @1572161937 in #1178
- Doc: update readme and models doc by @qinxuye in #1176
- DOC: Doc for oauth system with api-key by @ChengjieLi28 in #1210
New Contributors
- @utopia2077 made their first contribution in #1190
- @ivanzfb made their first contribution in #1200
Full Changelog: v0.9.4...v0.10.0
v0.9.4
What's new in 0.9.4 (2024-03-21)
These are the changes in inference v0.9.4.
New features
- FEAT: Support CodeShell model by @hainaweiben in #1166
- FEAT: Supports
sglang
backend by @ChengjieLi28 in #1161
Enhancements
- ENH: vLLM latest models support by @1572161937 in #1155
Bug fixes
- BUG: remove
best_of
from benchmark by @qinxuye in #1150 - BUG: fix _eval_qwen_chat_arguments parsing problem by @channingxiao18 in #1098
- BUG: Fix OpenAI compatibility issue during chat by @mujin2 in #1159
Documentation
- DOC: Update doc by @codingl2k1 in #1156
Others
New Contributors
- @channingxiao18 made their first contribution in #1098
- @1572161937 made their first contribution in #1155
Full Changelog: v0.9.3...v0.9.4
v0.9.3
What's new in 0.9.3 (2024-03-15)
These are the changes in inference v0.9.3.
New features
- FEAT: Add Yi-9B by @mujin2 in #1117
- FEAT: Provided the function of generate image. by @hainaweiben in #1047
Enhancements
- ENH: update cmd help info by @luweizheng in #1106
- ENH: Remove quantization limits for Apple METAL device when running model via
llama-cpp-python
by @ChengjieLi28 in #1134 - ENH: Make GET /v1/models compatible with OpenAI API. by @notsyncing in #1127
- ENH: support vllm>=0.3.1 by @qinxuye in #1145
Bug fixes
- BUG: fix the useless fstring. by @mikeshi80 in #1130
- BUG: Fixing the issue of model list loading failure caused by a large number of invalid requests on the model list page. by @wertycn in #1111
- BUG: Fix cache status for embedding, rerank and image models on the web UI by @ChengjieLi28 in #1135
- BUG: Fix missing information for
xinference registrations
andxinference list
command by @ChengjieLi28 in #1140 - BUG: Fix cannot continue to chat after canceling the streaming chat via
ctrl+c
by @ChengjieLi28 in #1144
Tests
- TST: Remove testing LLM model creating embedding by @ChengjieLi28 in #1121
Documentation
New Contributors
- @luweizheng made their first contribution in #1106
- @mujin2 made their first contribution in #1117
- @wertycn made their first contribution in #1111
Full Changelog: v0.9.2...v0.9.3
v0.9.2
What's new in 0.9.2 (2024-03-08)
These are the changes in inference v0.9.2.
New features
- FEAT: Add a command / SDK interface to query which models are able to… by @hainaweiben in #1076
- FEAT: add a docker-compose-distributed example with multiple workers by @bufferoverflow in #1064
- FEAT: Support download and merge multiple parts of gguf files by @notsyncing in #1075
- FEAT: Supports LoRA for LLM and image models by @ChengjieLi28 in #1080
Enhancements
- ENH: Supports
n_gpu_layers
parameter forllama-cpp-python
by @ChengjieLi28 in #1070 - ENH: Add a dropdown to the web UI to support adjusting GPU offload layers for llama.cpp loader by @notsyncing in #1073
- ENH: [UI] Show
replica
on running model page by @ChengjieLi28 in #1093 - ENH: Add "[DONE]" to the end of stream generation for better openai SDK compatibility by @ZhangTianrong in #1062
- ENH: [UI] Support setting
CPU
when selecting n_gpu by @ChengjieLi28 in #1096
Documentation
- DOC: Extra parameters for launching models by @aresnow1 in #1077
- DOC: contribution doc by @Ago327 in #1092
- DOC: doc for lora by @ChengjieLi28 in #1103
Others
- Update llm_family.json to correct the context length of glaive coder by @mikeshi80 in #1083
New Contributors
- @mikeshi80 made their first contribution in #1083
- @bufferoverflow made their first contribution in #1064
- @Ago327 made their first contribution in #1092
Full Changelog: v0.9.1...v0.9.2
v0.9.1
What's new in 0.9.1 (2024-03-01)
These are the changes in inference v0.9.1.
New features
- FEAT: Docker for cpu only by @ChengjieLi28 in #1068
Enhancements
- ENH: Support downloading gemma from modelscope by @aresnow1 in #1035
- ENH: [UI] Setting
quantization
when registering LLM by @ChengjieLi28 in #1040 - ENH: Restful client supports multiple system prompts for chat by @ChengjieLi28 in #1056
- ENH: supports disabling worker reporting status by @ChengjieLi28 in #1057
- ENH: Extra params for
xinference launch
command line by @ChengjieLi28 in #1048
Bug fixes
- BUG: Fix some models that cannot download from
modelscope
by @ChengjieLi28 in #1066 - BUG: Fix early truncation due to
max_token
being default to16
instead of1024
by @ZhangTianrong in #1061
Documentation
- DOC: Update readme by @qinxuye in #1045
- DOC: Fix readme by @qinxuye in #1054
- DOC: Fix wechat links by @qinxuye in #1055
New Contributors
- @ZhangTianrong made their first contribution in #1061
Full Changelog: v0.9.0...v0.9.1
v0.9.0
What's new in 0.9.0 (2024-02-22)
These are the changes in inference v0.9.0.
New features
- FEAT: Refactor device related code and add initial Intel GPU support by @notsyncing in #968
- FEAT: Support gemma series model by @aresnow1 in #1024
Enhancements
- ENH: [UI] Supports
replica
when launching LLM models by @ChengjieLi28 in #1011 - ENH: [UI] Show cluster resource information by @ChengjieLi28 in #1015
Bug fixes
- BUG: fix chat completion error when indexing body.messages by @fffonion in #1008
- BUG: Fix cache sd 1.5 error by @codingl2k1 in #1013
- BUG: fix typo in modelscope llama-2-13b-chat-GGUF by @qinxuye in #1026
- BUG: Fix missing qwen 1.5 7b gguf by @codingl2k1 in #1027
Documentation
- DOC: Polish model operation command doc by @onesuper in #1000
- DOC: Fix note on secret_key generation and algorithm selection for OAuth2 by @ChengjieLi28 in #1012
New Contributors
- @fffonion made their first contribution in #1008
- @notsyncing made their first contribution in #968
Full Changelog: v0.8.5...v0.9.0
v0.8.5
What's new in 0.8.5 (2024-02-06)
These are the changes in inference v0.8.5.
New features
- FEAT: Implemented web UI for launching the text2image model. by @hainaweiben in #985
- FEAT: Support qwen-1.5 series by @aresnow1 in #994
Enhancements
- ENH: Download stable diffusion model from modelscope by @codingl2k1 in #980
- REF: Supports
pydantic
v2 by @ChengjieLi28 in #983
Bug fixes
- BUG: Fix load yi vl model to multiple cards by @codingl2k1 in #992
- BUG: client compatible with old version of xinference by @ChengjieLi28 in #987
Others
New Contributors
- @hainaweiben made their first contribution in #985
Full Changelog: v0.8.4...v0.8.5
v0.8.4
What's new in 0.8.4 (2024-02-04)
These are the changes in inference v0.8.4.
Enhancements
- ENH: [UI] Fix too long LLM model name by @ChengjieLi28 in #979
- ENH: Add gguf models of llama-2-chat by @aresnow1 in #981
Bug fixes
- BUG: Fix custom model tool calls by @codingl2k1 in #978
- BUG: Fix chat template by @aresnow1 in #977
Documentation
- DOC: Translate model docs by @onesuper in #965
- DOC: Auto gen metrics doc by @codingl2k1 in #967
- DOC: Update README.md by @codingl2k1 in #969
Full Changelog: v0.8.3.1...v0.8.4
v0.8.3.1
What's new in 0.8.3.1 (2024-02-02)
These are the changes in inference v0.8.3.1.
Bug fixes
- BUG: Remove flash-attn dependency by @codingl2k1 in #970
Full Changelog: v0.8.3...v0.8.3.1