12 Apr 02:47

XprobeBot

e3a947e

v0.10.1

What's new in 0.10.1 (2024-04-12)

These are the changes in inference v0.10.1.

New features

FEAT: add support for qwen1.5 32B chat model by @mikeshi80 in #1249
FEAT: Support Qwen MoE model for huggingface and modelscope by @xiaodouzi666 in #1263
FEAT: Enable streaming in tool calls for Qwen when using vllm by @zhanghx0905 in #1215

Enhancements

ENH: make function create_embedding could receive extra args by @amumu96 in #1224
ENH: support more GPTQ and AWQ format for some models by @xiaodouzi666 in #1243
ENH: support multi gpus for qwen-vl and yi-vl by @qinxuye in #1236
ENH: support llamacpp multiple gpu by @amumu96 in #1229
ENH: UI: paper material for cards by @Minamiyama in #1261
REF: Refactor launch model for Web UI by @yiboyasss in #1254
REF: Remove ctransformers supports by @mujin2 in #1267

Bug fixes

BUG: Fix docker cpu build by @ChengjieLi28 in #1213
BUG: Fix cannot start xinference in docker due to cv2 by @ChengjieLi28 in #1217
BUG: Cannot start xinference in docker by @ChengjieLi28 in #1219
BUG: Fix opencv issue in docker container by @ChengjieLi28 in #1227
BUG: Fix the launch bug of OmnilMM 12B. by @hainaweiben in #1241
BUG: style spell error by @Minamiyama in #1247
BUG: Fix issue with supervisor not clearing information after worker exit by @hainaweiben in #1231
BUG: custom models on the web ui by @yiboyasss in #1259
BUG: fix system prompts for chatglm3 and internlm2 pytorch by @qinxuye in #1271
BUG: Fix authority and jump issue by @yiboyasss in #1276
BUG: fix custom vision model by @qinxuye in #1280

Tests

TST: Fix tests due to llama-cpp-python v0.2.58 by @ChengjieLi28 in #1242

Documentation

DOC: auto gen vllm doc & add chatglm3-{32k, 128k} support for vllm by @qinxuye in #1234
DOC: update models doc by @qinxuye in #1246
DOC: update readme by @qinxuye in #1268

New Contributors

@amumu96 made their first contribution in #1224
@xiaodouzi666 made their first contribution in #1243
@yiboyasss made their first contribution in #1254

Full Changelog: v0.10.0...v0.10.1

Contributors

qinxuye, Minamiyama, and 8 other contributors

Assets 2

29 Mar 04:56

XprobeBot

v0.10.0

2857ec4

v0.10.0

What's new in 0.10.0 (2024-03-29)

These are the changes in inference v0.10.0.

New features

FEAT: launch UI of audio model. by @hainaweiben in #1102
FEAT: Supports OmniLMM chat model by @hainaweiben in #1171
FEAT: Added vllm support for deepseek models by @ivanzfb in #1200
FEAT: force to specify worker ip and gpu idx when launching models by @ChengjieLi28 in #1195
FEAT: OAuth system supports api-key by @Ago327 in #1168
FEAT: Support deepseek vl by @codingl2k1 in #1175
FEAT: support some builtin new models by @mujin2 in #1204

Enhancements

BLD: add autoawq in setup by @utopia2077 in #1190

Bug fixes

BUG: Fix the incorrect model interface address caused a 307 redirect to HTTP, blocking the request and preventing the display of the model list. by @wertycn in #1182
BUG: fix doc fail introduced by #1171 & update readme by @qinxuye in #1203
BUG: Increase validator types for thie 'input' parameter of embeddings to match OpenAI API by @Minamiyama in #1201

Documentation

DOC: internal design by @1572161937 in #1178
Doc: update readme and models doc by @qinxuye in #1176
DOC: Doc for oauth system with api-key by @ChengjieLi28 in #1210

New Contributors

@utopia2077 made their first contribution in #1190
@ivanzfb made their first contribution in #1200

Full Changelog: v0.9.4...v0.10.0

Contributors

qinxuye, Minamiyama, and 9 other contributors

Assets 2

21 Mar 07:06

XprobeBot

v0.9.4

2c9465a

v0.9.4

What's new in 0.9.4 (2024-03-21)

These are the changes in inference v0.9.4.

New features

FEAT: Support CodeShell model by @hainaweiben in #1166
FEAT: Supports sglang backend by @ChengjieLi28 in #1161

Enhancements

ENH: vLLM latest models support by @1572161937 in #1155

Bug fixes

BUG: remove best_of from benchmark by @qinxuye in #1150
BUG: fix _eval_qwen_chat_arguments parsing problem by @channingxiao18 in #1098
BUG: Fix OpenAI compatibility issue during chat by @mujin2 in #1159

Documentation

DOC: Update doc by @codingl2k1 in #1156

Others

Chore: add assign workflow by @qinxuye in #1131

New Contributors

@channingxiao18 made their first contribution in #1098
@1572161937 made their first contribution in #1155

Full Changelog: v0.9.3...v0.9.4

Contributors

qinxuye, Yukun-Cui, and 5 other contributors

Assets 2

15 Mar 06:36

XprobeBot

v0.9.3

60f098c

v0.9.3

What's new in 0.9.3 (2024-03-15)

These are the changes in inference v0.9.3.

New features

FEAT: Add Yi-9B by @mujin2 in #1117
FEAT: Provided the function of generate image. by @hainaweiben in #1047

Enhancements

ENH: update cmd help info by @luweizheng in #1106
ENH: Remove quantization limits for Apple METAL device when running model via llama-cpp-python by @ChengjieLi28 in #1134
ENH: Make GET /v1/models compatible with OpenAI API. by @notsyncing in #1127
ENH: support vllm>=0.3.1 by @qinxuye in #1145

Bug fixes

BUG: fix the useless fstring. by @mikeshi80 in #1130
BUG: Fixing the issue of model list loading failure caused by a large number of invalid requests on the model list page. by @wertycn in #1111
BUG: Fix cache status for embedding, rerank and image models on the web UI by @ChengjieLi28 in #1135
BUG: Fix missing information for xinference registrations and xinference list command by @ChengjieLi28 in #1140
BUG: Fix cannot continue to chat after canceling the streaming chat via ctrl+c by @ChengjieLi28 in #1144

Tests

TST: Remove testing LLM model creating embedding by @ChengjieLi28 in #1121

Documentation

DOC: fix CPU docker image & refine by @qinxuye in #1118

New Contributors

@luweizheng made their first contribution in #1106
@mujin2 made their first contribution in #1117
@wertycn made their first contribution in #1111

Full Changelog: v0.9.2...v0.9.3

Contributors

qinxuye, mikeshi80, and 6 other contributors

Assets 2

08 Mar 06:09

XprobeBot

v0.9.2

29f4c10

v0.9.2

What's new in 0.9.2 (2024-03-08)

These are the changes in inference v0.9.2.

New features

FEAT: Add a command / SDK interface to query which models are able to… by @hainaweiben in #1076
FEAT: add a docker-compose-distributed example with multiple workers by @bufferoverflow in #1064
FEAT: Support download and merge multiple parts of gguf files by @notsyncing in #1075
FEAT: Supports LoRA for LLM and image models by @ChengjieLi28 in #1080

Enhancements

ENH: Supports n_gpu_layers parameter for llama-cpp-python by @ChengjieLi28 in #1070
ENH: Add a dropdown to the web UI to support adjusting GPU offload layers for llama.cpp loader by @notsyncing in #1073
ENH: [UI] Show replica on running model page by @ChengjieLi28 in #1093
ENH: Add "[DONE]" to the end of stream generation for better openai SDK compatibility by @ZhangTianrong in #1062
ENH: [UI] Support setting CPU when selecting n_gpu by @ChengjieLi28 in #1096

Documentation

DOC: Extra parameters for launching models by @aresnow1 in #1077
DOC: contribution doc by @Ago327 in #1092
DOC: doc for lora by @ChengjieLi28 in #1103

Others

Update llm_family.json to correct the context length of glaive coder by @mikeshi80 in #1083

New Contributors

@mikeshi80 made their first contribution in #1083
@bufferoverflow made their first contribution in #1064
@Ago327 made their first contribution in #1092

Full Changelog: v0.9.1...v0.9.2

Contributors

bufferoverflow, mikeshi80, and 6 other contributors

Assets 2

01 Mar 07:04

XprobeBot

v0.9.1

7b20f76

v0.9.1

What's new in 0.9.1 (2024-03-01)

These are the changes in inference v0.9.1.

New features

FEAT: Docker for cpu only by @ChengjieLi28 in #1068

Enhancements

ENH: Support downloading gemma from modelscope by @aresnow1 in #1035
ENH: [UI] Setting quantization when registering LLM by @ChengjieLi28 in #1040
ENH: Restful client supports multiple system prompts for chat by @ChengjieLi28 in #1056
ENH: supports disabling worker reporting status by @ChengjieLi28 in #1057
ENH: Extra params for xinference launch command line by @ChengjieLi28 in #1048

Bug fixes

BUG: Fix some models that cannot download from modelscope by @ChengjieLi28 in #1066
BUG: Fix early truncation due to max_token being default to 16 instead of 1024 by @ZhangTianrong in #1061

Documentation

DOC: Update readme by @qinxuye in #1045
DOC: Fix readme by @qinxuye in #1054
DOC: Fix wechat links by @qinxuye in #1055

New Contributors

@ZhangTianrong made their first contribution in #1061

Full Changelog: v0.9.0...v0.9.1

Contributors

qinxuye, ZhangTianrong, and 2 other contributors

Assets 2

22 Feb 08:03

XprobeBot

v0.9.0

c653c97

v0.9.0

What's new in 0.9.0 (2024-02-22)

These are the changes in inference v0.9.0.

New features

FEAT: Refactor device related code and add initial Intel GPU support by @notsyncing in #968
FEAT: Support gemma series model by @aresnow1 in #1024

Enhancements

ENH: [UI] Supports replica when launching LLM models by @ChengjieLi28 in #1011
ENH: [UI] Show cluster resource information by @ChengjieLi28 in #1015

Bug fixes

BUG: fix chat completion error when indexing body.messages by @fffonion in #1008
BUG: Fix cache sd 1.5 error by @codingl2k1 in #1013
BUG: fix typo in modelscope llama-2-13b-chat-GGUF by @qinxuye in #1026
BUG: Fix missing qwen 1.5 7b gguf by @codingl2k1 in #1027

Documentation

DOC: Polish model operation command doc by @onesuper in #1000
DOC: Fix note on secret_key generation and algorithm selection for OAuth2 by @ChengjieLi28 in #1012

New Contributors

@fffonion made their first contribution in #1008
@notsyncing made their first contribution in #968

Full Changelog: v0.8.5...v0.9.0

Contributors

qinxuye, onesuper, and 5 other contributors

Assets 2

06 Feb 05:37

XprobeBot

v0.8.5

e903e05

v0.8.5

What's new in 0.8.5 (2024-02-06)

These are the changes in inference v0.8.5.

New features

FEAT: Implemented web UI for launching the text2image model. by @hainaweiben in #985
FEAT: Support qwen-1.5 series by @aresnow1 in #994

Enhancements

ENH: Download stable diffusion model from modelscope by @codingl2k1 in #980
REF: Supports pydantic v2 by @ChengjieLi28 in #983

Bug fixes

BUG: Fix load yi vl model to multiple cards by @codingl2k1 in #992
BUG: client compatible with old version of xinference by @ChengjieLi28 in #987

Others

CI: Free disk usage by @aresnow1 in #982
[DOC] Polish troubleshooting page by @onesuper in #990

New Contributors

@hainaweiben made their first contribution in #985

Full Changelog: v0.8.4...v0.8.5

Contributors

onesuper, aresnow1, and 3 other contributors

Assets 2

04 Feb 09:17

XprobeBot

v0.8.4

1b9b8c8

v0.8.4

What's new in 0.8.4 (2024-02-04)

These are the changes in inference v0.8.4.

Enhancements

ENH: [UI] Fix too long LLM model name by @ChengjieLi28 in #979
ENH: Add gguf models of llama-2-chat by @aresnow1 in #981

Bug fixes

BUG: Fix custom model tool calls by @codingl2k1 in #978
BUG: Fix chat template by @aresnow1 in #977

Documentation

DOC: Translate model docs by @onesuper in #965
DOC: Auto gen metrics doc by @codingl2k1 in #967
DOC: Update README.md by @codingl2k1 in #969

Full Changelog: v0.8.3.1...v0.8.4

Contributors

onesuper, aresnow1, and 2 other contributors

Assets 2

02 Feb 08:06

XprobeBot

v0.8.3.1

cfbe5ba

v0.8.3.1

What's new in 0.8.3.1 (2024-02-02)

These are the changes in inference v0.8.3.1.

Bug fixes

BUG: Remove flash-attn dependency by @codingl2k1 in #970

Full Changelog: v0.8.3...v0.8.3.1

Contributors

codingl2k1

Assets 2

Releases: xorbitsai/inference

v0.10.1

What's new in 0.10.1 (2024-04-12)

New features

Enhancements

Bug fixes

Tests

Documentation

New Contributors

Contributors

v0.10.0

What's new in 0.10.0 (2024-03-29)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

v0.9.4

What's new in 0.9.4 (2024-03-21)

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

Contributors

v0.9.3

What's new in 0.9.3 (2024-03-15)

New features

Enhancements

Bug fixes

Tests

Documentation

New Contributors

Contributors

v0.9.2

What's new in 0.9.2 (2024-03-08)

New features

Enhancements

Documentation

Others

New Contributors

Contributors

v0.9.1

What's new in 0.9.1 (2024-03-01)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

v0.9.0

What's new in 0.9.0 (2024-02-22)

New features

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

v0.8.5

What's new in 0.8.5 (2024-02-06)

New features

Enhancements

Bug fixes

Others

New Contributors

Contributors

v0.8.4

What's new in 0.8.4 (2024-02-04)

Enhancements

Bug fixes

Documentation

Contributors

v0.8.3.1

What's new in 0.8.3.1 (2024-02-02)

Bug fixes

Contributors