Releases: xorbitsai/inference
Releases · xorbitsai/inference
v0.5.5
What's new in 0.5.5 (2023-10-26)
These are the changes in inference v0.5.5.
Enhancements
- ENH: display language tags by @Minamiyama in #558
- ENH: filter models by type by @Minamiyama in #559
- ENH: disable create embeddings using LLMs by @UranusSeven in #570
- ENH: benchmark latency by @UranusSeven in #576
- ENH: configurable
XINFERENCE_HOME
env by @ChengjieLi28 in #566
Bug fixes
- BUG: Fix
bge-base-zh
andbge-large-zh
from ModelScope by @ChengjieLi28 in #571 - BUG: When change model revision, xinference still uses the previous model by @ChengjieLi28 in #573
- BUG: incorrect vLLM config by @UranusSeven in #579
- BUG: fix llama-2 stop words by @UranusSeven in #580
Documentation
- DOC: Incompatibility Between NVIDIA Driver and PyTorch Version by @onesuper in #551
- DOC: Examples and resources page by @onesuper in #561
Full Changelog: v0.5.4...v0.5.5
v0.5.4
What's new in 0.5.4 (2023-10-20)
These are the changes in inference v0.5.4.
New features
- FEAT: wizardcoder python by @UranusSeven in #539
- FEAT: Support grammar-based sampling for ggml models by @aresnow1 in #525
- FEAT: speculative decoding by @UranusSeven in #509
Enhancements
- ENH: Download embedding models from ModelScope by @ChengjieLi28 in #532
- ENH: lock transformers version by @UranusSeven in #549
- ENH: Support downloading code-llama family models from ModelScope by @ChengjieLi28 in #557
- ENH: Add gguf format of codellama-instruct by @aresnow1 in #567
Bug fixes
- BUG: Fix stream not compatible with openai by @codingl2k1 in #524
- BUG: set trust_remote_code to true by default by @richzw in #555
- BUG: add quantization to valid file name by @richzw in #562
- BUG: remove "generate" ability from Baichuan-2-chat json config by @Minamiyama in #556
Documentation
- DOC: update pot files by @UranusSeven in #538
- DOC: Add Client API reference by @codingl2k1 in #543
- DOC: Add client doc to the user guide by @codingl2k1 in #547
New Contributors
- @richzw made their first contribution in #555
- @Minamiyama made their first contribution in #556
Full Changelog: v0.5.3...v0.5.4
v0.5.3
What's new in 0.5.3 (2023-10-13)
These are the changes in inference v0.5.3.
New features
- FEAT: Add BAAI/BGE v1.5 family models by @ChengjieLi28 in #522
- FEAT: Support Mistral & Mistral-Instruct by @Bojun-Feng in #510
- FEAT: Add --model-uid to launch sub command by @codingl2k1 in #529
- FEAT: Support stable diffusion by @codingl2k1 in #484
Enhancements
- REF: Use restful client as default client by @aresnow1 in #470
- REF: refactor client codes for xinference-client by @ChengjieLi28 in #528
Bug fixes
Tests
- TST: fix tiny llama by @UranusSeven in #513
Documentation
- DOC: hardware specific installations by @UranusSeven in #517
- DOC: update installation by @UranusSeven in #527
Full Changelog: v0.5.2...v0.5.3
v0.5.2
What's new in 0.5.2 (2023-09-27)
These are the changes in inference v0.5.2.
Enhancements
- ENH: validate model URI on register by @UranusSeven in #476
- ENH: Skip download for embedding models by @aresnow1 in #499
- ENH: set
trust_remote_code
to true by @UranusSeven in #500
Full Changelog: v0.5.1...v0.5.2
v0.5.1
What's new in 0.5.1 (2023-09-26)
These are the changes in inference v0.5.1.
Enhancements
- ENH: Safe iterate stream of ggml model by @codingl2k1 in #449
- ENH: Skip download if model exists by @aresnow1 in #495
Documentation
- DOC: vLLM by @UranusSeven in #491
Full Changelog: v0.5.0...v0.5.1
v0.5.0
What's new in 0.5.0 (2023-09-22)
These are the changes in inference v0.5.0.
New features
- FEAT: incorporate vLLM by @UranusSeven in #445
- FEAT: add register model page for dashboard by @Bojun-Feng in #420
- FEAT: internlm 20b by @UranusSeven in #486
- FEAT: support glaive coder by @UranusSeven in #490
- FEAT: Support download models from modelscope by @aresnow1 in #475
Enhancements
- ENH: shorten OpenBuddy's desc by @UranusSeven in #471
- ENH: enable vLLM on Linux with cuda by @UranusSeven in #472
- ENH: vLLM engine supports more models by @UranusSeven in #477
- ENH: remove subpool on failure by @UranusSeven in #478
- ENH: support trust_remote_code when launching a model by @UranusSeven in #479
- ENH: vLLM auto tensor parallel by @UranusSeven in #480
Bug fixes
- BUG: llama-cpp version dismatch by @Bojun-Feng in #473
- BUG: incorrect endpoint on host 0.0.0.0 by @UranusSeven in #474
- BUG: prompt style not set as expected on web UI by @UranusSeven in #489
Tests
Documentation
Full Changelog: v0.4.4...v0.5.0
v0.4.4
What's new in 0.4.4 (2023-09-19)
These are the changes in inference v0.4.4.
Bug fixes
- BUG: stop auto download from self-hosted storage for locale zh_CN by @UranusSeven in #465
Full Changelog: v0.4.3...v0.4.4
v0.4.3
v0.4.2
What's new in 0.4.2 (2023-09-15)
These are the changes in inference v0.4.2.
New features
- FEAT: concurrent generation by @codingl2k1 in #417
- FEAT: Support gguf by @aresnow1 in #446
- FEAT: Support OpenBuddy by @codingl2k1 in #444
Enhancements
- ENH: client support desc model by @UranusSeven in #442
- ENH: caching from self-hosted storage by @UranusSeven in #419
- ENH: Assign worker sub pool at runtime instead of pre-allocated by @ChengjieLi28 in #437
- ENH: add benchmark script by @UranusSeven in #451
Bug fixes
- BUG: Fix restful client for embedding models by @aresnow1 in #439
- BUG: cmdline double line breaker by @UranusSeven in #441
- BUG: no error raised on unsupported fmt by @UranusSeven in #443
- BUG: Xinferecen list failed if embedding models are launched by @aresnow1 in #452
Tests
- TST: skip self-hosted storage tests by @UranusSeven in #453
Documentation
- DOC: fix baichuan-2 and make naming consistent by @UranusSeven in #432
- DOC: update hot topics by @UranusSeven in #456
Others
- CI: Fix Windows CI by @codingl2k1 in #440
New Contributors
- @ChengjieLi28 made their first contribution in #437
Full Changelog: v0.4.1...v0.4.2
v0.4.1
What's new in 0.4.1 (2023-09-07)
These are the changes in inference v0.4.1.
Bug fixes
- BUG: Searching in UI results in white screen by @Bojun-Feng in #431
- BUG: Include json in MANIFEST.in by @aresnow1 in #435
Documentation
Full Changelog: v0.4.0...v0.4.1