Releases: meta-llama/llama-stack
v0.1.3
v0.1.3 Release
Here are some key changes that are coming as part of this release.
Build and Test Agents
Streamlined the initial development experience
- Added support for llama stack run --image-type venv
- Enhanced vector store options with new sqlite-vec provider and improved Qdrant integration
- vLLM improvements for tool calling and logprobs
- Better handling of sporadic code_interpreter tool calls
Agent Evals
Better benchmarking and Agent performance assessment
- Renamed eval API /eval-task to /benchmarks
- Improved documentation and notebooks for RAG and evals
Deploy and Monitoring of Agents
Improved production readiness
- Added usage metrics collection for chat completions
- CLI improvements for provider information
- Improved error handling and system reliability
- Better model endpoint handling and accessibility
- Improved signal handling on distro server
Better Engineering
Infrastructure and code quality improvements
- Faster text-based chat completion tests
- Improved testing for non-streaming agent apis
- Standardized import formatting with ruff linter
- Added conventional commits standard
- Fixed documentation parsing issues
What's Changed
- Getting started notebook update by @jeffxtang in #936
- docs: update index.md for 0.1.2 by @raghotham in #1013
- test: Make text-based chat completion tests run 10x faster by @terrytangyuan in #1016
- chore: Updated requirements.txt by @cheesecake100201 in #1017
- test: Use JSON tool prompt format for remote::vllm provider by @terrytangyuan in #1019
- docs: Render check marks correctly on PyPI by @terrytangyuan in #1024
- docs: update rag.md example code to prevent errors by @MichaelClifford in #1009
- build: update uv lock to sync package versions by @leseb in #1026
- fix: Gaps in doc codegen by @ellistarn in #1035
- fix: Readthedocs cannot parse comments, resulting in docs bugs by @ellistarn in #1033
- fix: a bad newline in ollama docs by @ellistarn in #1036
- fix: Update Qdrant support post-refactor by @jwm4 in #1022
- test: replace blocked image URLs with GitHub-hosted by @leseb in #1025
- fix: Added missing
tool_config
arg in SambaNovachat_completion()
by @terrytangyuan in #1042 - docs: Updating wording and nits in the README.md by @kelbrown20 in #992
- docs: remove changelog mention from PR template by @leseb in #1049
- docs: reflect actual number of spaces for indent by @booxter in #1052
- fix: agent config validation by @ehhuang in #1053
- feat: add MetricResponseMixin to chat completion response types by @dineshyv in #1050
- feat: make telemetry attributes be dict[str,PrimitiveType] by @dineshyv in #1055
- fix: filter out remote::sample providers when listing by @booxter in #1057
- feat: Support tool calling for non-streaming chat completion in remote vLLM provider by @terrytangyuan in #1034
- perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools by @yanxi0830 in #1041
- chore: update return type to Optional[str] by @leseb in #982
- feat: Support tool calling for streaming chat completion in remote vLLM provider by @terrytangyuan in #1063
- fix: show proper help text by @cdoern in #1065
- feat: add support for running in a venv by @cdoern in #1018
- feat: Adding sqlite-vec as a vectordb by @franciscojavierarceo in #1040
- feat: support listing all for
llama stack list-providers
by @booxter in #1056 - docs: Mention convential commits format in CONTRIBUTING.md by @bbrowning in #1075
- fix: logprobs support in remote-vllm provider by @bbrowning in #1074
- fix: improve signal handling and update dependencies by @leseb in #1044
- style: update model id in model list title by @reidliu41 in #1072
- fix: make backslash work in GET /models/{model_id:path} by @yanxi0830 in #1068
- chore: Link to Groq docs in the warning message for preview model by @terrytangyuan in #1060
- fix: remove :path in agents by @yanxi0830 in #1077
- build: format codebase imports using ruff linter by @leseb in #1028
- chore: Consistent naming for VectorIO providers by @terrytangyuan in #1023
- test: Enable logprobs top_k tests for remote::vllm by @terrytangyuan in #1080
- docs: Fix url to the llama-stack-spec yaml/html files by @vishnoianil in #1081
- fix: Update VectorIO config classes in registry by @terrytangyuan in #1079
- test: Add qdrant to provider tests by @jwm4 in #1039
- test: add test for Agent.create_turn non-streaming response by @ehhuang in #1078
- fix!: update eval-tasks -> benchmarks by @yanxi0830 in #1032
- fix: openapi for eval-task by @yanxi0830 in #1085
- fix: regex pattern matching to support :path suffix in the routes by @hardikjshah in #1089
- fix: disable sqlite-vec test by @yanxi0830 in #1090
- fix: add the missed help description info by @reidliu41 in #1096
- fix: Update QdrantConfig to QdrantVectorIOConfig by @bbrowning in #1104
- docs: Add region parameter to Bedrock provider by @raghotham in #1103
- build: configure ruff from pyproject.toml by @leseb in #1100
- chore: move all Llama Stack types from llama-models to llama-stack by @ashwinb in #1098
- fix: enable_session_persistence in AgentConfig should be optional by @terrytangyuan in #1012
- fix: improve stack build on venv by @leseb in #980
- fix: remove the empty line by @reidliu41 in #1097
New Contributors
- @MichaelClifford made their first contribution in #1009
- @ellistarn made their first contribution in #1035
- @kelbrown20 made their first contribution in #992
- @franciscojavierarceo made their first contribution in #1040
- @bbrowning made their first contribution in #1075
- @reidliu41 made their first contribution in #1072
- @vishnoianil made their first contribution in #1081
Full Changelog: v0.1.2...v0.1.3
v0.1.2
TL;DR
- Several stabilizations to development flows after the switch to
uv
- Migrated CI workflows to new OSS repo - llama-stack-ops
- Added automated rebuilds for ReadTheDocs
- Llama Stack server supports HTTPS
- Added system prompt overrides support
- Several bug fixes and improvements to documentation (check out Kubernetes deployment guide by @terrytangyuan )
What's Changed
- Fix UBI9 image build when installing Python packages via uv by @terrytangyuan in #926
- Fix precommit check after moving to ruff by @terrytangyuan in #927
- LocalInferenceImpl update for LS 0.1 by @jeffxtang in #911
- Properly close PGVector DB connection during shutdown() by @terrytangyuan in #931
- Add issue template config with docs and Discord links by @terrytangyuan in #930
- Fix uv pip install timeout issue for PyTorch by @terrytangyuan in #929
- github: ignore non-hidden python virtual environments by @nathan-weinberg in #939
- fix: broken link in Quick Start doc by @nathan-weinberg in #943
- fix: broken "core concepts" link in docs website by @nathan-weinberg in #940
- Misc fixes by @ashwinb in #944
- fix: formatting for ollama note in Quick Start doc by @nathan-weinberg in #945
- [docs] typescript sdk readme by @yanxi0830 in #946
- Support sys_prompt behavior in inference by @ehhuang in #937
- if client.initialize fails, the example should exit by @cdoern in #954
- Add Podman instructions to Quick Start by @jwm4 in #957
- github: issue templates automatically apply relevant label by @nathan-weinberg in #956
- docs: miscellaneous small fixes by @booxter in #961
- Make a couple properties optional by @ashwinb in #963
- [docs] Make RAG example self-contained by @booxter in #962
- docs, tests: replace datasets.rst with memory_optimizations.rst by @booxter in #968
- Fix broken pgvector provider and memory leaks by @terrytangyuan in #947
- [docs] update the zero_to_hero_guide llama stack version to 0.1.0 by @kami619 in #960
- missing T in import by @cooktheryan in #974
- Fix README.md notebook links by @aakankshaduggal in #976
- docs: clarify host.docker.internal works for recent podman by @booxter in #977
- docs: add addn server guidance for Linux users in Quick Start by @nathan-weinberg in #972
- sys_prompt support in Agent by @ehhuang in #938
- chore: update PR template to reinforce changelog by @leseb in #988
- github: update PR template to use correct syntax to auto-close issues by @booxter in #989
- chore: remove unused argument by @cdoern in #987
- test: replace memory with vector_io fixture by @leseb in #984
- docs: use uv in CONTRIBUTING guide by @leseb in #970
- docs: Add license badge to README.md by @terrytangyuan in #994
- Add Kubernetes deployment guide by @terrytangyuan in #899
- Fix incorrect handling of chat completion endpoint in remote::vLLM by @terrytangyuan in #951
- ci: Add semantic PR title check by @terrytangyuan in #979
- feat: Add a new template for
dell
by @hardikjshah in #978 - docs: Correct typos in Zero to Hero guide by @mlecanu in #997
- fix: Update rag examples to use fresh faiss index every time by @hardikjshah in #998
- doc: getting started notebook by @ehhuang in #996
- test: fix flaky agent test by @ehhuang in #1002
- test: rm unused exception alias in pytest.raises by @leseb in #991
- fix: List providers command prints out non-existing APIs from registry. Fixes #966 by @terrytangyuan in #969
- chore: add missing ToolConfig import in groq.py by @leseb in #983
- test: remove flaky agent test by @ehhuang in #1006
- test: Split inference tests to text and vision by @terrytangyuan in #1008
- feat: Add HTTPS serving option by @ashwinb in #1000
- test: encode image data as base64 by @leseb in #1003
- fix: Ensure a better error stack trace when llama-stack is not built by @cdoern in #950
- refactor(ollama): model availability check by @leseb in #986
New Contributors
- @nathan-weinberg made their first contribution in #939
- @cdoern made their first contribution in #954
- @jwm4 made their first contribution in #957
- @booxter made their first contribution in #961
- @kami619 made their first contribution in #960
- @cooktheryan made their first contribution in #974
- @aakankshaduggal made their first contribution in #976
- @leseb made their first contribution in #988
- @mlecanu made their first contribution in #997
Full Changelog: v0.1.1...v0.1.2
v0.1.1
A bunch of small / big improvements everywhere including support for Windows, switching to uv
and many provider improvements.
What's Changed
- Update doc templates for running safety on self-hosted templates by @hardikjshah in #874
- Update GH action so it correctly queries for test.pypi, etc. by @ashwinb in #875
- Fix report generation for url endpoints by @hardikjshah in #876
- Fixed typo by @BakungaBronson in #877
- Fixed multiple typos by @BakungaBronson in #878
- Ensure llama stack build --config <> --image-type <> works by @ashwinb in #879
- Update documentation by @ashwinb in #865
- Update discriminator to have the correct
mapping
by @ashwinb in #881 - Fix telemetry init by @dineshyv in #885
- Sambanova - LlamaGuard by @snova-edwardm in #886
- Update index.md by @Ckhanoyan in #888
- Report generation minor fixes by @sixianyi0721 in #884
- adding readme to docs folder for easier discoverability of notebooks β¦ by @heyjustinai in #857
- Agent response format by @hanzlfs in #660
- Add windows support for build execution by @VladOS95-cyber in #889
- Add run win command for stack by @VladOS95-cyber in #890
- Use ruamel.yaml to format the OpenAPI spec by @ashwinb in #892
- Fix Chroma adapter by @ashwinb in #893
- align with CompletionResponseStreamChunk.delta as str (instead of TextDelta) by @mattf in #900
- add NVIDIA_BASE_URL and NVIDIA_API_KEY to control hosted vs local endpoints by @mattf in #897
- Fix validator of "container" image type by @terrytangyuan in #901
- Update OpenAPI generator to add param and field documentation by @ashwinb in #896
- Fix link to selection guide and change "docker" to "container" by @terrytangyuan in #898
- [#432] Groq Provider tool call tweaks by @aidando73 in #811
- Fix running stack built with base conda environment by @dvrogozh in #903
- create a github action for triggering client-sdk tests on new pull-request by @sixianyi0721 in #850
- log probs - mark pytests as xfail for unsupported providers + add support for together by @sixianyi0721 in #883
- SambaNova supports Llama 3.3 by @snova-edwardm in #905
- fix ImageContentItem to take base64 string as image.data by @yanxi0830 in #909
- Fix Agents to support code and rag simultaneously by @hardikjshah in #908
- add test for user message w/ image.data content by @mattf in #906
- openapi gen return type fix for streaming/non-streaming by @yanxi0830 in #910
- feat: enable xpu support for meta-reference stack by @dvrogozh in #558
- Sec fixes as raised by bandit by @hardikjshah in #917
- Run code-gen by @hardikjshah in #916
- fix rag tests by @hardikjshah in #918
- Use
uv pip install
instead ofpip install
by @ashwinb in #921 - add image support to NVIDIA inference provider by @mattf in #907
New Contributors
- @BakungaBronson made their first contribution in #877
- @Ckhanoyan made their first contribution in #888
- @hanzlfs made their first contribution in #660
- @dvrogozh made their first contribution in #903
Full Changelog: v0.1.0...v0.1.1
v0.1.0
We are excited to announce a stable API release of Llama Stack, which enables developers to build RAG applications and Agents using tools and safety shields, monitor and those agents with telemetry, and evaluate the agent with scoring functions.
Context
GenAI application developers need more than just an LLM - they need to integrate tools, connect with their data sources, establish guardrails, and ground the LLM responses effectively. Currently, developers must piece together various tools and APIs, complicating the development lifecycle and increasing costs. The result is that developers are spending more time on these integrations rather than focusing on the application logic itself. The bespoke coupling of components also makes it challenging to adopt state-of-the-art solutions in the rapidly evolving GenAI space. This is particularly difficult for open models like Llama, as best practices are not widely established in the open.
Llama Stack was created to provide developers with a comprehensive and coherent interface that simplifies AI application development and codifies best practices across the Llama ecosystem. Since our launch in September 2024, we have seen a huge uptick in interest in Llama Stack APIs by both AI developers and from partners building AI services with Llama models. Partners like Nvidia, Fireworks, and Ollama have collaborated with us to develop implementations across various APIs, including inference, memory, and safety.
With Llama Stack, you can easily build a RAG agent which can also search the web, do complex math, and custom tool calling. You can use telemetry to inspect those traces, and convert telemetry into evals datasets. And with Llama Stackβs plugin architecture and prepackage distributions, you choose to run your agent anywhere - in the cloud with our partners, deploy your own environment using virtualenv, conda, or Docker, operate locally with Ollama, or even run on mobile devices with our SDKs. Llama Stack offers unprecedented flexibility while also simplifying the developer experience.
Release
After iterating on the APIs for the last 3 months, today weβre launching a stable release (V1) of the Llama Stack APIs and the corresponding llama-stack server and client packages(v0.1.0). We now have automated tests for providers. These tests make sure that all provider implementations are verified. Developers can now easily and reliably select distributions or providers based on their specific requirements.
There are example standalone apps in llama-stack-apps.
Key Features of this release
-
Unified API Layer
- Inference: Run LLM models
- RAG: Store and retrieve knowledge for RAG
- Agents: Build multi-step agentic workflows
- Tools: Register tools that can be called by the agent
- Safety: Apply content filtering and safety policies
- Evaluation: Test model and agent quality
- Telemetry: Collect and analyze usage data and complex agentic traces
- Post Training ( Coming Soon ): Fine tune models for specific use cases
-
Rich Provider Ecosystem
- Local Development: Meta's Reference, Ollama
- Cloud: Fireworks, Together, Nvidia, AWS Bedrock, Groq, Cerebras
- On-premises: Nvidia NIM, vLLM, TGI, Dell-TGI
- On-device: iOS and Android support
-
Built for Production
- Pre-packaged distributions for common deployment scenarios
- Backwards compatibility across model versions
- Comprehensive evaluation capabilities
- Full observability and monitoring
-
Multiple developer interfaces
- CLI: Command line interface
- Python SDK
- Swift iOS SDK
- Kotlin Android SDK
-
Sample llama stack applications
- Python
- iOS
- Android
What's Changed
- [4/n][torchtune integration] support lazy load model during inference by @SLR722 in #620
- remove unused telemetry related code for console by @dineshyv in #659
- Fix Meta reference GPU implementation by @ashwinb in #663
- Fixed imports for inference by @cdgamarose-nv in #661
- fix trace starting in library client by @dineshyv in #655
- Add Llama 70B 3.3 to fireworks by @aidando73 in #654
- Tools API with brave and MCP providers by @dineshyv in #639
- [torchtune integration] post training + eval by @SLR722 in #670
- Fix post training apis broken by torchtune release by @SLR722 in #674
- Add missing venv option in --image-type by @terrytangyuan in #677
- Removed unnecessary CONDA_PREFIX env var in installation guide by @terrytangyuan in #683
- Add 3.3 70B to Ollama inference provider by @aidando73 in #681
- docs: update evals_reference/index.md by @eltociear in #675
- [remove import ][1/n] clean up import & in apis/ by @yanxi0830 in #689
- [bugfix] fix broken vision inference, change serialization for bytes by @yanxi0830 in #693
- Minor Quick Start documentation updates. by @derekslager in #692
- [bugfix] fix meta-reference agents w/ safety multiple model loading pytest by @yanxi0830 in #694
- [bugfix] fix prompt_adapter interleaved_content_convert_to_raw by @yanxi0830 in #696
- Add missing "inline::" prefix for providers in building_distro.md by @terrytangyuan in #702
- Fix failing flake8 E226 check by @terrytangyuan in #701
- Add missing newlines before printing the Dockerfile content by @terrytangyuan in #700
- Add JSON structured outputs to Ollama Provider by @aidando73 in #680
- [#407] Agents: Avoid calling tools that haven't been explicitly enabled by @aidando73 in #637
- Made changes to readme and pinning to llamastack v0.0.61 by @heyjustinai in #624
- [rag evals][1/n] refactor base scoring fn & data schema check by @yanxi0830 in #664
- [Post Training] Fix missing import by @SLR722 in #705
- Import from the right path by @SLR722 in #708
- [#432] Add Groq Provider - chat completions by @aidando73 in #609
- Change post training run.yaml inference config by @SLR722 in #710
- [Post training] make validation steps configurable by @SLR722 in #715
- Fix incorrect entrypoint for broken
llama stack run
by @terrytangyuan in #706 - Fix assert message and call to completion_request_to_prompt in remote:vllm by @terrytangyuan in #709
- Fix Groq invalid self.config reference by @aidando73 in #719
- support llama3.1 8B instruct in post training by @SLR722 in #698
- remove default logger handlers when using libcli with notebook by @dineshyv in #718
- move DataSchemaValidatorMixin into standalone utils by @yanxi0830 in #720
- add 3.3 to together inference provider by @yanxi0830 in #729
- Update CODEOWNERS - add sixianyi0721 as the owner by @sixianyi0721 in #731
- fix links for distro by @yanxi0830 in #733
- add --version to llama stack CLI & /version endpoint by @yanxi0830 in #732
- agents to use tools api by @dineshyv in #673
- Add X-LlamaStack-Client-Version, rename ProviderData -> Provider-Data by @ashwinb in #735
- Check version incompatibility by @ashwinb in #738
- Add persistence for localfs datasets by @VladOS95-cyber in #557
- Fixed typo in default VLLM_URL in remote-vllm.md by @terrytangyuan in #723
- Consolidating Memory tests under client-sdk by @vladimirivic in #703
- Expose LLAMASTACK_PORT in cli.stack.run by @terrytangyuan in #722
- remove conflicting default for tool prompt format in chat completion by @dineshyv in #742
- rename LLAMASTACK_PORT to LLAMA_STACK_PORT for consistency with other env vars by @raghotham in #744
- Add inline vLLM inference provider to regression tests and fix regressions by @frreiss in #662
- [CICD] github workflow to push nightly package to testpypi by @yanxi0830 in #734
- Replaced zrangebylex method in the range method by @che...
v0.1.0rc12
What's Changed
- [4/n][torchtune integration] support lazy load model during inference by @SLR722 in #620
- remove unused telemetry related code for console by @dineshyv in #659
- Fix Meta reference GPU implementation by @ashwinb in #663
- Fixed imports for inference by @cdgamarose-nv in #661
- fix trace starting in library client by @dineshyv in #655
- Add Llama 70B 3.3 to fireworks by @aidando73 in #654
- Tools API with brave and MCP providers by @dineshyv in #639
- [torchtune integration] post training + eval by @SLR722 in #670
- Fix post training apis broken by torchtune release by @SLR722 in #674
- Add missing venv option in --image-type by @terrytangyuan in #677
- Removed unnecessary CONDA_PREFIX env var in installation guide by @terrytangyuan in #683
- Add 3.3 70B to Ollama inference provider by @aidando73 in #681
- docs: update evals_reference/index.md by @eltociear in #675
- [remove import ][1/n] clean up import & in apis/ by @yanxi0830 in #689
- [bugfix] fix broken vision inference, change serialization for bytes by @yanxi0830 in #693
- Minor Quick Start documentation updates. by @derekslager in #692
- [bugfix] fix meta-reference agents w/ safety multiple model loading pytest by @yanxi0830 in #694
- [bugfix] fix prompt_adapter interleaved_content_convert_to_raw by @yanxi0830 in #696
- Add missing "inline::" prefix for providers in building_distro.md by @terrytangyuan in #702
- Fix failing flake8 E226 check by @terrytangyuan in #701
- Add missing newlines before printing the Dockerfile content by @terrytangyuan in #700
- Add JSON structured outputs to Ollama Provider by @aidando73 in #680
- [#407] Agents: Avoid calling tools that haven't been explicitly enabled by @aidando73 in #637
- Made changes to readme and pinning to llamastack v0.0.61 by @heyjustinai in #624
- [rag evals][1/n] refactor base scoring fn & data schema check by @yanxi0830 in #664
- [Post Training] Fix missing import by @SLR722 in #705
- Import from the right path by @SLR722 in #708
- [#432] Add Groq Provider - chat completions by @aidando73 in #609
- Change post training run.yaml inference config by @SLR722 in #710
- [Post training] make validation steps configurable by @SLR722 in #715
- Fix incorrect entrypoint for broken
llama stack run
by @terrytangyuan in #706 - Fix assert message and call to completion_request_to_prompt in remote:vllm by @terrytangyuan in #709
- Fix Groq invalid self.config reference by @aidando73 in #719
- support llama3.1 8B instruct in post training by @SLR722 in #698
- remove default logger handlers when using libcli with notebook by @dineshyv in #718
- move DataSchemaValidatorMixin into standalone utils by @yanxi0830 in #720
- add 3.3 to together inference provider by @yanxi0830 in #729
- Update CODEOWNERS - add sixianyi0721 as the owner by @sixianyi0721 in #731
- fix links for distro by @yanxi0830 in #733
- add --version to llama stack CLI & /version endpoint by @yanxi0830 in #732
- agents to use tools api by @dineshyv in #673
- Add X-LlamaStack-Client-Version, rename ProviderData -> Provider-Data by @ashwinb in #735
- Check version incompatibility by @ashwinb in #738
- Add persistence for localfs datasets by @VladOS95-cyber in #557
- Fixed typo in default VLLM_URL in remote-vllm.md by @terrytangyuan in #723
- Consolidating Memory tests under client-sdk by @vladimirivic in #703
- Expose LLAMASTACK_PORT in cli.stack.run by @terrytangyuan in #722
- remove conflicting default for tool prompt format in chat completion by @dineshyv in #742
- rename LLAMASTACK_PORT to LLAMA_STACK_PORT for consistency with other env vars by @raghotham in #744
- Add inline vLLM inference provider to regression tests and fix regressions by @frreiss in #662
- [CICD] github workflow to push nightly package to testpypi by @yanxi0830 in #734
- Replaced zrangebylex method in the range method by @cheesecake100201 in #521
- Improve model download doc by @SLR722 in #748
- Support building UBI9 base container image by @terrytangyuan in #676
- update notebook to use new tool defs by @dineshyv in #745
- Add provider data passing for library client by @dineshyv in #750
- [Fireworks] Update model name for Fireworks by @benjibc in #753
- Consolidating Inference tests under client-sdk tests by @vladimirivic in #751
- Consolidating Safety tests from various places under client-sdk by @vladimirivic in #699
- [CI/CD] more robust re-try for downloading testpypi package by @yanxi0830 in #749
- [#432] Add Groq Provider - tool calls by @aidando73 in #630
- Rename ipython to tool by @ashwinb in #756
- Fix incorrect Python binary path for UBI9 image by @terrytangyuan in #757
- Update Cerebras docs to include header by @henrytwo in #704
- Add init files to post training folders by @SLR722 in #711
- Switch to use importlib instead of deprecated pkg_resources by @terrytangyuan in #678
- [bugfix] fix streaming GeneratorExit exception with LlamaStackAsLibraryClient by @yanxi0830 in #760
- Fix telemetry to work on reinstantiating new lib cli by @dineshyv in #761
- [post training] define llama stack post training dataset format by @SLR722 in #717
- add braintrust to experimental-post-training template by @SLR722 in #763
- added support of PYPI_VERSION in stack build by @jeffxtang in #762
- Fix broken tests in test_registry by @vladimirivic in #707
- Fix fireworks run-with-safety template by @vladimirivic in #766
- Free up memory after post training finishes by @SLR722 in #770
- Fix issue when generating distros by @terrytangyuan in #755
- Convert
SamplingParams.strategy
to a union by @hardikjshah in #767 - [CICD] Github workflow for publishing Docker images by @yanxi0830 in #764
- [bugfix] fix llama guard parsing ContentDelta by @yanxi0830 in #772
- rebase eval test w/ tool_runtime fixtures by @yanxi0830 in #773
- More idiomatic REST API by @dineshyv in #765
- add nvidia distribution by @cdgamarose-nv in #565
- bug fixes on inference tests by @sixianyi0721 in #774
- [bugfix] fix inference sdk test for v1 by @yanxi0830 in #775
- fix routing in library client by @dineshyv in https://github.com/meta-llama/llama-stack/pull...
v0.0.63
A small but important bug-fix release to update the URL datatype for the client-SDKs. The issue affected multimodal agentic turns especially.
Full Changelog: v0.0.62...v0.0.63
v0.0.62
What's Changed
A few important updates some of which are backwards incompatible. You must update your run.yaml
s when upgrading. As always look to templates/<distro>/run.yaml
for reference.
- Make embedding generation go through inference by @dineshyv in #606
- [/scoring] add ability to define aggregation functions for scoring functions & refactors by @yanxi0830 in #597
- Update the "InterleavedTextMedia" type by @ashwinb in #635
- [NEW!] Experimental post-training APIs! #540, #593, etc.
A variety of fixes and enhancements. Some selected ones:
- [#342] RAG - fix PDF format in vector database by @aidando73 in #551
- add completion api support to nvidia inference provider by @mattf in #533
- add model type to APIs by @dineshyv in #588
- Allow using an "inline" version of Chroma using PersistentClient by @ashwinb in #567
- [docs] add playground ui docs by @yanxi0830 in #592
- add colab notebook & update docs by @yanxi0830 in #619
- [tests] add client-sdk pytests & delete client.py by @yanxi0830 in #638
- [bugfix] no shield_call when there's no shields configured by @yanxi0830 in #642
New Contributors
- @SLR722 made their first contribution in #540
- @iamarunbrahma made their first contribution in #636
Full Changelog: v0.0.61...v0.0.62
v0.0.61
What's Changed
- add NVIDIA NIM inference adapter by @mattf in #355
- Tgi fixture by @dineshyv in #519
- fixes tests & move braintrust api_keys to request headers by @yanxi0830 in #535
- allow env NVIDIA_BASE_URL to set NVIDIAConfig.url by @mattf in #531
- move playground ui to llama-stack repo by @yanxi0830 in #536
- fix[documentation]: Update links to point to correct pages by @sablair in #549
- Fix URLs to Llama Stack Read the Docs Webpages by @JeffreyLind3 in #547
- Fix Zero to Hero README.md Formatting by @JeffreyLind3 in #546
- Guide readme fix by @raghotham in #552
- Fix broken Ollama link by @aidando73 in #554
- update client cli docs by @dineshyv in #560
- reduce the accuracy requirements to pass the chat completion structured output test by @mattf in #522
- removed assertion in ollama.py and fixed typo in the readme by @wukaixingxp in #563
- Cerebras Inference Integration by @henrytwo in #265
- unregister API for dataset by @sixianyi0721 in #507
- [llama stack ui] add native eval & inspect distro & playground pages by @yanxi0830 in #541
- Telemetry API redesign by @dineshyv in #525
- Introduce GitHub Actions Workflow for Llama Stack Tests by @ConnorHack in #523
- specify the client version that works for current together server by @jeffxtang in #566
- remove unused telemetry related code by @dineshyv in #570
- Fix up safety client for versioned API by @stevegrubb in #573
- Add eval/scoring/datasetio API providers to distribution templates & UI developer guide by @yanxi0830 in #564
- Add ability to query and export spans to dataset by @dineshyv in #574
- Renames otel config from jaeger to otel by @codefromthecrypt in #569
- add telemetry docs by @dineshyv in #572
- Console span processor improvements by @dineshyv in #577
- doc: quickstart guide errors by @aidando73 in #575
- Add kotlin docs by @Riandy in #568
- Update android_sdk.md by @Riandy in #578
- Bump kotlin docs to 0.0.54.1 by @Riandy in #579
- Make LlamaStackLibraryClient work correctly by @ashwinb in #581
- Update integration type for Cerebras to hosted by @henrytwo in #583
- Use customtool's get_tool_definition to remove duplication by @jeffxtang in #584
- [#391] Add support for json structured output for vLLM by @aidando73 in #528
- Fix Jaeger instructions by @yurishkuro in #580
- fix telemetry import by @yanxi0830 in #585
- update template run.yaml to include openai api key for braintrust by @yanxi0830 in #590
- add tracing to library client by @dineshyv in #591
- Fixes for library client by @ashwinb in #587
- Fix issue 586 by @yanxi0830 in #594
New Contributors
- @sablair made their first contribution in #549
- @JeffreyLind3 made their first contribution in #547
- @aidando73 made their first contribution in #554
- @henrytwo made their first contribution in #265
- @sixianyi0721 made their first contribution in #507
- @ConnorHack made their first contribution in #523
- @yurishkuro made their first contribution in #580
Full Changelog: v0.0.55...v0.0.61
v0.0.55 release
Llama Stack 0.0.54 Release
What's Changed
- Bugfixes release on top of 0.0.53
- Don't depend on templates.py when print llama stack build messages by @ashwinb in #496
- Restructure docs by @dineshyv in #494
- Since we are pushing for HF repos, we should accept them in inference configs by @ashwinb in #497
- Fix fp8 quantization script. by @liyunlu0618 in #500
- use logging instead of prints by @dineshyv in #499
New Contributors
- @liyunlu0618 made their first contribution in #500
Full Changelog: v0.0.53...v0.0.54