Release v1.8.1 · huggingface/text-embeddings-inference

text-embeddings-inference-v1 8 1-embedding-gemma(1)

Today, Google releases EmbeddingGemma, a state-of-the-art multilingual embedding model perfect for on-device use cases. Designed for speed and efficiency, the model features a compact size of 308M parameters and a 2K context window, unlocking new possibilities for mobile RAG pipelines, agents, and more. EmbeddingGemma is trained to support over 100 languages and is the highest-ranking text-only multilingual embedding model under 500M on the Massive Text Embedding Benchmark (MTEB) at the time of writing.

CPU:

docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.1 \
    --model-id google/embeddinggemma-300m --dtype float32

CPU with ONNX Runtime:

docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.1 \
    --model-id onnx-community/embeddinggemma-300m-ONNX --dtype float32 --pooling mean

NVIDIA CUDA:

docker run --gpus all --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cuda-1.8.1 \
    --model-id google/embeddinggemma-300m --dtype float32

Notable Changes

Add support for Gemma3 (text-only) architecture
Intel updates to Synapse 1.21.3 and IPEX 2.8
Extend ONNX Runtime support in OrtRuntime
- Support position_ids and past_key_values as inputs
- Handle padding_side and pad_token_id

What's Changed

Adjust HPU warmup: use dummy inputs with shape more close to real scenario by @kaixuanliu in #689
Add extra_args to trufflehog to exclude unverified results by @alvarobartt in #696
Update GitHub templates & fix mentions to Text Embeddings Inference by @alvarobartt in #697
Disable Flash Attention with USE_FLASH_ATTENTION by @alvarobartt in #692
Add support for position_ids and past_key_values in OrtBackend by @alvarobartt in #700
HPU upgrade to Synapse 1.21.3 by @kaixuanliu in #703
Upgrade to IPEX 2.8 by @kaixuanliu in #702
Parse modules.json to identify default Dense modules by @alvarobartt in #701
Add padding_side and pad_token_id in OrtBackend by @alvarobartt in #705
Update docs/openapi.json for v1.8.0 by @alvarobartt in #708
Add Gemma3 architecture (text-only) by @alvarobartt in #711
Update version to 1.8.1 by @alvarobartt in #712

Full Changelog: v1.8.0...v1.8.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.8.1

Notable Changes

What's Changed

Contributors

Uh oh!