
Today, Google releases EmbeddingGemma, a state-of-the-art multilingual embedding model perfect for on-device use cases. Designed for speed and efficiency, the model features a compact size of 308M parameters and a 2K context window, unlocking new possibilities for mobile RAG pipelines, agents, and more. EmbeddingGemma is trained to support over 100 languages and is the highest-ranking text-only multilingual embedding model under 500M on the Massive Text Embedding Benchmark (MTEB) at the time of writing.
- CPU:
docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.1 \
--model-id google/embeddinggemma-300m --dtype float32
- CPU with ONNX Runtime:
docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.1 \
--model-id onnx-community/embeddinggemma-300m-ONNX --dtype float32 --pooling mean
- NVIDIA CUDA:
docker run --gpus all --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cuda-1.8.1 \
--model-id google/embeddinggemma-300m --dtype float32
Notable Changes
- Add support for Gemma3 (text-only) architecture
- Intel updates to Synapse 1.21.3 and IPEX 2.8
- Extend ONNX Runtime support in
OrtRuntime
- Support
position_ids
andpast_key_values
as inputs - Handle
padding_side
andpad_token_id
- Support
What's Changed
- Adjust HPU warmup: use dummy inputs with shape more close to real scenario by @kaixuanliu in #689
- Add
extra_args
totrufflehog
to exclude unverified results by @alvarobartt in #696 - Update GitHub templates & fix mentions to Text Embeddings Inference by @alvarobartt in #697
- Disable Flash Attention with
USE_FLASH_ATTENTION
by @alvarobartt in #692 - Add support for
position_ids
andpast_key_values
inOrtBackend
by @alvarobartt in #700 - HPU upgrade to Synapse 1.21.3 by @kaixuanliu in #703
- Upgrade to IPEX 2.8 by @kaixuanliu in #702
- Parse
modules.json
to identify defaultDense
modules by @alvarobartt in #701 - Add
padding_side
andpad_token_id
inOrtBackend
by @alvarobartt in #705 - Update
docs/openapi.json
for v1.8.0 by @alvarobartt in #708 - Add Gemma3 architecture (text-only) by @alvarobartt in #711
- Update
version
to 1.8.1 by @alvarobartt in #712
Full Changelog: v1.8.0...v1.8.1