Skip to content

v1.8.1

Latest
Compare
Choose a tag to compare
@alvarobartt alvarobartt released this 04 Sep 15:22
0adb000
text-embeddings-inference-v1 8 1-embedding-gemma(1)

Today, Google releases EmbeddingGemma, a state-of-the-art multilingual embedding model perfect for on-device use cases. Designed for speed and efficiency, the model features a compact size of 308M parameters and a 2K context window, unlocking new possibilities for mobile RAG pipelines, agents, and more. EmbeddingGemma is trained to support over 100 languages and is the highest-ranking text-only multilingual embedding model under 500M on the Massive Text Embedding Benchmark (MTEB) at the time of writing.

  • CPU:
docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.1 \
    --model-id google/embeddinggemma-300m --dtype float32
  • CPU with ONNX Runtime:
docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.1 \
    --model-id onnx-community/embeddinggemma-300m-ONNX --dtype float32 --pooling mean
  • NVIDIA CUDA:
docker run --gpus all --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cuda-1.8.1 \
    --model-id google/embeddinggemma-300m --dtype float32

Notable Changes

  • Add support for Gemma3 (text-only) architecture
  • Intel updates to Synapse 1.21.3 and IPEX 2.8
  • Extend ONNX Runtime support in OrtRuntime
    • Support position_ids and past_key_values as inputs
    • Handle padding_side and pad_token_id

What's Changed

Full Changelog: v1.8.0...v1.8.1