How to send an image to a multimodal embedding model ? #13666

thoddnn · 2025-05-20T16:47:58Z

thoddnn
May 20, 2025

I'm starting a llama cpp server using the following command:

llama-server -m "/path/model.gguf" --mmproj "mmproj.gguf"

when I send a HTTP request to http://localhost:8080/embedding with the payload

{
    "content": "Hello"
}

It works and returns an embedding vector, but I would like to send an image instead of text. However, I don't know how to do this. Is it even possible with the current version?

Thanks 🙏

gnusupport · 2025-06-24T09:25:24Z

gnusupport
Jun 24, 2025

I also want to know how can I upload images by using external tools to the server.

1 reply

easyfab Jun 24, 2025

Demo (API) from this PR could perhaps help #12898

import json
import base64
from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="sk-test", timeout=9999)

# we support both remove image_url and base64 ; example below is for base64 image (read from disk)

# Function to encode the image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


# Path to your image
image_path = "../models/bliss.png"

# Getting the Base64 string
base64_image = encode_image(image_path)

response = client.chat.completions.create(
    model="gpt-4o",
    temperature=0.1,
    stream=True,
    messages=[
        {
            "role": "user",
            "content": [
                { "type": "text", "text": "describe what you see in details" },
                {
                    "type": "image_url",
                    "image_url": {
                        # alternatively, you can put the remote image url here, example: "http(s)://....."
                        "url": f"data:image/png;base64,{base64_image}",
                    },
                },
            ],
        }
    ],
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

print("\n\n")

gnusupport · 2025-06-25T12:01:45Z

gnusupport
Jun 25, 2025

Thank you, I'm now using this shell script to describe images in my file manager.

#!/bin/bash

# Check if image file argument is provided
if [ $# -eq 0 ]; then
   echo "Usage: $0 <image-file>"
   exit 1
fi

IMAGE_FILE="$1"
API_URL="http://192.168.1.68:8080/v1/chat/completions"
MODEL="llava"
OUTPUT_FILE="${IMAGE_FILE}.txt"  # Adds .txt to the original filename (e.g., image.jpg.txt)

# Check if file exists
if [ ! -f "$IMAGE_FILE" ]; then
   echo "Error: File '$IMAGE_FILE' not found"
   exit 1
fi

# Create temporary payload file
TMP_PAYLOAD=$(mktemp)

# Generate the JSON payload
cat <<EOF > "$TMP_PAYLOAD"
{
 "model": "$MODEL",
 "messages": [
   {
     "role": "user",
     "content": [
       {"type": "text", "text": "Describe this image in detail"},
       {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,$(base64 -w 0 "$IMAGE_FILE")"}}
     ]
   }
 ]
}
EOF

echo "Generating description for $IMAGE_FILE..."

# Make the API request and save response
curl -s -X POST \
    -H "Content-Type: application/json" \
    -d @"$TMP_PAYLOAD" \
    "$API_URL" | jq -r '.choices[0].message.content' > "$OUTPUT_FILE"

# Clean up
rm "$TMP_PAYLOAD"

echo "Description saved to $OUTPUT_FILE"

0 replies

TafadzwaD · 2025-06-25T15:55:48Z

TafadzwaD
Jun 25, 2025

I'm starting a llama cpp server using the following command:

llama-server -m "/path/model.gguf" --mmproj "mmproj.gguf"

when I send a HTTP request to http://localhost:8080/embedding with the payload
{
    "content": "Hello"
}
It works and returns an embedding vector, but I would like to send an image instead of text. However, I don't know how to do this. Is it even possible with the current version?

Thanks 🙏

@thoddnn Yes, you can send images to the /embedding endpoint in llama.cpp, but only if you're using a multimodal model like LLaVA and you’ve specified the --mmproj argument, which you’ve already done.

To include an image in the embedding request, you need to:

Encode the image in base64.
Reference it in the content field using an image placeholder like [img-21].
Provide the base64 image data and the same id (21 in this example) under image_data.

{
  "content": "Image: [img-21].\n Optional Caption",
  "image_data": [
    {
      "id": 21,
      "data": "<BASE64_ENCODED_IMAGE_HERE>"
    }
  ]
}

0 replies

jjhtr · 2025-07-13T01:58:44Z

jjhtr
Jul 13, 2025

I have been unable to get the following to work. What I observe is a response that seems like it has ignored the [img-N] ref and treated it as a set of text tokens. This is inferred by the fact that when issuing the command there are a) ~6 embedding vectors produced .. one per token I assume and b) it does not matter what id is provided in the image_data clause, (ie correctly "id":1 or erroneously "id":2) the embedding values produced are the same. Any thoughts?

    curl -X POST http://localhost:8080/embeddings -H "Content-Type: application/json"   -d 
    '{ 
       "content": "[img-1]",
      "image_data": [
         {
           "data":   "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8/5+hHgAHggJ/PchI7wAAAABJRU5ErkJggg==",
           "id": 1
         }
      ]
    }'

The server command line is (essentially) as follows

llama-b5866-bin-win-cuda-12.4-x64>llama-server -m "models\ggml-org_gemma-3-4b-it-GGUF_gemma-3-4b-it-Q4_K_M.gguf" --mmproj "models\ggml-org_gemma-3-4b-it-GGUF_mmproj-model-f16.gguf" --embedding --port 8080

0 replies

thoddnn · 2025-08-20T06:36:46Z

thoddnn
Aug 20, 2025
Author

I still can't get it to work.. The llama server always returns the same embedding even if the image is different

starting the server with a multimodal embedding model llama-server -m "vdr-2b-multi-v1.Q5_K_M.gguf" --mmproj "vdr-2b-multi-v1.mmproj-Q8_0.gguf" --embedding --port 8080

and send data with

image_path = "image_8.jpg" #"image_7.jpg"
    

        # Read and encode the image to base64
        with open(image_path, "rb") as image_file:
            image_data = base64.b64encode(image_file.read()).decode('utf-8')
        
        # Prepare the JSON payload
        payload = {
            "content": "Image: [img-22].\n Optional Caption",
            "image_data": [
                {
                    "id": 22,
                    "data": image_data
                }
            ]
        }
        
        # Make the POST request
        url = "http://localhost:8080/embeddings"
        headers = {
            "Content-Type": "application/json"
        }
        
        response = requests.post(url, json=payload, headers=headers)

0 replies

thoddnn · 2025-08-21T06:52:12Z

thoddnn
Aug 21, 2025
Author

PR #15108 looks like it should fix the issue, but I’m still running into trouble when trying it with the binaries from his fork of llama.cpp.

I have tried to send the following JSON to http://localhost:8080/embedding

payload = {
            "prompt": { "prompt_string": "Describe the image", "multimodal_data": [base64_image_data_string] },
        }

But I get the error

Error Response: {"error":{"code":400,"message":"\"input\" or \"content\" must be provided","type":"invalid_request_error"}}

@oobabooga Could you provide some guidance on how to generate embeddings from both images and text? 🙏 Thanks a lot

0 replies

65a · 2025-08-28T00:06:33Z

65a
Aug 28, 2025

@thoddnn you were really close! #15108 changes the type of the prompt itself, but not the outer JSON content key. So you should still use content, but the value of content is the JSON object with prompt_string and multimodal_data keys. The test may clarify:

llama.cpp/tools/server/tests/unit/test_vision_api.py

Line 130 in fbef0fa

res = server.make_request("POST", "/embeddings", data={

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to send an image to a multimodal embedding model ? #13666

Uh oh!

{{title}}

Uh oh!

Replies: 7 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to send an image to a multimodal embedding model ? #13666

Uh oh!

thoddnn May 20, 2025

Replies: 7 comments · 1 reply

Uh oh!

gnusupport Jun 24, 2025

Uh oh!

easyfab Jun 24, 2025

Uh oh!

gnusupport Jun 25, 2025

Uh oh!

Uh oh!

TafadzwaD Jun 25, 2025

Uh oh!

jjhtr Jul 13, 2025

Uh oh!

thoddnn Aug 20, 2025 Author

Uh oh!

thoddnn Aug 21, 2025 Author

Uh oh!

65a Aug 28, 2025

thoddnn
May 20, 2025

Replies: 7 comments 1 reply

gnusupport
Jun 24, 2025

gnusupport
Jun 25, 2025

TafadzwaD
Jun 25, 2025

jjhtr
Jul 13, 2025

thoddnn
Aug 20, 2025
Author

thoddnn
Aug 21, 2025
Author

65a
Aug 28, 2025