[Bug] Unable to Pass Messages into Prediction in dspy #7844

pretbc · 2025-02-24T14:16:02Z

What happened?

Description:
I am trying to pass messages containing audio data into dspy.Predict, but it seems that the model is analyzing the base64 string of the audio instead of properly processing the audio content.

Code Snippet:

lm = dspy.LM(
    "gemini-2.0-flash-exp", api_key=os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
)
dspy.configure(lm=lm)

audio_path = "temp_segment_1894 1.wav"

audio_data = pathlib.Path(audio_path).read_bytes()
audio_data_base64 = base64.b64encode(audio_data).decode("utf-8")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Analyze audio"},
            {
                "type": "image_url",
                "image_url": "data:audio/wav;base64,{}".format(
                    audio_data_base64
                ),
            },
        ],
    }
]
print(lm(messages=messages))  # This works correctly

classify = dspy.Predict('messages -> sentiment')  
# Issue: Cannot pass messages; the output seems to analyze the base64 string instead of the actual audio content.

Expected Behavior:

The model should process the audio data properly and return the sentiment analysis.
Observed Behavior:

dspy.Predict appears to be treating the base64 string as text instead of decoding and analyzing the actual audio.
Questions:

How should messages be passed to dspy.Predict correctly?
Is there a way to specify that messages contain audio data so the model processes it correctly?
Should a custom data structure or preprocessing step be added before passing messages?
Environment:

dspy version: [2.6.6]
Model: gemini-2.0-flash-exp
Python version: [3.12]
Any guidance on properly passing audio messages into Prediction would be greatly appreciated.

Steps to reproduce

provided as code snippet

DSPy version

2.6.6

The text was updated successfully, but these errors were encountered:

okhat · 2025-02-24T15:32:09Z

Hey @pretbc ! Audio is not supported. cc @isaacbmiller

pretbc · 2025-02-24T15:54:43Z

Any ETA? Or should i write it by my self base on dspy.Image

isaacbmiller · 2025-02-24T15:56:02Z

feel free to write it yourself and tag me in the PR!

would be awesome to have an implementation - i just don't have bandwidth personally to do this rn

pretbc · 2025-02-24T21:13:32Z

got feature but cannot push

╰─ git push --set-upstream origin feature/audio_utils ─╯
remote: Permission to stanfordnlp/dspy.git denied to pretbc.
fatal: unable to access 'https://github.com/stanfordnlp/dspy.git/': The requested URL returned error: 403

pretbc added the bug Something isn't working label Feb 24, 2025

isaacbmiller mentioned this issue Feb 24, 2025

[Feature] DSPy Audio/Video Support Tracking #7847

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Unable to Pass Messages into Prediction in dspy #7844

[Bug] Unable to Pass Messages into Prediction in dspy #7844

pretbc commented Feb 24, 2025

okhat commented Feb 24, 2025

pretbc commented Feb 24, 2025

isaacbmiller commented Feb 24, 2025

pretbc commented Feb 24, 2025

[Bug] Unable to Pass Messages into Prediction in dspy #7844

[Bug] Unable to Pass Messages into Prediction in dspy #7844

Comments

pretbc commented Feb 24, 2025

What happened?

Steps to reproduce

DSPy version

okhat commented Feb 24, 2025

pretbc commented Feb 24, 2025

isaacbmiller commented Feb 24, 2025

pretbc commented Feb 24, 2025