Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Unable to Pass Messages into Prediction in dspy #7844

Open
pretbc opened this issue Feb 24, 2025 · 4 comments
Open

[Bug] Unable to Pass Messages into Prediction in dspy #7844

pretbc opened this issue Feb 24, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@pretbc
Copy link

pretbc commented Feb 24, 2025

What happened?

Description:
I am trying to pass messages containing audio data into dspy.Predict, but it seems that the model is analyzing the base64 string of the audio instead of properly processing the audio content.

Code Snippet:

lm = dspy.LM(
    "gemini-2.0-flash-exp", api_key=os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
)
dspy.configure(lm=lm)

audio_path = "temp_segment_1894 1.wav"

audio_data = pathlib.Path(audio_path).read_bytes()
audio_data_base64 = base64.b64encode(audio_data).decode("utf-8")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Analyze audio"},
            {
                "type": "image_url",
                "image_url": "data:audio/wav;base64,{}".format(
                    audio_data_base64
                ),
            },
        ],
    }
]
print(lm(messages=messages))  # This works correctly

classify = dspy.Predict('messages -> sentiment')  
# Issue: Cannot pass messages; the output seems to analyze the base64 string instead of the actual audio content.

Expected Behavior:

The model should process the audio data properly and return the sentiment analysis.
Observed Behavior:

dspy.Predict appears to be treating the base64 string as text instead of decoding and analyzing the actual audio.
Questions:

How should messages be passed to dspy.Predict correctly?
Is there a way to specify that messages contain audio data so the model processes it correctly?
Should a custom data structure or preprocessing step be added before passing messages?
Environment:

dspy version: [2.6.6]
Model: gemini-2.0-flash-exp
Python version: [3.12]
Any guidance on properly passing audio messages into Prediction would be greatly appreciated.

Steps to reproduce

provided as code snippet

DSPy version

2.6.6

@pretbc pretbc added the bug Something isn't working label Feb 24, 2025
@okhat
Copy link
Collaborator

okhat commented Feb 24, 2025

Hey @pretbc ! Audio is not supported. cc @isaacbmiller

@pretbc
Copy link
Author

pretbc commented Feb 24, 2025

Any ETA? Or should i write it by my self base on dspy.Image

@isaacbmiller
Copy link
Collaborator

feel free to write it yourself and tag me in the PR!

would be awesome to have an implementation - i just don't have bandwidth personally to do this rn

@pretbc
Copy link
Author

pretbc commented Feb 24, 2025

got feature but cannot push

╰─ git push --set-upstream origin feature/audio_utils ─╯
remote: Permission to stanfordnlp/dspy.git denied to pretbc.
fatal: unable to access 'https://github.com/stanfordnlp/dspy.git/': The requested URL returned error: 403

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants