You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description:
I am trying to pass messages containing audio data into dspy.Predict, but it seems that the model is analyzing the base64 string of the audio instead of properly processing the audio content.
Code Snippet:
lm = dspy.LM(
"gemini-2.0-flash-exp", api_key=os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
)
dspy.configure(lm=lm)
audio_path = "temp_segment_1894 1.wav"
audio_data = pathlib.Path(audio_path).read_bytes()
audio_data_base64 = base64.b64encode(audio_data).decode("utf-8")
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "Analyze audio"},
{
"type": "image_url",
"image_url": "data:audio/wav;base64,{}".format(
audio_data_base64
),
},
],
}
]
print(lm(messages=messages)) # This works correctly
classify = dspy.Predict('messages -> sentiment')
# Issue: Cannot pass messages; the output seems to analyze the base64 string instead of the actual audio content.
Expected Behavior:
The model should process the audio data properly and return the sentiment analysis.
Observed Behavior:
dspy.Predict appears to be treating the base64 string as text instead of decoding and analyzing the actual audio.
Questions:
How should messages be passed to dspy.Predict correctly?
Is there a way to specify that messages contain audio data so the model processes it correctly?
Should a custom data structure or preprocessing step be added before passing messages?
Environment:
dspy version: [2.6.6]
Model: gemini-2.0-flash-exp
Python version: [3.12]
Any guidance on properly passing audio messages into Prediction would be greatly appreciated.
Steps to reproduce
provided as code snippet
DSPy version
2.6.6
The text was updated successfully, but these errors were encountered:
╰─ git push --set-upstream origin feature/audio_utils ─╯
remote: Permission to stanfordnlp/dspy.git denied to pretbc.
fatal: unable to access 'https://github.com/stanfordnlp/dspy.git/': The requested URL returned error: 403
What happened?
Description:
I am trying to pass messages containing audio data into dspy.Predict, but it seems that the model is analyzing the base64 string of the audio instead of properly processing the audio content.
Code Snippet:
Expected Behavior:
The model should process the audio data properly and return the sentiment analysis.
Observed Behavior:
dspy.Predict appears to be treating the base64 string as text instead of decoding and analyzing the actual audio.
Questions:
How should messages be passed to dspy.Predict correctly?
Is there a way to specify that messages contain audio data so the model processes it correctly?
Should a custom data structure or preprocessing step be added before passing messages?
Environment:
dspy version: [2.6.6]
Model: gemini-2.0-flash-exp
Python version: [3.12]
Any guidance on properly passing audio messages into Prediction would be greatly appreciated.
Steps to reproduce
provided as code snippet
DSPy version
2.6.6
The text was updated successfully, but these errors were encountered: