Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] DSPy Audio/Video Support Tracking #7847

Open
2 tasks
isaacbmiller opened this issue Feb 24, 2025 · 2 comments
Open
2 tasks

[Feature] DSPy Audio/Video Support Tracking #7847

isaacbmiller opened this issue Feb 24, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@isaacbmiller
Copy link
Collaborator

isaacbmiller commented Feb 24, 2025

What feature would you like to see?

We have received a number of requests for Audio and Video input support over the last few months (#2037, #7844, etc.)

I implemented DSPy.Image, and am looking for someone to help out and create similar or better implementations for audio and/or video inputs. It would be shocking to me if some good prompting and few shot suppport for audio would greatly help in some use cases, and also being able to script with audio in the same way that you can with text inputs.

For someone to implement this, there are a few required steps for the implementation I am imagining:

  1. create a class similar to Image (see adapters/image_utils.py)
  2. edit chat_adapter and json_adapter to have a try_expand_audio_tags method that will search and expand messages with multimodal inputs
  3. Write tests similar to tests/signatures/test_adapter_image.py to make sure it can work with a variety of signature types and input methods

I don't know much about the audio input APIs to really know what the speedbumps on this implementation are going to be.

As a first step, I would choose either the OpenAI API or Gemini, and get it working for that provider with whatever hacky code is needed, then expand and abstract after that.

feel free to @ me in the discord username is ibmiller if you need help

Would you like to contribute?

  • Yes, I'd like to help implement this.
  • No, I just want to request it.

Additional Context

No response

@isaacbmiller isaacbmiller added the enhancement New feature or request label Feb 24, 2025
@pretbc
Copy link

pretbc commented Feb 25, 2025

Yes, I'd like to help implement this.

@ramisbahi
Copy link

Yes, I'd like to help implement this: I'd like to help with implementing support for Video input. Additionally, I'm looking into benchmarking video understanding performance to evaluate how well DSPy can process video-based inputs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants