Skip to content

Commit

Permalink
Support audio input
Browse files Browse the repository at this point in the history
  • Loading branch information
johnd0e committed Nov 26, 2024
1 parent 3cfee3e commit 1c0a0f3
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 4 deletions.
10 changes: 10 additions & 0 deletions readme.MD
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,16 @@ or "models/". Otherwise, these defaults apply:

[model]: https://ai.google.dev/gemini-api/docs/models/gemini


## Media

[Vision] and [audio] input supported as per OpenAI [specs].
Implemented via [`inlineData`](https://ai.google.dev/api/caching#Part).

[vision]: https://platform.openai.com/docs/guides/vision
[audio]: https://platform.openai.com/docs/guides/audio?audio-generation-quickstart-example=audio-in
[specs]: https://platform.openai.com/docs/api-reference/chat/create

---

## Possible further development
Expand Down
15 changes: 11 additions & 4 deletions src/worker.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -263,11 +263,10 @@ const transformMsg = async ({ role, content }) => {
parts.push({ text: content });
return { role, parts };
}
// OpenAI "model": "gpt-4-vision-preview"
// user:
// An array of content parts with a defined type, each can be of type text or image_url when passing in images.
// You can pass multiple images by adding multiple image_url content parts.
// Image input is only supported when using the gpt-4-visual-preview model.
// An array of content parts with a defined type.
// Supported options differ based on the model being used to generate the response.
// Can contain text, image, or audio inputs.
for (const item of content) {
switch (item.type) {
case "text":
Expand All @@ -276,6 +275,14 @@ const transformMsg = async ({ role, content }) => {
case "image_url":
parts.push(await parseImg(item.image_url.url));
break;
case "input_audio":
parts.push({
inlineData: {
mimeType: "audio/" + item.input_audio.format,
data: item.input_audio.data,
}
});
break;
default:
throw new TypeError(`Unknown "content" item type: "${item.type}"`);
}
Expand Down

0 comments on commit 1c0a0f3

Please sign in to comment.