-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Labels
Description
OpenAI has recently introduced audio multimodality support, both for input and output.
The input audio modality support is introduced in #1560 all the way up to the Spring AI abstractions.
The output audio modality is only supported at the lower level (OpenAIApi
). Its usage is demonstrated in this integration test:
spring-ai/models/spring-ai-openai/src/test/java/org/springframework/ai/openai/api/OpenAiApiIT.java
Lines 98 to 118 in bdb66e5
@Test | |
void outputAudio() { | |
ChatCompletionMessage chatCompletionMessage = new ChatCompletionMessage( | |
"What is the magic spell to make objects fly?", Role.USER); | |
ChatCompletionRequest.AudioParameters audioParameters = new ChatCompletionRequest.AudioParameters( | |
ChatCompletionRequest.AudioParameters.Voice.NOVA, | |
ChatCompletionRequest.AudioParameters.AudioResponseFormat.MP3); | |
ChatCompletionRequest chatCompletionRequest = new ChatCompletionRequest(List.of(chatCompletionMessage), | |
OpenAiApi.ChatModel.GPT_4_O_AUDIO_PREVIEW.getValue(), audioParameters); | |
ResponseEntity<ChatCompletion> response = openAiApi.chatCompletionEntity(chatCompletionRequest); | |
assertThat(response).isNotNull(); | |
assertThat(response.getBody()).isNotNull(); | |
assertThat(response.getBody().usage().promptTokenDetails().audioTokens()).isEqualTo(0); | |
assertThat(response.getBody().usage().completionTokenDetails().audioTokens()).isGreaterThan(0); | |
assertThat(response.getBody().choices().get(0).message().audioOutput().data()).isNotNull(); | |
assertThat(response.getBody().choices().get(0).message().audioOutput().transcript()) | |
.containsIgnoringCase("leviosa"); | |
} |
It would be nice to start identifying what type of abstractions are needed in the ChatResponse
API to include audio response data.