-
Notifications
You must be signed in to change notification settings - Fork 89
Fix Issue #211 : Improved Embedding Performance by Handling Base64 Encoding #295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit includes the fix described in Issue openai#211. * Addressed the issue where Base64 encoding could not be handled. * Improved performance by using Base64 encoding by default.
This PR code will run with look like following code style. package com.openai.example;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CountDownLatch;
import com.openai.azure.AzureOpenAIServiceVersion;
import com.openai.azure.credential.AzureApiKeyCredential;
import com.openai.client.OpenAIClient;
import com.openai.client.OpenAIClientAsync;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.client.okhttp.OpenAIOkHttpClientAsync;
import com.openai.models.CreateEmbeddingResponse;
import com.openai.models.EmbeddingCreateParams;
import com.openai.services.blocking.EmbeddingService;
import com.openai.models.EmbeddingModel;
public final class EmbeddingsExample {
private EmbeddingsExample() {}
private static final String AZURE_OPENAI_ENDPOINT = "https://$AOAI.openai.azure.com";
private static final String AZURE_OPENAI_KEY = $AOAI_KEY;
private static OpenAIClient client;
public static void main(String[] args) {
client = OpenAIOkHttpClient.builder().baseUrl(AZURE_OPENAI_ENDPOINT)
.credential(AzureApiKeyCredential.create(AZURE_OPENAI_KEY))
.azureServiceVersion(AzureOpenAIServiceVersion.getV2024_02_15_PREVIEW()).build();
EmbeddingsExample example = new EmbeddingsExample();
example.basicSample();
example.multipleDataSample();
example.asyncSample();
}
// Basic Sample
public void basicSample() {
EmbeddingService embeddings = client.embeddings();
String singlePoem = "In the quiet night, stars whisper secrets, dreams take flight.";
// No specified format
EmbeddingCreateParams embeddingCreateParams = EmbeddingCreateParams.builder()
.input(singlePoem).model(EmbeddingModel.TEXT_EMBEDDING_3_SMALL.asString()).build();
embeddings.create(embeddingCreateParams).data().forEach(embedding -> {
// It will show both base64 and float embedding
System.out.println("EMBEDDEING (Default non-format toString()) -------------"
+ embedding.toString());
});
System.out.println("------------------------------------------------");
// Specified FloatEmbedding format
EmbeddingCreateParams embeddingCreateParams2 = EmbeddingCreateParams.builder()
.input(singlePoem).model(EmbeddingModel.TEXT_EMBEDDING_3_SMALL.asString())
.encodingFormat(EmbeddingCreateParams.EncodingFormat.FLOAT).build();
embeddings.create(embeddingCreateParams).data().forEach(embedding -> {
embedding.embedding().getFloatEmbedding().ifPresent(emb -> {
System.out.println("EMBEDDEING FLOAT (Float Embedding) -------------" + emb);
});
embedding.embedding().getBase64Embedding().ifPresent(emb -> {
System.out.println("EMBEDDEING BASE64 (Float Embedding) -------------" + emb);
});
System.out.println("------------------------------------------------");
});
// Specified Base64Embedding format
EmbeddingCreateParams embeddingCreateParams3 = EmbeddingCreateParams.builder()
.input(singlePoem).model(EmbeddingModel.TEXT_EMBEDDING_3_SMALL.asString())
.encodingFormat(EmbeddingCreateParams.EncodingFormat.BASE64).build();
embeddings.create(embeddingCreateParams).data().forEach(embedding -> {
embedding.embedding().getBase64Embedding().ifPresent(emb -> {
System.out.println("EMBEDDEING BASE64 (Base64 Embedding) -------------" + emb);
});
embedding.embedding().getFloatEmbedding().ifPresent(emb -> {
System.out.println("EMBEDDEING FLOAT (Base64 Embedding) -------------" + emb);
});
System.out.println("------------------------------------------------");
});
}
// Multiple Data Sample
public void multipleDataSample() {
EmbeddingService embeddings = client.embeddings();
getPoems().forEach(poem -> {
System.out.println("POEM (START) -------------" + poem);
EmbeddingCreateParams embeddingCreateParams = EmbeddingCreateParams.builder()
.input(poem).model(EmbeddingModel.TEXT_EMBEDDING_3_SMALL.asString()).build();
embeddings.create(embeddingCreateParams).data().forEach(embedding -> {
embedding.embedding().getFloatEmbedding().ifPresent(emb -> {
System.out.println("EMBEDDEING Float (by Dfault) -------------" + emb);
});
});
System.out.println("POEM (END) -------------" + poem);
});
}
// Async Sample
public void asyncSample() {
CountDownLatch latch = new CountDownLatch(1);
try {
OpenAIClientAsync asyncClient = OpenAIOkHttpClientAsync.builder()
.baseUrl(AZURE_OPENAI_ENDPOINT)
.credential(AzureApiKeyCredential.create(AZURE_OPENAI_KEY))
.azureServiceVersion(AzureOpenAIServiceVersion.getV2024_02_15_PREVIEW())
.build();
CompletableFuture<CreateEmbeddingResponse> completableFuture = asyncClient.embeddings()
.create(EmbeddingCreateParams.builder()
.input("The quick brown fox jumped over the lazy dog")
.model(EmbeddingModel.TEXT_EMBEDDING_3_SMALL)
.encodingFormat(EmbeddingCreateParams.EncodingFormat.FLOAT)
.user("user-1234").build());
completableFuture.thenAccept(response -> {
response.validate();
response.data().forEach(embedding -> {
System.out.println("EMBEDDEING (Async) -------------" + embedding.toString());
latch.countDown();
});
}).exceptionally(ex -> {
System.err.println("Error: " + ex.getMessage());
latch.countDown();
return null;
});
latch.await();
System.out.println("Latch count down completed");
System.exit(0);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
private List<String> getPoems() {
List<String> poems = new ArrayList<>();
poems.add("In the quiet night, stars whisper secrets, dreams take flight.");
poems.add("Beneath the moon's glow, shadows dance, hearts begin to know.");
poems.add("Waves crash on the shore, time stands still, love forevermore.");
poems.add("Autumn leaves fall, painting the ground, nature's final call.");
poems.add("Morning dew glistens, a new day dawns, hope always listens.");
poems.add("Mountains stand tall, silent guardians, witnessing it all.");
poems.add("In a field of green, flowers bloom bright, a serene scene.");
poems.add("Winter's chill bites, fireside warmth, cozy, long nights.");
poems.add("Spring's gentle breeze, life awakens, hearts find ease.");
poems.add("Sunset hues blend, day meets night, a perfect end.");
return poems;
}
} |
Due to a conflict, it was faster to create a new one rather than resolve the existing conflict. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit includes the fix described in Issue #211.
Detail
This pull request to
openai-java-core
includes significant changes to theEmbedding
model to support both float and Base64 representations of embedding vectors. The most important changes include the introduction of a newEmbeddingValue
class, updates to theEmbedding
class to use this new type, and modifications to the associated deserialization and test files.Changes to
Embedding
model:openai-java-core/src/main/kotlin/com/openai/models/Embedding.kt
: Changed the type ofembedding
fromJsonField<List<Double>>
toJsonField<EmbeddingValue>
, updated methods to returnEmbeddingValue
instead ofList<Double>
, and modified the builder to handleEmbeddingValue
. [1] [2] [3] [4] [5] [6]Introduction of
EmbeddingValue
class:openai-java-core/src/main/kotlin/com/openai/models/EmbeddingValue.kt
: Added a new classEmbeddingValue
to represent embedding vectors, supporting both float and Base64 representations.Deserialization updates:
openai-java-core/src/main/kotlin/com/openai/models/EmbeddingValueDeserializer.kt
: Added a new deserializerEmbeddingValueDeserializer
to handle JSON deserialization forEmbeddingValue
.Default encoding format change:
openai-java-core/src/main/kotlin/com/openai/models/EmbeddingCreateParams.kt
: Set the defaultEncodingFormat
value toBASE64
for performance improvements.Test updates:
openai-java-core/src/test/kotlin/com/openai/models/CreateEmbeddingResponseTest.kt
: Updated tests to useEmbeddingValue
instead ofList<Double>
.openai-java-core/src/test/kotlin/com/openai/models/EmbeddingTest.kt
: Updated tests to verify the newEmbeddingValue
structure.