Skip to content

Fix Issue #211 : Improved Embedding Performance by Handling Base64 Encoding #295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

yoshioterada
Copy link

This commit includes the fix described in Issue #211.

  • Addressed the issue where Base64 encoding could not be handled.
  • Improved performance by using Base64 encoding by default.

Detail

This pull request to openai-java-core includes significant changes to the Embedding model to support both float and Base64 representations of embedding vectors. The most important changes include the introduction of a new EmbeddingValue class, updates to the Embedding class to use this new type, and modifications to the associated deserialization and test files.

Changes to Embedding model:

Introduction of EmbeddingValue class:

Deserialization updates:

Default encoding format change:

Test updates:

This commit includes the fix described in Issue openai#211.

* Addressed the issue where Base64 encoding could not be handled.
* Improved performance by using Base64 encoding by default.
@yoshioterada yoshioterada requested a review from a team as a code owner March 11, 2025 06:06
@yoshioterada
Copy link
Author

This PR code will run with look like following code style.

package com.openai.example;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CountDownLatch;
import com.openai.azure.AzureOpenAIServiceVersion;
import com.openai.azure.credential.AzureApiKeyCredential;
import com.openai.client.OpenAIClient;
import com.openai.client.OpenAIClientAsync;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.client.okhttp.OpenAIOkHttpClientAsync;
import com.openai.models.CreateEmbeddingResponse;
import com.openai.models.EmbeddingCreateParams;
import com.openai.services.blocking.EmbeddingService;
import com.openai.models.EmbeddingModel;

public final class EmbeddingsExample {
    private EmbeddingsExample() {}

    private static final String AZURE_OPENAI_ENDPOINT = "https://$AOAI.openai.azure.com";
    private static final String AZURE_OPENAI_KEY = $AOAI_KEY;

    private static OpenAIClient client;

    public static void main(String[] args) {
        client = OpenAIOkHttpClient.builder().baseUrl(AZURE_OPENAI_ENDPOINT)
                .credential(AzureApiKeyCredential.create(AZURE_OPENAI_KEY))
                .azureServiceVersion(AzureOpenAIServiceVersion.getV2024_02_15_PREVIEW()).build();
        EmbeddingsExample example = new EmbeddingsExample();
        example.basicSample();
        example.multipleDataSample();
        example.asyncSample();
    }

    // Basic Sample
    public void basicSample() {
        EmbeddingService embeddings = client.embeddings();
        String singlePoem = "In the quiet night, stars whisper secrets, dreams take flight.";

        // No specified format
        EmbeddingCreateParams embeddingCreateParams = EmbeddingCreateParams.builder()
                .input(singlePoem).model(EmbeddingModel.TEXT_EMBEDDING_3_SMALL.asString()).build();
        embeddings.create(embeddingCreateParams).data().forEach(embedding -> {
            // It will show both base64 and float embedding
            System.out.println("EMBEDDEING (Default non-format toString()) -------------"
                    + embedding.toString());
        });
        System.out.println("------------------------------------------------");

        // Specified FloatEmbedding format
        EmbeddingCreateParams embeddingCreateParams2 = EmbeddingCreateParams.builder()
                .input(singlePoem).model(EmbeddingModel.TEXT_EMBEDDING_3_SMALL.asString())
                .encodingFormat(EmbeddingCreateParams.EncodingFormat.FLOAT).build();
        embeddings.create(embeddingCreateParams).data().forEach(embedding -> {
            embedding.embedding().getFloatEmbedding().ifPresent(emb -> {
                System.out.println("EMBEDDEING FLOAT (Float Embedding) -------------" + emb);
            });
            embedding.embedding().getBase64Embedding().ifPresent(emb -> {
                System.out.println("EMBEDDEING BASE64  (Float Embedding) -------------" + emb);
            });
            System.out.println("------------------------------------------------");
        });

        // Specified Base64Embedding format
        EmbeddingCreateParams embeddingCreateParams3 = EmbeddingCreateParams.builder()
                .input(singlePoem).model(EmbeddingModel.TEXT_EMBEDDING_3_SMALL.asString())
                .encodingFormat(EmbeddingCreateParams.EncodingFormat.BASE64).build();
        embeddings.create(embeddingCreateParams).data().forEach(embedding -> {
            embedding.embedding().getBase64Embedding().ifPresent(emb -> {
                System.out.println("EMBEDDEING BASE64 (Base64 Embedding) -------------" + emb);
            });
            embedding.embedding().getFloatEmbedding().ifPresent(emb -> {
                System.out.println("EMBEDDEING FLOAT (Base64 Embedding)  -------------" + emb);
            });
            System.out.println("------------------------------------------------");

        });
    }

    // Multiple Data Sample
    public void multipleDataSample() {
        EmbeddingService embeddings = client.embeddings();

        getPoems().forEach(poem -> {
            System.out.println("POEM (START) -------------" + poem);

            EmbeddingCreateParams embeddingCreateParams = EmbeddingCreateParams.builder()
                    .input(poem).model(EmbeddingModel.TEXT_EMBEDDING_3_SMALL.asString()).build();
            embeddings.create(embeddingCreateParams).data().forEach(embedding -> {
                embedding.embedding().getFloatEmbedding().ifPresent(emb -> {
                    System.out.println("EMBEDDEING Float (by Dfault) -------------" + emb);
                });
            });
            System.out.println("POEM (END) -------------" + poem);
        });
    }

    // Async Sample
    public void asyncSample() {
        CountDownLatch latch = new CountDownLatch(1);
        try {
            OpenAIClientAsync asyncClient = OpenAIOkHttpClientAsync.builder()
                    .baseUrl(AZURE_OPENAI_ENDPOINT)
                    .credential(AzureApiKeyCredential.create(AZURE_OPENAI_KEY))
                    .azureServiceVersion(AzureOpenAIServiceVersion.getV2024_02_15_PREVIEW())
                    .build();
            CompletableFuture<CreateEmbeddingResponse> completableFuture = asyncClient.embeddings()
                    .create(EmbeddingCreateParams.builder()
                            .input("The quick brown fox jumped over the lazy dog")
                            .model(EmbeddingModel.TEXT_EMBEDDING_3_SMALL)
                            .encodingFormat(EmbeddingCreateParams.EncodingFormat.FLOAT)
                            .user("user-1234").build());

            completableFuture.thenAccept(response -> {
                response.validate();
                response.data().forEach(embedding -> {
                    System.out.println("EMBEDDEING (Async) -------------" + embedding.toString());
                    latch.countDown();
                });
            }).exceptionally(ex -> {
                System.err.println("Error: " + ex.getMessage());
                latch.countDown();
                return null;
            });
            latch.await();
            System.out.println("Latch count down completed");
            System.exit(0);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }

    private List<String> getPoems() {
        List<String> poems = new ArrayList<>();
        poems.add("In the quiet night, stars whisper secrets, dreams take flight.");
        poems.add("Beneath the moon's glow, shadows dance, hearts begin to know.");
        poems.add("Waves crash on the shore, time stands still, love forevermore.");
        poems.add("Autumn leaves fall, painting the ground, nature's final call.");
        poems.add("Morning dew glistens, a new day dawns, hope always listens.");
        poems.add("Mountains stand tall, silent guardians, witnessing it all.");
        poems.add("In a field of green, flowers bloom bright, a serene scene.");
        poems.add("Winter's chill bites, fireside warmth, cozy, long nights.");
        poems.add("Spring's gentle breeze, life awakens, hearts find ease.");
        poems.add("Sunset hues blend, day meets night, a perfect end.");
        return poems;
    }
}

@yoshioterada
Copy link
Author

Due to a conflict, it was faster to create a new one rather than resolve the existing conflict.
Therefore, I have closed this PR and created a new one as indicated below. Consequently, I will close this PR.

#303

@yoshioterada yoshioterada deleted the fix-issue-211 branch March 12, 2025 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant