-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf: Improve vector embeddings creation by 60% #1310
Comments
manekinekko
added a commit
to manekinekko/openai-node
that referenced
this issue
Feb 8, 2025
Requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding. We always request embedding creating encoded as base64, and then decoded them to float32 based on the user's provided encoding_format parameter. Closes openai#1310
1 task
manekinekko
added a commit
to manekinekko/openai-node
that referenced
this issue
Feb 8, 2025
Requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding. We always request embedding creating encoded as base64, and then decoded them to float32 based on the user's provided encoding_format parameter. Closes openai#1310
manekinekko
added a commit
to manekinekko/openai-node
that referenced
this issue
Feb 24, 2025
Requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding. We always request embedding creating encoded as base64, and then decoded them to float32 based on the user's provided encoding_format parameter. Closes openai#1310
manekinekko
added a commit
to manekinekko/openai-node
that referenced
this issue
Feb 24, 2025
manekinekko
added a commit
to manekinekko/openai-node
that referenced
this issue
Mar 4, 2025
Requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding. We always request embedding creating encoded as base64, and then decoded them to float32 based on the user's provided encoding_format parameter. Closes openai#1310
manekinekko
added a commit
to manekinekko/openai-node
that referenced
this issue
Mar 4, 2025
manekinekko
added a commit
to manekinekko/openai-node
that referenced
this issue
Mar 6, 2025
Requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding. We always request embedding creating encoded as base64, and then decoded them to float32 based on the user's provided encoding_format parameter. Closes openai#1310
manekinekko
added a commit
to manekinekko/openai-node
that referenced
this issue
Mar 6, 2025
manekinekko
added a commit
to manekinekko/openai-node
that referenced
this issue
Mar 12, 2025
Requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding. We always request embedding creating encoded as base64, and then decoded them to float32 based on the user's provided encoding_format parameter. Closes openai#1310
manekinekko
added a commit
to manekinekko/openai-node
that referenced
this issue
Mar 12, 2025
manekinekko
added a commit
to manekinekko/openai-node
that referenced
this issue
Mar 20, 2025
Requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding. We always request embedding creating encoded as base64, and then decoded them to float32 based on the user's provided encoding_format parameter. Closes openai#1310
manekinekko
added a commit
to manekinekko/openai-node
that referenced
this issue
Mar 20, 2025
manekinekko
added a commit
to manekinekko/openai-node
that referenced
this issue
Mar 25, 2025
Requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding. We always request embedding creating encoded as base64, and then decoded them to float32 based on the user's provided encoding_format parameter. Closes openai#1310
manekinekko
added a commit
to manekinekko/openai-node
that referenced
this issue
Mar 25, 2025
manekinekko
added a commit
to manekinekko/openai-node
that referenced
this issue
Mar 27, 2025
Requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding. We always request embedding creating encoded as base64, and then decoded them to float32 based on the user's provided encoding_format parameter. Closes openai#1310
manekinekko
added a commit
to manekinekko/openai-node
that referenced
this issue
Mar 27, 2025
RobertCraigie
added a commit
that referenced
this issue
Mar 28, 2025
* perf(embedding): always request embedding creation as base64 Requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding. We always request embedding creating encoded as base64, and then decoded them to float32 based on the user's provided encoding_format parameter. Closes #1310 Co-authored-by: Robert Craigie <[email protected]>
Merged
closing as this will be fixed in the next release 4.91.0, #1429 |
I appreciate it. Thank you @RobertCraigie |
stainless-app bot
pushed a commit
that referenced
this issue
Mar 31, 2025
* perf(embedding): always request embedding creation as base64 Requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding. We always request embedding creating encoded as base64, and then decoded them to float32 based on the user's provided encoding_format parameter. Closes #1310 Co-authored-by: Robert Craigie <[email protected]>
stainless-app bot
pushed a commit
that referenced
this issue
Mar 31, 2025
* perf(embedding): always request embedding creation as base64 Requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding. We always request embedding creating encoded as base64, and then decoded them to float32 based on the user's provided encoding_format parameter. Closes #1310 Co-authored-by: Robert Craigie <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Confirm this is a feature request for the Node library and not the underlying OpenAI API.
Describe the feature or improvement you're requesting
The current implementation of the openai-node SDK does not specify a default value for the
encoding_format
argument. However, In the python SDK, this is defaulted tobase64
.After running a few benchmarks, requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding.
This performance improvement could translate to:
This is the result of a request that creates embedding from a 10kb chunk, run 10 times (the number are the size of response body in kb):
I think this can easily be patched as follows:
Something we need to keep in mind is that currently the default value specified by the REST API is float32. This means that users are expecting to get a list of float32 if they don't provide an encoding arg. This is a requirement, we don't want to break backward compatibility.
Also, we know base64 encoding is faster (less bytes going throught the network). So no matter what the user asked for (float32 or base64), we can force the encoding to base64 when creating embeddings.
When we get the response back from the API, and because of backward compat, we try to return a list of float32 (and for that we decode the base64 encoded embedding string). If the user was initially asking for base64, we simply pass the response.
I've sent a patch here #1312.
Related work openai/openai-python#2060
cc @tonybaloney @johnpapa @DanWahlin
Additional context
In the python SDK, @tonybaloney has done great job switching from Numpy to stdlib
array
which improves vector embeddings base64 decoding at runtime.Also, @RobertCraigie has already identified this issue openai/openai-python#1490 (comment)
The text was updated successfully, but these errors were encountered: