Perf: Improve vector embeddings creation by 60%

### Confirm this is a feature request for the Node library and not the underlying OpenAI API.

- [x] This is a feature request for the Node library

### Describe the feature or improvement you're requesting

The current implementation of the openai-node SDK does not specify a default value for the `encoding_format` argument. However, In the python SDK, this is [defaulted](https://github.com/tonybaloney/openai-python/blob/main/src/openai/resources/embeddings.py#L106-L107) to `base64`.

After running a few [benchmarks](https://github.com/manekinekko/rich-bench-node), requesting base64 encoded embeddings returns smaller body sizes, on average **~60% smaller** than float32 encoded. In other words, the size of the response body containing embeddings in float32 is **~2.3x bigger** than base64 encoded embedding.

This performance improvement could translate to:
- ✅ Faster HTTP responses
- ✅ Less bandwidth used when generating multiple embeddings

This is the result of a request that creates embedding from a 10kb chunk, run 10 times (the number are the size of response body in kb):

| Benchmark        | Min (ms) | Max (ms)  | Mean (ms)  | Min (+)          | Max (+)           | Mean (+)          |
|-----------------|---------:|----------:|----------:|-----------------:|-------------------:|------------------:|
| float32 vs base64 |   41.742 | 19616.000 |  9848.819 | 40.094 (3.9%)    | 8351.000 (57.4%)  | 4206.126 (57.3%)  |


I think this can easily be patched as follows:
- we always request embedding creating encoded as base64 
- when we get the response back
  - if user had asked for float32, we decode base64 to float32
  - if user had asked for base64, we pass the base64 response
  - if user hadn't specify an encoding, we decode base64 to float32 (for backward compat)

Something we need to keep in mind is that currently the default value specified by the REST API is float32. This means that users are expecting to get a list of float32 if they don't provide an encoding arg. This is a requirement, we don't want to break backward compatibility.
 
Also, we know base64 encoding is faster (less bytes going throught the network). So no matter what the user asked for (float32 or base64), we can force the encoding to base64 when creating embeddings.
 
When we get the response back from the API, and because of backward compat, we try to return a list of float32 (and for that we decode the base64 encoded embedding string). If the user was initially asking for base64, we simply pass the response.

I've sent a patch here https://github.com/openai/openai-node/pull/1312.

Related work https://github.com/openai/openai-python/pull/2060
cc @tonybaloney @johnpapa @danwahlin

### Additional context

In the python SDK, @tonybaloney has done great [job](https://github.com/openai/openai-python/pull/2060) switching from Numpy to stdlib `array` which improves vector embeddings base64 decoding at runtime.

Also, @RobertCraigie has already identified this issue https://github.com/openai/openai-python/issues/1490#issuecomment-2186675988

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perf: Improve vector embeddings creation by 60% #1310

Confirm this is a feature request for the Node library and not the underlying OpenAI API.

Describe the feature or improvement you're requesting

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Perf: Improve vector embeddings creation by 60% #1310

Description

Confirm this is a feature request for the Node library and not the underlying OpenAI API.

Describe the feature or improvement you're requesting

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions