-
Notifications
You must be signed in to change notification settings - Fork 190
Description
Related dev PR: elastic/elasticsearch#138047
The GoogleVertexAI service_settings object now supports a new parameter: max_batch_size (integer).
This parameter limits the batch size of chunked inference requests sent to GoogleVertexAI.
It can be used together with the max_chunk_size parameter to better control request sizes and avoid exceeding GoogleVertexAI’s token limits.
Required documentation changes
-
Add
max_batch_sizeto the GoogleVertexAI service_settings object. -
Add a NOTE about the formula estimating a safe batch size:
Formula:
batch_size ≈ max_chunk_size * 1.3 (tokens per word) * 512 (max chunks per document) / 20000 (GoogleVertexAI token limit) -
Update the request example to include this parameter
Additional info: https://docs.google.com/document/d/1ObKSCEJlucp1a3j5iz657aYNqvEvhIoGDRhV5MdYK5Q/edit?tab=t.0