[ML 9.3] Add `max_batch_size` parameter for GoogleVertexAI service settings

Related dev PR: https://github.com/elastic/elasticsearch/pull/138047

The GoogleVertexAI service_settings object now supports a new parameter: `max_batch_size` (integer).
This parameter limits the batch size of chunked inference requests sent to GoogleVertexAI.
It can be used together with the max_chunk_size parameter to better control request sizes and avoid exceeding GoogleVertexAI’s token limits.

Required documentation changes

1. Add `max_batch_size` to the GoogleVertexAI service_settings object.

2. Add a NOTE about the formula estimating a safe batch size:
Formula:
`batch_size ≈ max_chunk_size * 1.3 (tokens per word) * 512 (max chunks per document) / 20000 (GoogleVertexAI token limit)`

3. Update the request example to include this parameter

Additional info: https://docs.google.com/document/d/1ObKSCEJlucp1a3j5iz657aYNqvEvhIoGDRhV5MdYK5Q/edit?tab=t.0


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML 9.3] Add `max_batch_size` parameter for GoogleVertexAI service settings #4263

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ML 9.3] Add max_batch_size parameter for GoogleVertexAI service settings #4263

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[ML 9.3] Add `max_batch_size` parameter for GoogleVertexAI service settings #4263