-
Notifications
You must be signed in to change notification settings - Fork 138
Description
When a client sends a request to an AI workload, the desired model name (e.g. gpt-4o, llama, etc.) is included in the request body.
By default, the EPP gets the model name from the request body, and then picks the proper endpoint for that model name. However, the model name could also be provided via header (X-Gateway-Model-Name
). For example, a user could specify a desire for a traffic split, and therefore NGINX would need to change the model name depending the weighted traffic split decision by setting the header.
Using the NJS module that extracts the model name from the request body, we should set the X-Gateway-Model-Name
header appropriately when querying the EPP, so that it returns the proper endpoint based on the model that the user desired for their request.
Acceptance Criteria:
- If the HTTPRoute specifies an Exact match condition on the
X-Gateway-Model-Name
header AND specifies aRequestHeaderModifier
for this header, NGINX extracts the model name from the client request body. - If the user is performing a traffic split (see YAML example 2 in the design doc), and if the model name extracted in the request MATCHES the condition specified in the HTTPRoute, then NGINX should set the new header value based on the
RequestHeaderModifier
filter and the proper weighted decision (usingsplit_clients
) in the request to the EPP.
Design doc: https://github.com/nginx/nginx-gateway-fabric/blob/main/docs/proposals/gateway-inference-extension.md
Metadata
Metadata
Assignees
Labels
Type
Projects
Status