Inference Extension: Golang app to query the Endpoint Picker

The Endpoint Picker (EPP) is a component deployed by the Inference Platform Owner that chooses which AI workload endpoint should be the one to receive a client request. It uses the [ext_proc protocol](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/004-endpoint-picker-protocol) via gRPC.

When NGINX receives a client request destined for an AI workload, it needs to call to the EPP to get the proper endpoint to send to. However, NGINX cannot speak gRPC/ext_proc. We need some middleware to help with this. For our initial iteration, this probably requires two pieces: a Go application that can send the request to the EPP, and an NJS module that can initiate a subrequest from NGINX to the Go application. The flow is as follows:

NGINX -> NJS subrequest -> Go -> EPP

the EPP responds with the desired AI endpoint in a header, and then NGINX should forward the client request to that endpoint using `proxy_pass`.

This story is just to build the Go app.

Acceptance Criteria:
- write a Go application that uses the proper protocol to query the EPP, using the contents of the client request (body and headers), to receive the AI workload endpoint via a response header.
- the Go app should be listening for requests (from the NJS module) so it knows when it needs to query the EPP
- the Go application should be deployed in its own container in the NGINX Pod
- the application should only be deployed if NGF is configuring NGINX to handle AI workloads

Developer Notes:
- we can just enhance the existing `nginx-gateway` binary to include this functionality, since it already includes a few different commands
- the control plane could patch the NGINX deployment with this extra container if it sees an HTTPRoute for that nginx instance that references an InferencePool

Design doc: https://github.com/nginx/nginx-gateway-fabric/blob/main/docs/proposals/gateway-inference-extension.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inference Extension: Golang app to query the Endpoint Picker #3837

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inference Extension: Golang app to query the Endpoint Picker #3837

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions