-
Notifications
You must be signed in to change notification settings - Fork 138
Description
The Endpoint Picker (EPP) is a component deployed by the Inference Platform Owner that chooses which AI workload endpoint should be the one to receive a client request. It uses the ext_proc protocol via gRPC.
When NGINX receives a client request destined for an AI workload, it needs to call to the EPP to get the proper endpoint to send to. However, NGINX cannot speak gRPC/ext_proc. We need some middleware to help with this. For our initial iteration, this probably requires two pieces: a Go application that can send the request to the EPP, and an NJS module that can initiate a subrequest from NGINX to the Go application. The flow is as follows:
NGINX -> NJS subrequest -> Go -> EPP
the EPP responds with the desired AI endpoint in a header, and then NGINX should forward the client request to that endpoint using proxy_pass
.
This story is just to build the Go app.
Acceptance Criteria:
- write a Go application that uses the proper protocol to query the EPP, using the contents of the client request (body and headers), to receive the AI workload endpoint via a response header.
- the Go app should be listening for requests (from the NJS module) so it knows when it needs to query the EPP
- the Go application should be deployed in its own container in the NGINX Pod
- the application should only be deployed if NGF is configuring NGINX to handle AI workloads
Developer Notes:
- we can just enhance the existing
nginx-gateway
binary to include this functionality, since it already includes a few different commands - the control plane could patch the NGINX deployment with this extra container if it sees an HTTPRoute for that nginx instance that references an InferencePool
Design doc: https://github.com/nginx/nginx-gateway-fabric/blob/main/docs/proposals/gateway-inference-extension.md
Metadata
Metadata
Assignees
Labels
Type
Projects
Status