Skip to content

Inference Extension: Golang app to query the Endpoint Picker #3837

@sjberman

Description

@sjberman

The Endpoint Picker (EPP) is a component deployed by the Inference Platform Owner that chooses which AI workload endpoint should be the one to receive a client request. It uses the ext_proc protocol via gRPC.

When NGINX receives a client request destined for an AI workload, it needs to call to the EPP to get the proper endpoint to send to. However, NGINX cannot speak gRPC/ext_proc. We need some middleware to help with this. For our initial iteration, this probably requires two pieces: a Go application that can send the request to the EPP, and an NJS module that can initiate a subrequest from NGINX to the Go application. The flow is as follows:

NGINX -> NJS subrequest -> Go -> EPP

the EPP responds with the desired AI endpoint in a header, and then NGINX should forward the client request to that endpoint using proxy_pass.

This story is just to build the Go app.

Acceptance Criteria:

  • write a Go application that uses the proper protocol to query the EPP, using the contents of the client request (body and headers), to receive the AI workload endpoint via a response header.
  • the Go app should be listening for requests (from the NJS module) so it knows when it needs to query the EPP
  • the Go application should be deployed in its own container in the NGINX Pod
  • the application should only be deployed if NGF is configuring NGINX to handle AI workloads

Developer Notes:

  • we can just enhance the existing nginx-gateway binary to include this functionality, since it already includes a few different commands
  • the control plane could patch the NGINX deployment with this extra container if it sees an HTTPRoute for that nginx instance that references an InferencePool

Design doc: https://github.com/nginx/nginx-gateway-fabric/blob/main/docs/proposals/gateway-inference-extension.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/inference-extensionRelated to the Gateway API Inference ExtensionenhancementNew feature or requestrefinedRequirements are refined and the issue is ready to be implemented.size/largeEstimated to be completed within two weeks

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions