Skip to content

Inference Extension: Documentation #3844

@sjberman

Description

@sjberman

As a user, I want to know how to use NGF with the inference extension, so I can route traffic intelligently to my AI workloads in Kubernetes.

Acceptance Criteria:

  • Add a user guide on how to route traffic to AI workloads using NGF
  • Should cover how to install the Gateway API Inference Extension CRDs, and deploy NGF with the feature flag enabled
  • Should cover how to deploy an InferencePool and EPP, and how to configure an HTTPRoute to reference the InferencePool
  • Explain how to secure traffic between the NGINX pod and the EPP using cert-manager (mentioning that by default, we create self-signed certs)
  • Show examples on model name redirects and traffic splitting
  • Link to Gateway API inference extension docs where it makes sense (for example, these docs may better describe the InferenceObjective CRD and how a user should handle those)

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/inference-extensionRelated to the Gateway API Inference ExtensiondocumentationImprovements or additions to documentation

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions