Skip to content

feat: initial implementation of Inference Extension #493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 67 commits into from
Apr 2, 2025
Merged
Show file tree
Hide file tree
Changes from 64 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
243a4eb
very minimum poc
mathetake Mar 14, 2025
1c31f27
not working
mathetake Mar 14, 2025
68369b0
merge
mathetake Mar 21, 2025
70eaafc
more sketch
mathetake Mar 21, 2025
6ddeea4
more sketch
mathetake Mar 21, 2025
ffe6e46
more sketch
mathetake Mar 21, 2025
eec4fd3
more sketch
mathetake Mar 21, 2025
b1ab6af
more
mathetake Mar 21, 2025
595b392
more comments
mathetake Mar 21, 2025
dec1816
more comments
mathetake Mar 21, 2025
48aac23
more
mathetake Mar 21, 2025
5f6fe99
more
mathetake Mar 21, 2025
c5a0307
unnecessary changes
mathetake Mar 21, 2025
c016860
unnecessary changes
mathetake Mar 21, 2025
4d1d4ec
tidy
mathetake Mar 22, 2025
55498d5
tidy
mathetake Mar 22, 2025
3bf039d
more unit tests
mathetake Mar 22, 2025
38ec6f9
more
mathetake Mar 24, 2025
4b241e3
more tests
mathetake Mar 24, 2025
edce330
unnecessary changes
mathetake Mar 24, 2025
eaf65c0
more comments
mathetake Mar 24, 2025
6564096
more comments
mathetake Mar 24, 2025
f3f0085
more comments
mathetake Mar 24, 2025
68d87d1
more comments
mathetake Mar 24, 2025
b2b9a77
more comments
mathetake Mar 24, 2025
0f053c0
more tests
mathetake Mar 24, 2025
64d2387
more tests
mathetake Mar 24, 2025
792a101
more
mathetake Mar 24, 2025
aeb24c8
try adding extension server
mathetake Mar 24, 2025
7c43096
more
mathetake Mar 24, 2025
1b0a8ef
more
mathetake Mar 24, 2025
a69db90
more
mathetake Mar 24, 2025
7e05f11
more
mathetake Mar 24, 2025
df4205c
more unit tests
mathetake Mar 24, 2025
717791e
enable
mathetake Mar 24, 2025
e33c5c6
enable
mathetake Mar 24, 2025
fd92ff8
enable
mathetake Mar 24, 2025
744d652
consistent package
mathetake Mar 24, 2025
e236b21
consistent package
mathetake Mar 24, 2025
91c508b
init extproc
mathetake Mar 24, 2025
e12c730
more comments
mathetake Mar 24, 2025
3f83c25
more comments
mathetake Mar 25, 2025
10b80b0
more comments
mathetake Mar 25, 2025
40f4440
limit the scope
mathetake Mar 25, 2025
3d81911
more
mathetake Mar 25, 2025
000979b
more
mathetake Mar 25, 2025
84b8e41
more tests
mathetake Mar 25, 2025
3f8fba3
more tests
mathetake Mar 25, 2025
61c75fd
more tests
mathetake Mar 25, 2025
8da1c5b
adds tests
mathetake Mar 25, 2025
034353b
review: fix typo
mathetake Mar 25, 2025
5917710
refactors
mathetake Mar 25, 2025
544a677
more coverage
mathetake Mar 25, 2025
40e441b
more tests
mathetake Mar 25, 2025
5c992c1
fix test
mathetake Mar 25, 2025
743e1a2
Adds cel validation test
mathetake Mar 25, 2025
3f0e578
removes unnecessary config
mathetake Mar 25, 2025
0e5301c
more bugs
mathetake Mar 25, 2025
43bb81e
more bugs
mathetake Mar 25, 2025
f2b66b6
more tests
mathetake Mar 25, 2025
7250df2
done
mathetake Mar 26, 2025
82df969
more
mathetake Mar 26, 2025
431c442
add more unit test
mathetake Mar 26, 2025
f4a8b5e
merge main
mathetake Mar 26, 2025
fe106d5
more
mathetake Mar 27, 2025
1f96bab
review: fix comment on index
mathetake Apr 2, 2025
5b857fd
Merge remote-tracking branch 'origin/main' into poc
mathetake Apr 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions .golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ linters-settings:
# Do not allow non-required aliases.
no-extra-aliases: false
alias:
# gateway-api
- pkg: sigs.k8s.io/gateway-api/apis/v1
alias: gwapiv1
- pkg: sigs.k8s.io/gateway-api/apis/v1alpha2
Expand All @@ -51,7 +50,8 @@ linters-settings:
alias: egv1a1
- pkg: github.com/envoyproxy/ai-gateway/api/v1alpha1
alias: aigv1a1
# kubernetes api
- pkg: sigs.k8s.io/gateway-api-inference-extension/api/v1alpha2
alias: gwaiev1a2
- pkg: k8s.io/apimachinery/pkg/apis/meta/v1
alias: metav1
- pkg: k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1
Expand All @@ -64,6 +64,12 @@ linters-settings:
alias: apierrors
- pkg: github.com/envoyproxy/ai-gateway/internal/testing
alias: internaltesting
- pkg: github.com/envoyproxy/go-control-plane/envoy/config/cluster/v3
alias: clusterv3
- pkg: github.com/envoyproxy/go-control-plane/envoy/config/route/v3
alias: routev3
- pkg: github.com/envoyproxy/gateway/proto/extension
alias: egextension
gci:
sections:
# Captures all standard packages if they do not match another section.
Expand Down
24 changes: 23 additions & 1 deletion api/v1alpha1/api.go
Original file line number Diff line number Diff line change
Expand Up @@ -218,8 +218,30 @@ type AIGatewayRouteRule struct {
Matches []AIGatewayRouteRuleMatch `json:"matches,omitempty"`
}

// AIGatewayRouteRuleBackendRef is a reference to a AIServiceBackend with a weight.
// AIGatewayRouteRuleBackendRefKind specifies the kind of the backend reference.
type AIGatewayRouteRuleBackendRefKind string

const (
// AIGatewayRouteRuleBackendRefAIServiceBackend is the kind of the AIServiceBackend.
AIGatewayRouteRuleBackendRefAIServiceBackend AIGatewayRouteRuleBackendRefKind = "AIServiceBackend"
// AIGatewayRouteRuleBackendRefInferencePool is the kind of the InferencePool in the Gateway API Inference Extension.
// https://github.com/kubernetes-sigs/gateway-api-inference-extension
AIGatewayRouteRuleBackendRefInferencePool AIGatewayRouteRuleBackendRefKind = "InferencePool"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently InferencePool is a list of pods, If I want to create a pool of AIServiceBackends, would that be another CR for AIServiceBackendPool?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently InferencePool is a list of pods,

As i commented in the description as well as in the API's comment, InferencePool.Spec.Selector will be used to select AIServiceBackends, not pods. That is allowed in the API spec of InfExt and i think that works as you intended in the comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah saw that in your examples now, that’s pretty neat way to unify the ingress and egress use cases!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the dynamic load balancing becomes a core feature of the envoy ai gateway, then this InferencePool API will be part of the core AI Gateway API.

Copy link
Member Author

@mathetake mathetake Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's a good point and my concern as well; I think the point would be like where we enforce the cluster level load balancing (!= endpoint level one as it cannot do the transformation,auth etc) functionality. If it won't be a part of InfExt API's scope, then we are good to go like we provide cluster level dynamic load balancing at our API and InfPool can only do the endpoint level stuff.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a concern with relying on the InfPool API? Our intent is to keep it simple and flexible. (as it was used here, we use it for pods in the reference implementation, but as you have done here, any inference endpoint can work so long as its being selected on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no concern at the moment but we will see

Copy link

@nayihz nayihz Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InferencePool.Spec.Selector will be used to select AIServiceBackends, not pods.

IIUC, InferencePool.Spec.Selector is indeed used for select Pods, not for AIServiceBackends or other CR.
https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/bd9ee36450d68fb4d0d8ac4f9be4db7d1ec4fee3/pkg/epp/datastore/datastore.go#L129

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be that I misunderstood. In this project it did use InferencePool.Spec.Selector to select AIServiceBackends. Sorry for that.

)

// AIGatewayRouteRuleBackendRef is a reference to a backend with a weight.
type AIGatewayRouteRuleBackendRef struct {
// Kind is the kind of the backend, which is either "AIServiceBackend" or "InferencePool" in Gateway API Inference Extension.
//
// When this references InferencePool, the selector of the InferencePool is used to select (multiple) AIServiceBackend(s)
// that can serve the same model sets that the InferencePool binds.
//
// Default is AIServiceBackend.
//
// +kubebuilder:validation:Enum=AIServiceBackend;InferencePool
// +kubebuilder:default=AIServiceBackend
Kind *AIGatewayRouteRuleBackendRefKind `json:"kind,omitempty"`

// Name is the name of the AIServiceBackend.
//
// +kubebuilder:validation:Required
Expand Down
9 changes: 8 additions & 1 deletion api/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 2 additions & 5 deletions cmd/aigw/testdata/translate_basic.out.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,13 +62,10 @@ spec:
- headers:
- name: x-ai-eg-selected-backend
value: envoy-ai-gateway-basic-testupstream.default
- backendRefs:
- group: gateway.envoyproxy.io
kind: Backend
name: envoy-ai-gateway-basic-openai
matches:
- matches:
- path:
value: /
name: unreachable
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyExtensionPolicy
Expand Down
2 changes: 1 addition & 1 deletion cmd/aigw/translate.go
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ func translateCustomResourceObjects(
WithStatusSubresource(&aigv1a1.AIGatewayRoute{}).
WithStatusSubresource(&aigv1a1.AIServiceBackend{}).
WithStatusSubresource(&aigv1a1.BackendSecurityPolicy{})
_ = controller.ApplyIndexing(ctx, func(_ context.Context, obj client.Object, field string, extractValue client.IndexerFunc) error {
_ = controller.ApplyIndexing(ctx, true, func(_ context.Context, obj client.Object, field string, extractValue client.IndexerFunc) error {
builder = builder.WithIndex(obj, field, extractValue)
return nil
}) // Error should never happen.
Expand Down
16 changes: 12 additions & 4 deletions cmd/controller/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import (
"net"
"os"

"github.com/envoyproxy/gateway/proto/extension"
egextension "github.com/envoyproxy/gateway/proto/extension"
"go.uber.org/zap/zapcore"
"google.golang.org/grpc"
"google.golang.org/grpc/health/grpc_health_v1"
Expand All @@ -31,6 +31,7 @@ func parseAndValidateFlags(args []string) (
enableLeaderElection bool,
logLevel zapcore.Level,
extensionServerPort string,
enableInfExt bool,
err error,
) {
fs := flag.NewFlagSet("AI Gateway Controller", flag.ContinueOnError)
Expand Down Expand Up @@ -60,6 +61,11 @@ func parseAndValidateFlags(args []string) (
":1063",
"gRPC port for the extension server",
)
enableInfExtPtr := fs.Bool(
"enableInferenceExtension",
false,
"Enable the Gateway Inference Extetension. When enabling this, the CRDs for the InferenceModel and InferencePool must be installed prior to starting the controller.",
)

if err = fs.Parse(args); err != nil {
err = fmt.Errorf("failed to parse flags: %w", err)
Expand All @@ -77,7 +83,7 @@ func parseAndValidateFlags(args []string) (
err = fmt.Errorf("invalid log level: %q", *logLevelPtr)
return
}
return *extProcLogLevelPtr, *extProcImagePtr, *enableLeaderElectionPtr, zapLogLevel, *extensionServerPortPtr, nil
return *extProcLogLevelPtr, *extProcImagePtr, *enableLeaderElectionPtr, zapLogLevel, *extensionServerPortPtr, *enableInfExtPtr, nil
}

func main() {
Expand All @@ -88,6 +94,7 @@ func main() {
flagEnableLeaderElection,
zapLogLevel,
flagExtensionServerPort,
enableInfExt,
err := parseAndValidateFlags(os.Args[1:])
if err != nil {
setupLog.Error(err, "failed to parse and validate flags")
Expand All @@ -110,8 +117,8 @@ func main() {

// Start the extension server running alongside the controller.
s := grpc.NewServer()
extSrv := extensionserver.New(setupLog)
extension.RegisterEnvoyGatewayExtensionServer(s, extSrv)
extSrv := extensionserver.New(ctrl.Log)
egextension.RegisterEnvoyGatewayExtensionServer(s, extSrv)
grpc_health_v1.RegisterHealthServer(s, extSrv)
go func() {
<-ctx.Done()
Expand All @@ -128,6 +135,7 @@ func main() {
ExtProcImage: flagExtProcImage,
ExtProcLogLevel: flagExtProcLogLevel,
EnableLeaderElection: flagEnableLeaderElection,
EnableInfExt: enableInfExt,
}); err != nil {
setupLog.Error(err, "failed to start controller")
}
Expand Down
9 changes: 6 additions & 3 deletions cmd/controller/main_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,13 @@ import (

func Test_parseAndValidateFlags(t *testing.T) {
t.Run("no flags", func(t *testing.T) {
extProcLogLevel, extProcImage, enableLeaderElection, logLevel, extensionServerPort, err := parseAndValidateFlags([]string{})
extProcLogLevel, extProcImage, enableLeaderElection, logLevel, extensionServerPort, enableInfExt, err := parseAndValidateFlags([]string{})
require.Equal(t, "info", extProcLogLevel)
require.Equal(t, "docker.io/envoyproxy/ai-gateway-extproc:latest", extProcImage)
require.True(t, enableLeaderElection)
require.Equal(t, "info", logLevel.String())
require.Equal(t, ":1063", extensionServerPort)
require.False(t, enableInfExt)
require.NoError(t, err)
})
t.Run("all flags", func(t *testing.T) {
Expand All @@ -36,13 +37,15 @@ func Test_parseAndValidateFlags(t *testing.T) {
tc.dash + "enableLeaderElection=false",
tc.dash + "logLevel=debug",
tc.dash + "port=:8080",
tc.dash + "enableInferenceExtension=true",
}
extProcLogLevel, extProcImage, enableLeaderElection, logLevel, extensionServerPort, err := parseAndValidateFlags(args)
extProcLogLevel, extProcImage, enableLeaderElection, logLevel, extensionServerPort, enableInfExt, err := parseAndValidateFlags(args)
require.Equal(t, "debug", extProcLogLevel)
require.Equal(t, "example.com/extproc:latest", extProcImage)
require.False(t, enableLeaderElection)
require.Equal(t, "debug", logLevel.String())
require.Equal(t, ":8080", extensionServerPort)
require.True(t, enableInfExt)
require.NoError(t, err)
})
}
Expand All @@ -66,7 +69,7 @@ func Test_parseAndValidateFlags(t *testing.T) {
},
} {
t.Run(tc.name, func(t *testing.T) {
_, _, _, _, _, err := parseAndValidateFlags(tc.flags)
_, _, _, _, _, _, err := parseAndValidateFlags(tc.flags)
require.ErrorContains(t, err, tc.expErr)
})
}
Expand Down
2 changes: 1 addition & 1 deletion cmd/extproc/mainlib/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ func startMetricsServer(addr string, logger *slog.Logger) (*http.Server, metric.
}

go func() {
logger.Info("Starting metrics server", "address", addr)
logger.Info("starting metrics server", "address", addr)
if err := server.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
logger.Error("Metrics server failed", "error", err)
}
Expand Down
2 changes: 2 additions & 0 deletions examples/inference_extension/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
This example demonstrates how to use the [Inference Extension API](https://gateway-api-inference-extension.sigs.k8s.io/)in Envoy AI Gateway project.
The feature can be used only when `--enableInferenceExtension` is set to `true` passed to the Envoy AI Gateway controller. See the helm values.yaml file for more details.
133 changes: 133 additions & 0 deletions examples/inference_extension/inference_extension.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# Copyright Envoy AI Gateway Authors
# SPDX-License-Identifier: Apache-2.0
# The full text of the Apache license is available in the LICENSE file at
# the root of the repo.

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: inference-extension-example
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: inference-extension-example
namespace: default
spec:
gatewayClassName: inference-extension-example
listeners:
- name: http
protocol: HTTP
port: 80
---
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
name: inference-extension-example
namespace: default
spec:
schema:
name: OpenAI
targetRefs:
- name: inference-extension-example
kind: Gateway
group: gateway.networking.k8s.io
rules:
- matches:
- headers:
- type: Exact
name: x-target-inference-extension
value: "yes"
backendRefs:
- name: inference-extension-example-pool # The name of the InferencePool that binds to the backend.
# Explicitly specify the kind of the backend to be InferenceExtension.
kind: InferencePool
---
apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferencePool
metadata:
name: inference-extension-example-pool
spec:
targetPortNumber: 8080
selector:
# Select multiple AIServiceBackend objects to bind to the InferencePool.
app: my-backend
extensionRef:
# Specify the static name "envoy-ai-gateway" to bind the InferencePool to the Envoy AI Gateway.
# This indicates that the InferencePool will be managed by the Envoy AI Gateway.
name: envoy-ai-gateway
---
apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: inference-extension-example
spec:
modelName: mistral:latest
criticality: Critical
poolRef:
# Bind the InferenceModel to the InferencePool.
name: inference-extension-example-pool
---
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIServiceBackend
metadata:
name: inference-extension-example-testupstream
namespace: default
labels:
# Indicate the backend is selected by the InferencePool.
app: my-backend
spec:
schema:
name: OpenAI
backendRef:
name: inference-extension-example-testupstream
kind: Service
port: 8080
---
apiVersion: v1
kind: Service
metadata:
name: inference-extension-example-testupstream
namespace: default
spec:
selector:
app: inference-extension-example-testupstream
ports:
- protocol: TCP
port: 8080
targetPort: 8080
# The headless service allows the IP addresses of the pods to be resolved via the Service DNS.
clusterIP: None
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: inference-extension-example-testupstream
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: inference-extension-example-testupstream
template:
metadata:
labels:
app: inference-extension-example-testupstream
spec:
containers:
- name: testupstream
image: docker.io/envoyproxy/ai-gateway-testupstream:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
env:
- name: TESTUPSTREAM_ID
value: test
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 1
periodSeconds: 1
Loading