-
Notifications
You must be signed in to change notification settings - Fork 81
feat: initial implementation of Inference Extension #493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Changes from 64 commits
Commits
Show all changes
67 commits
Select commit
Hold shift + click to select a range
243a4eb
very minimum poc
mathetake 1c31f27
not working
mathetake 68369b0
merge
mathetake 70eaafc
more sketch
mathetake 6ddeea4
more sketch
mathetake ffe6e46
more sketch
mathetake eec4fd3
more sketch
mathetake b1ab6af
more
mathetake 595b392
more comments
mathetake dec1816
more comments
mathetake 48aac23
more
mathetake 5f6fe99
more
mathetake c5a0307
unnecessary changes
mathetake c016860
unnecessary changes
mathetake 4d1d4ec
tidy
mathetake 55498d5
tidy
mathetake 3bf039d
more unit tests
mathetake 38ec6f9
more
mathetake 4b241e3
more tests
mathetake edce330
unnecessary changes
mathetake eaf65c0
more comments
mathetake 6564096
more comments
mathetake f3f0085
more comments
mathetake 68d87d1
more comments
mathetake b2b9a77
more comments
mathetake 0f053c0
more tests
mathetake 64d2387
more tests
mathetake 792a101
more
mathetake aeb24c8
try adding extension server
mathetake 7c43096
more
mathetake 1b0a8ef
more
mathetake a69db90
more
mathetake 7e05f11
more
mathetake df4205c
more unit tests
mathetake 717791e
enable
mathetake e33c5c6
enable
mathetake fd92ff8
enable
mathetake 744d652
consistent package
mathetake e236b21
consistent package
mathetake 91c508b
init extproc
mathetake e12c730
more comments
mathetake 3f83c25
more comments
mathetake 10b80b0
more comments
mathetake 40f4440
limit the scope
mathetake 3d81911
more
mathetake 000979b
more
mathetake 84b8e41
more tests
mathetake 3f8fba3
more tests
mathetake 61c75fd
more tests
mathetake 8da1c5b
adds tests
mathetake 034353b
review: fix typo
mathetake 5917710
refactors
mathetake 544a677
more coverage
mathetake 40e441b
more tests
mathetake 5c992c1
fix test
mathetake 743e1a2
Adds cel validation test
mathetake 3f0e578
removes unnecessary config
mathetake 0e5301c
more bugs
mathetake 43bb81e
more bugs
mathetake f2b66b6
more tests
mathetake 7250df2
done
mathetake 82df969
more
mathetake 431c442
add more unit test
mathetake f4a8b5e
merge main
mathetake fe106d5
more
mathetake 1f96bab
review: fix comment on index
mathetake 5b857fd
Merge remote-tracking branch 'origin/main' into poc
mathetake File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
This example demonstrates how to use the [Inference Extension API](https://gateway-api-inference-extension.sigs.k8s.io/)in Envoy AI Gateway project. | ||
The feature can be used only when `--enableInferenceExtension` is set to `true` passed to the Envoy AI Gateway controller. See the helm values.yaml file for more details. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
# Copyright Envoy AI Gateway Authors | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# The full text of the Apache license is available in the LICENSE file at | ||
# the root of the repo. | ||
|
||
apiVersion: gateway.networking.k8s.io/v1 | ||
kind: GatewayClass | ||
metadata: | ||
name: inference-extension-example | ||
spec: | ||
controllerName: gateway.envoyproxy.io/gatewayclass-controller | ||
--- | ||
apiVersion: gateway.networking.k8s.io/v1 | ||
kind: Gateway | ||
metadata: | ||
name: inference-extension-example | ||
namespace: default | ||
spec: | ||
gatewayClassName: inference-extension-example | ||
listeners: | ||
- name: http | ||
protocol: HTTP | ||
port: 80 | ||
--- | ||
apiVersion: aigateway.envoyproxy.io/v1alpha1 | ||
kind: AIGatewayRoute | ||
metadata: | ||
name: inference-extension-example | ||
namespace: default | ||
spec: | ||
schema: | ||
name: OpenAI | ||
targetRefs: | ||
- name: inference-extension-example | ||
kind: Gateway | ||
group: gateway.networking.k8s.io | ||
rules: | ||
- matches: | ||
- headers: | ||
- type: Exact | ||
name: x-target-inference-extension | ||
value: "yes" | ||
backendRefs: | ||
- name: inference-extension-example-pool # The name of the InferencePool that binds to the backend. | ||
# Explicitly specify the kind of the backend to be InferenceExtension. | ||
kind: InferencePool | ||
--- | ||
apiVersion: inference.networking.x-k8s.io/v1alpha2 | ||
kind: InferencePool | ||
metadata: | ||
name: inference-extension-example-pool | ||
spec: | ||
targetPortNumber: 8080 | ||
selector: | ||
# Select multiple AIServiceBackend objects to bind to the InferencePool. | ||
app: my-backend | ||
extensionRef: | ||
# Specify the static name "envoy-ai-gateway" to bind the InferencePool to the Envoy AI Gateway. | ||
# This indicates that the InferencePool will be managed by the Envoy AI Gateway. | ||
name: envoy-ai-gateway | ||
--- | ||
apiVersion: inference.networking.x-k8s.io/v1alpha2 | ||
kind: InferenceModel | ||
metadata: | ||
name: inference-extension-example | ||
spec: | ||
modelName: mistral:latest | ||
criticality: Critical | ||
poolRef: | ||
# Bind the InferenceModel to the InferencePool. | ||
name: inference-extension-example-pool | ||
--- | ||
apiVersion: aigateway.envoyproxy.io/v1alpha1 | ||
kind: AIServiceBackend | ||
metadata: | ||
name: inference-extension-example-testupstream | ||
namespace: default | ||
labels: | ||
# Indicate the backend is selected by the InferencePool. | ||
app: my-backend | ||
spec: | ||
schema: | ||
name: OpenAI | ||
backendRef: | ||
name: inference-extension-example-testupstream | ||
kind: Service | ||
port: 8080 | ||
--- | ||
apiVersion: v1 | ||
kind: Service | ||
metadata: | ||
name: inference-extension-example-testupstream | ||
namespace: default | ||
spec: | ||
selector: | ||
app: inference-extension-example-testupstream | ||
ports: | ||
- protocol: TCP | ||
port: 8080 | ||
targetPort: 8080 | ||
# The headless service allows the IP addresses of the pods to be resolved via the Service DNS. | ||
clusterIP: None | ||
--- | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: inference-extension-example-testupstream | ||
namespace: default | ||
spec: | ||
replicas: 3 | ||
selector: | ||
matchLabels: | ||
app: inference-extension-example-testupstream | ||
template: | ||
metadata: | ||
labels: | ||
app: inference-extension-example-testupstream | ||
spec: | ||
containers: | ||
- name: testupstream | ||
image: docker.io/envoyproxy/ai-gateway-testupstream:latest | ||
imagePullPolicy: IfNotPresent | ||
ports: | ||
- containerPort: 8080 | ||
env: | ||
- name: TESTUPSTREAM_ID | ||
value: test | ||
readinessProbe: | ||
httpGet: | ||
path: /health | ||
port: 8080 | ||
initialDelaySeconds: 1 | ||
periodSeconds: 1 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently InferencePool is a list of pods, If I want to create a pool of AIServiceBackends, would that be another CR for AIServiceBackendPool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As i commented in the description as well as in the API's comment, InferencePool.Spec.Selector will be used to select AIServiceBackends, not pods. That is allowed in the API spec of InfExt and i think that works as you intended in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah saw that in your examples now, that’s pretty neat way to unify the ingress and egress use cases!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the dynamic load balancing becomes a core feature of the envoy ai gateway, then this InferencePool API will be part of the core AI Gateway API.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's a good point and my concern as well; I think the point would be like where we enforce the cluster level load balancing (!= endpoint level one as it cannot do the transformation,auth etc) functionality. If it won't be a part of InfExt API's scope, then we are good to go like we provide cluster level dynamic load balancing at our API and InfPool can only do the endpoint level stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a concern with relying on the InfPool API? Our intent is to keep it simple and flexible. (as it was used here, we use it for pods in the reference implementation, but as you have done here, any inference endpoint can work so long as its being selected on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no concern at the moment but we will see
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC,InferencePool.Spec.Selector
is indeed used for select Pods, not for AIServiceBackends or other CR.https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/bd9ee36450d68fb4d0d8ac4f9be4db7d1ec4fee3/pkg/epp/datastore/datastore.go#L129
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be that I misunderstood. In this project it did use
InferencePool.Spec.Selector
to select AIServiceBackends. Sorry for that.