Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Design] Failover Logic between Providers and Models #34

Open
4 tasks
missBerg opened this issue Dec 5, 2024 · 2 comments
Open
4 tasks

[Design] Failover Logic between Providers and Models #34

missBerg opened this issue Dec 5, 2024 · 2 comments
Assignees

Comments

@missBerg
Copy link
Contributor

missBerg commented Dec 5, 2024

The design proposal should include:

  • Motivation
  • Feature Definition
  • Control Plane API
  • Technical Implementation Proposal

This issue was created from conversation during Dec 5th Community meeting
https://docs.google.com/document/d/10e1sfsF-3G3Du5nBHGmLjXw5GVMqqCvFDqp_O65B0_w/edit?tab=t.0#bookmark=id.dz9gpy397ymu

@yuzisun
Copy link
Contributor

yuzisun commented Feb 6, 2025

This can potentially leverage the implementation in envoy gateway though we will to think how to expose on AIGatewayRoute
https://gateway.envoyproxy.io/docs/tasks/traffic/failover/

@mathetake
Copy link
Member

mathetake commented Mar 21, 2025

so what we essentially need is to "go through the extproc filter chain again to perform the necessary mutation" to re-mutate auth header, re-perform transformation, etc and that's not possible by either default Envoy or Inference Extension spec.

One workaround I can think of is to make a request directly from the extproc to the same listener where the extproc is running when the response header for the initially chosen backend is 5xx. When directly sending a request from the extproc, we can add a special header like #73 to specifically choose the backend without running the router's matching. Then, if the response from the second request originating from the extproc is 200, we can use ImmediateResponse to send the response crafted from the contents from the fallback request to the client.

But this comes with a cost; until we receive the first 200 response, the extproc needs to retain the whole request body (and this is inevitable by the nature of this type of fallback regardless of Envoy or whatever proxy we use).

The good thing about this workaround is that all the logic is implementable without any changes in EG or Envoy themselves

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants