[Design] Failover Logic between Providers and Models #34

missBerg · 2024-12-05T21:47:22Z

The design proposal should include:

Motivation
Feature Definition
Control Plane API
Technical Implementation Proposal

This issue was created from conversation during Dec 5th Community meeting
https://docs.google.com/document/d/10e1sfsF-3G3Du5nBHGmLjXw5GVMqqCvFDqp_O65B0_w/edit?tab=t.0#bookmark=id.dz9gpy397ymu

yuzisun · 2025-02-06T15:58:42Z

This can potentially leverage the implementation in envoy gateway though we will to think how to expose on AIGatewayRoute
https://gateway.envoyproxy.io/docs/tasks/traffic/failover/

mathetake · 2025-03-21T16:03:03Z

so what we essentially need is to "go through the extproc filter chain again to perform the necessary mutation" to re-mutate auth header, re-perform transformation, etc and that's not possible by either default Envoy or Inference Extension spec.

One workaround I can think of is to make a request directly from the extproc to the same listener where the extproc is running when the response header for the initially chosen backend is 5xx. When directly sending a request from the extproc, we can add a special header like #73 to specifically choose the backend without running the router's matching. Then, if the response from the second request originating from the extproc is 200, we can use ImmediateResponse to send the response crafted from the contents from the fallback request to the client.

But this comes with a cost; until we receive the first 200 response, the extproc needs to retain the whole request body (and this is inevitable by the nature of this type of fallback regardless of Envoy or whatever proxy we use).

The good thing about this workaround is that all the logic is implementable without any changes in EG or Envoy themselves

missBerg assigned yuzisun and wengyao04 Dec 5, 2024

yuzisun mentioned this issue Mar 8, 2025

Support for Backend priorities beyond active/passive levels envoyproxy/gateway#5442

Open

mathetake mentioned this issue Mar 13, 2025

Support k8s gateway API inference extensions #423

Open

3 tasks

mathetake mentioned this issue Mar 21, 2025

feat: initial implementation of Inference Extension #493

Merged

This was referenced Mar 21, 2025

Support fallback to different backend cluster based on response status code from the primary cluster envoyproxy/envoy#38841

Open

Migrate extproc to dynamic modules when available in Envoy #90

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Design] Failover Logic between Providers and Models #34

[Design] Failover Logic between Providers and Models #34

missBerg commented Dec 5, 2024

yuzisun commented Feb 6, 2025 •

edited

Loading

mathetake commented Mar 21, 2025 •

edited

Loading

[Design] Failover Logic between Providers and Models #34

[Design] Failover Logic between Providers and Models #34

Comments

missBerg commented Dec 5, 2024

yuzisun commented Feb 6, 2025 • edited Loading

mathetake commented Mar 21, 2025 • edited Loading

yuzisun commented Feb 6, 2025 •

edited

Loading

mathetake commented Mar 21, 2025 •

edited

Loading