-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Design] Failover Logic between Providers and Models #34
Comments
This can potentially leverage the implementation in envoy gateway though we will to think how to expose on |
so what we essentially need is to "go through the extproc filter chain again to perform the necessary mutation" to re-mutate auth header, re-perform transformation, etc and that's not possible by either default Envoy or Inference Extension spec. One workaround I can think of is to make a request directly from the extproc to the same listener where the extproc is running when the response header for the initially chosen backend is 5xx. When directly sending a request from the extproc, we can add a special header like #73 to specifically choose the backend without running the router's matching. Then, if the response from the second request originating from the extproc is 200, we can use ImmediateResponse to send the response crafted from the contents from the fallback request to the client. But this comes with a cost; until we receive the first 200 response, the extproc needs to retain the whole request body (and this is inevitable by the nature of this type of fallback regardless of Envoy or whatever proxy we use). The good thing about this workaround is that all the logic is implementable without any changes in EG or Envoy themselves |
The design proposal should include:
This issue was created from conversation during Dec 5th Community meeting
https://docs.google.com/document/d/10e1sfsF-3G3Du5nBHGmLjXw5GVMqqCvFDqp_O65B0_w/edit?tab=t.0#bookmark=id.dz9gpy397ymu
The text was updated successfully, but these errors were encountered: