-
Notifications
You must be signed in to change notification settings - Fork 24
fix: fallback to other same-family model on transient errors #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to 2de5bf0 in 1 minute and 49 seconds. Click for details.
- Reviewed
246
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
4
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src/pipelines/pipeline.rs:99
- Draft comment:
Using a single 'last_error' to record transient errors may lose detail if multiple failures occur. Consider aggregating errors or preserving more context for debugging. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The current implementation logs each error as it occurs via eprintln!, so error history is available in logs. The last_error is only used to return a final status code, not for debugging. Storing multiple errors would add complexity without clear benefit since errors are already logged. The suggestion seems to misunderstand the purpose of last_error. I could be undervaluing the benefit of having structured error history available programmatically rather than just in logs. There might be use cases for aggregated error reporting I haven't considered. The current error handling meets the needs - errors are logged for debugging, and the status code handling is appropriate for an HTTP API. Adding complexity to store multiple errors would be overengineering. The comment should be deleted. The current error handling is appropriate and adding more error tracking would increase complexity without clear benefit.
2. src/pipelines/pipeline.rs:102
- Draft comment:
Repeatedly cloning the payload in each loop iteration (e.g. at line 102) might be inefficient if the payload is large. Consider restructuring to avoid unnecessary clones. - Reason this comment was not posted:
Comment was on unchanged code.
3. src/pipelines/pipeline.rs:115
- Draft comment:
Consider replacing eprintln with a structured logging library (or using the existing OtelTracer) for consistency and better log management. - Reason this comment was not posted:
Confidence changes required:50%
<= threshold50%
None
4. src/pipelines/pipeline.rs:155
- Draft comment:
There is an inconsistency in return types—chat_completions returns Result<impl IntoResponse, StatusCode> while completions (and embeddings) return impl IntoResponse directly. Ensure this difference is intentional for error handling purposes. - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_8jYBYamauvT5rNLz
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
if let ChatCompletionResponse::NonStream(completion) = response { | ||
tracer.log_success(&completion); | ||
return Ok(Json(completion).into_response()); | ||
let matching_models: Vec<_> = model_keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filtering of matching models (lines 81-90) is repeated in three functions. Consider extracting this logic into a helper function to reduce code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed b172efe in 41 seconds. Click for details.
- Reviewed
56
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
3
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src/pipelines/pipeline.rs:134
- Draft comment:
Using unwrap() on last_error assumes it’s always Some. Although matching_models is checked earlier, consider using expect("transient error expected") to clarify this invariant. - Reason this comment was not posted:
Confidence changes required:33%
<= threshold50%
None
2. src/pipelines/pipeline.rs:203
- Draft comment:
Consider replacing unwrap() with expect() here to document that a transient error is expected at this point in completions. - Reason this comment was not posted:
Confidence changes required:33%
<= threshold50%
None
3. src/pipelines/pipeline.rs:261
- Draft comment:
Using unwrap() assumes last_error is Some. Use expect() with a clear message to document the assumption in embeddings. - Reason this comment was not posted:
Confidence changes required:33%
<= threshold50%
None
Workflow ID: wflow_y5DpiO4Ap60pEUQx
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comments:
- I'm quite concerned about this one, for these reasons:
- Design wise - we should generalize it (but ok, let's duck tape it for now)
- Performance/behaviour wise - Since LLM calls are usually quite long (in terms of response time), you'll end up waiting for the model till it times out, then you either place the hub at risk of timing out (without pointing out that it's the model fault), or if you set a long timeout on our server, it can flood it with a lot of hanging requests (and also the client app) - not sure how the mainloop here is working and wether there's a reactive way of dealing with it...
- DRY this retry logic?
Important
Add fallback mechanism to handle transient errors by trying other same-family models in
chat_completions
,completions
, andembeddings
.chat_completions
,completions
, andembeddings
functions, added fallback to other same-family models on transient errors.is_transient_error
function to identify transient errors likeStatusCode::TOO_MANY_REQUESTS
,StatusCode::REQUEST_TIMEOUT
, etc.chat_completions
,completions
, andembeddings
.This description was created by
for b172efe. You can customize this summary. It will automatically update as commits are pushed.