fix: fallback to other same-family model on transient errors #45

nirga · 2025-06-19T17:11:18Z

Important

Add fallback mechanism to handle transient errors by trying other same-family models in chat_completions, completions, and embeddings.

Error Handling:
- In chat_completions, completions, and embeddings functions, added fallback to other same-family models on transient errors.
- Introduced is_transient_error function to identify transient errors like StatusCode::TOO_MANY_REQUESTS, StatusCode::REQUEST_TIMEOUT, etc.
Logging:
- Enhanced logging to indicate when a transient error occurs and when fallback to another model is attempted.
- Logs final error if all models fail in chat_completions, completions, and embeddings.

^{This description was created by}^{for b172efe. You can customize this summary. It will automatically update as commits are pushed.}

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to 2de5bf0 in 1 minute and 49 seconds. Click for details.

Reviewed 246 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. src/pipelines/pipeline.rs:99

Draft comment:
Using a single 'last_error' to record transient errors may lose detail if multiple failures occur. Consider aggregating errors or preserving more context for debugging.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The current implementation logs each error as it occurs via eprintln!, so error history is available in logs. The last_error is only used to return a final status code, not for debugging. Storing multiple errors would add complexity without clear benefit since errors are already logged. The suggestion seems to misunderstand the purpose of last_error. I could be undervaluing the benefit of having structured error history available programmatically rather than just in logs. There might be use cases for aggregated error reporting I haven't considered. The current error handling meets the needs - errors are logged for debugging, and the status code handling is appropriate for an HTTP API. Adding complexity to store multiple errors would be overengineering. The comment should be deleted. The current error handling is appropriate and adding more error tracking would increase complexity without clear benefit.

2. src/pipelines/pipeline.rs:102

Draft comment:
Repeatedly cloning the payload in each loop iteration (e.g. at line 102) might be inefficient if the payload is large. Consider restructuring to avoid unnecessary clones.
Reason this comment was not posted:
Comment was on unchanged code.

3. src/pipelines/pipeline.rs:115

Draft comment:
Consider replacing eprintln with a structured logging library (or using the existing OtelTracer) for consistency and better log management.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

4. src/pipelines/pipeline.rs:155

Draft comment:
There is an inconsistency in return types—chat_completions returns Result<impl IntoResponse, StatusCode> while completions (and embeddings) return impl IntoResponse directly. Ensure this difference is intentional for error handling purposes.
Reason this comment was not posted:
Comment was on unchanged code.

Workflow ID: wflow_8jYBYamauvT5rNLz

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev · 2025-06-19T17:13:11Z

src/pipelines/pipeline.rs

-            if let ChatCompletionResponse::NonStream(completion) = response {
-                tracer.log_success(&completion);
-                return Ok(Json(completion).into_response());
+    let matching_models: Vec<_> = model_keys


The filtering of matching models (lines 81-90) is repeated in three functions. Consider extracting this logic into a helper function to reduce code duplication.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed b172efe in 41 seconds. Click for details.

Reviewed 56 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. src/pipelines/pipeline.rs:134

Draft comment:
Using unwrap() on last_error assumes it’s always Some. Although matching_models is checked earlier, consider using expect("transient error expected") to clarify this invariant.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

2. src/pipelines/pipeline.rs:203

Draft comment:
Consider replacing unwrap() with expect() here to document that a transient error is expected at this point in completions.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

3. src/pipelines/pipeline.rs:261

Draft comment:
Using unwrap() assumes last_error is Some. Use expect() with a clear message to document the assumption in embeddings.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_y5DpiO4Ap60pEUQx

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

galkleinman

General comments:

I'm quite concerned about this one, for these reasons:

Design wise - we should generalize it (but ok, let's duck tape it for now)
Performance/behaviour wise - Since LLM calls are usually quite long (in terms of response time), you'll end up waiting for the model till it times out, then you either place the hub at risk of timing out (without pointing out that it's the model fault), or if you set a long timeout on our server, it can flood it with a lot of hanging requests (and also the client app) - not sure how the mainloop here is working and wether there's a reactive way of dealing with it...

DRY this retry logic?

fix: fallback to other same-family model on transient errors

2de5bf0

ellipsis-dev bot reviewed Jun 19, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

fix: cursor bot errors

b172efe

ellipsis-dev bot reviewed Jun 19, 2025

View reviewed changes

Merge branch 'main' into fallback

fb39ae2

galkleinman approved these changes Jun 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: fallback to other same-family model on transient errors #45

fix: fallback to other same-family model on transient errors #45

Uh oh!

nirga commented Jun 19, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

ellipsis-dev bot Jun 19, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

galkleinman left a comment

Uh oh!

Uh oh!

fix: fallback to other same-family model on transient errors #45

Are you sure you want to change the base?

fix: fallback to other same-family model on transient errors #45

Uh oh!

Conversation

nirga commented Jun 19, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

galkleinman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nirga commented Jun 19, 2025 •

edited by ellipsis-dev bot

Loading