Skip to content

Conversation

nirga
Copy link
Member

@nirga nirga commented Jun 19, 2025

Important

Add fallback mechanism to handle transient errors by trying other same-family models in chat_completions, completions, and embeddings.

  • Error Handling:
    • In chat_completions, completions, and embeddings functions, added fallback to other same-family models on transient errors.
    • Introduced is_transient_error function to identify transient errors like StatusCode::TOO_MANY_REQUESTS, StatusCode::REQUEST_TIMEOUT, etc.
  • Logging:
    • Enhanced logging to indicate when a transient error occurs and when fallback to another model is attempted.
    • Logs final error if all models fail in chat_completions, completions, and embeddings.

This description was created by Ellipsis for b172efe. You can customize this summary. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to 2de5bf0 in 1 minute and 49 seconds. Click for details.
  • Reviewed 246 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src/pipelines/pipeline.rs:99
  • Draft comment:
    Using a single 'last_error' to record transient errors may lose detail if multiple failures occur. Consider aggregating errors or preserving more context for debugging.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The current implementation logs each error as it occurs via eprintln!, so error history is available in logs. The last_error is only used to return a final status code, not for debugging. Storing multiple errors would add complexity without clear benefit since errors are already logged. The suggestion seems to misunderstand the purpose of last_error. I could be undervaluing the benefit of having structured error history available programmatically rather than just in logs. There might be use cases for aggregated error reporting I haven't considered. The current error handling meets the needs - errors are logged for debugging, and the status code handling is appropriate for an HTTP API. Adding complexity to store multiple errors would be overengineering. The comment should be deleted. The current error handling is appropriate and adding more error tracking would increase complexity without clear benefit.
2. src/pipelines/pipeline.rs:102
  • Draft comment:
    Repeatedly cloning the payload in each loop iteration (e.g. at line 102) might be inefficient if the payload is large. Consider restructuring to avoid unnecessary clones.
  • Reason this comment was not posted:
    Comment was on unchanged code.
3. src/pipelines/pipeline.rs:115
  • Draft comment:
    Consider replacing eprintln with a structured logging library (or using the existing OtelTracer) for consistency and better log management.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50% None
4. src/pipelines/pipeline.rs:155
  • Draft comment:
    There is an inconsistency in return types—chat_completions returns Result<impl IntoResponse, StatusCode> while completions (and embeddings) return impl IntoResponse directly. Ensure this difference is intentional for error handling purposes.
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_8jYBYamauvT5rNLz

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

if let ChatCompletionResponse::NonStream(completion) = response {
tracer.log_success(&completion);
return Ok(Json(completion).into_response());
let matching_models: Vec<_> = model_keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filtering of matching models (lines 81-90) is repeated in three functions. Consider extracting this logic into a helper function to reduce code duplication.

cursor[bot]

This comment was marked as outdated.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed b172efe in 41 seconds. Click for details.
  • Reviewed 56 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src/pipelines/pipeline.rs:134
  • Draft comment:
    Using unwrap() on last_error assumes it’s always Some. Although matching_models is checked earlier, consider using expect("transient error expected") to clarify this invariant.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None
2. src/pipelines/pipeline.rs:203
  • Draft comment:
    Consider replacing unwrap() with expect() here to document that a transient error is expected at this point in completions.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None
3. src/pipelines/pipeline.rs:261
  • Draft comment:
    Using unwrap() assumes last_error is Some. Use expect() with a clear message to document the assumption in embeddings.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_y5DpiO4Ap60pEUQx

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@galkleinman galkleinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comments:

  1. I'm quite concerned about this one, for these reasons:
  • Design wise - we should generalize it (but ok, let's duck tape it for now)
  • Performance/behaviour wise - Since LLM calls are usually quite long (in terms of response time), you'll end up waiting for the model till it times out, then you either place the hub at risk of timing out (without pointing out that it's the model fault), or if you set a long timeout on our server, it can flood it with a lot of hanging requests (and also the client app) - not sure how the mainloop here is working and wether there's a reactive way of dealing with it...
  1. DRY this retry logic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants