fix: refactor reasoning algo #69

galzilber · 2025-09-07T12:40:50Z

Important

Refactor reasoning algorithm by consolidating handling across providers, reducing logging, and updating tests.

Behavior:
- Consolidated inline reasoning handling across providers; payloads now use a separate reasoning_effort field and token limits are adjusted for provider-specific fields.
- Reduced runtime logging related to reasoning across Anthropic, Azure, OpenAI, and Vertex AI.
- Removed deprecated public helpers related to reasoning to simplify the API surface.
Tests:
- Updated tests to validate reasoning transformation paths, token/effort behavior, and preservation of existing system messages.

^{This description was created by}^{for 99148f4. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

Refactor
- Consolidated inline reasoning handling across providers; payloads now use a separate reasoning_effort field and token limits are adjusted for provider-specific fields.
- Reduced runtime logging related to reasoning across Anthropic, Azure, OpenAI, and Vertex AI.
- Removed deprecated public helpers related to reasoning to simplify the API surface.
Tests
- Updated tests to validate reasoning transformation paths, token/effort behavior, and preservation of existing system messages.

- Removed redundant methods from ReasoningConfig and integrated their logic directly into the respective provider implementations. - Simplified reasoning handling for Anthropic and Azure by eliminating unnecessary function calls. - Updated tests to validate the new inline reasoning logic.

coderabbitai · 2025-09-07T12:40:57Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Warning

Rate limit exceeded

@galzilber has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 4 minutes and 24 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 70df879 and 99148f4.

📒 Files selected for processing (4)

src/providers/anthropic/provider.rs (1 hunks)
src/providers/azure/provider.rs (2 hunks)
src/providers/openai/provider.rs (2 hunks)
src/providers/vertexai/provider.rs (1 hunks)

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch gz/clean-hub-reasoning

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to 35e7d38 in 2 minutes and 6 seconds. Click for details.

Reviewed 406 lines of code in 8 files
Skipped 0 files when reviewing.
Skipped posting 8 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. src/models/chat.rs:42

Draft comment:
The helper functions for reasoning transformation (e.g. to_openai_effort, to_gemini_thinking_budget, to_thinking_prompt) were removed in favor of inline logic. Consider extracting a common utility to avoid duplication across providers.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. src/providers/azure/provider.rs:27

Draft comment:
Azure now uses inline reasoning effort logic similar to OpenAI. While the logic is solid, the duplicated implementation across providers may benefit from centralizing shared behavior.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment is making a suggestion about potential code duplication and architecture, but without seeing the OpenAI implementation, I can't verify if this is actually duplicated code. The comment is more of an observation than a clear request for change. It's also somewhat speculative - "may benefit from centralizing" rather than demonstrating a clear problem. I might be too harsh - code duplication is a real concern and the reviewer might have seen the OpenAI implementation. Also, suggesting architectural improvements is valuable. However, without clear evidence of the duplication or a specific suggestion for how to centralize the code, this comment doesn't provide actionable feedback. It's more of an observation than a required change. The comment should be deleted as it's more of an informative observation than a clear, actionable request for change, and we can't verify the claimed duplication without more context.

3. src/providers/openai/provider.rs:26

Draft comment:
The inline handling of reasoning effort (prioritizing max_tokens over effort) is consistent with other providers. Consider abstracting this common logic to avoid code repetition.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment claims this logic is consistent with other providers, but we don't have access to other provider implementations to verify this. Without seeing the other providers' code, we can't be certain that abstraction would be beneficial. The logic here seems specific to OpenAI's handling of reasoning effort and max_tokens. I might be wrong about the specificity - there could be common patterns across providers that would benefit from abstraction. The comment might be based on knowledge of other provider implementations. However, without concrete evidence of duplication in other providers, suggesting abstraction is speculative. The logic here appears to handle OpenAI-specific concerns (like the reasoning_effort field). Delete the comment because it makes assumptions about code duplication without strong evidence, and the logic appears to be OpenAI-specific.

4. src/providers/vertexai/models.rs:410

Draft comment:
Vertex AI reasoning conversion only considers max_tokens (mapped to thinking_budget) and ignores effort. Confirm that this is the intended behavior and document it if so.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

5. src/providers/vertexai/provider.rs:87

Draft comment:
The validate_location function filters allowed characters to alphanumeric and hyphen. Consider adding documentation or tests to clarify expected formats (e.g. region names) and to avoid accidental rejection of valid inputs.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

6. src/providers/vertexai/provider.rs:60

Draft comment:
In get_auth_token, the call to token.token().unwrap_or_default() may silently yield an empty string. Consider handling a missing token more explicitly to avoid masking authentication issues.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

7. src/providers/vertexai/provider.rs:161

Draft comment:
Test mode logic is duplicated in both chat_completions and embeddings. Consider extracting the test endpoint selection to a shared helper to improve maintainability.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

8. src/providers/vertexai/provider.rs:40

Draft comment:
Usage of is_some_and (line 40) requires Rust 1.70+. Ensure that the project’s minimum required Rust version is updated accordingly.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_HnBymLV5tWhp1jHC

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev · 2025-09-07T12:43:00Z

src/providers/anthropic/models.rs

+            // Handle Anthropic reasoning logic inline
+            let thinking_prompt = if reasoning_config.max_tokens.is_some() {
+                // If max_tokens is specified, use a generic thinking prompt
+                Some("Think through this step-by-step with detailed reasoning.".to_string())


Inline reasoning logic now duplicates the prompt strings (e.g. 'Think through this step-by-step with detailed reasoning.'). Consider extracting these magic strings to constants or a shared function for maintainability.

nirga · 2025-09-07T12:44:42Z

src/providers/anthropic/models.rs

        // Add reasoning prompt if reasoning is requested
        if let Some(reasoning_config) = &request.reasoning {
-            if let Some(thinking_prompt) = reasoning_config.to_thinking_prompt() {
+            // Handle Anthropic reasoning logic inline


why are we adding this? 🤔

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/providers/azure/provider.rs (2)
144-149: Log errors without unwrap() and at error level.

Avoid potential panics when reading error bodies and use error! over info!/eprintln!.
-            info!(
-                "Azure OpenAI API request error: {}",
-                response.text().await.unwrap()
-            );
+            let body = response.text().await.unwrap_or_else(|_| "<failed to read error body>".to_string());
+            tracing::error!("Azure OpenAI API request error: {}", body);
-            eprintln!(
-                "Azure OpenAI API request error: {}",
-                response.text().await.unwrap()
-            );
+            let body = response.text().await.unwrap_or_else(|_| "<failed to read error body>".to_string());
+            tracing::error!("Azure OpenAI API request error: {}", body);
-            eprintln!(
-                "Azure OpenAI Embeddings API request error: {}",
-                response.text().await.unwrap()
-            );
+            let body = response.text().await.unwrap_or_else(|_| "<failed to read error body>".to_string());
+            tracing::error!("Azure OpenAI Embeddings API request error: {}", body);
Also applies to: 186-189, 228-231

129-133: Replace JSON-array streaming with SSE-based parser in Azure provider (src/providers/azure/provider.rs:129–133)
Current use of json_array_stream::<ChatCompletionChunk> is incompatible with Azure’s SSE format, which sends one JSON object per data: event (ending with a final data: [DONE] frame). Implement an SSE parser that:

Reads the HTTP response body as a text/byte stream

Splits on SSE boundaries (\n\n) to isolate events

Extracts lines starting with data:, buffers multi-line payloads if needed

Calls JSON.parse() on each payload and watches for [DONE]

Yields ChatCompletionChunk instances as events arrive

🧹 Nitpick comments (8)

src/providers/vertexai/models.rs (1)

414-416: Avoid lossy u32→i32 cast for thinking_budget.

Casting with as i32 can overflow if someone passes a very large value. Prefer a fallible or saturating conversion.

-                // Handle Gemini thinking budget logic inline
-                r.max_tokens.map(|tokens| tokens as i32)
+                // Handle Gemini thinking budget logic inline (avoid lossy cast)
+                r.max_tokens.and_then(|t| i32::try_from(t).ok())
+                // Alternatively, clamp and log:
+                // r.max_tokens.map(|t| (t.min(i32::MAX as u32)) as i32)

src/providers/vertexai/provider.rs (2)

191-203: Gate verbose request logging to TRACE and avoid dumping prompts by default.

Full-body logs can leak PII and secrets. Keep generation_config/thinking_config details but move the full request body to TRACE or behind a feature/env flag.

-        let serialized_body = serde_json::to_string_pretty(&request_body)
-            .unwrap_or_else(|e| format!("Failed to serialize request: {e}"));
-        tracing::debug!("📤 Full Request Body:\n{}", serialized_body);
-
-        // Specifically log the generation_config part
-        if let Some(gen_config) = &request_body.generation_config {
-            tracing::debug!("⚙️ Generation Config: {:?}", gen_config);
-            if let Some(thinking_config) = &gen_config.thinking_config {
-                tracing::debug!("🧠 ThinkingConfig in request: {:?}", thinking_config);
-            }
-        }
+        if tracing::enabled!(tracing::Level::TRACE) {
+            if let Ok(serialized_body) = serde_json::to_string_pretty(&request_body) {
+                tracing::trace!("📤 Full Request Body:\n{}", serialized_body);
+            }
+            if let Some(gen_config) = &request_body.generation_config {
+                tracing::trace!("⚙️ Generation Config: {:?}", gen_config);
+                if let Some(thinking_config) = &gen_config.thinking_config {
+                    tracing::trace!("🧠 ThinkingConfig in request: {:?}", thinking_config);
+                }
+            }
+        }

1-1: Fix CI: run cargo fmt.

CI failed with “cargo fmt --check”. Please run:

#!/bin/bash
cargo fmt --all
cargo fmt --check

src/providers/openai/provider.rs (1)

26-39: Reasoning effort handling looks right; consider gating by model family.

Setting reasoning_effort only when no per-request reasoning token limit is provided is sensible. Note: Chat Completions for newer OpenAI models (o1/o3/o4/gpt-5) favor max_completion_tokens, and o-series accept reasoning_effort on chat/completions; older models may reject it. To reduce 400s, optionally gate reasoning_effort by model prefix (e.g., o1/o3/o4/gpt-5).

Evidence that newer models require max_completion_tokens and support reasoning_effort on chat/completions: OpenAI .NET SDK changelog (adds max_completion_tokens for o1), and community/issue threads quoting the official API reference. (github.com)

Example gating:
-        let reasoning_effort = base.reasoning.as_ref().and_then(|reasoning| {
+        let model_lc = base.model.to_lowercase();
+        let is_reasoning_family =
+            model_lc.starts_with("o1") || model_lc.starts_with("o3") || model_lc.starts_with("o4") || model_lc.starts_with("gpt-5");
+        let reasoning_effort = is_reasoning_family.then(|| base.reasoning.as_ref()).flatten().and_then(|reasoning| {
             if reasoning.max_tokens.is_some() {
                 // If max_tokens is specified, don't use effort for OpenAI
                 None
             } else {
                 // Only return effort if it's not empty
                 reasoning
                     .effort
                     .as_ref()
                     .filter(|e| !e.trim().is_empty())
                     .cloned()
             }
         });

src/providers/azure/provider.rs (1)

103-110: Avoid panic on missing deployment.

Gracefully 400 on misconfig instead of unwrap() panic.

-        let deployment = model_config.params.get("deployment").unwrap();
+        let deployment = match model_config.params.get("deployment") {
+            Some(d) => d,
+            None => {
+                tracing::error!("Missing 'deployment' in model_config.params");
+                return Err(StatusCode::BAD_REQUEST);
+            }
+        };

src/providers/anthropic/models.rs (2)

108-125: Inline thinking-prompt mapping is clear; consider deduping prompt strings.

Extract prompt strings to const to avoid duplication and reduce typos.

// Place near the top of the file
const PROMPT_DETAILED: &str = "Think through this step-by-step with detailed reasoning.";
const PROMPT_THOUGHTFUL: &str = "Consider this problem thoughtfully.";
const PROMPT_BRIEF: &str = "Think about this briefly.";

-            let thinking_prompt = if reasoning_config.max_tokens.is_some() {
-                // If max_tokens is specified, use a generic thinking prompt
-                Some("Think through this step-by-step with detailed reasoning.".to_string())
+            let thinking_prompt = if reasoning_config.max_tokens.is_some() {
+                Some(PROMPT_DETAILED.to_string())
             } else {
                 match reasoning_config.effort.as_deref() {
                     Some(effort) if !effort.trim().is_empty() => match effort {
-                        "high" => Some(
-                            "Think through this step-by-step with detailed reasoning.".to_string(),
-                        ),
-                        "medium" => Some("Consider this problem thoughtfully.".to_string()),
-                        "low" => Some("Think about this briefly.".to_string()),
+                        "high" => Some(PROMPT_DETAILED.to_string()),
+                        "medium" => Some(PROMPT_THOUGHTFUL.to_string()),
+                        "low" => Some(PROMPT_BRIEF.to_string()),
                         _ => None,
                     },
                     _ => None,
                 }
             };

Also applies to: 126-131

93-105: Multiple system messages are collapsed to the first only.

If multiple system messages or multiple text parts exist, you may want to concatenate them for Anthropic’s single system field.

src/providers/bedrock/test.rs (1)

73-89: Typo: antropic_tests module name.

Rename to anthropic_tests for consistency/searchability.
-#[cfg(test)]
-mod antropic_tests {
+#[cfg(test)]
+mod anthropic_tests {

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 87c8971 and 35e7d38.

📒 Files selected for processing (8)

src/models/chat.rs (0 hunks)
src/providers/anthropic/models.rs (1 hunks)
src/providers/anthropic/provider.rs (0 hunks)
src/providers/azure/provider.rs (1 hunks)
src/providers/bedrock/test.rs (1 hunks)
src/providers/openai/provider.rs (1 hunks)
src/providers/vertexai/models.rs (1 hunks)
src/providers/vertexai/provider.rs (1 hunks)

💤 Files with no reviewable changes (2)

src/providers/anthropic/provider.rs
src/models/chat.rs

🧰 Additional context used

🪛 GitHub Actions: build

src/providers/vertexai/models.rs

[error] 414-414: cargo fmt --check failed due to formatting differences. Run 'cargo fmt' to format the code.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Analyze (rust)

🔇 Additional comments (5)

src/providers/vertexai/provider.rs (1)

149-151: Log message tweak LGTM.

src/providers/openai/provider.rs (1)

41-45: max_tokens → max_completion_tokens migration LGTM.

This aligns with the deprecation of max_tokens on newer models; keeping max_completion_tokens only is appropriate.

src/providers/azure/provider.rs (2)

95-101: Reasoning validation before transformation looks good.

27-41: Confirmed: Azure Chat Completions use top-level reasoning_effort — the provider’s inline extraction of effort into reasoning_effort is correct; only the preview Responses API uses nested reasoning: { effort: … }.

src/providers/bedrock/test.rs (1)

709-804: Nice pivot to transformation-based tests.

Validates the Anthropic prompt injection via From<ChatCompletionRequest> and covers effort vs max_tokens precedence. Looks good.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed b486c32 in 28 seconds. Click for details.

Reviewed 40 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. src/providers/anthropic/models.rs:103

Draft comment:
Removal of the inline reasoning prompt appending is appropriate for a unified approach. Please ensure that the new centralized reasoning logic is correctly integrated and covered by tests.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to ensure that the new centralized reasoning logic is correctly integrated and covered by tests. This violates the rule against asking the author to ensure that the change is tested.

2. src/providers/anthropic/models.rs:106

Draft comment:
Consider adding a comment explaining that reasoning prompt handling is now managed uniformly elsewhere, to help future maintainers understand why this code was removed.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is asking the PR author to add a comment for future maintainers, which is a request for additional documentation rather than a code suggestion or a request for a test. It doesn't align with the rules provided, as it is not a specific code suggestion or a request for a test.

Workflow ID: wflow_8c2aw4BEYfrBXwQb

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

…vider models

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 75bccb2 in 25 seconds. Click for details.

Reviewed 32 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. src/providers/anthropic/models.rs:103

Draft comment:
Remove extraneous whitespace for cleaner code.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. src/providers/vertexai/models.rs:415

Draft comment:
Simplify the closure mapping for thinking_budget; more concise without altering functionality.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_EV7tZcTvkGpldOCH

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed a976b4f in 22 seconds. Click for details.

Reviewed 25 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. src/providers/anthropic/models.rs:93

Draft comment:
Removed unnecessary 'mut' from 'system' variable. This is appropriate since the variable is not modified, improving code clarity.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. src/providers/anthropic/provider.rs:1

Draft comment:
Cleaned up unused logging import by removing 'tracing::info'. This aligns with the refactoring for reduced runtime logging.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_9BuYeHZxXZ5tzPdX

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed c063fa9 in 20 seconds. Click for details.

Reviewed 23 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. src/providers/vertexai/models.rs:452

Draft comment:
Good refactor: directly returning Self { ... } instead of using an intermediate variable is more idiomatic and clear. No issues found.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_0L01UwkvP9lbRQ6X

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 70df879 in 49 seconds. Click for details.

Reviewed 127 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. src/providers/bedrock/test.rs:712

Draft comment:
Updating comment and assertion here correctly reflects that a reasoning config no longer appends a prompt to the system message when none is provided.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, explaining what the update does without suggesting any changes or asking for clarification. It doesn't provide any actionable feedback or raise any concerns about the code.

2. src/providers/bedrock/test.rs:749

Draft comment:
Removed test logic that checked for an appended reasoning prompt (even with max_tokens). This aligns with the refactored behavior; ensure the removal is intentional.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to ensure that the removal of test logic is intentional. This falls under the rule of not asking the author to confirm their intention or ensure behavior is intended. Therefore, this comment should be removed.

3. src/providers/bedrock/test.rs:844

Draft comment:
Updated this test to verify that the reasoning prompt is no longer injected into the system message for Anthropic. The new assertion that the system message is None reflects the new design.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. src/providers/bedrock/test.rs:908

Draft comment:
This test now ensures that if an original system message exists, it is preserved exactly without appending a reasoning prompt. Using assert_eq with the full string is correct if the format is stable; otherwise, consider a more flexible check.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_ifQ2rmpZwRVJJA3R

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

src/providers/bedrock/test.rs (3)
711-754: Tighten the test and drop redundant imports

The two local use statements for ChatCompletionRequest and ChatCompletionMessage are redundant (already imported at module scope).

Add a couple of cheap assertions to ensure the transformation preserved model and stripped system-role messages.
-        use crate::models::chat::ChatCompletionRequest;
-        use crate::models::content::ChatCompletionMessage;
-        use crate::providers::anthropic::AnthropicChatCompletionRequest;
+        use crate::providers::anthropic::AnthropicChatCompletionRequest;
@@
-        assert!(anthropic_request.system.is_none());
+        assert!(anthropic_request.system.is_none());
+        assert_eq!(anthropic_request.model, "test");
+        assert_eq!(anthropic_request.messages.len(), 1);
+        assert!(anthropic_request
+            .messages
+            .iter()
+            .all(|m| m.role != "system"));
Also consider renaming the test to reflect the new behavior (e.g., test_reasoning_config_no_system_augmentation).

760-801: Avoid async for a sync-only test; add one more assertion

This test doesn’t await anything—prefer #[test] to avoid spinning up Tokio. Also, assert no system role remains in messages.
         // Verify reasoning prompt is no longer included in system message
         assert!(
             anthropic_request.system.is_none(),
             "System message should not be present since reasoning logic was removed"
         );
+        assert!(
+            anthropic_request.messages.iter().all(|m| m.role != "system"),
+            "No system entries should remain in the messages vector"
+        );
If you want to switch to a plain test (outside this hunk):

Replace #[tokio::test] with #[test]

Remove async from the function signature.

855-866: Preserve original system; also assert post-transform message shape

Good check on system preservation. Add brief assertions to ensure the system message is not duplicated into messages and that we kept exactly the single user message.
         let system_message = anthropic_request.system.unwrap();
         assert_eq!(
             system_message, "You are a helpful assistant.",
             "Should only contain original system message: {}",
             system_message
         );
+        assert_eq!(anthropic_request.messages.len(), 1);
+        assert_eq!(anthropic_request.messages[0].role, "user");
+        assert!(
+            anthropic_request.messages.iter().all(|m| m.role != "system"),
+            "System messages must be filtered out of 'messages'"
+        );

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 35e7d38 and 70df879.

📒 Files selected for processing (4)

src/providers/anthropic/models.rs (1 hunks)
src/providers/anthropic/provider.rs (0 hunks)
src/providers/bedrock/test.rs (3 hunks)
src/providers/vertexai/models.rs (2 hunks)

💤 Files with no reviewable changes (1)

src/providers/anthropic/provider.rs

🚧 Files skipped from review as they are similar to previous changes (2)

src/providers/anthropic/models.rs
src/providers/vertexai/models.rs

🧰 Additional context used

🧬 Code graph analysis (1)

src/providers/bedrock/test.rs (1)

src/providers/anthropic/models.rs (3)

from (85-156)

from (160-196)

from (200-221)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: Analyze (rust)

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 16204ef in 1 minute and 1 seconds. Click for details.

Reviewed 14 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. src/providers/vertexai/provider.rs:148

Draft comment:
Removed logging of the reasoning validation error (_e). While reducing log noise is intentional per the release note, dropping the error details may hinder troubleshooting. Consider logging the error at a lower log level (e.g., debug) if it doesn't expose sensitive information.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% The change is intentionally removing error logging. While the comment's suggestion to log at debug level is reasonable, we don't have enough context to know if these validation errors need debugging. The code still returns BAD_REQUEST which may be sufficient feedback. The comment is speculative about whether this will hinder troubleshooting. I may be underestimating the value of debug logs for validation errors. Removing error information entirely could make it harder to fix invalid requests. However, without understanding the validation logic or how critical these errors are, I can't be confident that logging is truly needed. The BAD_REQUEST status code may provide sufficient feedback. The comment makes a reasonable suggestion but is speculative about the impact. We should err on the side of deletion since we lack strong evidence that logging is needed here.

Workflow ID: wflow_kNTqUR4gKupTXpkg

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 99148f4 in 35 seconds. Click for details.

Reviewed 42 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. src/providers/anthropic/provider.rs:44

Draft comment:
Consider logging the validation failure at a lower log level (e.g., debug) instead of suppressing the error entirely to aid in debugging configuration issues.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

2. src/providers/azure/provider.rs:97

Draft comment:
Consider retaining a debug-level log of the reasoning validation error to assist in diagnosing misconfigurations without cluttering production error logs.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

3. src/providers/openai/provider.rs:96

Draft comment:
Consider logging the failure of reasoning configuration validation at a lower severity (e.g., debug) to maintain troubleshooting information.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_N2mHS3ESB03RLBE1

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

galzilber added 2 commits September 7, 2025 15:27

fix

e366a06