Skip to content

feat: Integrate Chutes API with Kimi K2.5-TEE model#7

Open
echobt wants to merge 1 commit intomainfrom
feature/chutes-api-kimi-integration
Open

feat: Integrate Chutes API with Kimi K2.5-TEE model#7
echobt wants to merge 1 commit intomainfrom
feature/chutes-api-kimi-integration

Conversation

@echobt
Copy link
Contributor

@echobt echobt commented Feb 3, 2026

Summary

This PR integrates the Chutes API with the Kimi K2.5-TEE model for the agent.

Changes

  • Add ChutesClient class for Chutes API (https://api.chutes.ai/v1)
  • Support CHUTES_API_KEY environment variable for authentication
  • Set moonshotai/Kimi-K2.5-TEE as default model
  • Enable thinking mode by default with <think>...</think> parsing
  • Use Kimi K2.5 recommended parameters (temp=1.0, top_p=0.95)
  • Increase context limit to 256K tokens
  • Add openai>=1.0.0 dependency for OpenAI-compatible API client

Testing

python3 -c "from src.llm.client import ChutesClient; print('OK')"

Related

  • Part of umbrella PR for Chutes API complete integration
  • 🔗 Depends on: N/A (independent feature)
  • 🔗 Blocks: docs/comprehensive-mermaid-documentation, feature/remove-openrouter-litellm

Summary by CodeRabbit

  • New Features

    • Added multi-provider LLM support with Chutes API as the default provider and OpenRouter as fallback
    • Enabled AI thinking mode for improved reasoning and response quality
    • Extended token limits and context windows for enhanced processing capacity
  • Improvements

    • Increased maximum iterations for agent execution
    • Added caching and timeout optimizations for better performance

- Add ChutesClient class for Chutes API (https://api.chutes.ai/v1)
- Support CHUTES_API_KEY environment variable for authentication
- Set moonshotai/Kimi-K2.5-TEE as default model
- Enable thinking mode by default with <think>...</think> parsing
- Use Kimi K2.5 recommended parameters (temp=1.0, top_p=0.95)
- Increase context limit to 256K tokens
- Add openai>=1.0.0 dependency for OpenAI-compatible API client
@coderabbitai
Copy link

coderabbitai bot commented Feb 3, 2026

📝 Walkthrough

Walkthrough

The pull request introduces multi-provider LLM support by adding a Chutes API client with thinking mode capabilities as the default provider, alongside OpenRouter as a fallback. The implementation includes a factory function for provider selection, updated configuration defaults for the Kimi K2.5-TEE model, and extended cost/token tracking across both providers.

Changes

Cohort / File(s) Summary
Agent Initialization
agent.py
Updated agent startup to use provider-driven configuration and get_llm_client() factory function; added logging for provider and thinking mode selection; preserved core agent loop logic.
LLM Provider Implementation
src/llm/client.py
Introduced ChutesClient for Chutes API with thinking mode extraction, error mapping, and cost tracking; extended LiteLLMClient with thinking support and temperature handling; added get_llm_client() factory function for provider selection; enhanced LLMResponse with thinking and cost fields.
Configuration Defaults
src/config/defaults.py
Migrated from OpenRouter-centric to Chutes/Kimi K2.5-TEE defaults; enabled thinking mode; increased model context limit to 256000 and max iterations to 350; added shell timeout, caching parameters, and cost limit configuration.
Dependencies
pyproject.toml, requirements.txt
Added openai>=1.0.0 dependency for Chutes API compatibility.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as Agent (main)
    participant Config as CONFIG
    participant Factory as get_llm_client()
    participant ChutesC as ChutesClient
    participant LiteLLMC as LiteLLMClient
    participant API as Chutes/OpenRouter API

    Agent->>Config: read provider setting
    Config-->>Agent: provider = "chutes" (or fallback)
    Agent->>Factory: get_llm_client(provider, model, cost_limit, enable_thinking)
    alt provider == "chutes"
        Factory->>ChutesC: instantiate with auth, thinking_mode
        ChutesC-->>Factory: client ready
    else provider == "openrouter"
        Factory->>LiteLLMC: instantiate with litellm config
        LiteLLMC-->>Factory: client ready
    end
    Factory-->>Agent: llm_client
    Agent->>ChutesC: chat(messages, temperature, max_tokens)
    ChutesC->>API: request (with thinking mode params)
    API-->>ChutesC: response (thinking + content)
    ChutesC->>ChutesC: extract thinking_content
    ChutesC-->>Agent: LLMResponse(thinking, cost, usage)
    Agent->>Agent: run agent loop with response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 A Chutes client hops into view,
Thinking modes now work, tried and true!
The factory builds what we need,
Multi-providers help us succeed! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.92% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Integrate Chutes API with Kimi K2.5-TEE model' directly and clearly summarizes the main change: integration of Chutes API with a specific model, which aligns with the primary focus across all modified files (agent.py, config defaults, llm/client.py, and dependency additions).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/chutes-api-kimi-integration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@echobt
Copy link
Contributor Author

echobt commented Feb 3, 2026

Part of Umbrella PR: #6 (Epic: Complete Chutes API Integration)

This PR is the first step in the stacked PR sequence:

  1. This PR (feat: Integrate Chutes API with Kimi K2.5-TEE model #7) - Chutes API integration (merge first)
  2. docs: Comprehensive documentation with Mermaid diagrams #8 - Documentation (depends on this)
  3. feat: Remove OpenRouter support, replace litellm with Chutes API #9 - OpenRouter removal (depends on feat: Integrate Chutes API with Kimi K2.5-TEE model #7 and docs: Comprehensive documentation with Mermaid diagrams #8)

Please see #6 for the complete merge strategy.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/llm/client.py`:
- Around line 102-164: The ChutesClient currently only reads CHUTES_API_TOKEN so
users setting CHUTES_API_KEY (as documented) will get an auth error; update the
token retrieval in ChutesClient.__init__ to accept either environment variable
(check CHUTES_API_TOKEN first, then CHUTES_API_KEY or vice versa) and set
self._api_token accordingly, and update the raised LLMError message to reference
both env var names; ensure the later OpenAI client initialization still uses
self._api_token.
🧹 Nitpick comments (1)
pyproject.toml (1)

30-30: Consider consolidating dependency declarations.

openai>=1.0.0 is declared in both requirements.txt and pyproject.toml with matching versions. If both files are intentional (e.g., for tool compatibility or development workflows), maintain alignment as part of standard practice.

Comment on lines +102 to +164
class ChutesClient:
"""LLM Client for Chutes API with Kimi K2.5-TEE.

Chutes API is OpenAI-compatible, hosted at https://llm.chutes.ai/v1
Default model: moonshotai/Kimi-K2.5-TEE with thinking mode enabled.

Environment variable: CHUTES_API_TOKEN

Kimi K2.5 parameters:
- Thinking mode: temperature=1.0, top_p=0.95
- Instant mode: temperature=0.6, top_p=0.95
- Context window: 256K tokens
"""

def __init__(
self,
model: str = CHUTES_DEFAULT_MODEL,
temperature: Optional[float] = None,
max_tokens: int = 16384,
cost_limit: Optional[float] = None,
enable_thinking: bool = True,
# Legacy params (kept for compatibility)
cache_extended_retention: bool = True,
cache_key: Optional[str] = None,
):
self.model = model
self.max_tokens = max_tokens
self.cost_limit = cost_limit or float(os.environ.get("LLM_COST_LIMIT", "100.0"))
self.enable_thinking = enable_thinking

# Set temperature based on thinking mode if not explicitly provided
if temperature is None:
params = KIMI_K25_THINKING_PARAMS if enable_thinking else KIMI_K25_INSTANT_PARAMS
self.temperature = params["temperature"]
else:
self.temperature = temperature

self._total_cost = 0.0
self._total_tokens = 0
self._request_count = 0
self._input_tokens = 0
self._output_tokens = 0
self._cached_tokens = 0

# Get API token
self._api_token = os.environ.get("CHUTES_API_TOKEN")
if not self._api_token:
raise LLMError(
"CHUTES_API_TOKEN environment variable not set. "
"Get your API token at https://chutes.ai",
code="authentication_error"
)

# Import and configure OpenAI client for Chutes API
try:
from openai import OpenAI
self._client = OpenAI(
api_key=self._api_token,
base_url=CHUTES_API_BASE,
)
except ImportError:
raise ImportError("openai not installed. Run: pip install openai")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Support the documented CHUTES_API_KEY env var to prevent auth failures.

The client only checks CHUTES_API_TOKEN. If users follow the documented CHUTES_API_KEY, auth will fail. Accept both.

🔧 Suggested fix
-        self._api_token = os.environ.get("CHUTES_API_TOKEN")
+        self._api_token = (
+            os.environ.get("CHUTES_API_KEY")
+            or os.environ.get("CHUTES_API_TOKEN")
+        )
         if not self._api_token:
             raise LLMError(
-                "CHUTES_API_TOKEN environment variable not set. "
+                "CHUTES_API_KEY (or CHUTES_API_TOKEN) environment variable not set. "
                 "Get your API token at https://chutes.ai",
                 code="authentication_error"
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class ChutesClient:
"""LLM Client for Chutes API with Kimi K2.5-TEE.
Chutes API is OpenAI-compatible, hosted at https://llm.chutes.ai/v1
Default model: moonshotai/Kimi-K2.5-TEE with thinking mode enabled.
Environment variable: CHUTES_API_TOKEN
Kimi K2.5 parameters:
- Thinking mode: temperature=1.0, top_p=0.95
- Instant mode: temperature=0.6, top_p=0.95
- Context window: 256K tokens
"""
def __init__(
self,
model: str = CHUTES_DEFAULT_MODEL,
temperature: Optional[float] = None,
max_tokens: int = 16384,
cost_limit: Optional[float] = None,
enable_thinking: bool = True,
# Legacy params (kept for compatibility)
cache_extended_retention: bool = True,
cache_key: Optional[str] = None,
):
self.model = model
self.max_tokens = max_tokens
self.cost_limit = cost_limit or float(os.environ.get("LLM_COST_LIMIT", "100.0"))
self.enable_thinking = enable_thinking
# Set temperature based on thinking mode if not explicitly provided
if temperature is None:
params = KIMI_K25_THINKING_PARAMS if enable_thinking else KIMI_K25_INSTANT_PARAMS
self.temperature = params["temperature"]
else:
self.temperature = temperature
self._total_cost = 0.0
self._total_tokens = 0
self._request_count = 0
self._input_tokens = 0
self._output_tokens = 0
self._cached_tokens = 0
# Get API token
self._api_token = os.environ.get("CHUTES_API_TOKEN")
if not self._api_token:
raise LLMError(
"CHUTES_API_TOKEN environment variable not set. "
"Get your API token at https://chutes.ai",
code="authentication_error"
)
# Import and configure OpenAI client for Chutes API
try:
from openai import OpenAI
self._client = OpenAI(
api_key=self._api_token,
base_url=CHUTES_API_BASE,
)
except ImportError:
raise ImportError("openai not installed. Run: pip install openai")
class ChutesClient:
"""LLM Client for Chutes API with Kimi K2.5-TEE.
Chutes API is OpenAI-compatible, hosted at https://llm.chutes.ai/v1
Default model: moonshotai/Kimi-K2.5-TEE with thinking mode enabled.
Environment variable: CHUTES_API_TOKEN
Kimi K2.5 parameters:
- Thinking mode: temperature=1.0, top_p=0.95
- Instant mode: temperature=0.6, top_p=0.95
- Context window: 256K tokens
"""
def __init__(
self,
model: str = CHUTES_DEFAULT_MODEL,
temperature: Optional[float] = None,
max_tokens: int = 16384,
cost_limit: Optional[float] = None,
enable_thinking: bool = True,
# Legacy params (kept for compatibility)
cache_extended_retention: bool = True,
cache_key: Optional[str] = None,
):
self.model = model
self.max_tokens = max_tokens
self.cost_limit = cost_limit or float(os.environ.get("LLM_COST_LIMIT", "100.0"))
self.enable_thinking = enable_thinking
# Set temperature based on thinking mode if not explicitly provided
if temperature is None:
params = KIMI_K25_THINKING_PARAMS if enable_thinking else KIMI_K25_INSTANT_PARAMS
self.temperature = params["temperature"]
else:
self.temperature = temperature
self._total_cost = 0.0
self._total_tokens = 0
self._request_count = 0
self._input_tokens = 0
self._output_tokens = 0
self._cached_tokens = 0
# Get API token
self._api_token = (
os.environ.get("CHUTES_API_KEY")
or os.environ.get("CHUTES_API_TOKEN")
)
if not self._api_token:
raise LLMError(
"CHUTES_API_KEY (or CHUTES_API_TOKEN) environment variable not set. "
"Get your API token at https://chutes.ai",
code="authentication_error"
)
# Import and configure OpenAI client for Chutes API
try:
from openai import OpenAI
self._client = OpenAI(
api_key=self._api_token,
base_url=CHUTES_API_BASE,
)
except ImportError:
raise ImportError("openai not installed. Run: pip install openai")
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 124-124: Unused method argument: cache_extended_retention

(ARG002)


[warning] 125-125: Unused method argument: cache_key

(ARG002)


[warning] 149-153: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 163-163: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


[warning] 163-163: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@src/llm/client.py` around lines 102 - 164, The ChutesClient currently only
reads CHUTES_API_TOKEN so users setting CHUTES_API_KEY (as documented) will get
an auth error; update the token retrieval in ChutesClient.__init__ to accept
either environment variable (check CHUTES_API_TOKEN first, then CHUTES_API_KEY
or vice versa) and set self._api_token accordingly, and update the raised
LLMError message to reference both env var names; ensure the later OpenAI client
initialization still uses self._api_token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant