Skip to content

feat(provider): make outer OpenAI retry count configurable#9099

Open
x1051445024 wants to merge 2 commits into
AstrBotDevs:masterfrom
x1051445024:configurable-provider-retries
Open

feat(provider): make outer OpenAI retry count configurable#9099
x1051445024 wants to merge 2 commits into
AstrBotDevs:masterfrom
x1051445024:configurable-provider-retries

Conversation

@x1051445024

@x1051445024 x1051445024 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add provider_settings.provider_error_retries to configure the OpenAI-compatible provider's outer recovery retry loop.
  • Default the outer provider-level retry loop to 1 attempt to avoid multiplying latency with request_max_retries.
  • Keep transport/status retry behavior controlled by the existing request_max_retries setting.
  • Add unit coverage for coercing the new retry setting.

Motivation

OpenAI-compatible proxy/aggregator providers often perform their own upstream retry and fallback. AstrBot currently has an inner request retry loop plus a fixed outer provider-level retry loop of 10 attempts. When the upstream proxy is slow or unstable, those nested retries can significantly delay fallback to other providers.

This change makes the outer loop configurable while preserving the existing recovery paths for users who need more attempts.

Testing

  • python -m py_compile astrbot/core/provider/sources/openai_source.py astrbot/core/config/default.py tests/test_openai_source.py

pytest was not available in the local environment used for this change.

Summary by Sourcery

Make the OpenAI-compatible provider’s outer recovery retry loop configurable via provider settings and align defaults.

New Features:

  • Add a provider_settings.provider_error_retries option to control provider-level recovery retries for the OpenAI-compatible provider.

Enhancements:

  • Default provider-level recovery retries to 1 attempt and wire this setting into text and streaming chat flows to avoid compounding latency with request_max_retries.
  • Extend default configuration schema to include the new provider_error_retries setting.

Tests:

  • Add unit tests to verify defaulting and coercion behavior for the new provider_error_retries setting.

@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Jul 1, 2026

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable provider_error_retries setting to replace the hardcoded retry limit of 10 in OpenAI provider chat methods, allowing users to control outer-loop recovery paths. Feedback highlights that a default value of 1 effectively disables these recovery paths (such as key rotation and image fallback) and suggests a higher default. Additionally, the new setting needs to be added to CONFIG_METADATA_3 and localization files for WebUI visibility. Finally, a potential AttributeError should be avoided by guarding against self.provider_settings being None, along with adding corresponding unit tests.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

"default_provider_id": "",
"fallback_chat_models": [],
"request_max_retries": 5,
"provider_error_retries": 1,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

⚠️ Critical Reliability & WebUI Configuration Issues

  1. Default Value of 1 Disables All Recovery Paths:
    Setting the default value of provider_error_retries to 1 means the outer retry loop runs exactly once (i.e., 0 retries). As a result, all provider-level recovery paths (such as context length trimming, image fallback, key rotation, and tool removal) are completely disabled by default. If any of these errors occur on the first attempt, the request will immediately fail without attempting recovery.

    To preserve these robust recovery features out-of-the-box, consider keeping the default value higher (e.g., 3 or 5), while still allowing proxy/aggregator users to configure it down to 1 if they want to avoid nested retries.

  2. Missing from CONFIG_METADATA_3:
    This setting is not added to CONFIG_METADATA_3 in default.py. Since CONFIG_METADATA_3 is used to render the settings in the WebUI, users will not be able to see or configure this setting from the admin panel. Please add it to CONFIG_METADATA_3 under ai_group -> metadata -> ai -> items and update the corresponding i18n locale files (config-metadata.json).

multiplying latency for proxy/aggregator providers that already perform
their own upstream retry and fallback.
"""
raw = self.provider_settings.get("provider_error_retries", 1)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Guard against self.provider_settings being None to prevent an AttributeError when calling .get().

        settings = self.provider_settings or {}
        raw = settings.get("provider_error_retries", 1)

Comment on lines +126 to +127
provider.provider_settings = {}
assert provider._provider_error_retries() == 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Add a test case to verify that _provider_error_retries handles provider_settings being None gracefully.

Suggested change
provider.provider_settings = {}
assert provider._provider_error_retries() == 1
provider.provider_settings = None
assert provider._provider_error_retries() == 1
provider.provider_settings = {}
assert provider._provider_error_retries() == 1

@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant