Skip to content

Conversation

@vblagoje
Copy link
Member

@vblagoje vblagoje commented Oct 2, 2025

Why:

Introduces a robust fallback mechanism for chat generators that automatically switches between multiple generators when primary services fail, ensuring continuous service availability during API outages or rate limiting.

What:

  • Added new FallbackChatGenerator component with sequential fallback logic
  • Implemented per-generator timeout handling and comprehensive error handling (429, 401, 400, 408, 500+ errors)
  • Added both sync/async execution support with streaming callback handling
  • Added test suite and three practical usage examples
  • Set up full integrations structure with proper licensing and documentation

How can it be used:

from haystack_integrations.components.generators.fallback_chat import FallbackChatGenerator

primary = OpenAIChatGenerator(model="gpt-4o-mini")
backup = AnthropicChatGenerator(model="claude-3-5-sonnet-20241022")

fallback = FallbackChatGenerator(generators=[primary, backup], timeout=10.0)
result = fallback.run([ChatMessage.from_user("Hello!")])
print(result["replies"][0].text)

How did you test it:

  • Added comprehensive unit tests covering success/failure scenarios and timeout behavior
  • Tested error handling for all specified HTTP status codes and streaming functionality
  • Created integration examples demonstrating real-world usage patterns
  • Validated serialization/deserialization and async/sync compatibility

Notes for the reviewer:

Focus on the timeout logic in _get_effective_timeout() and error handling in _run_generator_with_timeout(). The streaming callback forwarding maintains proper async/sync compatibility.

@github-actions github-actions bot added the type:documentation Improvements or additions to documentation label Oct 2, 2025
@vblagoje
Copy link
Member Author

vblagoje commented Oct 6, 2025

@sjrl please have a quick look - perhaps examples and chat_generator.py itself are good candidates to quickly grasp the impl from user and our perspective. LMK if you like this direction

@vblagoje vblagoje marked this pull request as ready for review October 7, 2025 09:56
@vblagoje vblagoje requested a review from a team as a code owner October 7, 2025 09:56
@vblagoje vblagoje requested review from davidsbatista and removed request for a team October 7, 2025 09:56
@vblagoje
Copy link
Member Author

vblagoje commented Oct 7, 2025

Review from anyone else interested in this area is welcome cc @julian-risch @sjrl

Comment on lines +70 to +73
for gen in generators:
if not hasattr(gen, "run") or not callable(gen.run):
msg = "All items in 'generators' must expose a callable 'run' method (duck-typed ChatGenerator)"
raise TypeError(msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this check is needed. At least in our other components that take in components in their init we don't strictly double check that they are a Haystack component.

Comment on lines 135 to 140
return gen.run(
messages=messages,
generation_kwargs=generation_kwargs,
tools=tools,
streaming_callback=streaming_callback,
)
Copy link
Contributor

@sjrl sjrl Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible we should only forward params to the generator's run method if it accepts it. E.g. I don't think generators have tools supported. or at the very least we should enforce in the init method what the run signature of each chat generator should be.

Comment on lines +120 to +127
def _run_single_sync(
self,
gen: Any,
messages: list[ChatMessage],
generation_kwargs: dict[str, Any] | None,
tools: (list[Tool] | Toolset) | None,
streaming_callback: StreamingCallbackT | None,
) -> dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vblagoje I'm pretty confused as to what this is function is doing. Why are we running the calls to the generator within a ThreadPoolExecutor with a single worker?

Comment on lines 342 to 345
except asyncio.TimeoutError as e:
logger.warning("Generator %s timed out after %.2fs", gen_name, effective_timeout)
failed.append(gen_name)
last_error = e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general comment. I'm not sure I like how we've implemented our own additional timeout management here. I think we should be asking users to set timeouts on each individual ChatGenerator since that is normally specifiable as a timeout param to the ChatGenerator when making it. I think that is more clear and understandable than creating our own mechanism on top

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can rewrite your example like this

from haystack_integrations.components.generators.fallback_chat import FallbackChatGenerator

primary = OpenAIChatGenerator(model="gpt-4o-mini", timeout=10.0)  # <-- Added timeout here
backup = AnthropicChatGenerator(model="claude-3-5-sonnet-20241022", timeout=10.0)  # <-- Added timeout here

fallback = FallbackChatGenerator(generators=[primary, backup])
result = fallback.run([ChatMessage.from_user("Hello!")])
print(result["replies"][0].text)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ok we can do that. I'll drop the timeout on the FallbackChatGenerator and the complexities with what takes precendence: list of generators, outer etc

@sjrl
Copy link
Contributor

sjrl commented Oct 7, 2025

@vblagoje I also wanted to ask, what was your reasoning for making this a core integration?

@vblagoje
Copy link
Member Author

vblagoje commented Oct 7, 2025

@vblagoje I also wanted to ask, what was your reasoning for making this a core integration?

My reasoning was that this was an optional sidecar not a core feature as we tend to reserve for truly necessary building blocks to core. I'm not hard pressed for integration - we can make it core feature as well. Perhaps via experimental?

@vblagoje vblagoje marked this pull request as draft October 7, 2025 11:52
@vblagoje
Copy link
Member Author

vblagoje commented Oct 7, 2025

Converting to draft until we place this PR properly and remove additional time management preemption for individual chat generators

@davidsbatista davidsbatista changed the title Add new integration for FallbackChatGenerator feat: add new integration for FallbackChatGenerator Oct 7, 2025
@sjrl
Copy link
Contributor

sjrl commented Oct 7, 2025

@vblagoje I also wanted to ask, what was your reasoning for making this a core integration?

My reasoning was that this was an optional sidecar not a core feature as we tend to reserve for truly necessary building blocks to core. I'm not hard pressed for integration - we can make it core feature as well. Perhaps via experimental?

I think this would be suitable as a core feature but unsure if it should be in experimental or not first. What do you think @julian-risch

@vblagoje
Copy link
Member Author

Moved to core via deepset-ai/haystack#9859

@vblagoje vblagoje closed this Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New ChatGenerator fallback component for quota limits/API errors

3 participants