How to stream sub-agent responses in real-time to reduce workflow latency #3326

parthiaitech · 2025-10-29T06:45:36Z

parthiaitech
Oct 29, 2025

Hi team,

I’m currently building a multi-agent workflow using Google ADK, which includes one Supervisor Agent and two sub-agents, those are Planner Agent and an Executor Agent. The structure looks like this:

SupervisorAgent = LlmAgent(
name="supervisor_agent",
model=LiteLlm(model=llm_model),
description="Coordinates and monitors tasks across planner_agent, and executor_agent sub-agents.",
instruction=instruction,
sub_agents=[planner_agent, executor_agent],
before_model_callback=simple_before_model_modifier,
after_model_callback=simple_after_model_modifier,
generate_content_config=types.GenerateContentConfig(temperature=0.1),
)

Current flow:

The Supervisor Agent receives a message with retrieved content.
It forwards the content to the Planner Agent, which generates a detailed plan.
Once the plan is complete, it’s passed to the Executor Agent, which executes the steps sequentially.

Since each step’s execution depends on the output of the previous step, full parallelization isn’t ideal. However, the current approach waits for the entire plan before the Executor Agent starts, which increases latency.

Goal:
I’d like to stream the Planner Agent’s response to the Executor Agent in real-time — step by step as the plan is being generated — instead of waiting for the entire plan to finish.

Question:
Is there any recommended way or built-in support in ADK for agent-to-agent text streaming (or partial output forwarding) between sub-agents?
If not, is there a best practice or workaround to implement this efficiently (e.g., through async streaming callbacks or SSE integration)?

Any insights, examples, or implementation pointers would be greatly appreciated!

Thank you!

adk-bot · 2025-10-29T06:47:00Z

adk-bot
Oct 29, 2025
Maintainer

Response from ADK Answering Agent (experimental, answer may be inaccurate)

TLDR: While ADK's native streaming is primarily for user-to-agent interaction, you can achieve a similar real-time effect between agents by re-architecting your workflow. Instead of generating the entire plan at once, use a LoopAgent to have the Planner generate and pass one step at a time to the Executor for immediate execution.

Hi @parthiaitech,

That's an excellent question about optimizing agent workflows. Your goal to reduce latency by streaming the Planner's output to the Executor is a common challenge in complex agentic systems.

Currently, ADK's powerful bidirectional streaming capabilities, often initiated with Runner.run_live(), are primarily designed for real-time communication between an end-user and the agent system. [4, 6] This is what enables features like live voice conversations and streaming text responses back to a UI. [1, 9]

For communication between sub-agents running in the same process, the standard mechanism is passing data via a shared session.state. [13] As you've observed, this typically involves one agent completing its turn and writing its full output before the next agent can begin, which leads to the latency you described.

Recommended Approach: Step-by-Step Generation in a Loop

Instead of waiting for the full plan, you can implement a "generative loop" pattern that processes the plan one step at a time. This approach aligns well with ADK's existing orchestration tools.

Here’s how you could structure it:

Modify the Planner Agent: Adjust the Planner's instructions to generate only the single next step of the plan, given the overall goal and the history of previously executed steps. It should also have a way to signal when the plan is complete.
Use a LoopAgent: The Supervisor can delegate the task to a LoopAgent. This agent would orchestrate the step-by-step execution cycle.
Define the Loop Body: Inside the loop, you would typically use a SequentialAgent to manage the turn:
- Invoke Planner: The Planner runs first, generating the next single step. Its output is saved to the session state (e.g., state['current_step']).
- Invoke Executor: The Executor reads state['current_step'] and executes only that action. The result of the execution can also be saved to the state to inform the Planner in the next iteration.
Set the Loop Condition: The LoopAgent needs a condition to determine when to stop. This could be a special "PLAN_COMPLETE" status returned by the Planner or a maximum number of steps.

This architecture effectively creates a real-time, step-by-step flow between your agents, achieving your goal of overlapping planning and execution to reduce overall latency. While not a direct token-level stream between the LLM's output and the next agent, it provides the same functional benefit for this kind of sequential task execution.

I hope this implementation guidance is helpful! Let us know if you have more questions.

[1] Custom Asynchronous Web App with ADK Streaming and FastAPI
[4] Build a streaming agent
[6] Agent Development Kit (ADK) Key Capabilities
[9] Build a streaming agent with Java
[13] Multi-Agent Systems in ADK

1 reply

parthiaitech Oct 29, 2025
Author

Response from ADK Answering Agent (experimental, answer may be inaccurate)

TLDR: While ADK's native streaming is primarily for user-to-agent interaction, you can achieve a similar real-time effect between agents by re-architecting your workflow. Instead of generating the entire plan at once, use a LoopAgent to have the Planner generate and pass one step at a time to the Executor for immediate execution.

Hi @parthiaitech,

That's an excellent question about optimizing agent workflows. Your goal to reduce latency by streaming the Planner's output to the Executor is a common challenge in complex agentic systems.

Currently, ADK's powerful bidirectional streaming capabilities, often initiated with Runner.run_live(), are primarily designed for real-time communication between an end-user and the agent system. [4, 6] This is what enables features like live voice conversations and streaming text responses back to a UI. [1, 9]

For communication between sub-agents running in the same process, the standard mechanism is passing data via a shared session.state. [13] As you've observed, this typically involves one agent completing its turn and writing its full output before the next agent can begin, which leads to the latency you described.

Recommended Approach: Step-by-Step Generation in a Loop

Instead of waiting for the full plan, you can implement a "generative loop" pattern that processes the plan one step at a time. This approach aligns well with ADK's existing orchestration tools.

Here’s how you could structure it:

Modify the Planner Agent: Adjust the Planner's instructions to generate only the single next step of the plan, given the overall goal and the history of previously executed steps. It should also have a way to signal when the plan is complete.

Use a LoopAgent: The Supervisor can delegate the task to a LoopAgent. This agent would orchestrate the step-by-step execution cycle.

Define the Loop Body: Inside the loop, you would typically use a SequentialAgent to manage the turn:

Invoke Planner: The Planner runs first, generating the next single step. Its output is saved to the session state (e.g., state['current_step']).

Invoke Executor: The Executor reads state['current_step'] and executes only that action. The result of the execution can also be saved to the state to inform the Planner in the next iteration.

Set the Loop Condition: The LoopAgent needs a condition to determine when to stop. This could be a special "PLAN_COMPLETE" status returned by the Planner or a maximum number of steps.

This architecture effectively creates a real-time, step-by-step flow between your agents, achieving your goal of overlapping planning and execution to reduce overall latency. While not a direct token-level stream between the LLM's output and the next agent, it provides the same functional benefit for this kind of sequential task execution.

I hope this implementation guidance is helpful! Let us know if you have more questions.

[1] Custom Asynchronous Web App with ADK Streaming and FastAPI [4] Build a streaming agent [6] Agent Development Kit (ADK) Key Capabilities [9] Build a streaming agent with Java [13] Multi-Agent Systems in ADK

Can you tell me the exact implementation steps with example code?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to stream sub-agent responses in real-time to reduce workflow latency #3326

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Recommended Approach: Step-by-Step Generation in a Loop

Select a reply

Uh oh!

How to stream sub-agent responses in real-time to reduce workflow latency #3326

Uh oh!

parthiaitech Oct 29, 2025

Replies: 1 comment · 1 reply

Uh oh!

adk-bot Oct 29, 2025 Maintainer

Recommended Approach: Step-by-Step Generation in a Loop

Uh oh!

parthiaitech Oct 29, 2025 Author

Recommended Approach: Step-by-Step Generation in a Loop

parthiaitech
Oct 29, 2025

Replies: 1 comment 1 reply

adk-bot
Oct 29, 2025
Maintainer

parthiaitech Oct 29, 2025
Author