feat: introduce ModelClientStreamingChunkEvent for streaming model output and update handling in agents and console #5208

ekzhu · 2025-01-26T18:01:47Z

Resolves #3983

introduce model_client_stream parameter in AssistantAgent to enable token-level streaming output.
introduce ModelClientStreamingChunkEvent as a type of AgentEvent to pass the streaming chunks to the application via run_stream and on_messages_stream. Although this will not affect the inner messages list in the final Response or TaskResult.
handle this new message type in Console.

Example:

import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient

async def main() -> None:
    agent = AssistantAgent("assistant", OpenAIChatCompletionClient(model="gpt-4o"), model_client_stream=True)
    await Console(agent.run_stream(task="Write a short story with a surprising ending."))

asyncio.run(main())

To see more details:

import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient

async def main() -> None:
    agent = AssistantAgent("assistant", OpenAIChatCompletionClient(model="gpt-4o"), model_client_stream=True)
    async for message in agent.run_stream(task="Write 3 line poem."):
        print(message)

asyncio.run(main())

source='user' models_usage=None content='Write 3 line poem.' type='TextMessage'
source='assistant' models_usage=None content='Silent' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' whispers' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' glide' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=',' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content='  \n' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content='Moon' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content='lit' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' dreams' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' dance' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' through' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' the' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' night' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=',' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content='  \n' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content='Stars' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' watch' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' from' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' above' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content='.' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=RequestUsage(prompt_tokens=0, completion_tokens=0) content='Silent whispers glide,  \nMoonlit dreams dance through the night,  \nStars watch from above.' type='TextMessage'
TaskResult(messages=[TextMessage(source='user', models_usage=None, content='Write 3 line poem.', type='TextMessage'), TextMessage(source='assistant', models_usage=RequestUsage(prompt_tokens=0, completion_tokens=0), content='Silent whispers glide,  \nMoonlit dreams dance through the night,  \nStars watch from above.', type='TextMessage')], stop_reason=None)

Next step needed after this PR:

Update RichConsole to handle the streaming chunks, currently they are skipped.

…tput and update handling in agents and console

gagb · 2025-01-27T00:20:24Z

Why is the new parameter a property of the agent instead of the model client?

ekzhu · 2025-01-27T02:47:03Z

Why is the new parameter a property of the agent instead of the model client?

Because the model client has different public methods: create and create_stream. So it becomes the callers responsibility to choose which one to use. The caller is the agent in this case

husseinmozannar · 2025-01-27T18:57:33Z

I wonder if you need either 1) message_ids for each message a unique id or/and 2) chunk id (from 0 to num of chunks). The reason is you need to figure out UI side how to collate the messages. Currently I guess if a message is of type ModelClientStreamingChunkEvent you keep appending it until you encounter message of different type.

But I guess this is error prone since you might parallel messages from other agents being emitted or things potentially out of order, but maybe this is not a concern

ekzhu · 2025-01-27T19:13:22Z

I wonder if you need either 1) message_ids for each message a unique id or/and 2) chunk id (from 0 to num of chunks). The reason is you need to figure out UI side how to collate the messages. Currently I guess if a message is of type ModelClientStreamingChunkEvent you keep appending it until you encounter message of different type.

But I guess this is error prone since you might parallel messages from other agents being emitted or things potentially out of order, but maybe this is not a concern

I believe we will need to introduce the message id and chunk index at some point to address out-of-order messages, though right now all messages are going to be arriving in order because AgentChat is sequential. So, the complete message always follows the previous chunks. Once we introduce parallelism in the future, a whole set of modifications will be needed. So the thinking here is to avoid adding attributes that "may be useful" as they will be inevitably changed.

feat: introduce ModelClientStreamingChunkEvent for streaming model ou…

e94435c

…tput and update handling in agents and console

ekzhu requested review from jackgerrits, MohMaz, victordibia, gagb and husseinmozannar January 26, 2025 18:02

Skip streaming chunk event in rich console.

cdca35f

ekzhu requested a review from gziz January 26, 2025 18:11

Merge remote-tracking branch 'origin/main' into ekzhu-streaming

7f2d2e8

MohMaz approved these changes Jan 27, 2025

View reviewed changes

Merge branch 'main' into ekzhu-streaming

2aa3703

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: introduce ModelClientStreamingChunkEvent for streaming model output and update handling in agents and console #5208

feat: introduce ModelClientStreamingChunkEvent for streaming model output and update handling in agents and console #5208

ekzhu commented Jan 26, 2025 •

edited

Loading

gagb commented Jan 27, 2025

ekzhu commented Jan 27, 2025

husseinmozannar commented Jan 27, 2025

ekzhu commented Jan 27, 2025

feat: introduce ModelClientStreamingChunkEvent for streaming model output and update handling in agents and console #5208

Are you sure you want to change the base?

feat: introduce ModelClientStreamingChunkEvent for streaming model output and update handling in agents and console #5208

Conversation

ekzhu commented Jan 26, 2025 • edited Loading

gagb commented Jan 27, 2025

ekzhu commented Jan 27, 2025

husseinmozannar commented Jan 27, 2025

ekzhu commented Jan 27, 2025

ekzhu commented Jan 26, 2025 •

edited

Loading