Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce ModelClientStreamingChunkEvent for streaming model output and update handling in agents and console #5208

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ekzhu
Copy link
Collaborator

@ekzhu ekzhu commented Jan 26, 2025

Resolves #3983

  • introduce model_client_stream parameter in AssistantAgent to enable token-level streaming output.
  • introduce ModelClientStreamingChunkEvent as a type of AgentEvent to pass the streaming chunks to the application via run_stream and on_messages_stream. Although this will not affect the inner messages list in the final Response or TaskResult.
  • handle this new message type in Console.

Example:

import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient

async def main() -> None:
    agent = AssistantAgent("assistant", OpenAIChatCompletionClient(model="gpt-4o"), model_client_stream=True)
    await Console(agent.run_stream(task="Write a short story with a surprising ending."))

asyncio.run(main())

To see more details:

import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient

async def main() -> None:
    agent = AssistantAgent("assistant", OpenAIChatCompletionClient(model="gpt-4o"), model_client_stream=True)
    async for message in agent.run_stream(task="Write 3 line poem."):
        print(message)

asyncio.run(main())
source='user' models_usage=None content='Write 3 line poem.' type='TextMessage'
source='assistant' models_usage=None content='Silent' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' whispers' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' glide' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=',' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content='  \n' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content='Moon' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content='lit' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' dreams' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' dance' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' through' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' the' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' night' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=',' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content='  \n' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content='Stars' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' watch' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' from' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content=' above' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=None content='.' type='ModelClientStreamingChunkEvent'
source='assistant' models_usage=RequestUsage(prompt_tokens=0, completion_tokens=0) content='Silent whispers glide,  \nMoonlit dreams dance through the night,  \nStars watch from above.' type='TextMessage'
TaskResult(messages=[TextMessage(source='user', models_usage=None, content='Write 3 line poem.', type='TextMessage'), TextMessage(source='assistant', models_usage=RequestUsage(prompt_tokens=0, completion_tokens=0), content='Silent whispers glide,  \nMoonlit dreams dance through the night,  \nStars watch from above.', type='TextMessage')], stop_reason=None)

Next step needed after this PR:

  • Update RichConsole to handle the streaming chunks, currently they are skipped.

…tput and update handling in agents and console
@ekzhu ekzhu requested a review from gziz January 26, 2025 18:11
@gagb
Copy link
Collaborator

gagb commented Jan 27, 2025

Why is the new parameter a property of the agent instead of the model client?

@ekzhu
Copy link
Collaborator Author

ekzhu commented Jan 27, 2025

Why is the new parameter a property of the agent instead of the model client?

Because the model client has different public methods: create and create_stream. So it becomes the callers responsibility to choose which one to use. The caller is the agent in this case

@husseinmozannar
Copy link
Contributor

I wonder if you need either 1) message_ids for each message a unique id or/and 2) chunk id (from 0 to num of chunks). The reason is you need to figure out UI side how to collate the messages. Currently I guess if a message is of type ModelClientStreamingChunkEvent you keep appending it until you encounter message of different type.

But I guess this is error prone since you might parallel messages from other agents being emitted or things potentially out of order, but maybe this is not a concern

@ekzhu
Copy link
Collaborator Author

ekzhu commented Jan 27, 2025

I wonder if you need either 1) message_ids for each message a unique id or/and 2) chunk id (from 0 to num of chunks). The reason is you need to figure out UI side how to collate the messages. Currently I guess if a message is of type ModelClientStreamingChunkEvent you keep appending it until you encounter message of different type.

But I guess this is error prone since you might parallel messages from other agents being emitted or things potentially out of order, but maybe this is not a concern

I believe we will need to introduce the message id and chunk index at some point to address out-of-order messages, though right now all messages are going to be arriving in order because AgentChat is sequential. So, the complete message always follows the previous chunks. Once we introduce parallelism in the future, a whole set of modifications will be needed. So the thinking here is to avoid adding attributes that "may be useful" as they will be inevitably changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Partial ChatMessage from ChatAgent for streaming
4 participants