Description
Describe the bug
Benchmarks collects messages from agent.stream().
Current implemention in maniupulation_o3de collects only last message every iteration.
and current implementation in tool_calling_agent collects all messages every iteration.
Both these approaches are invalid as messages can come in batches and are appended to already stored messages. This means that maniupulation_o3de can skip some messages and tool_calling_agent unnecessarily appends same messages more than once.
In manipulation O3DE it doesn't affect results as they are calculated based on end simulation positions, but can affect messages logged. In Tool Calling Agent it can affect results.
To Reproduce
- To see this bug you need to run
python src/rai_bench/rai_bench/examples/test_models.py
in debug mode - Track values of event and messages in agent loop:
Expected behavior
Extracted messages should be ALL UNIQUE messages returned by agent.
Tool calls extracted from these messages should also reflect all unique agent tool calls.
Screenshots
Platform
- OS: Ubuntu 22.04
- ROS 2 HUMBLE
Version
Release number or commit hash.
Additional context