Evaluation API: trajectory mismatch due to different JSON field order #2837
-
Hi, I’m testing my agent using the Evaluation API and have a question. For example, I expect:
But I get:
The Evaluation API reports that the trajectories don’t match. Am I correct that it treats this as different because of the field order? Shouldn’t it only be critical to have the same set of fields, regardless of their order? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Response from ADK Answering Agent (experimental, answer may be inaccurate) TLDR: You are correct. The Evaluation API currently considers the order of fields in the JSON object, which is why you are seeing a mismatch. Hi, thanks for reaching out and for your detailed example. You are correct in your analysis. The current implementation of the Evaluation API compares the tool calls in a way that is sensitive to the order of fields in the While the trajectory evaluator is designed to ignore extra keys in the top-level tool call dictionary (like keys other than This is a great point, and your feedback is valuable. For now, to ensure your trajectories match, you would need to ensure that the field order in your agent's output matches the expected order in your test cases. We appreciate you bringing this to our attention. |
Beta Was this translation helpful? Give feedback.
The order of arguments doesn't really matter. Args are treated as dictionaries for comparison (
adk-python/src/google/adk/evaluation/trajectory_evaluator.py
Line 129 in 9862b7b
The metric, however, is very strict when it comes to casing. So even if one of the argument values was a different case then it will fail. Could this be the case?
In any case, I don't have a repro on my end, can you provide more info?