Evaluation API: trajectory mismatch due to different JSON field order #2837

st4sik · 2025-09-03T14:44:02Z

st4sik
Sep 3, 2025

Hi, I’m testing my agent using the Evaluation API and have a question.

For example, I expect:

{
  "tool_name": "get_geo_coordinates",
  "tool_input": { "choice": "point", "location": "New York" }
}

But I get:

{
  "tool_name": "get_geo_coordinates",
  "tool_input": { "location": "New York", "choice": "point" }
}

The Evaluation API reports that the trajectories don’t match.

Am I correct that it treats this as different because of the field order? Shouldn’t it only be critical to have the same set of fields, regardless of their order?

Answered by ankursharmas

Sep 11, 2025

The order of arguments doesn't really matter. Args are treated as dictionaries for comparison (

adk-python/src/google/adk/evaluation/trajectory_evaluator.py

Line 129 in 9862b7b

if actual.name != expected.name or actual.args != expected.args:

)

The metric, however, is very strict when it comes to casing. So even if one of the argument values was a different case then it will fail. Could this be the case?

In any case, I don't have a repro on my end, can you provide more info?

View full answer

adk-bot · 2025-09-03T14:45:15Z

adk-bot
Sep 3, 2025
Maintainer

Response from ADK Answering Agent (experimental, answer may be inaccurate)

TLDR: You are correct. The Evaluation API currently considers the order of fields in the JSON object, which is why you are seeing a mismatch.

Hi, thanks for reaching out and for your detailed example.

You are correct in your analysis. The current implementation of the Evaluation API compares the tool calls in a way that is sensitive to the order of fields in the tool_input JSON object. This means that even if the same fields and values are present, a different order will result in the trajectories not matching.

While the trajectory evaluator is designed to ignore extra keys in the top-level tool call dictionary (like keys other than "tool_name" and "tool_input"), it appears the comparison of the nested tool_input object is not canonicalized to account for different field orders [1].

This is a great point, and your feedback is valuable. For now, to ensure your trajectories match, you would need to ensure that the field order in your agent's output matches the expected order in your test cases.

We appreciate you bringing this to our attention.

[1] test_trajectory_evaluator.py

2 replies

st4sik Sep 3, 2025
Author

It would be great to get an official response from a developer.

ankursharmas Sep 11, 2025
Maintainer

The order of arguments doesn't really matter. Args are treated as dictionaries for comparison (

adk-python/src/google/adk/evaluation/trajectory_evaluator.py

Line 129 in 9862b7b

if actual.name != expected.name or actual.args != expected.args:

)

The metric, however, is very strict when it comes to casing. So even if one of the argument values was a different case then it will fail. Could this be the case?

In any case, I don't have a repro on my end, can you provide more info?

Answer selected by hangfei

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluation API: trajectory mismatch due to different JSON field order #2837

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Evaluation API: trajectory mismatch due to different JSON field order #2837

Uh oh!

st4sik Sep 3, 2025

Replies: 1 comment · 2 replies

Uh oh!

adk-bot Sep 3, 2025 Maintainer

Uh oh!

st4sik Sep 3, 2025 Author

Uh oh!

ankursharmas Sep 11, 2025 Maintainer

st4sik
Sep 3, 2025

Replies: 1 comment 2 replies

adk-bot
Sep 3, 2025
Maintainer

st4sik Sep 3, 2025
Author

ankursharmas Sep 11, 2025
Maintainer