Idea: bounded evaluation artifact interop with Assay #5167

Rul1an · 2026-04-05T23:05:53Z

Rul1an
Apr 5, 2026

Hi all,

I maintain Assay. I spent some time with the ADK docs around evaluation, artifacts, and trajectory output, and I think there may be a small interop seam here if it stays narrow.

My read is that ADK is strongest as a toolkit for building, evaluating, and deploying agents. Assay is strongest when it takes external outputs and compiles them into bounded, reviewable evidence.

So I am not asking for broad integration, and I am not asking to import ADK evaluator semantics or scores into Assay as truth.

The much smaller question is this:

Is there one stable evaluation artifact or trajectory-oriented output that you would consider suitable for external consumers?

The shape I would want to test is intentionally small:

one exported evaluation artifact
a bounded mapping into Assay evidence
outcome, expected vs actual steps, and metrics kept as observed metadata
no score translation into Assay trust language
no claim that a passing ADK evaluation means the system is safe

I wrote up the tiny sketch on the Assay side here:
Rul1an/assay#1035

The key boundary for me is simple:

We are not asking to import your runtime's trust semantics, evaluator scores, or policy judgments into Assay as truth; we are asking whether there is a smallest stable output surface that can be compiled into bounded external evidence.

If that direction sounds reasonable, the most useful next step would probably just be:

point to the smallest stable evaluation artifact or trajectory output you already consider external-consumer-safe
and, if possible, one tiny sample artifact

Happy to tighten this into an even smaller sample if that is easier to react to than a discussion post.

Rul1an · 2026-04-06T19:06:53Z

Rul1an
Apr 6, 2026
Author

Putting one concrete artifact behind the earlier note: we built a tiny evaluation-artifact sample on our side and kept it deliberately narrow: one pass artifact, one fail artifact, and one malformed case, with trajectory_ref only treated as an observed reference rather than a second seam. The sample is here: Rul1an/assay#1043

We are not asking Assay to inherit ADK evaluator scores, runtime judgments, or trust semantics as truth. The only thing I’d love to understand from your side is this: if an external evidence consumer wants to stay small and honest, which artifact or result surface would you most want them to align to as the smallest stable seam?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: bounded evaluation artifact interop with Assay #5167

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Idea: bounded evaluation artifact interop with Assay #5167

Uh oh!

Rul1an Apr 5, 2026

Replies: 1 comment

Uh oh!

Rul1an Apr 6, 2026 Author

Rul1an
Apr 5, 2026

Rul1an
Apr 6, 2026
Author