Replies: 1 comment
-
|
Putting one concrete artifact behind the earlier note: we built a tiny evaluation-artifact sample on our side and kept it deliberately narrow: one pass artifact, one fail artifact, and one malformed case, with We are not asking Assay to inherit ADK evaluator scores, runtime judgments, or trust semantics as truth. The only thing I’d love to understand from your side is this: if an external evidence consumer wants to stay small and honest, which artifact or result surface would you most want them to align to as the smallest stable seam? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I maintain Assay. I spent some time with the ADK docs around evaluation, artifacts, and trajectory output, and I think there may be a small interop seam here if it stays narrow.
My read is that ADK is strongest as a toolkit for building, evaluating, and deploying agents. Assay is strongest when it takes external outputs and compiles them into bounded, reviewable evidence.
So I am not asking for broad integration, and I am not asking to import ADK evaluator semantics or scores into Assay as truth.
The much smaller question is this:
Is there one stable evaluation artifact or trajectory-oriented output that you would consider suitable for external consumers?
The shape I would want to test is intentionally small:
I wrote up the tiny sketch on the Assay side here:
Rul1an/assay#1035
The key boundary for me is simple:
We are not asking to import your runtime's trust semantics, evaluator scores, or policy judgments into Assay as truth; we are asking whether there is a smallest stable output surface that can be compiled into bounded external evidence.
If that direction sounds reasonable, the most useful next step would probably just be:
Happy to tighten this into an even smaller sample if that is easier to react to than a discussion post.
Beta Was this translation helpful? Give feedback.
All reactions