Testing Suite: uncovered cases reports #636

teocns · 2025-01-10T21:03:37Z

teocns
Jan 10, 2025

We will use this discussion to report edge cases that are not yet covered in the testing suite but should be implemented.

dot-agi · 2025-01-13T11:30:27Z

dot-agi
Jan 13, 2025
Maintainer

Microsoft Autogen >=0.4.0 has to be tested with the examples on their website.

This is because they have made major changes to the codebas and we must validate that it works as expected + add tests to the latest testing suite.

0 replies

dot-agi · 2025-01-13T11:31:39Z

dot-agi
Jan 13, 2025
Maintainer

crewAI codebase contains the agentops integration which should be tested completely either on their codebase or on ours by installing the latest version and testing against it.

1 reply

teocns Jan 13, 2025
Author

Can write integration tests our side. Taking a look at core manual tests and seeing what we have. Adding them to #637

dot-agi · 2025-01-24T08:27:24Z

dot-agi
Jan 24, 2025
Maintainer

Integration tests are testing LLM provider clients. They should rather test the patched providers in llms/providers directory and partner integrations in partners directory within agentops directory.

0 replies

kinthaiofficial · 2026-04-29T00:58:50Z

kinthaiofficial
Apr 29, 2026

Test coverage for agent systems is genuinely hard because the "correct" behavior is often probabilistic, context-dependent, and hard to specify in a traditional unit test.

A few testing strategies that work at different layers:

Deterministic input/output tests — for agent behaviors that should be deterministic (format compliance, schema validation, tool call structure), traditional assertion-based tests work fine. Use temperature=0 or seed the sampling for reproducibility.

Distribution tests — instead of "does this exact output match", test "does output distribution satisfy this property?" Over N runs: what fraction produce valid JSON? What's the 90th percentile token count? What's the success rate on a defined task set? These properties can be asserted with tolerances.

State machine coverage — enumerate all possible agent state transitions (pending → running → delegated → budget-paused → completed/failed) and ensure you have test cases that exercise each edge. Budget exhaustion and delegation failures are common coverage gaps.

Contract tests for tool calls — for each tool the agent can call, verify that: (a) the agent correctly formats the call, (b) the agent handles valid responses correctly, (c) the agent handles error responses gracefully without looping.

Economic invariant tests — assert that total child agent budgets ≤ parent budget; assert that cost attribution sums correctly through the delegation chain; assert that the budget-exhausted state is reachable and handled correctly.

Replay tests — record a successful agent execution trace, then replay it and assert that the cost, steps taken, and output match the baseline within tolerance. Good for regression detection.

We test KinthAI's agent protocol layer with a mix of these approaches: https://blog.kinthai.ai/221-agents-multi-agent-coordination-lessons covers the protocol design, which informs what invariants are worth testing.

What's causing the most testing pain — non-determinism, cost of running tests, or specification of expected behavior?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Testing Suite: uncovered cases reports #636

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Testing Suite: uncovered cases reports #636

Uh oh!

teocns Jan 10, 2025

Replies: 4 comments · 1 reply

Uh oh!

dot-agi Jan 13, 2025 Maintainer

Uh oh!

dot-agi Jan 13, 2025 Maintainer

Uh oh!

teocns Jan 13, 2025 Author

Uh oh!

dot-agi Jan 24, 2025 Maintainer

Uh oh!

kinthaiofficial Apr 29, 2026

teocns
Jan 10, 2025

Replies: 4 comments 1 reply

dot-agi
Jan 13, 2025
Maintainer

dot-agi
Jan 13, 2025
Maintainer

teocns Jan 13, 2025
Author

dot-agi
Jan 24, 2025
Maintainer

kinthaiofficial
Apr 29, 2026