Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 1 addition & 13 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,19 +141,7 @@ explicitly first.
avoids pinning consumers to our exact normalisation of the
template-input shape (see decision 3).

14. **Replay applies no agentcap-side normalisation.** `agentcap
replay` re-issues a captured request with no flags that mutate
the body. The request is persisted as parsed JSON (so the
original byte sequence — whitespace, key ordering — isn't
recoverable, only the JSON object); streamed SSE response bytes
are kept verbatim. Cross-server strictness asymmetries (e.g.
captures from a lenient upstream sent at a strict upstream
that rejects explicit `null`s) are the consumer's normalisation
problem, not agentcap's. Multi-turn replay stays out of scope
because conversation state diverges as soon as the new model
responds differently.

15. **Inference backend must deliver tool calls in `message.content`,
14. **Inference backend must deliver tool calls in `message.content`,
not `message.reasoning_content`.** Hermes (and presumably other
agents) parses tool calls from the OpenAI-spec `content` field.
Reasoning-by-default models (Qwen 3.5+, etc.) on llama.cpp put
Expand Down
17 changes: 6 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@ An end-to-end harness for running real coding agents at scale across
the agent through a corpus of prompts, captures every chat-completion
request/response from the wire (request bodies as parsed JSON;
streamed responses as raw SSE bytes), and pushes the result to the
Hub — so consumers can replay, render, or analyse what the agent
Hub — so consumers can render, analyse, or re-issue what the agent
actually sent and got back, without reconstructing it from a log.

The pipeline:

```
corpus ──► sandboxed agent run ──► capture ──► export ──► publish ──► inspect / replay
corpus ──► sandboxed agent run ──► capture ──► export ──► publish ──► inspect
```

Repeat for each `(agent, model)` you want compared — the corpus
Expand All @@ -28,7 +28,7 @@ with live preview and Esc walk-back. See
## Quick start

Install the prereqs (one-time) and agentcap itself. `podman` runs
the per-agent sandbox, `fzf` drives the inspect / replay pickers
the per-agent sandbox, `fzf` drives the inspect pickers
(hard requirement; `agentcap inspect` errors out without it), and
`trufflehog` runs the pre-push secret scan (`agentcap export`
aborts on any verified hit; pass `--no-scan` to skip).
Expand All @@ -45,6 +45,8 @@ curl -sSfL https://raw.githubusercontent.com/trufflesecurity/trufflehog/main/scr
| sh -s -- -b ~/.local/bin

# Both
python -m venv .venv
source .venv/bin/activate
pip install -e .
```

Expand Down Expand Up @@ -98,22 +100,15 @@ agentcap inspect <run-id> # one run only
agentcap inspect <request-id> # dump a specific body
```

Re-issue a single captured request to an OpenAI-compatible target.

```bash
agentcap replay <request-id> --target http://127.0.0.1:8000
```

## Usage

The four sub-commands have a dedicated walkthrough each — flags,
The three sub-commands have a dedicated walkthrough each — flags,
flows, and a recorded demo:

| command | docs page |
|-------------------|--------------------------------------------|
| `agentcap run` | [docs/capture.md](docs/capture.md) — sandboxes, multi-turn, follow-ups, backends |
| `agentcap inspect`| [docs/inspect.md](docs/inspect.md) — workspace / parquet / HF dataset pickers |
| `agentcap replay` | [docs/replay.md](docs/replay.md) — re-issue any captured request elsewhere |
| `agentcap export` | [docs/export.md](docs/export.md) — push captures + traces as a HF Collection |

See [docs/tested-models-and-agents.md](docs/tested-models-and-agents.md)
Expand Down
1 change: 0 additions & 1 deletion docs/capture.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,5 +113,4 @@ For known-good `(backend, model, agent)` tuples see

- `agentcap ls` — list runs in the workspace.
- [docs/inspect.md](inspect.md) — browse captured requests.
- [docs/replay.md](replay.md) — re-issue a captured request elsewhere.
- [docs/export.md](export.md) — publish to a HF dataset.
48 changes: 0 additions & 48 deletions docs/demo/replay.tape

This file was deleted.

Binary file removed docs/img/replay.gif
Binary file not shown.
7 changes: 2 additions & 5 deletions docs/inspect.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,11 +97,8 @@ filesystem, no full download needed.
## Piping the picked rid

`--rid` makes the picker print the selected request id and exit
instead of opening it — useful for chaining into `agentcap replay`.

```bash
agentcap replay $(agentcap inspect --rid) --target http://127.0.0.1:8000
```
instead of opening it — handy for capturing a selection into a script
(e.g. `rid=$(agentcap inspect --rid)`).

## Looking up a specific rid

Expand Down
88 changes: 0 additions & 88 deletions docs/replay.md

This file was deleted.

Loading
Loading