Track cached, reasoning, and cost usage in v1 by xeophon · Pull Request #1704 · PrimeIntellect-ai/verifiers

xeophon · 2026-06-16T12:37:42Z

Overview

Add provider-reported usage telemetry to v1 traces and surface it consistently across evaluation, training serialization, persistence, and the rich dashboard.

Details

Extend Usage with cached input tokens, reasoning tokens, and provider-reported cost, with per-call aggregation across a rollout.
Preserve response usage on sampled message nodes so it survives wire and disk serialization.
Map OpenAI Chat Completions, OpenAI Responses, and Anthropic usage details into the shared v1 representation using their SDK models.
Keep provider usage separate from renderer-derived training sequence lengths while carrying cache and reasoning details through the training and legacy bridges.
Show cached input, reasoning tokens, and accumulated cost alongside token counts in the rich dashboard.

Note

Medium Risk
Changes how prompt_tokens is interpreted per provider and persists usage on traces, which can shift displayed totals and anything consuming stored usage—though scope is telemetry/display rather than core rollout logic.

Overview
Extends v1 usage telemetry so provider-reported cache reads, reasoning-token subsets, and optional USD cost flow from dialect parsers into traces, training serialization, and the eval rich dashboard.

Usage gains cached_input_tokens, reasoning_tokens, cost, plus aggregate and input_tokens so totals treat cached input as a disjoint bucket. OpenAI Chat, Responses, and Anthropic dialects map SDK usage details into that shape (notably prompt_tokens is uncached input for OpenAI; Anthropic prompt_tokens includes cache-creation tokens). Per-turn usage on MessageNode is serialized on wire/disk; Trace.usage / Branch.usage roll up per model call. The train client round-trips cache/reasoning into synthetic chat completions; the legacy bridge reads the new fields from v0 dicts. The eval dashboard shows cached, reasoning, and cost beside token counts.

^{Reviewed by Cursor Bugbot for commit 81e1fc7. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Track cached, reasoning, and cost usage fields across v1 dialects, traces, and dashboard

Extends the Usage type with cached_input_tokens, reasoning_tokens, cost, and an aggregate classmethod; total_tokens now includes cached input tokens.
Updates OpenAI Chat, Responses, and Anthropic dialect translators to populate the new fields from provider-reported usage details.
Adds a Trace.usage computed property that aggregates Usage across all model call nodes in the trace.
Updates the eval dashboard to display cached tokens, reasoning tokens, and USD cost per row; hides token counts when both prompt and completion are zero.
MessageNode.usage is no longer excluded from serialization, so per-node usage is now retained in trace output.
Behavioral Change: prompt_tokens in OpenAI Chat and Responses dialects now reflects uncached input tokens only; Anthropic prompt_tokens now includes cache-creation tokens.

^{Macroscope summarized 81e1fc7.}

macroscopeapp · 2026-06-16T12:42:32Z

Approvability

Verdict: Needs human review

This PR adds new token tracking fields and changes the semantic meaning of prompt_tokens (now excludes cached tokens) across multiple dialects and core types. The persistence behavior of usage data also changes from transient to persisted. These are meaningful runtime behavior changes that warrant review.

^{You can customize Macroscope's approvability policy. Learn more.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 980e06833e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-16T12:48:54Z

    provider/SDK enum skew (e.g. a value the pinned `openai` rejects)."""

    model_config = ConfigDict(extra="allow")
+    usage: ResponseUsage | None = None


Keep Responses usage parsing permissive

When an openai_responses endpoint returns a usage object with only aggregate counts such as input_tokens, output_tokens, and total_tokens, this typed field makes OpenAIResponse.model_validate(raw) in EvalClient.get_response validate against the SDK ResponseUsage, whose nested token-detail objects are required. The previous extra-allow/dict path accepted those responses and treated missing details as 0, but now the rollout fails before parse_response; keep usage permissive or normalize missing detail fields before validation.

Useful? React with 👍 / 👎.

mikasenghaas

nice, i like the token breakdown. im not sure i like the cost breakdown just yet. afaiu, the current pi inference cost is not actually accurate bc we dont account for cached tokens? at least this made the cost or running benches so far CRAZY HIGH. maybe this is resolved on pi inference now? also, can we test this against some of the common apis, like oai/ant at least, maybe also deepseek/kimi/minimax etc.?

Track cached, reasoning, and cost usage

81e1fc7

xeophon force-pushed the v1/usage-telemetry branch from 980e068 to 81e1fc7 Compare June 16, 2026 12:41

xeophon changed the base branch from codex/v1-prime-config to feat/nano-as-v1 June 16, 2026 12:41

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

mikasenghaas reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track cached, reasoning, and cost usage in v1#1704

Track cached, reasoning, and cost usage in v1#1704
xeophon wants to merge 1 commit into
feat/nano-as-v1from
v1/usage-telemetry

xeophon commented Jun 16, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

macroscopeapp Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 16, 2026

Uh oh!

mikasenghaas left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xeophon commented Jun 16, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Details

Track cached, reasoning, and cost usage fields across v1 dialects, traces, and dashboard

Uh oh!

macroscopeapp Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xeophon commented Jun 16, 2026 •

edited by macroscopeapp Bot

Loading

macroscopeapp Bot commented Jun 16, 2026 •

edited

Loading