Skip to content

map v1 reasoning effort by dialect#1690

Merged
xeophon merged 8 commits into
feat/nano-as-v1from
codex/v1-reasoning-effort-overrides
Jun 16, 2026
Merged

map v1 reasoning effort by dialect#1690
xeophon merged 8 commits into
feat/nano-as-v1from
codex/v1-reasoning-effort-overrides

Conversation

@xeophon

@xeophon xeophon commented Jun 15, 2026

Copy link
Copy Markdown
Member

Overview

Maps the provider-agnostic sampling.reasoning_effort setting into each v1 dialect's native request shape.

Details

  • Responses requests write the configured value to reasoning.effort while preserving adjacent reasoning settings such as summaries.
  • Anthropic Messages requests write the configured value to output_config.effort while preserving other output configuration.
  • Anthropic thinking configuration remains explicit and is preserved from the intercepted request; the dialect does not infer model capabilities or enable a thinking mode.
  • Existing provider-native reasoning and thinking settings remain unchanged when the eval does not set reasoning_effort.

Note

Low Risk
Small, opt-in request shaping in the interception layer; merges with existing provider fields and only applies when reasoning_effort is configured.

Overview
Adds provider-neutral sampling.reasoning_effort (CLI/TOML) on SamplingConfig so evals can set reasoning effort in one place.

When set, dialect apply_overrides maps it onto outgoing requests: Responsesreasoning.effort (merged with existing reasoning), Anthropic Messagesoutput_config.effort (merged with existing output_config). If unset, behavior is unchanged.

GUIDE and README document the knob and per-dialect wire shapes (including chat-completions reasoning_effort). Fixes a missing newline at the end of GUIDE.md.

Reviewed by Cursor Bugbot for commit b0897d5. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Map reasoning_effort to dialect-specific request fields for Anthropic and Responses APIs

  • anthropic.py: Maps sampling.reasoning_effort to output_config.effort in outgoing Anthropic Messages requests, merging with any existing output_config.
  • responses.py: Maps sampling.reasoning_effort to reasoning.effort in outgoing OpenAI /responses requests, merging with any existing reasoning object.

Changes since #1690 opened

  • Added 'effort' field to SamplingConfig and updated dialect-specific mappings [f806269]
  • Replaced reasoning_effort with a provider-neutral effort field in the sampling configuration section [37fa310]
  • Removed [sampling] section from gsm8k configuration [47aebe5]
  • Updated dialect mappers to read reasoning_effort from sampling configuration instead of effort [2b174e0]
  • Renamed the sampling configuration field from effort to reasoning_effort [2b174e0]
  • Updated documentation to reference sampling.reasoning_effort instead of sampling.effort [2b174e0]
  • Updated tests to validate the reasoning_effort field name [2b174e0]
  • Removed test test_sampling_reasoning_effort_is_typed from tests.v1.test_configs [b0897d5]

Macroscope summarized 456685c.

@xeophon xeophon marked this pull request as ready for review June 15, 2026 12:36
@xeophon xeophon changed the title [codex] map v1 reasoning effort by dialect map v1 reasoning effort by dialect Jun 15, 2026
macroscopeapp[bot]
macroscopeapp Bot previously approved these changes Jun 15, 2026
@macroscopeapp

macroscopeapp Bot commented Jun 15, 2026

Copy link
Copy Markdown

Approvability

Verdict: Needs human review

An unresolved review comment identifies that the new reasoning_effort parameter is not mapped correctly in the train client and legacy bridge code paths, which could cause the feature to fail or be ignored in those contexts.

You can customize Macroscope's approvability policy. Learn more.

@mikasenghaas mikasenghaas left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, let's put effort into our SamplingConfig and map from there

@mikasenghaas

Copy link
Copy Markdown
Member

unforutnately i dont think we can type this as a literal since it has to work across providers, so prob just str

@macroscopeapp macroscopeapp Bot dismissed their stale review June 15, 2026 17:05

Dismissing prior approval to re-evaluate f806269

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f806269. Configure here.

Comment thread verifiers/v1/types.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f806269a53

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread verifiers/v1/types.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 47aebe58e2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread verifiers/v1/types.py Outdated
model_config = ConfigDict(extra="allow")
temperature: float | None = None
top_p: float | None = None
effort: str | None = None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Map effort before generic sampling dumps

Adding effort to the shared SamplingConfig makes it appear in every model_dump(), but only the proxy dialects translate it. The v1 train client passes sampling_args.model_dump(exclude_none=True) directly to renderers.client.generate (verifiers/v1/clients/train.py:193), and the legacy bridge passes the same dump into v0 clients (verifiers/v1/legacy.py:349/:435), whose normalizers look for reasoning_effort, not effort. In runs such as uv run eval <taskset> --client.type train --sampling.effort medium or legacy --id evals, the new documented knob is therefore sent as an unmapped effort key instead of the provider/engine-native shape, so the request can fail or the requested reasoning budget is not applied.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2b174e074a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread verifiers/v1/types.py

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b0897d55da

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +263 to +266
overrides["output_config"] = {
**dict(body.get("output_config") or {}),
"effort": s["reasoning_effort"],
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enable adaptive thinking when applying Anthropic effort

When --sampling.reasoning-effort is used for adaptive-thinking Claude models such as claude-opus-4-7 or claude-sonnet-4-6 and the intercepted request body does not already include thinking, this override only sends output_config.effort. Anthropic's extended-thinking docs require thinking: {type: "adaptive"} to enable effort-controlled thinking on those models, and the existing v0 Anthropic client adds that field in the same situation (verifiers/clients/anthropic_messages_client.py:335-348), so v1 proxy evals can silently run without the requested adaptive thinking instead of honoring the configured reasoning budget.

Useful? React with 👍 / 👎.

@xeophon xeophon merged commit 2822e23 into feat/nano-as-v1 Jun 16, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants