map v1 reasoning effort by dialect#1690
Conversation
ApprovabilityVerdict: Needs human review An unresolved review comment identifies that the new You can customize Macroscope's approvability policy. Learn more. |
mikasenghaas
left a comment
There was a problem hiding this comment.
ya, let's put effort into our SamplingConfig and map from there
|
unforutnately i dont think we can type this as a literal since it has to work across providers, so prob just |
Dismissing prior approval to re-evaluate f806269
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit f806269. Configure here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f806269a53
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 47aebe58e2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| model_config = ConfigDict(extra="allow") | ||
| temperature: float | None = None | ||
| top_p: float | None = None | ||
| effort: str | None = None |
There was a problem hiding this comment.
Map effort before generic sampling dumps
Adding effort to the shared SamplingConfig makes it appear in every model_dump(), but only the proxy dialects translate it. The v1 train client passes sampling_args.model_dump(exclude_none=True) directly to renderers.client.generate (verifiers/v1/clients/train.py:193), and the legacy bridge passes the same dump into v0 clients (verifiers/v1/legacy.py:349/:435), whose normalizers look for reasoning_effort, not effort. In runs such as uv run eval <taskset> --client.type train --sampling.effort medium or legacy --id evals, the new documented knob is therefore sent as an unmapped effort key instead of the provider/engine-native shape, so the request can fail or the requested reasoning budget is not applied.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2b174e074a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b0897d55da
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| overrides["output_config"] = { | ||
| **dict(body.get("output_config") or {}), | ||
| "effort": s["reasoning_effort"], | ||
| } |
There was a problem hiding this comment.
Enable adaptive thinking when applying Anthropic effort
When --sampling.reasoning-effort is used for adaptive-thinking Claude models such as claude-opus-4-7 or claude-sonnet-4-6 and the intercepted request body does not already include thinking, this override only sends output_config.effort. Anthropic's extended-thinking docs require thinking: {type: "adaptive"} to enable effort-controlled thinking on those models, and the existing v0 Anthropic client adds that field in the same situation (verifiers/clients/anthropic_messages_client.py:335-348), so v1 proxy evals can silently run without the requested adaptive thinking instead of honoring the configured reasoning budget.
Useful? React with 👍 / 👎.

Overview
Maps the provider-agnostic
sampling.reasoning_effortsetting into each v1 dialect's native request shape.Details
reasoning.effortwhile preserving adjacent reasoning settings such as summaries.output_config.effortwhile preserving other output configuration.reasoning_effort.Note
Low Risk
Small, opt-in request shaping in the interception layer; merges with existing provider fields and only applies when
reasoning_effortis configured.Overview
Adds provider-neutral
sampling.reasoning_effort(CLI/TOML) onSamplingConfigso evals can set reasoning effort in one place.When set, dialect
apply_overridesmaps it onto outgoing requests: Responses →reasoning.effort(merged with existingreasoning), Anthropic Messages →output_config.effort(merged with existingoutput_config). If unset, behavior is unchanged.GUIDE and README document the knob and per-dialect wire shapes (including chat-completions
reasoning_effort). Fixes a missing newline at the end of GUIDE.md.Reviewed by Cursor Bugbot for commit b0897d5. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Map
reasoning_effortto dialect-specific request fields for Anthropic and Responses APIssampling.reasoning_efforttooutput_config.effortin outgoing Anthropic Messages requests, merging with any existingoutput_config.sampling.reasoning_efforttoreasoning.effortin outgoing OpenAI/responsesrequests, merging with any existingreasoningobject.Changes since #1690 opened
SamplingConfigand updated dialect-specific mappings [f806269]reasoning_effortwith a provider-neutraleffortfield in thesamplingconfiguration section [37fa310][sampling]section fromgsm8kconfiguration [47aebe5]reasoning_effortfrom sampling configuration instead ofeffort[2b174e0]efforttoreasoning_effort[2b174e0]sampling.reasoning_effortinstead ofsampling.effort[2b174e0]reasoning_effortfield name [2b174e0]test_sampling_reasoning_effort_is_typedfromtests.v1.test_configs[b0897d5]Macroscope summarized 456685c.