Skip to content

[FEATURE] Support previous_response_id for stateful multi-turn conversations in OpenAIResponsesModel #1957

@sagimedina1

Description

@sagimedina1

Problem Statement

The OpenAIResponsesModel currently resends the full conversation history on every turn, even though the Responses API natively
supports server-side conversation state via previous_response_id. The docstring at the top of openai_responses.py acknowledges
this:

"The Responses API can maintain conversation state server-side through 'previous_response_id'... Note: This implementation currently only implements the stateless approach."

For agentic applications with multi-turn conversations (10+ turns with tool calls), this means:

  • Token costs scale linearly with conversation length — turn 10 resends all previous turns
  • Latency increases as the input payload grows
  • Context window pressure — long conversations hit limits faster, not because of new content, but because of repeated history

This affects both OpenAI and xAI Responses API users, since both support previous_response_id.

Proposed Solution

  1. After each successful response in stream(), capture and store response.id from the completed response event.
  2. On subsequent calls, if a previous_response_id is available, pass it in the request instead of the full message history. Only
    send the new user message(s) and tool results in input.
  3. Fall back to the current stateless approach (full history) if:
    - No previous response ID exists (first turn)
    - The stored response has expired (30-day server retention)
    - The API returns an error indicating the previous response is invalid
  4. Expose a configuration option to enable/disable this behavior (e.g., stateful=True in config), defaulting to disabled for
    backward compatibility.

The key changes would be in:

  • stream() — capture response.id from response.completed event
  • _format_request() — conditionally pass previous_response_id instead of full history
  • State management — store the last response ID (could be returned as metadata alongside usage stats)

Use Case

We run a property management AI agent on Bedrock AgentCore using Strands. Each session is a multi-turn conversation where the PM
asks for analysis, creates action plans, drafts emails, and executes tasks. A typical session is 10-20 turns with heavy tool
use (each turn may involve 2-5 tool calls).

Today, turn 15 of a conversation resends all 14 previous turns plus their tool call/result pairs. With previous_response_id,
turn 15 would only send the new user message — the server already has the rest.

This would help with:

  • Cost reduction — estimated 40-60% input token savings for typical multi-turn sessions
  • Faster responses — less data to transmit and process per turn
  • Longer conversations — more room in the context window for actual content instead of repeated history

Alternatives Solutions

Application-level caching/summarization - can get messy

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions