[FEATURE] Support previous_response_id for stateful multi-turn conversations in OpenAIResponsesModel

### Problem Statement

 The OpenAIResponsesModel currently resends the full conversation history on every turn, even though the Responses API natively
  supports server-side conversation state via previous_response_id. The docstring at the top of openai_responses.py acknowledges
  this:

> "The Responses API can maintain conversation state server-side through 'previous_response_id'... Note: This implementation currently only implements the stateless approach."

  For agentic applications with multi-turn conversations (10+ turns with tool calls), this means:
  - Token costs scale linearly with conversation length — turn 10 resends all previous turns
  - Latency increases as the input payload grows
  - Context window pressure — long conversations hit limits faster, not because of new content, but because of repeated history

  This affects both OpenAI and xAI Responses API users, since both support previous_response_id.

### Proposed Solution

1. After each successful response in stream(), capture and store response.id from the completed response event.
  2. On subsequent calls, if a previous_response_id is available, pass it in the request instead of the full message history. Only
   send the new user message(s) and tool results in input.
  3. Fall back to the current stateless approach (full history) if:
    - No previous response ID exists (first turn)
    - The stored response has expired (30-day server retention)
    - The API returns an error indicating the previous response is invalid
  4. Expose a configuration option to enable/disable this behavior (e.g., stateful=True in config), defaulting to disabled for
  backward compatibility.

  The key changes would be in:
  - stream() — capture response.id from response.completed event
  - _format_request() — conditionally pass previous_response_id instead of full history
  - State management — store the last response ID (could be returned as metadata alongside usage stats)


### Use Case

  We run a property management AI agent on Bedrock AgentCore using Strands. Each session is a multi-turn conversation where the PM
   asks for analysis, creates action plans, drafts emails, and executes tasks. A typical session is 10-20 turns with heavy tool
  use (each turn may involve 2-5 tool calls).

  Today, turn 15 of a conversation resends all 14 previous turns plus their tool call/result pairs. With previous_response_id,
  turn 15 would only send the new user message — the server already has the rest.

  This would help with:
  - Cost reduction — estimated 40-60% input token savings for typical multi-turn sessions
  - Faster responses — less data to transmit and process per turn
  - Longer conversations — more room in the context window for actual content instead of repeated history



### Alternatives Solutions

Application-level caching/summarization - can get messy

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Support previous_response_id for stateful multi-turn conversations in OpenAIResponsesModel #1957

Problem Statement

Proposed Solution

Use Case

Alternatives Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] Support previous_response_id for stateful multi-turn conversations in OpenAIResponsesModel #1957

Description

Problem Statement

Proposed Solution

Use Case

Alternatives Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions