Human-in-the-Loop Implementation #2021

mjschock · 2025-11-01T01:47:43Z

Resolves #636.

OPENAI_API_KEY="your_api_key_here" uv run python examples/agent_patterns/human_in_the_loop.py

================================================================================
Run interrupted - tool approval required
================================================================================
State saved to result.json
Loading state from result.json

Tool call details:
  Agent: Weather Assistant
  Tool: get_temperature
  Arguments: {"city":"Oakland"}

Do you approve this tool call? (y/n): y
✓ Approved: get_temperature

Resuming agent execution...

================================================================================
Final Output:
================================================================================
The weather in Oakland is sunny, and the temperature is 20°C.

OPENAI_API_KEY="your_api_key_here" uv run python examples/agent_patterns/human_in_the_loop_stream.py 

================================================================================
Human-in-the-loop: approval required for the following tool calls:
================================================================================

Tool call details:
  Agent: Weather Assistant
  Tool: get_temperature
  Arguments: {"city":"Oakland"}

Do you approve this tool call? (y/n): y
✓ Approved: get_temperature

Resuming agent execution...

================================================================================
Final Output:
================================================================================
The current weather in Oakland is sunny, with a temperature of 20°C.

Done!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-11-01T18:19:41Z

src/agents/run.py

        previous_response_id: str | None,
        conversation_id: str | None,
        session: Session | None,
+        run_state: RunState[TContext] | None = None,
    ):
        if streamed_result.trace:
            streamed_result.trace.start(mark_as_current=True)


Prime server tracker when resuming streaming runs

When Runner.run_streamed resumes from a RunState with conversation_id or previous_response_id, _start_streaming constructs a _ServerConversationTracker but never seeds it with the prior model_responses that were already sent. Unlike the synchronous path, no call to track_server_items is made, so prepare_input treats every previously generated item as unsent and resubmits them to the server. This duplicates earlier messages and breaks server-side conversation threading when a run is resumed.

Useful? React with 👍 / 👎.

Fixed in a56ce0a. Added server conversation tracker priming at lines 1076-1079 to match the non-streaming implementation and prevent message duplication when resuming from RunState.

chatgpt-codex-connector · 2025-11-01T18:19:41Z

src/agents/run.py

        previous_response_id: str | None,
        conversation_id: str | None,
        session: Session | None,
+        run_state: RunState[TContext] | None = None,
    ):
        if streamed_result.trace:
            streamed_result.trace.start(mark_as_current=True)


Streaming resume ignores existing turn count

The streaming execution path always initializes current_turn = 0 when _start_streaming is called, even if a RunState with an existing _current_turn is supplied. The loop then increments from zero, so any turns completed before the interruption are ignored and the max_turns guard is reset. After each interruption, a resumed streaming run can exceed the user’s turn limit and misreport the current turn number.

Useful? React with 👍 / 👎.

This was already fixed in 74c50fd at line 914: current_turn=run_state._current_turn if run_state else 0. The turn counter is properly restored from the RunState.

…essage duplication When resuming a streaming run from RunState, the server conversation tracker was not being primed with previously sent model responses. This caused `prepare_input` to treat all previously generated items as unsent and resubmit them to the server, breaking conversation threading. **Issue**: Missing `track_server_items` call in streaming resumption path **Fix**: Added server conversation tracker priming logic in `_start_streaming` method (lines 1076-1079) to match the non-streaming path implementation (lines 553-556). The fix iterates through `run_state._model_responses` and calls `track_server_items(response)` to mark them as already sent to the server. **Impact**: Resolves message duplication when resuming interrupted streaming runs, ensuring proper conversation threading with server-side sessions. Fixes code review feedback from PR openai#2021 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

seratch · 2025-11-04T02:33:20Z

Thanks for sending this patch!

I currently don't have the bandwidth to check this in depth, but one thing I wanted to mention is that, while implementing the sessions feature in openai-agents-js project, I found that the internals of runner need to take various HITL patterns into consideration. There might not be necessary to make those changes in this Python SDK, but sufficient testing for the sessions scenarios is worth doing.

mjschock · 2025-11-04T17:45:50Z

Thanks for sending this patch!

I currently don't have the bandwidth to check this in depth, but one thing I wanted to mention is that, while implementing the sessions feature in openai-agents-js project, I found that the internals of runner need to take various HITL patterns into consideration. There might not be necessary to make those changes in this Python SDK, but sufficient testing for the sessions scenarios is worth doing.

Happy to contribute! I added a couple examples using SQLiteSession and OpenAIConversationsSession and made sure they work:

OPENAI_API_KEY="your_api_key_here" uv run python examples/memory/memory_session_hitl_example.py 
=== Memory Session + HITL Example ===
Session id: :memory:
Enter a message to chat with the agent. Submit an empty line to exit.
The agent will ask for approval before using tools.

You: What cities does the Bay Bridge connect?
Assistant: The Bay Bridge connects San Francisco and Oakland in California.

You: What's the weather in those cities?

Agent HITL Assistant wants to call 'get_weather' with {"location":"San Francisco, CA"}. Approve? (y/n): y
Approved tool call.

Agent HITL Assistant wants to call 'get_weather' with {"location":"Oakland, CA"}. Approve? (y/n): y
Approved tool call.
Assistant: San Francisco is currently foggy with a temperature of 58°F. Oakland is sunny with a temperature of 72°F.

You:

OPENAI_API_KEY="your_api_key_here" uv run python examples/memory/openai_session_hitl_example.py 
=== OpenAI Session + HITL Example ===
Enter a message to chat with the agent. Submit an empty line to exit.
The agent will ask for approval before using tools.

You: What cities does the Bay Bridge connect?
Assistant: The Bay Bridge, officially known as the San Francisco–Oakland Bay Bridge, connects the cities of **San Francisco** and **Oakland** in California.

You: What's the weather in those cities?

Agent HITL Assistant wants to call 'get_weather' with {"location":"San Francisco, CA"}. Approve? (y/n): y
Approved tool call.

Agent HITL Assistant wants to call 'get_weather' with {"location":"Oakland, CA"}. Approve? (y/n): y
Approved tool call.
Assistant: San Francisco is currently foggy and 58°F, while Oakland is sunny and 72°F.

You:

I'm hoping that just about covers everything but lemme know if there are other areas I should make sure address.

…essage duplication When resuming a streaming run from RunState, the server conversation tracker was not being primed with previously sent model responses. This caused `prepare_input` to treat all previously generated items as unsent and resubmit them to the server, breaking conversation threading. **Issue**: Missing `track_server_items` call in streaming resumption path **Fix**: Added server conversation tracker priming logic in `_start_streaming` method (lines 1076-1079) to match the non-streaming path implementation (lines 553-556). The fix iterates through `run_state._model_responses` and calls `track_server_items(response)` to mark them as already sent to the server. **Impact**: Resolves message duplication when resuming interrupted streaming runs, ensuring proper conversation threading with server-side sessions. Fixes code review feedback from PR openai#2021 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

mjschock · 2025-11-06T18:48:09Z

Thanks @seratch for adding it to the 0.6.x milestone! I'll make sure to address any feedback.

mjschock · 2025-11-08T19:52:35Z

Just a heads up. I noticed there are some discrepancies with the serialized RunState when interrupting in Python and resuming in TypeScript and visa versa. I'm addressing those this weekend and will push the update.

…essage duplication When resuming a streaming run from RunState, the server conversation tracker was not being primed with previously sent model responses. This caused `prepare_input` to treat all previously generated items as unsent and resubmit them to the server, breaking conversation threading. **Issue**: Missing `track_server_items` call in streaming resumption path **Fix**: Added server conversation tracker priming logic in `_start_streaming` method (lines 1076-1079) to match the non-streaming path implementation (lines 553-556). The fix iterates through `run_state._model_responses` and calls `track_server_items(response)` to mark them as already sent to the server. **Impact**: Resolves message duplication when resuming interrupted streaming runs, ensuring proper conversation threading with server-side sessions. Fixes code review feedback from PR openai#2021 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

seratch · 2025-11-16T03:27:40Z

@codex Can you do reviews on the code changes in this PR with fresh eyes? Please focus on the potential issues for existing apps and edge case scenarios rather than general feedback.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-11-16T03:31:21Z

src/agents/run.py


            await AgentRunner._save_result_to_session(session, starting_input, [])


Avoid duplicating session history when resuming streaming runs

When Runner.run_streamed is resumed from a RunState, _start_streaming still calls _prepare_input_with_session and _save_result_to_session unconditionally (streamed_result.input is recomputed and the original prompt is saved again). In the non‑streaming path these calls are skipped when a RunState is supplied so that the session history isn’t duplicated. Re‑preparing and saving here means a resumed streaming run writes the same user message back into the session and the next turn will see duplicate user prompts, breaking conversations for apps that rely on Session or session_input_callback. Consider gating these calls with run_state is None as the synchronous path does.

Useful? React with 👍 / 👎.

👍 Fixed in 466c306

…essage duplication When resuming a streaming run from RunState, the server conversation tracker was not being primed with previously sent model responses. This caused `prepare_input` to treat all previously generated items as unsent and resubmit them to the server, breaking conversation threading. **Issue**: Missing `track_server_items` call in streaming resumption path **Fix**: Added server conversation tracker priming logic in `_start_streaming` method (lines 1076-1079) to match the non-streaming path implementation (lines 553-556). The fix iterates through `run_state._model_responses` and calls `track_server_items(response)` to mark them as already sent to the server. **Impact**: Resolves message duplication when resuming interrupted streaming runs, ensuring proper conversation threading with server-side sessions. Fixes code review feedback from PR openai#2021 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

…gent usage and saving tool outputs to session

…fresh_run_exercises_persistence

…debug

…tchAction

mjschock · 2025-12-10T00:29:53Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-12-10T00:33:59Z

src/agents/result.py

+        # Create a RunState from the current result
+        original_input_for_state = getattr(self, "_original_input", None)
+        state = RunState(
+            context=self.context_wrapper,
+            original_input=original_input_for_state
+            if original_input_for_state is not None
+            else self.input,
+            starting_agent=self.last_agent,
+            max_turns=self.max_turns,
+        )
+
+        # Populate the state with data from the result
+        state._generated_items = self.new_items
+        state._model_responses = self.raw_responses
+        state._input_guardrail_results = self.input_guardrail_results
+        state._output_guardrail_results = self.output_guardrail_results
+        state._last_processed_response = self._last_processed_response
+        state._current_turn_persisted_item_count = self._current_turn_persisted_item_count
+        state.set_tool_use_tracker_snapshot(self._tool_use_tracker_snapshot)


Preserve turn count when serializing RunResult state

When converting a non‑streaming RunResult into a RunState, the turn counter is never copied into the state. The new to_state() builds a fresh RunState and populates items and responses, but leaves _current_turn at its default zero. The resume path in run.py initializes current_turn from run_state._current_turn, so resuming after an interruption always starts from turn 0, effectively resetting max‑turns enforcement (e.g., max_turns=1 can be exceeded by pausing after turn 1 and resuming) and misreporting how many turns have already run. The serialized state should carry forward the current turn so resumes honor the original limit.

Useful? React with 👍 / 👎.

Fixed in d93d046

…Result to RunState

…reserved during state conversion

mjschock · 2025-12-10T01:02:00Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-12-10T01:04:54Z

src/agents/run.py

+                (
+                    len(run_state._generated_items)
+                    if run_state._generated_items
+                    else 0
+                    if run_state._current_turn_persisted_item_count == 0


Preserve persisted-item counter when resuming streamed runs

When constructing RunResultStreaming from a RunState, _current_turn_persisted_item_count is always reset to len(run_state._generated_items) whenever any generated items exist (len(run_state._generated_items) if run_state._generated_items else ...). That overrides the persisted count saved in the state, even for Python-to-Python resumes where _current_turn_persisted_item_count was already set to the number of items actually written before the interruption. If a turn was interrupted mid-persistence (e.g., some outputs not yet saved to the session), resuming will incorrectly assume everything was persisted and skip saving those unsaved items, causing missing history in sessions. The fallback to len() should only apply when the stored counter is zero for cross-language states, not unconditionally whenever generated_items is non-empty.

Useful? React with 👍 / 👎.

Fixed in 3891cec

…rsisted item count during streamed run resumption

… AgentRunner

mjschock · 2025-12-10T01:35:47Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-12-10T01:38:51Z

src/agents/run_state.py

+        call_id = cast(Optional[str], converted.get("call_id") or converted.get("callId"))
+
+        converted["type"] = "function_call_result"
+
+        if not converted.get("name"):
+            converted["name"] = self._lookup_function_name(call_id or "")


Preserve tool output types during run state serialization

When serializing a run state, _convert_output_item_to_protocol unconditionally overwrites every tool output’s type with function_call_result. On restore, _deserialize_items dispatches on this type to choose between FunctionCallOutput, ComputerCallOutput, or LocalShellCallOutput, so computer/shell/apply_patch outputs that were originally computer_call_output/local_shell_call_output are rehydrated as function_call_output (or fail validation), losing the tool-specific payload and breaking resumption for those tools. The serializer should only rewrite function-call outputs or preserve non-function output types.

Useful? React with 👍 / 👎.

Fixed in c218592

…ization

…ll types in RunState

mjschock · 2025-12-10T02:17:19Z

@codex review

chatgpt-codex-connector · 2025-12-10T02:20:56Z

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mjschock mentioned this pull request Nov 1, 2025

Human-In-The-Loop Architecture should be implemented on top priority! #636

Open

mjschock marked this pull request as ready for review November 1, 2025 18:16

chatgpt-codex-connector bot reviewed Nov 1, 2025

View reviewed changes

seratch added enhancement New feature or request feature:core labels Nov 4, 2025

mjschock force-pushed the main branch from b262f66 to 3d52667 Compare November 4, 2025 17:47

mjschock force-pushed the main branch from 3d52667 to f30310e Compare November 4, 2025 17:55

mjschock force-pushed the main branch from f30310e to 110dcbd Compare November 5, 2025 02:15

seratch added this to the 0.6.x milestone Nov 5, 2025

mjschock force-pushed the main branch from 110dcbd to b38b2f7 Compare November 6, 2025 18:45

mjschock force-pushed the main branch from 2bc0ce0 to 9f37210 Compare November 14, 2025 02:12

mjschock force-pushed the main branch from 9f37210 to 1947718 Compare November 14, 2025 16:54

mjschock force-pushed the main branch from 1947718 to def65ac Compare November 16, 2025 01:42

chatgpt-codex-connector bot reviewed Nov 16, 2025

View reviewed changes

mjschock force-pushed the main branch from 66df149 to 9ca00a8 Compare November 17, 2025 23:43

seratch removed this from the 0.6.x milestone Nov 18, 2025

mjschock added 15 commits December 9, 2025 15:56

fix: addressing edge cases when resuming

1a14f63

fix: addressing edge cases when resuming (continued)

053b5ac

fix: addressing rebase issues

8f48c12

fix: improving parity with openai-agent-js hitl functionality

39c3a16

fix: bring coverage back up, addressing edge cases

baf719b

fix: cleanup

6aeb949

fix: rename summary_message back to assistant_message

78803a7

fix: enhance agent state management during resume, ensuring correct a…

1f81189

…gent usage and saving tool outputs to session

fix: finish up human-in-the-loop port

eed651c

fix: add auto_previous_response_id parameter to test_start_streaming_…

d8fd65b

…fresh_run_exercises_persistence

fix: typing updates to pass make old_version_tests

6a9a635

fix: remove dead code and add failing hitl error scenarios

fdc1bbb

fix: address failing hitl error scenarios

cad0464

fix: change logging level for _ServerConversationTracker creation to …

1f6b051

…debug

fix: pass context_wrapper to _coerce_apply_patch_operation in ApplyPa…

af56baf

…tchAction

mjschock force-pushed the main branch from bf3fb94 to af56baf Compare December 10, 2025 00:27

chatgpt-codex-connector bot reviewed Dec 10, 2025

View reviewed changes

mjschock added 2 commits December 9, 2025 16:38

fix: add test to ensure current turn is preserved when converting Run…

a73208e

…Result to RunState

fix: update RunResult to track current turn number and ensure it is p…

d93d046

…reserved during state conversion

chatgpt-codex-connector bot reviewed Dec 10, 2025

View reviewed changes

mjschock added 2 commits December 9, 2025 17:11

fix: add tests to ensure ToolApprovalItem hashability and preserve pe…

8e503d6

…rsisted item count during streamed run resumption

refactor: simplify condition for current turn persisted item count in…

3891cec

… AgentRunner

chatgpt-codex-connector bot reviewed Dec 10, 2025

View reviewed changes

mjschock added 2 commits December 9, 2025 18:01

test: add tests to preserve tool output types during run state serial…

57b700b

…ization

fix: enhance output item conversion logic to preserve non-function ca…

c218592

…ll types in RunState


		await AgentRunner._save_result_to_session(session, starting_input, [])

Human-in-the-Loop Implementation #2021

Are you sure you want to change the base?

Human-in-the-Loop Implementation #2021

Uh oh!

Conversation

mjschock commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

mjschock Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

mjschock Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

seratch commented Nov 4, 2025

Uh oh!

mjschock commented Nov 4, 2025

Uh oh!

mjschock commented Nov 6, 2025

Uh oh!

mjschock commented Nov 8, 2025

Uh oh!

seratch commented Nov 16, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

mjschock Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

mjschock commented Dec 10, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

mjschock Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

mjschock commented Dec 10, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

mjschock Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

mjschock commented Dec 10, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

mjschock Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

mjschock commented Nov 1, 2025 •

edited

Loading