examples: add multi-turn tool-call loop for the Responses API#3439
examples: add multi-turn tool-call loop for the Responses API#3439akrishnash wants to merge 2 commits into
Conversation
Every existing example in examples/responses/ shows a single turn —
the model generates a function_call, and the example stops there.
None show what to do next: execute the tool, feed the result back,
and loop until the model produces a final text answer.
This example fills that gap with a minimal, self-contained agent loop:
- Two local tools: get_weather() and calculate()
- Uses previous_response_id to carry conversation state across turns
instead of manually reconstructing the input list each round
- Guards against unbounded loops with MAX_TURNS
- Prints tool invocations so the flow is easy to follow
The complete pattern is:
send message
→ model returns function_call items
→ execute tools locally
→ pass function_call_output items + previous_response_id
→ repeat until model returns plain text
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0b7ed73c02
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def calculate(expression: str) -> str: | ||
| """Safely evaluate a Python arithmetic expression (no builtins, math module available).""" | ||
| try: | ||
| result = eval(expression, {"__builtins__": {}}, vars(math)) # noqa: S307 |
There was a problem hiding this comment.
Replace unsafe eval in calculate
If this example is reused with arbitrary user prompts, the model controls expression, so this eval can run non-arithmetic Python expressions despite the empty __builtins__ sandbox; for example, dunder introspection or resource-exhausting expressions can execute on the host before the error handler returns. Since the docstring presents this as safe, use an AST/operator whitelist or a small arithmetic parser instead of evaluating model-supplied text.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — fixed in 82173e8. Replaced eval() with a whitelisted AST walk in calculate: only arithmetic operators (+ - * / // % **) and a small math subset (sqrt, floor, ceil, abs, pi, e, tau) are permitted; everything else is rejected rather than executed. Attribute access ((1).__class__), out-of-whitelist calls (__import__(...)), and resource bombs (9**9**9, via an exponent cap) now return a structured error. Verified the example's own use case still works (7 * 24 * 3600 → 604800).
Codex review flagged that eval(), even with empty __builtins__, can still run dunder introspection or resource-exhausting expressions on host since the model controls the expression string. Walk the AST and permit only arithmetic operators plus a small math subset; cap exponents. Rejects attribute access, function calls outside the whitelist, and huge powers.
|
This example is quite useful because it shows the full tool-call loop end to end. A small note about where to place per-turn timeout or retry logic would make the pattern even safer to copy into production code. |
Summary
Every existing example in examples/responses/ stops after the first model turn: they show how a function_call is generated but not what happens next. The complete agent pattern (execute tool locally, pass result back, repeat until final text answer) is not demonstrated anywhere.
This PR adds examples/responses/tool_call_loop.py to fill that gap.
What the example shows
How to run
OPENAI_API_KEY=your-key python examples/responses/tool_call_loop.py
The example asks about weather in two cities and a seconds-in-N-weeks calculation, exercising parallel tool calls across two turns before producing a final text answer.