Skip to content

fix(openai): include responses instructions in captured prompt#1565

Open
D-Joey-G wants to merge 1 commit intolangfuse:mainfrom
D-Joey-G:fix/openai-responses-instructions
Open

fix(openai): include responses instructions in captured prompt#1565
D-Joey-G wants to merge 1 commit intolangfuse:mainfrom
D-Joey-G:fix/openai-responses-instructions

Conversation

@D-Joey-G
Copy link

@D-Joey-G D-Joey-G commented Mar 13, 2026

Corrects behaviour such that arguments passed to the instructions parameter of the OpenAI responses API are treated like 'role': 'system' arguments in the chat.completions API. (see langfuse/langfuse#9775 or langfuse/langfuse#10143 or langfuse/langfuse#8763)

Adds tests to validate behaviour.

Disclaimer: Experimental PR review

Greptile Summary

This PR fixes the OpenAI Responses API integration so that the instructions parameter is captured in the Langfuse prompt alongside input, mirroring how role: system messages are treated in the Chat Completions API. It adds the _extract_responses_prompt helper and updates the existing streaming integration test and adds new unit tests.

Key changes:

  • New _extract_responses_prompt function merges instructions + input into a unified [{role: system}, {role: user}] message list (or a dict fallback) before logging to Langfuse
  • The function correctly normalises OpenAI SDK NotGiven sentinel values to None
  • _get_langfuse_data_from_kwargs now delegates to _extract_responses_prompt instead of reading input directly for the Responses/AsyncResponses objects
  • New parametrised unit test file (test_openai_prompt_extraction.py) covers all documented input combinations
  • The streaming integration test is updated to assert the new merged prompt format

Issues noted:

  • When input is a list that already contains a {"role": "system", ...} entry and instructions is also provided, the captured Langfuse prompt will have two system messages — the new one prepended unconditionally. This could mislead prompt replay or evaluation tools.
  • No non-streaming integration test exercises the instructions + input path; only the streaming case is covered at the integration level.

Confidence Score: 4/5

  • This PR is safe to merge; the fix is correct for all common usage patterns and doesn't alter what is sent to OpenAI.
  • The core logic is sound and well-tested for all documented input shapes. The main concern is the edge case where a caller already includes a system message inside the input list — instructions will be prepended unconditionally, producing duplicate system entries in the captured Langfuse prompt. This doesn't affect the actual API call (only what Langfuse records) and is unlikely in practice, but it's a correctness issue worth addressing. A missing non-streaming integration test is a minor coverage gap. No regressions are expected for existing callers.
  • Pay close attention to langfuse/openai.py lines 271-272 — the unconditional prepend of the system message when input is a list.

Important Files Changed

Filename Overview
langfuse/openai.py Adds _extract_responses_prompt function that merges instructions and input into a unified prompt structure for the Responses API; logic is correct but has an untested edge case where a list input already containing a system message would get a duplicate prepended system entry.
tests/test_openai.py Updates test_response_api_streaming to assert the new merged [system, user] input format; no new non-streaming integration test covers the instructions + input path.
tests/test_openai_prompt_extraction.py New unit test file covering all documented cases for _extract_responses_prompt, including NOT_GIVEN sentinel values and string/list inputs; well-structured and complete for the happy paths.

Sequence Diagram

sequenceDiagram
    participant User
    participant LangfuseWrapper as Langfuse Wrapper (_wrap)
    participant ExtractPrompt as _extract_responses_prompt
    participant Langfuse
    participant OpenAI as OpenAI Responses API

    User->>LangfuseWrapper: responses.create(model, instructions, input, ...)
    LangfuseWrapper->>ExtractPrompt: kwargs (instructions, input)
    alt instructions is None
        ExtractPrompt-->>LangfuseWrapper: input as-is
    else input is str
        ExtractPrompt-->>LangfuseWrapper: [{role:system, content:instructions}, {role:user, content:input}]
    else input is list
        ExtractPrompt-->>LangfuseWrapper: [{role:system, content:instructions}, ...input]
    else input is None
        ExtractPrompt-->>LangfuseWrapper: {instructions: instructions}
    end
    LangfuseWrapper->>Langfuse: start_observation(input=merged_prompt)
    LangfuseWrapper->>OpenAI: original call (unchanged kwargs)
    OpenAI-->>LangfuseWrapper: response
    LangfuseWrapper->>Langfuse: update generation (output, usage, model)
    LangfuseWrapper-->>User: response
Loading

Last reviewed commit: 4c8353f

Greptile also left 1 inline comment on this PR.

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

@CLAassistant
Copy link

CLAassistant commented Mar 13, 2026

CLA assistant check
All committers have signed the CLA.

Comment on lines +271 to +272
if isinstance(input_value, list):
return [{"role": "system", "content": instructions}, *input_value]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible duplicate system message in captured prompt

When input_value is a list, instructions is prepended unconditionally as a role: system entry. If the caller already included a {"role": "system", ...} message inside their input list (valid in the Responses API), the captured Langfuse prompt will contain two system-role entries, which can confuse prompt replay or evaluation flows.

Consider checking whether a system message is already present before prepending:

if isinstance(input_value, list):
    already_has_system = any(
        isinstance(m, dict) and m.get("role") == "system"
        for m in input_value
    )
    if already_has_system:
        return input_value
    return [{"role": "system", "content": instructions}, *input_value]

Alternatively, document clearly that instructions will always be surfaced as the first system message in the captured prompt regardless of existing list contents.

Copy link
Author

@D-Joey-G D-Joey-G Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple developer/system messages isn't uncommon, so that concern doesn't seem important.

I believe that it is safe/correct to put the instructions arg first in what we trace, as earlier docs for the Responses API said the parameter inserts a system (or developer) message as the first item in the model's context.

That said, the docs now say:

A system (or developer) message inserted into the model's context.

When using along with previous_response_id, the instructions from a previous response will not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.

That is admittedly less certain on that point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants