Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(llmobs): span linking for oai agents sdk #13072

Merged
merged 7 commits into from
Apr 8, 2025

Conversation

lievan
Copy link
Contributor

@lievan lievan commented Apr 4, 2025

Add span linking between tool & llm spans for the openai agents sdk.

We use the core dispatch api since span linking requires cross-integration communication in the case where someone selects "chat completions" as the llm api to use for the agents sdk.

Signals are dispatched

  • when LLM spans finish (chat completions api) in the oai integration
  • when LLM spans finish (responses api) in the agents sdk integration
  • when tool calls/handoffs finish in the agents sdk integration

ToolCallTracker in ddtrace.llmobs._utils contains the functions that handles these signals to add span links.

Links created

[LLM output -> tool input] for the case where an LLM span chooses a tool and that tool is later executed via the agents sdk. We do this by mapping the tool name & arguments to it's tool id. When the tool call is triggered, we have access to it's name and arguments. From there, we can look up it's tool id and the LLM span that is used to generate that argument. We pop the tool name/arg from the lookup dictionary after it's used.

[Tool output -> LLM input] for the case where a tool's output is fed back into a later LLM call, either in the same agent or another agent. We can tell this since the tool_id is present in the LLM's input messages. We then use this tool id to lookup the tool span.

So the general lifecycle is:

  1. An llm chooses a tool. A save the tool id, tool name, and tool arguments and correlate it with the LLM span
  2. The tool is run.
    • We look at the argument and name of the tool and use it to look up the LLM span that chose this tool. We then delete the name/arg from the lookup dict. We then
    • We save the span/trace id of the tool and correlate it with the tool_id
  3. The tool output is used as input for an LLM span. We have access to the tool id here, and lookup the span/trace id of the tool to link it to the LLM span

A note on handoffs

Hand-offs are implemented as tool calls in the agents SDK, so the span linking logic is largely the same. Two notes

  • there are no arguments for handoffs, so we use a dummy default lookup key for [LLM output -> tool input] linking step
  • the tool_id representing a handoff may be continually used as input for an LLM call since the list of messages is kept and added to across agent runs. However, it realistically should only be linked to the first LLM call of the agent being handed-off to since. Unlike other tool calls, a handoff is only an orchestration step and it doesn't provide extra context actually "used" in downstream llm generations
  • There are two brittle parts of hand-off linking that relies on some implementation details internal to the agents sdk
    • We are re-constructing the raw tool name used for hand-offs
      handoff_tool_name = "transfer_to_{}".format("_".join(oai_span.to_agent.split(" ")).lower())
    • We are using {} as the placeholder for the hand-off tool call argument. This is what's generated by the LLM when it chooses a handoff.

We can improve on this by inferring these values when an LLM chooses a handoff tool, but this requires a bit more exploring

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

@lievan lievan marked this pull request as ready for review April 4, 2025 14:49
@lievan lievan requested review from a team as code owners April 4, 2025 14:49
@lievan lievan requested review from erikayasuda and quinna-h April 4, 2025 14:49
Copy link
Contributor

github-actions bot commented Apr 4, 2025

CODEOWNERS have been resolved as:

tests/contrib/openai_agents/cassettes/test_multiple_agent_handoffs_with_chat_completions.yaml  @DataDog/apm-core-python @DataDog/apm-idm-python
ddtrace/llmobs/_constants.py                                            @DataDog/ml-observability
ddtrace/llmobs/_integrations/openai.py                                  @DataDog/ml-observability
ddtrace/llmobs/_integrations/openai_agents.py                           @DataDog/ml-observability
ddtrace/llmobs/_integrations/utils.py                                   @DataDog/ml-observability
ddtrace/llmobs/_llmobs.py                                               @DataDog/ml-observability
ddtrace/llmobs/_utils.py                                                @DataDog/ml-observability
tests/contrib/openai_agents/conftest.py                                 @DataDog/apm-core-python @DataDog/apm-idm-python
tests/contrib/openai_agents/test_openai_agents_llmobs.py                @DataDog/apm-core-python @DataDog/apm-idm-python
tests/llmobs/_utils.py                                                  @DataDog/ml-observability

@lievan lievan changed the title chore(llmobs): span linking for oai agents sdk chore(llmobs): span linking for oai agents sdk Apr 4, 2025
Copy link
Contributor

github-actions bot commented Apr 4, 2025

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 228 ± 2 ms.

The average import time from base is: 232 ± 4 ms.

The import time difference between this PR and base is: -3.9 ± 0.1 ms.

Import time breakdown

The following import paths have shrunk:

ddtrace.auto 2.103 ms (0.92%)
ddtrace.bootstrap.sitecustomize 1.434 ms (0.63%)
ddtrace.bootstrap.preload 1.434 ms (0.63%)
ddtrace.internal.products 1.434 ms (0.63%)
ddtrace.internal.remoteconfig.client 0.662 ms (0.29%)
ddtrace 0.669 ms (0.29%)

@pr-commenter
Copy link

pr-commenter bot commented Apr 4, 2025

Benchmarks

Benchmark execution time: 2025-04-04 15:29:53

Comparing candidate commit 210c362 in PR branch evan.li/span-linking-agents with baseline commit 534fa86 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 498 metrics, 2 unstable metrics.

@lievan lievan changed the base branch from main to evan.li/oai-agents April 4, 2025 21:58
Copy link
Contributor

@sabrenner sabrenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general logic lgtm, and the examples you provided look really nice! just a couple small style things nits and one question, will approve after it's answered!

@lievan lievan merged commit 65b08a4 into evan.li/oai-agents Apr 8, 2025
41 of 47 checks passed
@lievan lievan deleted the evan.li/span-linking-agents branch April 8, 2025 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants