Skip to content

Commit 65b08a4

Browse files
lievanlievan
and
lievan
authored
chore(llmobs): span linking for oai agents sdk (#13072)
Add span linking between tool & llm spans for the openai agents sdk. We use the core dispatch api since span linking requires cross-integration communication in the case where someone selects "chat completions" as the llm api to use for the agents sdk. Signals are dispatched - when LLM spans finish (chat completions api) in the oai integration - when LLM spans finish (responses api) in the agents sdk integration - when tool calls/handoffs finish in the agents sdk integration `ToolCallTracker` in `ddtrace.llmobs._utils` contains the functions that handles these signals to add span links. ### Links created **[LLM output -> tool input]** for the case where an LLM span chooses a tool and that tool is later executed via the agents sdk. We do this by mapping the tool name & arguments to it's tool id. When the tool call is triggered, we have access to it's name and arguments. From there, we can look up it's tool id and the LLM span that is used to generate that argument. We pop the tool name/arg from the lookup dictionary after it's used. **[Tool output -> LLM input]** for the case where a tool's output is fed back into a later LLM call, either in the same agent or another agent. We can tell this since the tool_id is present in the LLM's input messages. We then use this tool id to lookup the tool span. So the general lifecycle is: 1. An llm chooses a tool. A save the tool id, tool name, and tool arguments and correlate it with the LLM span 2. The tool is run. - We look at the argument and name of the tool and use it to look up the LLM span that chose this tool. We then delete the name/arg from the lookup dict. We then - We save the span/trace id of the tool and correlate it with the tool_id 4. The tool output is used as input for an LLM span. We have access to the tool id here, and lookup the span/trace id of the tool to link it to the LLM span #### A note on handoffs Hand-offs are implemented as tool calls in the agents SDK, so the span linking logic is largely the same. Two notes - there are no arguments for handoffs, so we use a dummy default lookup key for [LLM output -> tool input] linking step - the tool_id representing a handoff may be continually used as input for an LLM call since the list of messages is kept and added to across agent runs. However, it realistically should only be linked to the first LLM call of the agent being handed-off to since. Unlike other tool calls, a handoff is only an orchestration step and it doesn't provide extra context actually "used" in downstream llm generations - There are two brittle parts of hand-off linking that relies on some implementation details internal to the agents sdk - We are re-constructing the raw tool name used for hand-offs `handoff_tool_name = "transfer_to_{}".format("_".join(oai_span.to_agent.split(" ")).lower())` - We are using `{}` as the placeholder for the hand-off tool call argument. This is what's generated by the LLM when it chooses a handoff. We can improve on this by inferring these values when an LLM chooses a handoff tool, but this requires a bit more exploring ## Checklist - [ ] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [ ] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: lievan <[email protected]>
1 parent e0587dc commit 65b08a4

File tree

10 files changed

+672
-42
lines changed

10 files changed

+672
-42
lines changed

ddtrace/llmobs/_constants.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,3 +69,11 @@
6969
NAME = "_ml_obs.name"
7070
DECORATOR = "_ml_obs.decorator"
7171
INTEGRATION = "_ml_obs.integration"
72+
73+
DISPATCH_ON_TOOL_CALL_OUTPUT_USED = "on_tool_call_output_used"
74+
DISPATCH_ON_LLM_TOOL_CHOICE = "on_llm_tool_choice"
75+
DISPATCH_ON_TOOL_CALL = "on_tool_call"
76+
77+
# Tool call arguments are used to lookup the associated tool call info.
78+
# When there are no tool call args, we use this as a place-holder lookup key
79+
OAI_HANDOFF_TOOL_ARG = "{}"

ddtrace/llmobs/_integrations/openai.py

Lines changed: 26 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,12 @@
55
from typing import Optional
66
from typing import Tuple
77

8+
from ddtrace.internal import core
89
from ddtrace.internal.constants import COMPONENT
10+
from ddtrace.internal.utils.formats import format_trace_id
911
from ddtrace.internal.utils.version import parse_version
12+
from ddtrace.llmobs._constants import DISPATCH_ON_LLM_TOOL_CHOICE
13+
from ddtrace.llmobs._constants import DISPATCH_ON_TOOL_CALL_OUTPUT_USED
1014
from ddtrace.llmobs._constants import INPUT_DOCUMENTS
1115
from ddtrace.llmobs._constants import INPUT_MESSAGES
1216
from ddtrace.llmobs._constants import INPUT_TOKENS_METRIC_KEY
@@ -158,6 +162,9 @@ def _llmobs_set_meta_tags_from_chat(span: Span, kwargs: Dict[str, Any], messages
158162
"""Extract prompt/response tags from a chat completion and set them as temporary "_ml_obs.meta.*" tags."""
159163
input_messages = []
160164
for m in kwargs.get("messages", []):
165+
tool_call_id = m.get("tool_call_id")
166+
if tool_call_id:
167+
core.dispatch(DISPATCH_ON_TOOL_CALL_OUTPUT_USED, (tool_call_id, span))
161168
input_messages.append({"content": str(_get_attr(m, "content", "")), "role": str(_get_attr(m, "role", ""))})
162169
parameters = {k: v for k, v in kwargs.items() if k not in ("model", "messages", "tools", "functions")}
163170
span._set_ctx_items({INPUT_MESSAGES: input_messages, METADATA: parameters})
@@ -199,13 +206,28 @@ def _llmobs_set_meta_tags_from_chat(span: Span, kwargs: Dict[str, Any], messages
199206
continue
200207
tool_calls = _get_attr(choice_message, "tool_calls", []) or []
201208
for tool_call in tool_calls:
209+
tool_args = getattr(tool_call.function, "arguments", "")
210+
tool_name = getattr(tool_call.function, "name", "")
211+
tool_id = getattr(tool_call, "id", "")
202212
tool_call_info = {
203-
"name": getattr(tool_call.function, "name", ""),
204-
"arguments": json.loads(getattr(tool_call.function, "arguments", "")),
205-
"tool_id": getattr(tool_call, "id", ""),
206-
"type": getattr(tool_call, "type", ""),
213+
"name": tool_name,
214+
"arguments": json.loads(tool_args),
215+
"tool_id": tool_id,
216+
"type": "function",
207217
}
208218
tool_calls_info.append(tool_call_info)
219+
core.dispatch(
220+
DISPATCH_ON_LLM_TOOL_CHOICE,
221+
(
222+
tool_id,
223+
tool_name,
224+
tool_args,
225+
{
226+
"trace_id": format_trace_id(span.trace_id),
227+
"span_id": str(span.span_id),
228+
},
229+
),
230+
)
209231
if tool_calls_info:
210232
output_messages.append({"content": content, "role": role, "tool_calls": tool_calls_info})
211233
continue

ddtrace/llmobs/_integrations/openai_agents.py

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,20 @@
66
from typing import Union
77
import weakref
88

9+
from ddtrace.internal import core
910
from ddtrace.internal.logger import get_logger
1011
from ddtrace.internal.utils.formats import format_trace_id
12+
from ddtrace.llmobs._constants import DISPATCH_ON_LLM_TOOL_CHOICE
13+
from ddtrace.llmobs._constants import DISPATCH_ON_TOOL_CALL
14+
from ddtrace.llmobs._constants import DISPATCH_ON_TOOL_CALL_OUTPUT_USED
1115
from ddtrace.llmobs._constants import INPUT_MESSAGES
1216
from ddtrace.llmobs._constants import INPUT_VALUE
1317
from ddtrace.llmobs._constants import METADATA
1418
from ddtrace.llmobs._constants import METRICS
1519
from ddtrace.llmobs._constants import MODEL_NAME
1620
from ddtrace.llmobs._constants import MODEL_PROVIDER
1721
from ddtrace.llmobs._constants import NAME
22+
from ddtrace.llmobs._constants import OAI_HANDOFF_TOOL_ARG
1823
from ddtrace.llmobs._constants import OUTPUT_MESSAGES
1924
from ddtrace.llmobs._constants import OUTPUT_VALUE
2025
from ddtrace.llmobs._constants import PARENT_ID_KEY
@@ -201,11 +206,30 @@ def _llmobs_set_response_attributes(self, span: Span, oai_span: OaiSpanAdapter)
201206
span._set_ctx_item(MODEL_PROVIDER, "openai")
202207

203208
if oai_span.input:
204-
messages = oai_span.llmobs_input_messages()
209+
messages, tool_call_ids = oai_span.llmobs_input_messages()
210+
211+
for tool_call_id in tool_call_ids:
212+
core.dispatch(DISPATCH_ON_TOOL_CALL_OUTPUT_USED, (tool_call_id, span))
213+
205214
span._set_ctx_item(INPUT_MESSAGES, messages)
206215

207216
if oai_span.response and oai_span.response.output:
208-
messages = oai_span.llmobs_output_messages()
217+
messages, tool_call_outputs = oai_span.llmobs_output_messages()
218+
219+
for tool_id, tool_name, tool_args in tool_call_outputs:
220+
core.dispatch(
221+
DISPATCH_ON_LLM_TOOL_CHOICE,
222+
(
223+
tool_id,
224+
tool_name,
225+
tool_args,
226+
{
227+
"trace_id": format_trace_id(span.trace_id),
228+
"span_id": str(span.span_id),
229+
},
230+
),
231+
)
232+
209233
span._set_ctx_item(OUTPUT_MESSAGES, messages)
210234

211235
metadata = oai_span.llmobs_metadata
@@ -219,12 +243,20 @@ def _llmobs_set_response_attributes(self, span: Span, oai_span: OaiSpanAdapter)
219243
def _llmobs_set_tool_attributes(self, span: Span, oai_span: OaiSpanAdapter) -> None:
220244
span._set_ctx_item(INPUT_VALUE, oai_span.input or "")
221245
span._set_ctx_item(OUTPUT_VALUE, oai_span.output or "")
246+
core.dispatch(
247+
DISPATCH_ON_TOOL_CALL,
248+
(oai_span.name, oai_span.input, "function", span),
249+
)
222250

223251
def _llmobs_set_handoff_attributes(self, span: Span, oai_span: OaiSpanAdapter) -> None:
224252
handoff_tool_name = "transfer_to_{}".format("_".join(oai_span.to_agent.split(" ")).lower())
225253
span.name = handoff_tool_name
226254
span._set_ctx_item("input_value", oai_span.from_agent or "")
227255
span._set_ctx_item("output_value", oai_span.to_agent or "")
256+
core.dispatch(
257+
DISPATCH_ON_TOOL_CALL,
258+
(handoff_tool_name, OAI_HANDOFF_TOOL_ARG, "handoff", span),
259+
)
228260

229261
def _llmobs_set_agent_attributes(self, span: Span, oai_span: OaiSpanAdapter) -> None:
230262
if oai_span.llmobs_metadata:

ddtrace/llmobs/_integrations/utils.py

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313

1414
from ddtrace.internal.logger import get_logger
1515
from ddtrace.llmobs._constants import INPUT_TOKENS_METRIC_KEY
16+
from ddtrace.llmobs._constants import OAI_HANDOFF_TOOL_ARG
1617
from ddtrace.llmobs._constants import OUTPUT_TOKENS_METRIC_KEY
1718
from ddtrace.llmobs._constants import TOTAL_TOKENS_METRIC_KEY
1819
from ddtrace.llmobs._utils import _get_attr
@@ -486,19 +487,22 @@ def get_error_data(self) -> Optional[Dict[str, Any]]:
486487
return None
487488
return self.error.get("data")
488489

489-
def llmobs_input_messages(self) -> List[Dict[str, Any]]:
490+
def llmobs_input_messages(self) -> Tuple[List[Dict[str, Any]], List[str]]:
490491
"""Returns processed input messages for LLM Obs LLM spans.
491492
492493
Returns:
493-
A list of processed messages
494+
- A list of processed messages
495+
- A list of tool call IDs for span linking purposes
494496
"""
495497
messages = self.input
496498
processed: List[Dict[str, Any]] = []
499+
tool_call_ids: List[str] = []
500+
497501
if not messages:
498-
return processed
502+
return processed, tool_call_ids
499503

500504
if isinstance(messages, str):
501-
return [{"content": messages, "role": "user"}]
505+
return [{"content": messages, "role": "user"}], tool_call_ids
502506

503507
for item in messages:
504508
processed_item: Dict[str, Union[str, List[Dict[str, str]]]] = {}
@@ -541,6 +545,7 @@ def llmobs_input_messages(self) -> List[Dict[str, Any]]:
541545
output = json.loads(output)
542546
except json.JSONDecodeError:
543547
output = {"output": output}
548+
tool_call_ids.append(item["call_id"])
544549
processed_item["tool_calls"] = [
545550
{
546551
"tool_id": item["call_id"],
@@ -550,21 +555,23 @@ def llmobs_input_messages(self) -> List[Dict[str, Any]]:
550555
if processed_item:
551556
processed.append(processed_item)
552557

553-
return processed
558+
return processed, tool_call_ids
554559

555-
def llmobs_output_messages(self) -> List[Dict[str, Any]]:
560+
def llmobs_output_messages(self) -> Tuple[List[Dict[str, Any]], List[Tuple[str, str, str]]]:
556561
"""Returns processed output messages for LLM Obs LLM spans.
557562
558563
Returns:
559-
A list of processed messages
564+
- A list of processed messages
565+
- A list of tool call data (name, id, args) for span linking purposes
560566
"""
561567
if not self.response or not self.response.output:
562-
return []
568+
return [], []
563569

564570
messages: List[Any] = self.response.output
565571
processed: List[Dict[str, Any]] = []
572+
tool_call_outputs: List[Tuple[str, str, str]] = []
566573
if not messages:
567-
return processed
574+
return processed, tool_call_outputs
568575

569576
if not isinstance(messages, list):
570577
messages = [messages]
@@ -581,6 +588,13 @@ def llmobs_output_messages(self) -> List[Dict[str, Any]]:
581588
message.update({"role": getattr(item, "role", "assistant"), "content": text})
582589
# Handle tool calls
583590
elif hasattr(item, "call_id") and hasattr(item, "arguments"):
591+
tool_call_outputs.append(
592+
(
593+
item.call_id,
594+
getattr(item, "name", ""),
595+
item.arguments if item.arguments else OAI_HANDOFF_TOOL_ARG,
596+
)
597+
)
584598
message.update(
585599
{
586600
"tool_calls": [
@@ -599,7 +613,7 @@ def llmobs_output_messages(self) -> List[Dict[str, Any]]:
599613
message.update({"content": str(item)})
600614
processed.append(message)
601615

602-
return processed
616+
return processed, tool_call_outputs
603617

604618
def llmobs_trace_input(self) -> Optional[str]:
605619
"""Converts Response span data to an input value for top level trace.
@@ -611,7 +625,7 @@ def llmobs_trace_input(self) -> Optional[str]:
611625
return None
612626

613627
try:
614-
messages = self.llmobs_input_messages()
628+
messages, _ = self.llmobs_input_messages()
615629
if messages and len(messages) > 0:
616630
return messages[0].get("content")
617631
except (AttributeError, IndexError):

ddtrace/llmobs/_llmobs.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,9 @@
3636
from ddtrace.llmobs._constants import AGENTLESS_BASE_URL
3737
from ddtrace.llmobs._constants import ANNOTATIONS_CONTEXT_ID
3838
from ddtrace.llmobs._constants import DECORATOR
39+
from ddtrace.llmobs._constants import DISPATCH_ON_LLM_TOOL_CHOICE
40+
from ddtrace.llmobs._constants import DISPATCH_ON_TOOL_CALL
41+
from ddtrace.llmobs._constants import DISPATCH_ON_TOOL_CALL_OUTPUT_USED
3942
from ddtrace.llmobs._constants import INPUT_DOCUMENTS
4043
from ddtrace.llmobs._constants import INPUT_MESSAGES
4144
from ddtrace.llmobs._constants import INPUT_PROMPT
@@ -61,6 +64,7 @@
6164
from ddtrace.llmobs._evaluators.runner import EvaluatorRunner
6265
from ddtrace.llmobs._utils import AnnotationContext
6366
from ddtrace.llmobs._utils import LinkTracker
67+
from ddtrace.llmobs._utils import ToolCallTracker
6468
from ddtrace.llmobs._utils import _get_ml_app
6569
from ddtrace.llmobs._utils import _get_session_id
6670
from ddtrace.llmobs._utils import _get_span_name
@@ -122,6 +126,8 @@ def __init__(self, tracer=None):
122126
self._annotations = []
123127
self._annotation_context_lock = forksafe.RLock()
124128

129+
self._tool_call_tracker = ToolCallTracker()
130+
125131
def _on_span_start(self, span):
126132
if self.enabled and span.span_type == SpanTypes.LLM:
127133
self._activate_llmobs_span(span)
@@ -303,6 +309,10 @@ def _stop_service(self) -> None:
303309
core.reset_listeners("threading.submit", self._current_trace_context)
304310
core.reset_listeners("threading.execution", self._llmobs_context_provider.activate)
305311

312+
core.reset_listeners(DISPATCH_ON_LLM_TOOL_CHOICE, self._tool_call_tracker.on_llm_tool_choice)
313+
core.reset_listeners(DISPATCH_ON_TOOL_CALL, self._tool_call_tracker.on_tool_call)
314+
core.reset_listeners(DISPATCH_ON_TOOL_CALL_OUTPUT_USED, self._tool_call_tracker.on_tool_call_output_used)
315+
306316
forksafe.unregister(self._child_after_fork)
307317

308318
@classmethod
@@ -399,6 +409,10 @@ def enable(
399409
core.on("threading.submit", cls._instance._current_trace_context, "llmobs_ctx")
400410
core.on("threading.execution", cls._instance._llmobs_context_provider.activate)
401411

412+
core.on(DISPATCH_ON_LLM_TOOL_CHOICE, cls._instance._tool_call_tracker.on_llm_tool_choice)
413+
core.on(DISPATCH_ON_TOOL_CALL, cls._instance._tool_call_tracker.on_tool_call)
414+
core.on(DISPATCH_ON_TOOL_CALL_OUTPUT_USED, cls._instance._tool_call_tracker.on_tool_call_output_used)
415+
402416
atexit.register(cls.disable)
403417
telemetry_writer.product_activated(TELEMETRY_APM_PRODUCT.LLMOBS, True)
404418

0 commit comments

Comments
 (0)