Skip to content

feat: add native OpenTelemetry instrumentation#6304

Open
lucasgomide wants to merge 5 commits into
mainfrom
luzk/otel-instrumentation
Open

feat: add native OpenTelemetry instrumentation#6304
lucasgomide wants to merge 5 commits into
mainfrom
luzk/otel-instrumentation

Conversation

@lucasgomide

@lucasgomide lucasgomide commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Open spans directly on the user's thread so that stdlib log records emitted during hot paths like Crew.kickoff, BaseTool.run, and LLM.call carry the active trace context and correlate with the spans they belong to — a gap the previous metrics-only telemetry could not close. Introduces a crewai.telemetry.otel module exposing operation and follows_from, instruments the execution hot paths, and propagates the active context across every parallel-dispatch site. Depends only on opentelemetry-api so provider and exporter choice stays with the host application per the standard OTel library pattern; without an installed SDK the ProxyTracer keeps everything as a NoOp.


Note

Medium Risk
Touches every major execution path (crew/task/agent/LLM/tools/flows) and changes telemetry global-provider behavior; mitigated by NoOp defaults without an SDK and broad new test coverage.

Overview
Adds native OpenTelemetry spans on the host’s global tracer via new crewai.telemetry.otel (operation, follows_from), wrapping crew kickoff, tasks, agents, flows (including resume/method execution), LLM calls, tools, memory, knowledge, A2A delegation, guardrails, and agent reasoning.

Anonymous CrewAI OTLP telemetry no longer calls set_tracer() — it keeps a private TracerProvider and emits only through _tracer(), so importing or running CrewAI does not take over the process-wide OTel provider. CLI/EventListener stop forcing global tracer installation.

Trace continuity fixes: the event bus re-attaches OTel context when dispatching async handlers via run_coroutine_threadsafe; structured tools run sync callables with contextvars.copy_context() in the executor so nested spans and log correlation survive thread hops.

Flow HITL pauses can opt out of ERROR spans via expected_exceptions; resume opens a dedicated resume flow span (with docs for optional follows_from links). Expanded tests cover span nesting, log correlation, context propagation sites, and no-op behavior without an SDK provider.

Reviewed by Cursor Bugbot for commit 99b87e8. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added OpenTelemetry instrumentation throughout the execution pipeline, including crew kickoff, task runs, agent reasoning, flow phases, LLM calls, tool usage, knowledge queries, and memory operations.
    • Improved trace context propagation across async and thread boundaries for consistent end-to-end observability.
  • Bug Fixes

    • Refined telemetry span behavior for expected/control-flow exceptions so they don’t incorrectly mark spans as errors.
  • Tests

    • Added OpenTelemetry test coverage for span creation, exception-to-status handling, follows-from links, log/trace correlation, and multi-thread/context propagation.
    • Added no-op telemetry tests to ensure behavior remains safe when no tracing provider is configured.

@corridor-security corridor-security Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary: This PR adds native OpenTelemetry tracing around crew, task, agent, tool, LLM, memory, knowledge, A2A, and flow execution, plus context propagation for trace continuity. No exploitable security vulnerabilities were identified in the added code.

Risk: Low risk. The changes introduce observability instrumentation and do not add public endpoints, authentication changes, authorization logic, filesystem access, SQL construction, or new untrusted network/file inputs.

Comment thread lib/crewai/src/crewai/flow/runtime/__init__.py
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds native OpenTelemetry instrumentation to CrewAI by introducing operation() and follows_from() primitives in a new crewai.telemetry.otel module, then wrapping every major execution stage—crew, task, agent, flow, LLM providers, tools, memory, knowledge, guardrail, reasoning, and A2A delegation—in named spans, while propagating OTel context across thread/loop boundaries in the event bus and structured tool executor.

Changes

Native OpenTelemetry Instrumentation

Layer / File(s) Summary
OTel core primitives: operation() and follows_from()
lib/crewai/src/crewai/telemetry/otel.py, lib/crewai/src/crewai/telemetry/__init__.py
Adds _tracer() late-resolving the global provider, operation() context manager with explicit exception handling and new expected_exceptions parameter to exempt control-flow exceptions from error recording, and follows_from() for causal span links. Updates __all__ to export both helpers alongside Telemetry.
Cross-thread OTel context propagation
lib/crewai/src/crewai/events/event_bus.py, lib/crewai/src/crewai/tools/structured_tool.py
Introduces _ctx_run_coro to attach/detach OTel context around async coroutines scheduled onto the background loop thread; applies it to all four dispatch sites in emit() and replay(). Updates ainvoke to propagate contextvars into the executor thread via ctx.run.
Crew and Task execution spans
lib/crewai/src/crewai/crew.py, lib/crewai/src/crewai/task.py
Wraps kickoff()/akickoff() in operation("execute crew", ...) and _execute_core()/_aexecute_core() in operation("execute task", ...) with crew/task identity attributes.
Agent execution spans
lib/crewai/src/crewai/agent/core.py
Wraps execute_task, aexecute_task, kickoff, and kickoff_async in operation("execute agent", {agent.role, agent.id}) spans.
Flow runtime spans
lib/crewai/src/crewai/flow/runtime/__init__.py
Wraps flow resume, kickoff, and per-method execution in "resume flow", "execute flow", and "execute flow method" operations, all with expected_exceptions=(HumanFeedbackPending,).
LLM provider call spans
lib/crewai/src/crewai/llm.py, lib/crewai/src/crewai/llms/providers/...
Adds operation("call llm", {"crewai.llm.model": ...}) alongside llm_call_context() for both sync and async call paths in the base LLM class and all five provider implementations (Anthropic, Azure, Bedrock, Gemini, OpenAI).
Tool, Knowledge, Memory, and A2A delegation spans
lib/crewai/src/crewai/tools/base_tool.py, lib/crewai/src/crewai/knowledge/knowledge.py, lib/crewai/src/crewai/memory/unified_memory.py, lib/crewai/src/crewai/a2a/utils/delegation.py
Wraps BaseTool/Tool run methods in "call tool" spans, Knowledge.query/aquery in "query knowledge" spans, UnifiedMemory.remember/recall in "remember/recall memory" spans, and aexecute_a2a_delegation in "a2a delegate" spans with failure event enrichment.
Guardrail and Reasoning spans
lib/crewai/src/crewai/tasks/llm_guardrail.py, lib/crewai/src/crewai/utilities/reasoning_handler.py
Wraps LLMGuardrail.__call__ in operation("guard llm", ...) and AgentReasoning.handle_agent_reasoning planning call in operation("agent reason", ...).
OTel instrumentation integration tests
lib/crewai/tests/telemetry/test_otel.py
Validates span creation, attributes, error recording, expected_exceptions semantics, follows_from link construction, nested hot-path trace coherence with shared trace_id, log↔trace ID correlation, and context propagation across all cross-thread dispatch patterns.
OTel no-op behavior tests
lib/crewai/tests/telemetry/test_otel_noop.py
Verifies proxy tracer provider default, non-recording spans when no SDK is installed, and clean kickoff execution without raising or replacing the global provider.

Suggested Reviewers

  • lorenzejay
  • greysonlalonde
  • joaomdmoura
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.06% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately describes the primary objective of this comprehensive changeset: adding native OpenTelemetry instrumentation throughout crewAI's execution hot paths.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch luzk/otel-instrumentation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

with operation("failing op"):
raise RuntimeError("boom")

finished = span_exporter.get_finished_spans()
with operation("doubly recorded"):
raise RuntimeError("once")

span = span_exporter.get_finished_spans()[0]
with operation("cancelled op"):
raise asyncio.CancelledError("cancel")

span = span_exporter.get_finished_spans()[0]
Comment thread lib/crewai/tests/telemetry/test_otel.py Fixed
Comment thread lib/crewai/tests/telemetry/test_otel.py Fixed
@mintlify

mintlify Bot commented Jun 23, 2026

Copy link
Copy Markdown

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
crewai 🟢 Ready View Preview Jun 23, 2026, 1:24 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
lib/crewai/tests/telemetry/test_otel.py (1)

85-91: 📐 Maintainability & Code Quality | 🔵 Trivial

Isolate the private OpenTelemetry internals behind a versioned helper for clarity.

trace._TRACER_PROVIDER_SET_ONCE._done and trace._TRACER_PROVIDER are undocumented private symbols; they are not part of the public API and relying on them across version upgrades causes stability issues. While the assertion guard assert actual is _SHARED_PROVIDER will catch a hard break, it surfaces only as a confusing test failure rather than a clear signal. Wrap this in a small helper function with a comment documenting the supported opentelemetry-api version (currently ~1.34.0) and verify these attributes exist at initialization time, so future maintainers understand the fragility and constraints.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/tests/telemetry/test_otel.py` around lines 85 - 91, The test code
directly accesses undocumented private OpenTelemetry internals
(trace._TRACER_PROVIDER_SET_ONCE._done and trace._TRACER_PROVIDER) which are not
part of the public API and lack version stability guarantees. Create a small
helper function that encapsulates this private attribute access and verify these
attributes exist at initialization time, add a documentation comment specifying
the supported opentelemetry-api version constraint (currently ~1.34.0), and call
this helper from the test instead of directly manipulating the private symbols.
This way future maintainers will understand the version fragility and the
assertion guard will provide clearer context if something breaks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/crewai/src/crewai/flow/runtime/__init__.py`:
- Around line 2493-2525: The `HumanFeedbackPending` exception is escaping the
`operation("execute flow", ...)` context manager before being caught by an outer
exception handler, causing the span to be marked as an error even though it
represents expected control flow. Wrap the code inside the operation context
that calls `self._execute_start_method(start_method)` and
`asyncio.gather(*tasks)` in a try-except block that catches
`HumanFeedbackPending` separately, re-raises it after the operation context
closes so the outer exception handler can process it without the operation span
recording it as an error.
- Around line 2852-2867: The issue is that when a sync method returns a
coroutine, the await is happening after the operation span exits, placing part
of the execution outside the tracing context. To fix this, after executing the
sync method in the else block (where asyncio.to_thread is used with ctx.run),
check if the result is a coroutine using asyncio.iscoroutine(result), and if so,
await it while still inside the operation context manager block. This ensures
the complete execution of coroutine-returning sync methods stays within the
"execute flow method" operation span.

In `@lib/crewai/src/crewai/llms/providers/anthropic/completion.py`:
- Around line 301-303: The instrumentation code added to the context manager at
the llm_call_context() call (around lines 301-303) and at line 378-380 is
logically correct but violates Ruff formatting rules, causing CI to fail. Run
the Ruff formatter on the file by executing the command `uv run ruff format
lib/crewai/src/crewai/llms/providers/anthropic/completion.py` to automatically
apply the required formatting changes, then commit and push the formatted output
to unblock CI.

In `@lib/crewai/src/crewai/llms/providers/openai/completion.py`:
- Around line 414-416: The ruff formatter is detecting formatting issues in the
completion.py file that are blocking CI. Run the command to automatically format
the file using ruff, which will resolve the formatting violations detected by
ruff format --check. Specifically, apply ruff formatting to the file containing
the llm_call_context and operation function calls around the "call llm"
operation context, then commit the formatted changes.

In `@lib/crewai/src/crewai/telemetry/otel.py`:
- Line 1: The file lib/crewai/src/crewai/telemetry/otel.py has formatting issues
detected by the Ruff formatter that are blocking CI. Run the command `uv run
ruff format lib/crewai/src/crewai/telemetry/otel.py` to automatically apply the
required formatting standards to the file, or run `uv run ruff format lib/` to
format the entire lib directory. This will resolve the formatting drift and
allow the CI pipeline to pass.

In `@lib/crewai/tests/telemetry/test_otel.py`:
- Around line 567-570: The ruff format check is failing because several
expressions are unnecessarily split across multiple lines when they fit within
the 88 character limit. Collapse the manually wrapped expressions in the
test_otel.py file: the ThreadPoolExecutor pool.submit(...).result() blocks
(found around lines 568-570, 591-593, 612-614, and 635-637) and the
single-argument assert statements (around lines 498-500 and 573-575) should each
be collapsed onto a single line where they fit under 88 characters. Run uv run
ruff format lib/ to automatically fix all formatting issues in the directory.

---

Nitpick comments:
In `@lib/crewai/tests/telemetry/test_otel.py`:
- Around line 85-91: The test code directly accesses undocumented private
OpenTelemetry internals (trace._TRACER_PROVIDER_SET_ONCE._done and
trace._TRACER_PROVIDER) which are not part of the public API and lack version
stability guarantees. Create a small helper function that encapsulates this
private attribute access and verify these attributes exist at initialization
time, add a documentation comment specifying the supported opentelemetry-api
version constraint (currently ~1.34.0), and call this helper from the test
instead of directly manipulating the private symbols. This way future
maintainers will understand the version fragility and the assertion guard will
provide clearer context if something breaks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 120c8a32-83fb-4d7a-882b-bc510eeca1d9

📥 Commits

Reviewing files that changed from the base of the PR and between 2eb4e3a and fdfe593.

📒 Files selected for processing (25)
  • docs/docs.json
  • docs/edge/en/changelog.mdx
  • docs/edge/en/observability/opentelemetry.mdx
  • lib/crewai/src/crewai/a2a/utils/delegation.py
  • lib/crewai/src/crewai/agent/core.py
  • lib/crewai/src/crewai/crew.py
  • lib/crewai/src/crewai/events/event_bus.py
  • lib/crewai/src/crewai/flow/runtime/__init__.py
  • lib/crewai/src/crewai/knowledge/knowledge.py
  • lib/crewai/src/crewai/llm.py
  • lib/crewai/src/crewai/llms/providers/anthropic/completion.py
  • lib/crewai/src/crewai/llms/providers/azure/completion.py
  • lib/crewai/src/crewai/llms/providers/bedrock/completion.py
  • lib/crewai/src/crewai/llms/providers/gemini/completion.py
  • lib/crewai/src/crewai/llms/providers/openai/completion.py
  • lib/crewai/src/crewai/memory/unified_memory.py
  • lib/crewai/src/crewai/task.py
  • lib/crewai/src/crewai/tasks/llm_guardrail.py
  • lib/crewai/src/crewai/telemetry/__init__.py
  • lib/crewai/src/crewai/telemetry/otel.py
  • lib/crewai/src/crewai/tools/base_tool.py
  • lib/crewai/src/crewai/tools/structured_tool.py
  • lib/crewai/src/crewai/utilities/reasoning_handler.py
  • lib/crewai/tests/telemetry/test_otel.py
  • lib/crewai/tests/telemetry/test_otel_noop.py

Comment thread lib/crewai/src/crewai/flow/runtime/__init__.py
Comment thread lib/crewai/src/crewai/flow/runtime/__init__.py
Comment thread lib/crewai/src/crewai/llms/providers/anthropic/completion.py Outdated
Comment thread lib/crewai/src/crewai/llms/providers/openai/completion.py Outdated
Comment thread lib/crewai/src/crewai/telemetry/otel.py
Comment on lines +567 to +570
with ThreadPoolExecutor(max_workers=2) as pool:
inner = pool.submit(
contextvars.copy_context().run, _task
).result()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

CI is red on ruff format; collapse these manually wrapped expressions.

The lint job fails because several expressions here are wrapped across lines that ruff format will collapse onto a single line (each fits well under 88 cols). Representative spots: the pool.submit(...).result() blocks at Lines 568-570, 591-593, 612-614, 635-637 and the single-arg assert blocks at Lines 498-500 and 573-575. Run uv run ruff format lib/ to fix all reformatted files.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/tests/telemetry/test_otel.py` around lines 567 - 570, The ruff
format check is failing because several expressions are unnecessarily split
across multiple lines when they fit within the 88 character limit. Collapse the
manually wrapped expressions in the test_otel.py file: the ThreadPoolExecutor
pool.submit(...).result() blocks (found around lines 568-570, 591-593, 612-614,
and 635-637) and the single-argument assert statements (around lines 498-500 and
573-575) should each be collapsed onto a single line where they fit under 88
characters. Run uv run ruff format lib/ to automatically fix all formatting
issues in the directory.

Source: Pipeline failures

Open spans directly on the user's thread so that stdlib log records
emitted during hot paths like `Crew.kickoff`, `BaseTool.run`, and
`LLM.call` carry the active trace context and correlate with the
spans they belong to — a gap the previous metrics-only telemetry
could not close. Introduces a `crewai.telemetry.otel` module
exposing `operation` and `follows_from`, instruments the execution
hot paths, and propagates the active context across every
parallel-dispatch site. Depends only on `opentelemetry-api` so
provider and exporter choice stays with the host application per the
standard OTel library pattern; without an installed SDK the
`ProxyTracer` keeps everything as a NoOp.

Co-authored-by: Cursor <cursoragent@cursor.com>
@lucasgomide lucasgomide force-pushed the luzk/otel-instrumentation branch from fdfe593 to 7cc74cb Compare June 23, 2026 14:57
Comment thread lib/crewai/src/crewai/tools/base_tool.py
with operation("paused op", expected_exceptions=(_ExpectedPause,)):
raise _ExpectedPause("pause")

span = span_exporter.get_finished_spans()[0]
Address review feedback on the native OpenTelemetry instrumentation
`test_otel.py`'s `span_exporter` fixture installed an SDK
`TracerProvider` once via module-level globals and never restored the
default `ProxyTracerProvider`, so `test_otel_noop.py`'s unconfigured-
default-state assertions failed whenever the two files ran on the same
worker. Install the SDK provider fresh per test and reset the global
slot back to `ProxyTracerProvider` in `finally`; `_tracer()` re-resolves
on every span so swapping providers between tests is safe.
`Telemetry.set_tracer()` installed crewAI's anonymous SDK
`TracerProvider` into OpenTelemetry's process-global slot, so the first
`Crew` constructed in a test or host application replaced the default
`ProxyTracerProvider` and exfiltrated every host span emitted via
`trace.get_tracer(...)` to crewAI's OTLP endpoint. Keep the provider
local to the `Telemetry` instance and route every anonymous span
through `self.provider.get_tracer("crewai.telemetry")` so the global
slot stays untouched. Mirrors the fix in `crewai_core.telemetry`,
drops the now-dead `set_tracer()` calls in `event_listener.py` and
`crewai_cli.command`, and adds regression coverage that asserts the
provider stays a `ProxyTracerProvider` after constructing a `Crew`.
@linear

linear Bot commented Jun 23, 2026

Copy link
Copy Markdown

CON-290

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 99b87e8. Configure here.

# whole call stays inside the "execute flow method" span
# (enables AgentExecutor pattern).
if asyncio.iscoroutine(result):
result = await result

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flow method span ends early

Low Severity

The new execute flow method span wraps only the method body and auto-awaited coroutine, but _run_human_feedback_step runs after that block ends. If feedback fails or raises HumanFeedbackPending, the method span is already finished with a non-error status, so traces can show a successful method while the flow is paused or failed in the feedback step.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 99b87e8. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant