-
Notifications
You must be signed in to change notification settings - Fork 534
fix(async-processor): concurrent exports actually serialised #3028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3028 +/- ##
=======================================
+ Coverage 81.1% 81.2% +0.1%
=======================================
Files 126 126
Lines 24954 25045 +91
=======================================
+ Hits 20255 20360 +105
+ Misses 4699 4685 -14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
290d3be
to
06fa8ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR restores true parallel exports by scheduling export()
futures on the async runtime instead of awaiting them directly, and adds tests to verify behavior under different max_concurrent_exports
settings.
- Refactored
export
calls to use a static async function withRwLock
-wrapped exporter andFuturesUnordered
for concurrency. - Updated shutdown and resource-setting to acquire a write lock on the exporter.
- Added tokio-based tests (
TrackingExporter
) to ensure exports are parallel when allowed and serialized when limited.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
opentelemetry-sdk/src/trace/span_processor_with_async_runtime.rs | Refactor export logic for concurrent scheduling and lock-protect the exporter; update shutdown and set_resource; add concurrency tests. |
opentelemetry-sdk/Cargo.toml | Include the rt-tokio feature in experimental_trace_batch_span_processor_with_async_runtime . |
opentelemetry-sdk/src/trace/span_processor_with_async_runtime.rs
Outdated
Show resolved
Hide resolved
06fa8ae
to
1905b50
Compare
opentelemetry-sdk/src/trace/span_processor_with_async_runtime.rs
Outdated
Show resolved
Hide resolved
@@ -188,13 +190,19 @@ struct BatchSpanProcessorInternal<E, R> { | |||
spans: Vec<SpanData>, | |||
export_tasks: FuturesUnordered<BoxFuture<'static, OTelSdkResult>>, | |||
runtime: R, | |||
exporter: E, | |||
exporter: Arc<RwLock<E>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going forward, the SpanExporter
trait should be redesigned to use immutable references (&self) for all methods. This would allow us to remove the RwLock
and use just Arc<E>
for sharing the exporter across concurrent tasks - similar to how LogExporter
is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed — my goal was mainly to minimise the scope of the change, but you're right that using a RwLock is... questionable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a TODO comment for this so someone can work on it? I'll also file an issue. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for adding tests for concurrent export.
let export_result = self.export().await; | ||
let batch = self.spans.split_off(0); | ||
let exporter = self.exporter.clone(); | ||
let runtime = self.runtime.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe these clones should be cheap - Arc
cloning for exporter, and ZST cloning for runtime.
@@ -188,13 +190,19 @@ struct BatchSpanProcessorInternal<E, R> { | |||
spans: Vec<SpanData>, | |||
export_tasks: FuturesUnordered<BoxFuture<'static, OTelSdkResult>>, | |||
runtime: R, | |||
exporter: E, | |||
exporter: Arc<RwLock<E>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a TODO comment for this so someone can work on it? I'll also file an issue. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this and fixing! Can you add a changelog entry too?
(Concurrent Export support is something we need to support everywhere, and is pending some spec-level discussions. But given this functionality was already there and got removed unintentionally, no problem adding it back)
@lalitb @cijothomas Thanks again for all the reviews! 🙏 Is there anything else outstanding that needs to be addressed before this can be merged? |
#2685 unintentionally broke parallel exports by awaiting the
export()
future directly inopentelemetry-sdk/src/trace/span_processor_with_async_runtime.rs
, rather than passing it to the runtime for concurrent polling. As a result,OTEL_BSP_MAX_CONCURRENT_EXPORTS
became ineffective, serialising all exports and increasing the risk of dropped spans under load.This PR restores true parallelism by respecting
max_concurrent_exports
, and adds tests to verify:max_concurrent_exports
> 1max_concurrent_exports
== 1Merge requirement checklist
CHANGELOG.md
files updated for non-trivial, user-facing changes