Skip to content

Commit de0d53a

Browse files
authored
Merge branch 'main' into u/mtomka/refine-runtime-trait
2 parents 88efa83 + 2997c4b commit de0d53a

File tree

31 files changed

+1913
-1445
lines changed

31 files changed

+1913
-1445
lines changed

docs/design/logs.md

Lines changed: 302 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,302 @@
1+
# OpenTelemetry Rust Logs Design
2+
3+
Status:
4+
[Development](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/document-status.md)
5+
6+
## Overview
7+
8+
[OpenTelemetry (OTel)
9+
Logs](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/README.md)
10+
support differs from Metrics and Traces as it does not introduce a new logging
11+
API for end users. Instead, OTel recommends leveraging existing logging
12+
libraries such as [log](https://crates.io/crates/log) and
13+
[tracing](https://crates.io/crates/tracing), while providing bridges (appenders)
14+
to route logs through OpenTelemetry.
15+
16+
OTel took this different approach due to the long history of existing logging
17+
solutions. In Rust, these are [log](https://crates.io/crates/log) and
18+
[tracing](https://crates.io/crates/tracing), and have been embraced in the
19+
community for some time. OTel Rust maintains appenders for these libraries,
20+
allowing users to seamlessly integrate with OpenTelemetry without changing their
21+
existing logging instrumentation.
22+
23+
The `tracing` appender is particularly optimized for performance due to its
24+
widespread adoption and the fact that `tracing` itself has a bridge from the
25+
`log` crate. Notably, OpenTelemetry Rust itself is instrumented using `tracing`
26+
for internal logs. Additionally, when OTel began supporting logging as a signal,
27+
the `log` crate lacked structured logging support, reinforcing the decision to
28+
prioritize `tracing`.
29+
30+
## Benefits of OpenTelemetry Logs
31+
32+
- **Unified configuration** across Traces, Metrics, and Logs.
33+
- **Automatic correlation** with Traces.
34+
- **Consistent Resource attributes** across signals.
35+
- **Multiple destinations support**: Logs can continue flowing to existing
36+
destinations like stdout etc. while also being sent to an
37+
OpenTelemetry-capable backend, typically via an OTLP Exporter or exporters
38+
that export to operating system native systems like `Windows ETW` or `Linux
39+
user_events`.
40+
- **Standalone logging support** for applications that use OpenTelemetry as
41+
their primary logging mechanism.
42+
43+
## Key Design Principles
44+
45+
- High performance - no locks/contention in the hot path with minimal/no heap
46+
allocation where possible.
47+
- Capped resource (memory) usage - well-defined behavior when overloaded.
48+
- Self-observable - exposes telemetry about itself to aid in troubleshooting
49+
etc.
50+
- Robust error handling, returning Result where possible instead of panicking.
51+
- Minimal public API, exposing based on need only.
52+
53+
## Architecture Overview
54+
55+
```mermaid
56+
graph TD
57+
subgraph Application
58+
A1[Application Code]
59+
end
60+
subgraph Logging Libraries
61+
B1[log crate]
62+
B2[tracing crate]
63+
end
64+
subgraph OpenTelemetry
65+
C1[OpenTelemetry Appender for log]
66+
C2[OpenTelemetry Appender for tracing]
67+
C3[OpenTelemetry Logs API]
68+
C4[OpenTelemetry Logs SDK]
69+
C5[OTLP Exporter]
70+
end
71+
subgraph Observability Backend
72+
D1[OTLP-Compatible Backend]
73+
end
74+
A1 --> |Emits Logs| B1
75+
A1 --> |Emits Logs| B2
76+
B1 --> |Bridged by| C1
77+
B2 --> |Bridged by| C2
78+
C1 --> |Sends to| C3
79+
C2 --> |Sends to| C3
80+
C3 --> |Processes with| C4
81+
C4 --> |Exports via| C5
82+
C5 --> |Sends to| D1
83+
```
84+
85+
## Logs API
86+
87+
Logs API is part of the [opentelemetry](https://crates.io/crates/opentelemetry)
88+
crate.
89+
90+
The OTel Logs API is not intended for direct end-user usage. Instead, it is
91+
designed for appender/bridge authors to integrate existing logging libraries
92+
with OpenTelemetry. However, there is nothing preventing it from being used by
93+
end-users.
94+
95+
### API Components
96+
97+
1. **Key-Value Structs**: Used in `LogRecord`, where `Key` struct is shared
98+
across signals but `Value` struct differ from Metrics and Traces. This is
99+
because values in Logs can contain more complex structures than those in
100+
Traces and Metrics.
101+
2. **Traits**:
102+
- `LoggerProvider` - provides methods to obtain Logger.
103+
- `Logger` - provides methods to create LogRecord and emit the created
104+
LogRecord.
105+
- `LogRecord` - provides methods to populate LogRecord.
106+
3. **No-Op Implementations**: By default, the API performs no operations until
107+
an SDK is attached.
108+
109+
### Logs Flow
110+
111+
1. Obtain a `LoggerProvider` implementation.
112+
2. Use the `LoggerProvider` to create `Logger` instances, specifying a scope
113+
name (module/component emitting logs). Optional attributes and version are
114+
also supported.
115+
3. Use the `Logger` to create an empty `LogRecord` instance.
116+
4. Populate the `LogRecord` with body, timestamp, attributes, etc.
117+
5. Call `Logger.emit(LogRecord)` to process and export the log.
118+
119+
If only the Logs API is used (without an SDK), all the above steps result in no
120+
operations, following OpenTelemetry’s philosophy of separating API from SDK. The
121+
official Logs SDK provides real implementations to process and export logs.
122+
Users or vendors can also provide alternative SDK implementations.
123+
124+
## Logs SDK
125+
126+
Logs SDK is part of the
127+
[opentelemetry_sdk](https://crates.io/crates/opentelemetry_sdk) crate.
128+
129+
The OpenTelemetry Logs SDK provides an OTel specification-compliant
130+
implementation of the Logs API, handling log processing and export.
131+
132+
### Core Components
133+
134+
#### `SdkLoggerProvider`
135+
136+
This is the implementation of the `LoggerProvider` and deals with concerns such
137+
as processing and exporting Logs.
138+
139+
- Implements the `LoggerProvider` trait.
140+
- Creates and manages `SdkLogger` instances.
141+
- Holds logging configuration, including `Resource` and processors.
142+
- Does not retain a list of created loggers. Instead, it passes an owned clone
143+
of itself to each logger created. This is done so that loggers get a hold of
144+
the configuration (like which processor to invoke).
145+
- Uses an `Arc<LoggerProviderInner>` and delegates all configuration to
146+
`LoggerProviderInner`. This allows cheap cloning of itself and ensures all
147+
clones point to the same underlying configuration.
148+
- As `SdkLoggerProvider` only holds an `Arc` of its inner, it can only take
149+
`&self` in its methods like flush and shutdown. Else it needs to rely on
150+
interior mutability that comes with runtime performance costs. Since methods
151+
like shutdown usually need to mutate interior state, but this component can
152+
only take `&self`, it defers to components like exporter to use interior
153+
mutability to handle shutdown. (More on this in the exporter section)
154+
- An alternative design was to let `SdkLogger` hold a `Weak` reference to the
155+
`SdkLoggerProvider`. This would be a `weak->arc` upgrade in every log
156+
emission, significantly affecting throughput.
157+
- `LoggerProviderInner` implements `Drop`, triggering `shutdown()` when no
158+
references remain. However, in practice, loggers are often stored statically
159+
inside appenders (like tracing-appender), so explicit shutdown by the user is
160+
required.
161+
162+
#### `SdkLogger`
163+
164+
This is an implementation of the `Logger`, and contains functionality to create
165+
and emit logs.
166+
167+
- Implements the `Logger` trait.
168+
- Creates `SdkLogRecord` instances and emits them.
169+
- Calls `OnEmit()` on all registered processors when emitting logs.
170+
- Passes mutable references to each processor (`&mut log_record`), i.e.,
171+
ownership is not passed to the processor. This ensures that the logger avoids
172+
cloning costs. Since a mutable reference is passed, processors can modify the
173+
log, and it will be visible to the next processor in the chain.
174+
- Since the processor only gets a reference to the log, it cannot store it
175+
beyond the `OnEmit()`. If a processor needs to buffer logs, it must explicitly
176+
copy them to the heap.
177+
- This design allows for stack-only log processing when exporting to operating
178+
system native facilities like `Windows ETW` or `Linux user_events`.
179+
- OTLP Exporting requires network calls (HTTP/gRPC) and batching of logs for
180+
efficiency purposes. These exporters buffer log records by copying them to the
181+
heap. (More on this in the BatchLogRecordProcessor section)
182+
183+
#### `LogRecord`
184+
185+
- Holds log data, including attributes.
186+
- Uses an inline array for up to 5 attributes to optimize stack usage.
187+
- Falls back to a heap-allocated `Vec` if more attributes are required.
188+
- Inspired by Go’s `slog` library for efficiency.
189+
190+
#### LogRecord Processors
191+
192+
`SdkLoggerProvider` allows being configured with any number of LogProcessors.
193+
They get called in the order of registration. Log records are passed to the
194+
`OnEmit` method of LogProcessor. LogProcessors can be used to process the log
195+
records, enrich them, filter them, and export to destinations by leveraging
196+
LogRecord Exporters.
197+
198+
Following built-in Log processors are provided in the Log SDK:
199+
200+
##### SimpleLogProcessor
201+
202+
This processor is designed to be used for exporting purposes. Export is handled
203+
by an Exporter (which is a separate component). SimpleLogProcessor is "simple"
204+
in the sense that it does not attempt to do any processing - it just calls the
205+
exporter and passes the log record to it. To comply with OTel specification, it
206+
synchronizes calls to the `Export()` method, i.e., only one `Export()` call will
207+
be done at any given time.
208+
209+
SimpleLogProcessor is only used for test/learning purposes and is often used
210+
along with a `stdout` exporter.
211+
212+
##### BatchLogProcessor
213+
214+
This is another "exporting" processor. As with SimpleLogProcessor, a different
215+
component named LogExporter handles the actual export logic. BatchLogProcessor
216+
buffers/batches the logs it receives into an in-memory buffer. It invokes the
217+
exporter every 1 second or when 512 items are in the batch (customizable). It
218+
uses a background thread to do the export, and communication between the user
219+
thread (where logs are emitted) and the background thread occurs with `mpsc`
220+
channels.
221+
222+
The max amount of items the buffer holds is 2048 (customizable). Once the limit
223+
is reached, any *new* logs are dropped. It *does not* apply back-pressure to the
224+
user thread and instead drops logs.
225+
226+
As with SimpleLogProcessor, this component also ensures only one export is
227+
active at a given time. A modified version of this is required to achieve higher
228+
throughput in some environments.
229+
230+
In this design, at most 2048+512 logs can be in memory at any given point. In
231+
other words, that many logs can be lost if the app crashes in the middle.
232+
233+
## LogExporters
234+
235+
LogExporters are responsible for exporting logs to a destination. Some of them
236+
include:
237+
238+
1. **InMemoryExporter** - exports to an in-memory list, primarily for
239+
unit-testing. This is used extensively in the repo itself, and external users
240+
are also encouraged to use this.
241+
2. **Stdout exporter** - prints telemetry to stdout. Only for debugging/learning
242+
purposes. The output format is not defined and also is not performance
243+
optimized. A production-recommended version with a standardized output format
244+
is in the plan.
245+
3. **OTLP Exporter** - OTel's official exporter which uses the OTLP protocol
246+
that is designed with the OTel data model in mind. Both HTTP and gRPC-based
247+
exporting is offered.
248+
4. **Exporters to OS Kernel facilities** - These exporters are not maintained in
249+
the core repo but listed for completion. They export telemetry to Windows ETW
250+
or Linux user_events. They are designed for high-performance workloads. Due
251+
to their nature of synchronous exporting, they do not require
252+
buffering/batching. This allows logs to operate entirely on the stack and can
253+
scale easily with the number of CPU cores. (Kernel uses per-CPU buffers for
254+
the events, ensuring no contention)
255+
256+
## `tracing` Log Appender
257+
258+
Tracing appender is part of the
259+
[opentelemetry-appender-tracing](https://crates.io/crates/opentelemetry-appender-tracing)
260+
crate.
261+
262+
The `tracing` appender bridges `tracing` logs to OpenTelemetry. Logs emitted via
263+
`tracing` macros (`info!`, `warn!`, etc.) are forwarded to OpenTelemetry through
264+
this integration.
265+
266+
- `tracing` is designed for high performance, using *layers* or *subscribers* to
267+
handle emitted logs (events).
268+
- The appender implements a `Layer`, receiving logs from `tracing`.
269+
- Uses the OTel Logs API to create `LogRecord`, populate it, and emit it via
270+
`Logger.emit(LogRecord)`.
271+
- If no Logs SDK is present, the process is a no-op.
272+
273+
Note on terminology: Within OpenTelemetry, "tracing" refers to distributed
274+
tracing (i.e creation of Spans) and not in-process structured logging and
275+
execution traces. The crate "tracing" has notion of creating Spans as well as
276+
Events. The events from "tracing" crate is what gets converted to OTel Logs,
277+
when using this appender. Spans created using "tracing" crate is not handled by
278+
this crate.
279+
280+
## Performance
281+
282+
// Call out things done specifically for performance
283+
284+
### Perf test - benchmarks
285+
286+
// Share ~~ numbers
287+
288+
### Perf test - stress test
289+
290+
// Share ~~ numbers
291+
292+
## Summary
293+
294+
- OpenTelemetry Logs does not provide a user-facing logging API.
295+
- Instead, it integrates with existing logging libraries (`log`, `tracing`).
296+
- The Logs API defines key traits but performs no operations unless an SDK is
297+
installed.
298+
- The Logs SDK enables log processing, transformation, and export.
299+
- The Logs SDK is performance optimized to minimize copying and heap allocation,
300+
wherever feasible.
301+
- The `tracing` appender efficiently routes logs to OpenTelemetry without
302+
modifying existing logging workflows.

docs/design/metrics.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# OpenTelemetry Rust Metrics Design
2+
3+
Status:
4+
[Development](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/document-status.md)
5+
6+
TODO:

docs/design/traces.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# OpenTelemetry Rust Traces Design
2+
3+
Status:
4+
[Development](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/document-status.md)
5+
6+
TODO:

examples/logs-basic/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,4 @@ opentelemetry_sdk = { path = "../../opentelemetry-sdk", features = ["logs"] }
1010
opentelemetry-stdout = { path = "../../opentelemetry-stdout", features = ["logs"]}
1111
opentelemetry-appender-tracing = { path = "../../opentelemetry-appender-tracing", default-features = false}
1212
tracing = { workspace = true, features = ["std"]}
13-
tracing-subscriber = { workspace = true, features = ["registry", "std"] }
13+
tracing-subscriber = { workspace = true, features = ["env-filter","registry", "std", "fmt"] }

examples/logs-basic/src/main.rs

Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ use opentelemetry_appender_tracing::layer;
22
use opentelemetry_sdk::logs::SdkLoggerProvider;
33
use opentelemetry_sdk::Resource;
44
use tracing::error;
5-
use tracing_subscriber::prelude::*;
5+
use tracing_subscriber::{prelude::*, EnvFilter};
66

77
fn main() {
88
let exporter = opentelemetry_stdout::LogExporter::default();
@@ -14,8 +14,37 @@ fn main() {
1414
)
1515
.with_simple_exporter(exporter)
1616
.build();
17-
let layer = layer::OpenTelemetryTracingBridge::new(&provider);
18-
tracing_subscriber::registry().with(layer).init();
17+
18+
// For the OpenTelemetry layer, add a tracing filter to filter events from
19+
// OpenTelemetry and its dependent crates (opentelemetry-otlp uses crates
20+
// like reqwest/tonic etc.) from being sent back to OTel itself, thus
21+
// preventing infinite telemetry generation. The filter levels are set as
22+
// follows:
23+
// - Allow `info` level and above by default.
24+
// - Restrict `opentelemetry`, `hyper`, `tonic`, and `reqwest` completely.
25+
// Note: This will also drop events from crates like `tonic` etc. even when
26+
// they are used outside the OTLP Exporter. For more details, see:
27+
// https://github.com/open-telemetry/opentelemetry-rust/issues/761
28+
let filter_otel = EnvFilter::new("info")
29+
.add_directive("hyper=off".parse().unwrap())
30+
.add_directive("opentelemetry=off".parse().unwrap())
31+
.add_directive("tonic=off".parse().unwrap())
32+
.add_directive("h2=off".parse().unwrap())
33+
.add_directive("reqwest=off".parse().unwrap());
34+
let otel_layer = layer::OpenTelemetryTracingBridge::new(&provider).with_filter(filter_otel);
35+
36+
// Create a new tracing::Fmt layer to print the logs to stdout. It has a
37+
// default filter of `info` level and above, and `debug` and above for logs
38+
// from OpenTelemetry crates. The filter levels can be customized as needed.
39+
let filter_fmt = EnvFilter::new("info").add_directive("opentelemetry=debug".parse().unwrap());
40+
let fmt_layer = tracing_subscriber::fmt::layer()
41+
.with_thread_names(true)
42+
.with_filter(filter_fmt);
43+
44+
tracing_subscriber::registry()
45+
.with(otel_layer)
46+
.with(fmt_layer)
47+
.init();
1948

2049
error!(name: "my-event-name", target: "my-system", event_id = 20, user_name = "otel", user_email = "[email protected]", message = "This is an example message");
2150
let _ = provider.shutdown();

examples/tracing-grpc/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ opentelemetry_sdk = { path = "../../opentelemetry-sdk", features = ["rt-tokio"]
1919
opentelemetry-stdout = { path = "../../opentelemetry-stdout", features = ["trace"] }
2020
prost = { workspace = true }
2121
tokio = { workspace = true, features = ["full"] }
22-
tonic = { workspace = true }
22+
tonic = { workspace = true, features = ["server"] }
2323

2424
[build-dependencies]
2525
tonic-build = { workspace = true }

opentelemetry-appender-tracing/Cargo.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,8 @@ tracing-opentelemetry = { version = "0.29", optional = true }
2323
log = { workspace = true }
2424
opentelemetry-stdout = { path = "../opentelemetry-stdout", features = ["logs"] }
2525
opentelemetry_sdk = { path = "../opentelemetry-sdk", features = ["logs", "testing"] }
26-
tracing-subscriber = { workspace = true, features = ["registry", "std", "env-filter"] }
26+
tracing = { workspace = true, features = ["std"]}
27+
tracing-subscriber = { workspace = true, features = ["env-filter","registry", "std", "fmt"] }
2728
tracing-log = "0.2"
2829
criterion = { workspace = true }
2930
tokio = { workspace = true, features = ["full"]}

opentelemetry-appender-tracing/benches/logs.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ use opentelemetry_sdk::error::OTelSdkResult;
2020
use opentelemetry_sdk::logs::{LogBatch, LogExporter};
2121
use opentelemetry_sdk::logs::{LogProcessor, SdkLogRecord, SdkLoggerProvider};
2222
use opentelemetry_sdk::Resource;
23+
#[cfg(not(target_os = "windows"))]
2324
use pprof::criterion::{Output, PProfProfiler};
2425
use tracing::error;
2526
use tracing_subscriber::prelude::*;

0 commit comments

Comments
 (0)