Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions develop-docs/sdk/telemetry/logs.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -390,9 +390,11 @@ A new data category for logs has been added to Relay, `log_item`. Both the `log`

### Buffering

Logs should be buffered before being sent. SDKs should keep a buffer of logs on the client (so you can have logs from multiple traces in the buffer) that flushes out based on some kind of condition. We recommend to follow the [telemetry buffer specification outlined](/sdk/telemetry/telemetry-buffer/) in the develop docs, but you should choose the approach that works best for your platform. When starting initial development on the SDK you can choose a simple approach to buffer like flushing logs if the buffer length exceeds 100 items, or if 5 seconds have passed.
SDKs MUST buffer logs before sending them. SDKs should keep a buffer of logs that flushes when specific conditions are met. When starting initial development on the SDK, you can choose a simple approach, like flushing logs if the buffer length exceeds 100 items or if 5 seconds have passed. To prevent data loss, the buffer SHOULD forward logs to the transport in the scenarios outlined in the [telemetry buffer data forwarding scenarios](/sdk/telemetry/telemetry-buffer/#data-forwarding-scenarios).

SDKs must have a hard limit of 1000 log events queued up to avoid causing our customers applications going out of memory. Logs added once the hard limit has been reached are dropped.
SDKs MUST also have a hard limit of 1000 log events queued up to avoid causing our customers applications going out of memory. Logs added once the hard limit has been reached are dropped. Also, Relay has a hard limit of 1000 logs per envelope. SDKs MUST ensure not to exceed this limit. SDKs SHOULD therefore batch logs into envelopes of 100 logs or less.

We used to recommend following the [BatchProcessor](/sdk/telemetry/telemetry-buffer/#batchprocessor-v0), but this page is currently under development. We currently working on a new [telemetry buffer specification](/sdk/telemetry/telemetry-buffer/) that will replace the BatchProcessor.

SDKS should NOT release logging capabilities to users if a buffering implementation has not been added to their SDK when adding logging APIs.

Expand Down
10 changes: 5 additions & 5 deletions develop-docs/sdk/telemetry/metrics.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ While the metrics functionality for an SDK is in an experimental state, SDKs sho
Sentry.init({
// stable
enableMetrics: true,

// experimental
_experiments: { enableMetrics: true },
});
Expand Down Expand Up @@ -437,12 +437,13 @@ A new data category for metrics has been added to Relay, `trace_metric`. Rate li

### Buffering

Metrics should be buffered before being sent. SDKs should keep a buffer of metrics on the client that flushes out based on some kind of condition. We recommend following the [batch processor specification](/sdk/telemetry/spans/batch-processor/) outlined in the develop docs, but you should choose the approach that works best for your platform.
SDKs MUST buffer metrics before sending them. SDKs should keep a buffer of metrics that flushes when specific conditions are met. When starting initial development on the SDK, you can choose a simple approach, like flushing metrics if the buffer length exceeds 100 items or if 5 seconds have passed. To prevent data loss, the buffer SHOULD forward metrics to the transport in the scenarios outlined in the [telemetry buffer data forwarding scenarios](/sdk/telemetry/telemetry-buffer/#data-forwarding-scenarios). Furthermore:

- The aggregation window should be time and size based.
- Flush triggers (e.g. SDK shutdown, size thresholds) should be considered based on the platform and use case.
- SDKs should implement safeguards to prevent excessive memory usage from metric buffering.

We're currently working on a concept for buffering different types of telemetry data in [telemetry buffer specification](/sdk/telemetry/telemetry-buffer/).

### Behaviour with other Sentry Telemetry

#### Tracing
Expand Down Expand Up @@ -553,5 +554,4 @@ If `debug` is set to `true` in SDK init, calls to the Sentry metrics API should

- [Experimental JS SDK PR #17883 - Metrics API Implementation](https://github.com/getsentry/sentry-javascript/pull/17883/files)
- [Experimental Python SDK PR #4898 - Metrics API Implementation](https://github.com/getsentry/sentry-python/pull/4898)
- [Batch Processor Specification](/sdk/telemetry/spans/batch-processor/)

- [Batch Processor Specification](/sdk/telemetry/telemetry-buffer/batch-processor/)
108 changes: 108 additions & 0 deletions develop-docs/sdk/telemetry/telemetry-buffer/batch-processor.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
title: Batch Processor (deprecated)
redirect_from:
- /sdk/telemetry/spans/batch-processor/
sidebar_order: 10
---

<Alert level="warning">
The BatchProcessor is deprecated. Please use the [Telemetry Buffer](/sdk/telemetry/telemetry-buffer/) instead.
</Alert>

<Alert>
This document uses key words such as "MUST", "SHOULD", and "MAY" as defined in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt) to indicate requirement levels.
</Alert>

# BatchProcessor (deprecated)

This section covers the initial specification of the BatchProcessor, which some SDKs use as a reference when implementing logs. This exists only as a reference until we fully spec out the [telemetry buffer](/sdk/telemetry/telemetry-buffer/) across all platforms.

## Overview

The BatchProcessor batches spans and logs into one envelope to reduce the number of HTTP requests. When an SDK implements span streaming or logs, it MUST use a BatchProcessor, which is similar to [OpenTelemetry's Batch Processor](https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/batchprocessor/README.md). The BatchProcessor holds logs and finished spans in memory and batches them together into envelopes. It uses a combination of time and size-based batching. When writing this, the BatchProcessor only handles spans and logs, but an SDK MAY use it for other telemetry data in the future.

## Specification

Whenever the SDK finishes a span or captures a log, it MUST put it into the BatchProcessor. The SDK MUST NOT put unfinished spans into the BatchProcessor.

The BatchProcessor MUST start a timeout of 5 seconds when the SDK adds the first span or log. When the timeout exceeds, the BatchProcessor MUST forward all spans or logs to the transport, no matter how many items it contains. The SDK MAY choose a different value for the timeout, but it MUST NOT exceed 30 seconds, as this can lead to problems with the span buffer on the backend, which uses a time interval of 60 seconds for determining segments for spans. The BatchProcessor SHOULD only start a new timeout, when it has spans or logs to send, to avoid running the timeout unnecessarily.

The BatchProcessor MUST forward all items to the transport after the SDK when containing spans or logs exceeding 1MiB in size. The SDK MAY choose a different value for the max batch size keeping the [envelope max sizes](/sdk/data-model/envelopes/#size-limits) in mind. The SDK MUST calculate the size of a span or a log to manage the BatchProcessor's memory footprint. The SDK MUST serialize the span or log and calculate the size based on the serialized JSON bytes. As serialization is expensive, the BatchProcessor SHOULD keep track of the serialized spans and logs and pass these to the envelope to avoid serializing multiple times.

When the BatchProcessor forwards all spans or logs to the transport, it MUST reset its timeout and remove all spans and logs. The SDK MUST apply filtering and sampling before adding spans or logs to the BatchProcessor. The SDK MUST apply rate limits to spans and logs after they leave the BatchProcessor to send as much data as possible by dropping data as late as possible.

The BatchProcessor MUST forward all spans and logs in memory to the transport to avoid data loss in the following scenarios:

1. When the user calls `SentrySDK.flush()`, the BatchProcessor MUST forward all data in memory to the transport, and only then the transport SHOULD flush the data.
2. When the user calls `SentrySDK.close()`, the BatchProcessor MUST forward all data in memory to the transport. SDKs SHOULD keep their existing closing behavior.
3. When the application shuts down gracefully, the BatchProcessor SHOULD forward all data in memory to the transport. The transport SHOULD keep its existing behavior, which usually stores the data to disk as an envelope. It is not required to call transport `flush`. This is mostly relevant for mobile SDKs already subscribed to these hooks, such as [applicationWillTerminate](https://developer.apple.com/documentation/uikit/uiapplicationdelegate/applicationwillterminate(_:)) on iOS.
4. When the application moves to the background, the BatchProcessor SHOULD forward all data in memory to the transport and stop the timer. The transport SHOULD keep its existing behavior, which usually stores the data to disk as an envelope. It is not required to call transport `flush`. This is mostly relevant for mobile SDKs.
5. Mobile SDKs MUST minimize data loss when sudden process terminations occur. Refer to the [Mobile Telemetry Buffer](/sdk/telemetry/telemetry-buffer/mobile-telemetry-buffer) section for more details.

The detailed specification is written in the [Gherkin syntax](https://cucumber.io/docs/gherkin/reference/). The specification uses spans as an example, but the same applies to logs or any other future telemetry data.


```Gherkin
Scenario: No spans in BatchProcessor 1 span added
Given no spans in the BatchProcessor
When the SDK finishes 1 span
Then the SDK puts this span to the BatchProcessor
And starts a timeout of 5 seconds
And doesn't forward the span to the transport

Scenario: Span added before timeout exceeds
Given span A in the BatchProcessor
Given 4.9 seconds pass
When the SDK finishes span B
Then the SDK adds span B to the BatchProcessor
And doesn't reset the timeout
And doesn't forward the spans A and B in the BatchProcessor to the transport

Scenario: Timeout exceeds and no spans or logs to send
Given no spans in the BatchProcessor
When the timeout exceeds
Then the BatchProcessor does nothing
And doesn't start a new timeout

Scenario: Spans with size of 1 MiB - 1 byte added, timeout exceeds
Given spans with size of 1 MiB - 1 byte in the BatchProcessor
When the timeout exceeds
Then the SDK adds all the spans to one envelope
And forwards them to the transport
And resets the timeout
And clears the BatchProcessor

Scenario: Spans with size of 1 MiB - 1 byte added within 4.9 seconds
Given spans with size of 1 MiB - 1 byte in the BatchProcessor
When the SDK finishes another span and puts it into the BatchProcessor
Then the BatchProcessor puts all spans into one envelope
And forwards the envelope to the transport
And resets the timeout
And clears the BatchProcessor

Scenario: Unfinished spans
Given no span is in the BatchProcessor
When the SDK starts a span but doesn't finish it
Then the BatchProcessor is empty

Scenario: Span filtered out
Given no span is in the BatchProcessor
When the finishes a span
And the span is filtered out
Then the BatchProcessor is empty

Scenario: Span not sampled
Given no span is in the BatchProcessor
When the finishes a span
And the span is not sampled
Then the BatchProcessor is empty

Scenario: 1 span added application crashes
Given 1 span in the SpansAggregator
When the SDK detects a crash
Then the SDK does nothing with the items in the BatchProcessor
And loses the spans in the BatchProcessor

```

<PageGrid />
99 changes: 12 additions & 87 deletions develop-docs/sdk/telemetry/telemetry-buffer/index.mdx
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
---
title: Telemetry Buffer
redirect_from:
- /sdk/telemetry/spans/batch-processor/
sidebar_order: 5
---

Expand Down Expand Up @@ -33,96 +31,23 @@ Therefore, we recommend implementing different types of telemetry buffers tailor

# Common Requirements

To be defined. Things like common API, client reports, etc.
This section covers the common requirements relevant for all platforms.

# BatchProcessor V0
## Data Forwarding Scenarios

## Overview
The TelemetryBuffer MUST forward all data in memory to the transport to avoid data loss in the following scenarios:

The BatchProcessor batches spans and logs into one envelope to reduce the number of HTTP requests. When an SDK implements span streaming or logs, it MUST use a BatchProcessor, which is similar to [OpenTelemetry's Batch Processor](https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/batchprocessor/README.md). The BatchProcessor holds logs and finished spans in memory and batches them together into envelopes. It uses a combination of time and size-based batching. When writing this, the BatchProcessor only handles spans and logs, but an SDK MAY use it for other telemetry data in the future.
1. When the user calls `SentrySDK.flush()`, the TelemetryBuffer MUST forward all data in memory to the transport, and only then the transport SHOULD flush the data.
2. When the user calls `SentrySDK.close()`, the TelemetryBuffer MUST forward all data in memory to the transport. SDKs SHOULD keep their existing closing behavior.

## Specification
Scenarios mostly relevant for mobile SDKs:

Whenever the SDK finishes a span or captures a log, it MUST put it into the BatchProcessor. The SDK MUST NOT put unfinished spans into the BatchProcessor.
1. When the application shuts down gracefully, the TelemetryBuffer SHOULD forward all data in memory to the transport. The transport SHOULD keep its existing behavior, which usually stores the data to disk as an envelope. It is not required to call transport `flush`. This is mostly relevant for mobile SDKs already subscribed to these hooks, such as [applicationWillTerminate](https://developer.apple.com/documentation/uikit/uiapplicationdelegate/applicationwillterminate(_:)) on iOS.
2. When the application moves to the background, the TelemetryBuffer SHOULD forward all data in memory to the transport and stop the timer. The transport SHOULD keep its existing behavior, which usually stores the data to disk as an envelope. It is not required to call transport `flush`. This is mostly relevant for mobile SDKs.
3. Mobile SDKs MUST minimize data loss when sudden process terminations occur. Refer to the [Mobile Telemetry Buffer](/sdk/telemetry/telemetry-buffer/mobile-telemetry-buffer) section for more details.

The BatchProcessor MUST start a timeout of 5 seconds when the SDK adds the first span or log. When the timeout exceeds, the BatchProcessor MUST forward all spans or logs to the transport, no matter how many items it contains. The SDK MAY choose a different value for the timeout, but it MUST NOT exceed 30 seconds, as this can lead to problems with the span buffer on the backend, which uses a time interval of 60 seconds for determining segments for spans. The BatchProcessor SHOULD only start a new timeout, when it has spans or logs to send, to avoid running the timeout unnecessarily.
## FAQ

The BatchProcessor MUST forward all items to the transport after the SDK when containing spans or logs exceeding 1MiB in size. The SDK MAY choose a different value for the max batch size keeping the [envelope max sizes](/sdk/data-model/envelopes/#size-limits) in mind. The SDK MUST calculate the size of a span or a log to manage the BatchProcessor's memory footprint. The SDK MUST serialize the span or log and calculate the size based on the serialized JSON bytes. As serialization is expensive, the BatchProcessor SHOULD keep track of the serialized spans and logs and pass these to the envelope to avoid serializing multiple times.
### Where is the batch processor?

When the BatchProcessor forwards all spans or logs to the transport, it MUST reset its timeout and remove all spans and logs. The SDK MUST apply filtering and sampling before adding spans or logs to the BatchProcessor. The SDK MUST apply rate limits to spans and logs after they leave the BatchProcessor to send as much data as possible by dropping data as late as possible.

The BatchProcessor MUST forward all spans and logs in memory to the transport to avoid data loss in the following scenarios:

1. When the user calls `SentrySDK.flush()`, the BatchProcessor MUST forward all data in memory to the transport, and only then the transport SHOULD flush the data.
2. When the user calls `SentrySDK.close()`, the BatchProcessor MUST forward all data in memory to the transport. SDKs SHOULD keep their existing closing behavior.
3. When the application shuts down gracefully, the BatchProcessor SHOULD forward all data in memory to the transport. The transport SHOULD keep its existing behavior, which usually stores the data to disk as an envelope. It is not required to call transport `flush`. This is mostly relevant for mobile SDKs already subscribed to these hooks, such as [applicationWillTerminate](https://developer.apple.com/documentation/uikit/uiapplicationdelegate/applicationwillterminate(_:)) on iOS.
4. When the application moves to the background, the BatchProcessor SHOULD forward all data in memory to the transport and stop the timer. The transport SHOULD keep its existing behavior, which usually stores the data to disk as an envelope. It is not required to call transport `flush`. This is mostly relevant for mobile SDKs.
5. Mobile SDKs MUST minimize data loss when sudden process terminations occur. Refer to the [Mobile Telemetry Buffer](/sdk/telemetry/telemetry-buffer/mobile-telemetry-buffer) section for more details.

The detailed specification is written in the [Gherkin syntax](https://cucumber.io/docs/gherkin/reference/). The specification uses spans as an example, but the same applies to logs or any other future telemetry data.


```Gherkin
Scenario: No spans in BatchProcessor 1 span added
Given no spans in the BatchProcessor
When the SDK finishes 1 span
Then the SDK puts this span to the BatchProcessor
And starts a timeout of 5 seconds
And doesn't forward the span to the transport

Scenario: Span added before timeout exceeds
Given span A in the BatchProcessor
Given 4.9 seconds pass
When the SDK finishes span B
Then the SDK adds span B to the BatchProcessor
And doesn't reset the timeout
And doesn't forward the spans A and B in the BatchProcessor to the transport

Scenario: Timeout exceeds and no spans or logs to send
Given no spans in the BatchProcessor
When the timeout exceeds
Then the BatchProcessor does nothing
And doesn't start a new timeout

Scenario: Spans with size of 1 MiB - 1 byte added, timeout exceeds
Given spans with size of 1 MiB - 1 byte in the BatchProcessor
When the timeout exceeds
Then the SDK adds all the spans to one envelope
And forwards them to the transport
And resets the timeout
And clears the BatchProcessor

Scenario: Spans with size of 1 MiB - 1 byte added within 4.9 seconds
Given spans with size of 1 MiB - 1 byte in the BatchProcessor
When the SDK finishes another span and puts it into the BatchProcessor
Then the BatchProcessor puts all spans into one envelope
And forwards the envelope to the transport
And resets the timeout
And clears the BatchProcessor

Scenario: Unfinished spans
Given no span is in the BatchProcessor
When the SDK starts a span but doesn't finish it
Then the BatchProcessor is empty

Scenario: Span filtered out
Given no span is in the BatchProcessor
When the finishes a span
And the span is filtered out
Then the BatchProcessor is empty

Scenario: Span not sampled
Given no span is in the BatchProcessor
When the finishes a span
And the span is not sampled
Then the BatchProcessor is empty

Scenario: 1 span added application crashes
Given 1 span in the SpansAggregator
When the SDK detects a crash
Then the SDK does nothing with the items in the BatchProcessor
And loses the spans in the BatchProcessor

```

<PageGrid />
The batch processor is deprecated and we move it to the [batch-processor](/sdk/telemetry/telemetry-buffer/batch-processor/) page. The telemetry buffer will include parts of the batch processor functionality.
Loading
Loading