Skip to content

Enhancement Proposal: Simplifying Token Calculation for High-Frequency Append-Log Style Operations #974

@lvboudre

Description

@lvboudre

Enhancement Proposal: Simplifying Token Calculation for High-Frequency Append-Log Style Operations

Overview:

In our current project utilizing ScyllaDB, we're implementing a high-frequency, append-log style architecture to handle concurrent append requests. To optimize performance and minimize network traffic, we're batching these requests similar to how Kafka API operates, sending batches to ScyllaDB every 10 milliseconds.

To ensure efficient batching and minimize network overhead, it's crucial to group insert requests that will ultimately end up on the same node within ScyllaDB. This necessitates the computation of tokens for each insert statement, enabling us to determine their placement within the token ring.

Current Challenge:

Presently, the existing API poses challenges in efficiently computing the token of a PreparedStatement without incurring significant performance overhead. The process involves invoking Session::calculate_token, which necessitates serializing a row (resulting in memory allocation), extracting the partition key, and then computing the token. Subsequently, when batching these statements using Session::batch, each row undergoes serialization again, effectively doubling memory allocation and serialization overhead.

Immediate Solution

To streamline this process and enhance performance, we propose making Session::calculate_token_untyped public instead of keeping it pub(crate). By exposing this method publicly, we can pre-serialize every row, thereby reusing the serialization results to compute tokens and seamlessly integrate them into our batching process.

Additionnal Note

In addition to the proposed enhancement of making Session::calculate_token_untyped public, we suggest making the PartitionHasher publicly accessible as well. This would empower users to compute results in advance without having to go through the serialization process of SerializeRow and PreparedStatement.

Considering that many ScyllaDB use cases involve key-value stores where the partition key is often known early on, exposing PartitionHasher would facilitate more efficient pre-computation of tokens, enhancing overall performance and developer experience.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceImproves performance of existing features

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions