Enhancement Proposal: Simplifying Token Calculation for High-Frequency Append-Log Style Operations

Enhancement Proposal: Simplifying Token Calculation for High-Frequency Append-Log Style Operations

## Overview:

In our current project utilizing ScyllaDB, we're implementing a high-frequency, append-log style architecture to handle concurrent append requests. To optimize performance and minimize network traffic, we're batching these requests similar to how Kafka API operates, sending batches to ScyllaDB every 10 milliseconds.

To ensure efficient batching and minimize network overhead, it's crucial to group insert requests that will ultimately end up on the same node within ScyllaDB. This necessitates the computation of tokens for each insert statement, enabling us to determine their placement within the token ring.

## Current Challenge:

Presently, the existing API poses challenges in efficiently computing the token of a `PreparedStatement` without incurring significant performance overhead. The process involves invoking `Session::calculate_token`, which necessitates serializing a row (resulting in memory allocation), extracting the partition key, and then computing the token. Subsequently, when batching these statements using `Session::batch`, each row undergoes serialization again, effectively doubling memory allocation and serialization overhead.

## Immediate Solution

To streamline this process and enhance performance, we propose making Session::calculate_token_untyped public instead of keeping it pub(crate). By exposing this method publicly, we can pre-serialize every row, thereby reusing the serialization results to compute tokens and seamlessly integrate them into our batching process.

## Additionnal Note

In addition to the proposed enhancement of making `Session::calculate_token_untyped` public, we suggest making the `PartitionHasher` publicly accessible as well. This would empower users to compute results in advance without having to go through the serialization process of `SerializeRow` and `PreparedStatement`.

Considering that many ScyllaDB use cases involve key-value stores where the partition key is often known early on, exposing PartitionHasher would facilitate more efficient pre-computation of tokens, enhancing overall performance and developer experience.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhancement Proposal: Simplifying Token Calculation for High-Frequency Append-Log Style Operations #974

Overview:

Current Challenge:

Immediate Solution

Additionnal Note

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enhancement Proposal: Simplifying Token Calculation for High-Frequency Append-Log Style Operations #974

Description

Overview:

Current Challenge:

Immediate Solution

Additionnal Note

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions