-
Notifications
You must be signed in to change notification settings - Fork 142
Description
Enhancement Proposal: Simplifying Token Calculation for High-Frequency Append-Log Style Operations
Overview:
In our current project utilizing ScyllaDB, we're implementing a high-frequency, append-log style architecture to handle concurrent append requests. To optimize performance and minimize network traffic, we're batching these requests similar to how Kafka API operates, sending batches to ScyllaDB every 10 milliseconds.
To ensure efficient batching and minimize network overhead, it's crucial to group insert requests that will ultimately end up on the same node within ScyllaDB. This necessitates the computation of tokens for each insert statement, enabling us to determine their placement within the token ring.
Current Challenge:
Presently, the existing API poses challenges in efficiently computing the token of a PreparedStatement without incurring significant performance overhead. The process involves invoking Session::calculate_token, which necessitates serializing a row (resulting in memory allocation), extracting the partition key, and then computing the token. Subsequently, when batching these statements using Session::batch, each row undergoes serialization again, effectively doubling memory allocation and serialization overhead.
Immediate Solution
To streamline this process and enhance performance, we propose making Session::calculate_token_untyped public instead of keeping it pub(crate). By exposing this method publicly, we can pre-serialize every row, thereby reusing the serialization results to compute tokens and seamlessly integrate them into our batching process.
Additionnal Note
In addition to the proposed enhancement of making Session::calculate_token_untyped public, we suggest making the PartitionHasher publicly accessible as well. This would empower users to compute results in advance without having to go through the serialization process of SerializeRow and PreparedStatement.
Considering that many ScyllaDB use cases involve key-value stores where the partition key is often known early on, exposing PartitionHasher would facilitate more efficient pre-computation of tokens, enhancing overall performance and developer experience.