Support making atomic write requests idempotent #2197

empiredan · 2025-02-21T07:43:11Z

Motivation

Pegasus does not support duplicating atomic write requests including incr, check_and_set and check_and_mutate since they are not idempotent. In practice, various applications use atomic write interfaces in many scenarios. However, such applications cannot use duplication to synchronize data, and therefore cannot benefit from the high performance that duplication provides.

Design

Due to the urgency of the requirements, the first version of the idempotent implementation for the atomic writes should be as simple as possible, without making fundamental changes to the write path.

Therefore, we decided to implement the idempotence of atomic write requests as follows: for each replica, ensure that only one atomic write request is being processed in the write pipeline at any given time. Once the replica server receives an atomic request, firstly it will be cached. It will not be pushed into the write pipeline until all requests before it have been applied. The write pipeline consists of the following stages:

read the current value from RocksDB, calculate the final value according to specific semantics of requested atomic write and build the idempotent request based on it;
append the corresponding mutation to plog;
broadcast the prepare requests to the secondary replicas;
apply the final result back to RocksDB ultimately.

The primary replicas have all 1 ~ 4 stages, and at last reply to the client while the secondary replicas only have stages 2 and 4.

Task List

Support idempotence in replica server

Make incr requests idempotent:

Make check_and_set requests idempotent:

feat(make_idempotent): support making check_and_set request idempotent in pegasus_write_service::impl #2230

Infrastructure for idempotence:

Support idempotence in meta server

feat(make_idempotent): introduce new table-level APIs to meta servers to list/control idempotence for atomic writes #2205

Support idempotence in shell

The text was updated successfully, but these errors were encountered:

…egasus_server_write` and `replication_app_base` (#2196) #2197 To support idempotence, a new interface `make_idempotent()` is introduced and an existing interface `on_batched_write_requests()` is changed for both the classes `pegasus_server_write` and `replication_app_base`. This is different from what we have done for `pegasus_write_service` and `pegasus_write_service::impl`, both of which provide `make_idempotent()` and `put()` by the following PRs: - #2185 - #2192 `make_idempotent()` for `replication_app_base` is provided as a virtual function called by primary replicas, implemented internally by `pegasus_server_impl` and `pegasus_server_write`. `on_batched_write_requests` for `replication_app_base` is the same. It is changed with a new parameter `original_request` added. It is just the original request received from the client. It must be an atomic request (i.e. `incr`, `check_and_set` and `check_and_mutate`) if it is non-null, and used to decide if a write request is atomic and generate the response corresponding to the atomic write request.

…rimary replicas (#2198) #2197 Suppose that a client issues an `incr` request to increase the base value `100` by `1`. If the current configuration requires all atomic write requests to be idempotent, the primary replica will make this request idempotent by following steps after receiving it: 1. A mutation with `is_blocking = true` will be created to store this request and then added to the mutation queue as a blocking mutation. 2. Once this blocking mutation is ready to get popped, it will be the first element of the entire queue, thereby blocking it (which means any mutation cannot be dequeued from it). 3. This blocking mutation cannot get popped until all previous write requests have been applied. 4. After popped, the current base value `100` is read from the storage engine, and after performing the `incr` operation, a single put request is created to store the final value `101`. 5. Another mutation is then created to store this idempotent single put request, which is subsequently added to the write pipeline, including writing to `plog` and broadcasting to secondary replicas.

… to list/control idempotence for atomic writes (#2205) #2197 We support enabling the idempotence of atomic write operations at the **meta** level. First, we introduce `atomic_idempotent` as **an attribute** of the `app_info` structure, which represents the basic properties of a table. This attribute is also persisted to remote storage and will be loaded from remote storage when the meta server starts. As for the **interface**, we implement the following on the meta server: - Support specifying this attribute when creating a table. - List API will return the entire `app_info` object with `atomic_idempotent` for each table. - Support getting/setting this attribute for any table.

…nt atomic writes (#2214) #2197 Previously in #2198, we implemented idempotence for each atomic write by blocking the entire mutation queue until the 2PC pipeline was drained. However, this significantly affects performance since the pipeline is stalled and all write requests get stuck in the mutation queue. To address this performance issue, we introduce a row-lock mechanism: each hash key and the highest decree currently in the 2PC phase are recorded in a hash table. For each atomic write request, if the maximum decree associated with its hash key has not yet been applied to the storage engine, the request is blocked in the mutation queue. Otherwise, the hash key is considered unlocked and the request can proceed into the 2PC phase at any time. To avoid the performance overhead of deserialization, we directly use the `partition_hash` (an unsigned 64-bit integer) from the client instead of the hash key. This also makes memory usage more predictable, as the `partition_hash` has a fixed size. Additionally, to mitigate the performance impact caused by frequent insertions and deletions in the row-lock hash table, we introduce an LRU strategy: keys are only evicted when the hash table exceeds a certain size threshold, and only the least recently used keys with no active usage are removed. Give a concrete example to illustrate how atomic write requests are handled after introducing row locks. Suppose a client issues an `incr` request to a primary replica. If the primary replica has been configured to make all atomic write requests idempotent, then: 1. A mutation will be created as a blocking candidate to hold this atomic write request and then appended to the mutation queue. 2. This mutation will be blocked and cannot get popped once the hash key contained in it is locked(i.e. the maximum decree associated with the hash key has not been applied to the storage engine). 3. This mutation can get popped only after its hash key becomes unlocked. 4. Popped from the mutation queue, the current base value 100 is read from the storage engine, and create a single put request to store the final value 101. 5. Another mutation is then created to hold this idempotent single put request. 6. Subsequently the new mutation enters 2PC phase, appended to `plog` and broadcast to secondary replicas.

… `.app-info` file for each replica (#2220) #2197 In #2205, we introduced a new attribute `atomic_idempotent` to meta, which is used to enable/disable the idempotence of atomic write operations. This attribute should be broadcast to each replica: whenever one replica is promoted to the primary, by this attribute it can decide whether to make all atomic writes idempotent. This attribute will be persisted into `.app-info` file to ensure it can be loaded after restarted.

…ls` commands on shell (#2221) #2197 Support `atomic_idempotent` for `create` and `ls` by: - Add a flag to the `create` command to decide whether the created table is `atomic_idempotent` (false by default). - Add a column `atomic_idempotent` to the result shown by `ls` command.

…to decide whether to make all atomic writes idempotent (#2222) #2197 Since in #2220 `atomic_idempotent` was synced from meta server to replica server, it could be used to decide whether a primary replica makes all atomic writes idempotent.

…mpotent` on shell (#2229) #2197 Following commands are supported on shell to operate `atomic_idempotent`, which decides whether all atomic requests written to a table are idempotent": - `get_atomic_idempotent` - `enable_atomic_idempotent` - `disable_atomic_idempotent` All of these commands are based on a table, thus an argument `app_name` is required.

empiredan added the type/enhancement Indicates new feature requests label Feb 21, 2025

empiredan mentioned this issue Feb 21, 2025

feat(make_idempotent): support making incr request idempotent in pegasus_server_write and replication_app_base #2196

Merged

empiredan self-assigned this Feb 25, 2025

empiredan mentioned this issue Feb 25, 2025

feat(make_idempotent): support making write requests idempotent for primary replicas #2198

Merged

empiredan mentioned this issue Mar 18, 2025

feat(make_idempotent): introduce new table-level APIs to meta servers to list/control idempotence for atomic writes #2205

Merged

empiredan mentioned this issue Mar 19, 2025

perf(make_idempotent): introduce row lock to improve 2PC for idempotent atomic writes #2214

Merged

This was referenced Apr 9, 2025

feat(make_idempotent): sync atomic_idempotent from meta server into .app-info file for each replica #2220

Merged

feat(make_idempotent): support atomic_idempotent for create and ls commands on shell #2221

Merged

empiredan mentioned this issue Apr 14, 2025

feat(make_idempotent): each primary replica uses atomic_idempotent to decide whether to make all atomic writes idempotent #2222

Merged

This was referenced Apr 15, 2025

feat(make_idempotent): support getting/enabling/disabling atomic_idempotent on shell #2229

Merged

feat(make_idempotent): support making check_and_set request idempotent in pegasus_write_service::impl #2230

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support making atomic write requests idempotent #2197

Support making atomic write requests idempotent #2197

empiredan commented Feb 21, 2025 •

edited

Loading

Support making atomic write requests idempotent #2197

Support making atomic write requests idempotent #2197

Comments

empiredan commented Feb 21, 2025 • edited Loading

Motivation

Design

Task List

Support idempotence in replica server

Support idempotence in meta server

Support idempotence in shell

empiredan commented Feb 21, 2025 •

edited

Loading