Replication Flow Control – Prioritizing replication traffic in the replica #1838

xbasel · 2025-03-11T17:29:17Z

Overview

This PR introduces Replication Flow Control (repl-flow-control), a dynamic mechanism that prioritizes replication traffic on the replica side. By detecting replication pressure and adjusting read frequency adaptively, it reduces the risk of primary buffer overflows and full syncs.

Problem

In high-load scenarios, a replica might not consume replication data fast enough, leading to backpressure on the primary. When the primary’s buffer overflows, it drops the replica connection, triggering a full sync, a costly operation that impacts system performance.

Without this feature:

Replication reads occur at a fixed rate, irrespective of data pressure.
If the replica falls behind, the primary accumulates replication data leading to higher memory utilization.
Once the primary buffer overflows, the connection drops, forcing a full sync.
Full syncs cause high memory, CPU, and I/O spikes.

Solution: Replication Flow Control

repl-flow-control enables the replica to dynamically increase its replication read rate if it detects that replication data is accumulating. The mechanism operates as follows:

Detecting replication pressure
Each read from the primary is checked against the max real byte limit. If the read hit the limit, suggesting more data is likely available.

Prioritizing replication reads
If replication pressure is detected, the replica invokes multiple reads per I/O event instead of a single one. This allows the replica to catch up faster, reducing memory consumption and buffer overflows on the primary.

Adaptive flow control
The read limit ramps up gradually if full reads continue, avoiding sudden spikes. If a read does not fill the buffer, the limit is reduced. Increases are rate-limited to once every 100ms, avoiding over-aggressive read bursts. There's a configurable maximum number of reads.

Performance Impact

Test setup:

Bombard the replica with expensive commands, leading to high CPU utilization
Write to the main database to trigger replication traffic

Latency and Throughput Changes

Metric	Before (repl-flow-control Disabled)	After (repl-flow-control Enabled)
Throughput (requests/sec)	941.71	760.98
Avg Latency (ms)	52.865	65.534
p50 Latency (ms)	59.743	68.543
p95 Latency (ms)	79.231	106.687
p99 Latency (ms)	90.303	126.527
Max Latency (ms)	188.031	385.535

📌 Observations:

Replication stability improves,no full syncs were observed after enabling flow control.
Higher latency for normal clients due to increased resource allocation for replication.
CPU and memory usage remain stable, with no major overhead.
Replica throughput slightly decreases as replication takes priority.

TODO

Consider limiting the maximum number of reads per event to a ratio of the total number of events returned by the epoll cycle. For example, if the ratio is 20% and EPOLL returns 100 events, the replica can read from the primary up to 20 times per primary I/O event.

Implements #1596

codecov · 2025-03-11T17:45:13Z

Codecov Report

Attention: Patch coverage is 80.00000% with 5 lines in your changes missing coverage. Please review.

Project coverage is 70.98%. Comparing base (bcd2f95) to head (e846d9e).

Files with missing lines	Patch %	Lines
src/config.c	0.00%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff            @@
##           unstable    #1838   +/-   ##
=========================================
  Coverage     70.97%   70.98%           
=========================================
  Files           123      123           
  Lines         65665    65686   +21     
=========================================
+ Hits          46608    46628   +20     
- Misses        19057    19058    +1

Files with missing lines	Coverage Δ
src/networking.c	`88.95% <100.00%> (+0.06%)`	⬆️
src/server.c	`87.54% <100.00%> (-0.03%)`	⬇️
src/server.h	`100.00% <ø> (ø)`
src/config.c	`78.14% <0.00%> (-0.25%)`	⬇️

... and 14 files with indirect coverage changes

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Adds Replication flow control (repl-flow-control) to adjust replication read frequency based on buffer pressure. Helps replicas keep up with replication data and reduces primary buffer utilization and overflows. - Dynamic replication read scaling based on buffer pressure. - Reduces full syncs by increasing replication reads when needed. - Improves replication responsiveness, reduces data staleness. - Trade-offs: Slightly higher client latency due to replication prioritization. Replication was handled like a normal client. Under high load in the replica, replication lag increased, making data stale and caused primary buffer overflows, triggering full syncs and high CPU/memory/I/O usage. - Fewer full syncs from buffer overruns. - Lower replication lag, fresher data on replicas. - More stable primary buffer usage, less swapping. - Slightly higher client latency due to replication prioritization. Signed-off-by: xbasel <[email protected]>

hwware · 2025-03-12T15:06:41Z

valkey.conf

+# If enabled, the replica invokes multiple reads per I/O event when it
+# detects replication pressure.
+#
+# Default: yes


I have reservations about the default value is yes due to the lower performance.

hwware · 2025-03-12T15:19:35Z

src/server.h

@@ -1181,6 +1181,7 @@ typedef struct client {
    /* Input buffer and command parsing fields */
    sds querybuf;        /* Buffer we use to accumulate client queries. */
    size_t qb_pos;       /* The position we have read in querybuf. */
+    int qb_full_read;    /* True if the last read returned the maximum allowed bytes */


It looks like a boolean variable (I check the codes, the variable is 1 or 0), how about change to is_qb_full_read or similar? It is easy to read for others.

hwware · 2025-03-12T15:47:59Z

src/networking.c

+        return 0;
+    }
+
+    bool is_last_iteration = iteration >= server.repl_cur_reads_per_io_event;


I do not find where this variable repl_cur_reads_per_io_event initialization, did you do in somewhere else? If not, it is dangerous, please initialize it.

madolson · 2025-03-12T17:09:07Z

src/networking.c

+
+    bool is_last_iteration = iteration >= server.repl_cur_reads_per_io_event;
+
+    if (is_last_iteration) {


I suppose I don't really understand why this needs to be adaptive. As long as we got a full read, why can't we repeat up until the max io event size? There is a comment about "The read limit ramps up gradually if full reads continue, avoiding sudden spikes.", but we are just deferring the spikes until later. Most users don't like replication lag, they would rather have the most up to date data if possible.

I suppose I don't really understand why this needs to be adaptive. As long as we got a full read, why can't we repeat up until the max io event size? There is a comment about "The read limit ramps up gradually if full reads continue, avoiding sudden spikes.", but we are just deferring the spikes until later. Most users don't like replication lag, they would rather have the most up to date data if possible.

I have no strong opinion here, probably reading up to the max reads is enough (and simpler).

zuiderkwast

This looks to me like we're fixing a stability issue. Why would anyone want to disable it? Let's discuss if we actually need a config for this. I think we maybe don't need it and we can just keep this always enabled.

I think the benchmark numbers don't give a fair picture. Without this feature, there is a problem of replication lag and for the full sync, it means extra resources used by replica and primary and maybe even extra latency for commands sent to the primary.

Also, this affects only the latency of read-from-replicas. I think this is less important than read-from-primary. If the replica is not fast enough, the client can read from the primary or from another replica. If they do that, it's good because it will reduce the load on the overloaded replica.

xbasel marked this pull request as draft March 11, 2025 17:42

xbasel force-pushed the flowcontrol branch from 8bc5f5b to b2783ee Compare March 11, 2025 17:45

xbasel changed the title ~~Replication Flow Control – Prioritizing replica reads to prevent primary buffer overflows and high replication lag~~ Replication Flow Control – Prioritizing replication traffic in the replica side Mar 11, 2025

xbasel force-pushed the flowcontrol branch 5 times, most recently from d52eadb to e3bcd5f Compare March 11, 2025 18:45

xbasel force-pushed the flowcontrol branch from e3bcd5f to e846d9e Compare March 11, 2025 19:18

xbasel mentioned this pull request Mar 11, 2025

[NEW] Introduce Quality-of-Service for the replication stream to reduce full sync as a result of buffer overruns #1596

Open

xbasel changed the title ~~Replication Flow Control – Prioritizing replication traffic in the replica side~~ Replication Flow Control – Prioritizing replication traffic in the replica Mar 11, 2025

xbasel marked this pull request as ready for review March 11, 2025 20:09

hwware reviewed Mar 12, 2025

View reviewed changes

madolson reviewed Mar 12, 2025

View reviewed changes

zuiderkwast reviewed Mar 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replication Flow Control – Prioritizing replication traffic in the replica #1838

Replication Flow Control – Prioritizing replication traffic in the replica #1838

xbasel commented Mar 11, 2025 •

edited

Loading

codecov bot commented Mar 11, 2025 •

edited

Loading

hwware Mar 12, 2025

hwware Mar 12, 2025

hwware Mar 12, 2025

madolson Mar 12, 2025 •

edited

Loading

xbasel Mar 12, 2025

zuiderkwast left a comment


		bool is_last_iteration = iteration >= server.repl_cur_reads_per_io_event;

		if (is_last_iteration) {

Replication Flow Control – Prioritizing replication traffic in the replica #1838

Are you sure you want to change the base?

Replication Flow Control – Prioritizing replication traffic in the replica #1838

Conversation

xbasel commented Mar 11, 2025 • edited Loading

Overview

Problem

Solution: Replication Flow Control

Performance Impact

Latency and Throughput Changes

codecov bot commented Mar 11, 2025 • edited Loading

Codecov Report

hwware Mar 12, 2025

Choose a reason for hiding this comment

hwware Mar 12, 2025

Choose a reason for hiding this comment

hwware Mar 12, 2025

Choose a reason for hiding this comment

madolson Mar 12, 2025 • edited Loading

Choose a reason for hiding this comment

xbasel Mar 12, 2025

Choose a reason for hiding this comment

zuiderkwast left a comment

Choose a reason for hiding this comment

xbasel commented Mar 11, 2025 •

edited

Loading

codecov bot commented Mar 11, 2025 •

edited

Loading

madolson Mar 12, 2025 •

edited

Loading