-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replication Flow Control – Prioritizing replication traffic in the replica #1838
base: unstable
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #1838 +/- ##
=========================================
Coverage 70.97% 70.98%
=========================================
Files 123 123
Lines 65665 65686 +21
=========================================
+ Hits 46608 46628 +20
- Misses 19057 19058 +1
🚀 New features to boost your workflow:
|
d52eadb
to
e3bcd5f
Compare
Adds Replication flow control (repl-flow-control) to adjust replication read frequency based on buffer pressure. Helps replicas keep up with replication data and reduces primary buffer utilization and overflows. - Dynamic replication read scaling based on buffer pressure. - Reduces full syncs by increasing replication reads when needed. - Improves replication responsiveness, reduces data staleness. - Trade-offs: Slightly higher client latency due to replication prioritization. Replication was handled like a normal client. Under high load in the replica, replication lag increased, making data stale and caused primary buffer overflows, triggering full syncs and high CPU/memory/I/O usage. - Fewer full syncs from buffer overruns. - Lower replication lag, fresher data on replicas. - More stable primary buffer usage, less swapping. - Slightly higher client latency due to replication prioritization. Signed-off-by: xbasel <[email protected]>
# If enabled, the replica invokes multiple reads per I/O event when it | ||
# detects replication pressure. | ||
# | ||
# Default: yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reservations about the default value is yes due to the lower performance.
@@ -1181,6 +1181,7 @@ typedef struct client { | |||
/* Input buffer and command parsing fields */ | |||
sds querybuf; /* Buffer we use to accumulate client queries. */ | |||
size_t qb_pos; /* The position we have read in querybuf. */ | |||
int qb_full_read; /* True if the last read returned the maximum allowed bytes */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a boolean variable (I check the codes, the variable is 1 or 0), how about change to is_qb_full_read or similar? It is easy to read for others.
return 0; | ||
} | ||
|
||
bool is_last_iteration = iteration >= server.repl_cur_reads_per_io_event; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not find where this variable repl_cur_reads_per_io_event initialization, did you do in somewhere else? If not, it is dangerous, please initialize it.
|
||
bool is_last_iteration = iteration >= server.repl_cur_reads_per_io_event; | ||
|
||
if (is_last_iteration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I don't really understand why this needs to be adaptive. As long as we got a full read, why can't we repeat up until the max io event size? There is a comment about "The read limit ramps up gradually if full reads continue, avoiding sudden spikes.", but we are just deferring the spikes until later. Most users don't like replication lag, they would rather have the most up to date data if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I don't really understand why this needs to be adaptive. As long as we got a full read, why can't we repeat up until the max io event size? There is a comment about "The read limit ramps up gradually if full reads continue, avoiding sudden spikes.", but we are just deferring the spikes until later. Most users don't like replication lag, they would rather have the most up to date data if possible.
I have no strong opinion here, probably reading up to the max reads is enough (and simpler).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to me like we're fixing a stability issue. Why would anyone want to disable it? Let's discuss if we actually need a config for this. I think we maybe don't need it and we can just keep this always enabled.
I think the benchmark numbers don't give a fair picture. Without this feature, there is a problem of replication lag and for the full sync, it means extra resources used by replica and primary and maybe even extra latency for commands sent to the primary.
Also, this affects only the latency of read-from-replicas. I think this is less important than read-from-primary. If the replica is not fast enough, the client can read from the primary or from another replica. If they do that, it's good because it will reduce the load on the overloaded replica.
Overview
This PR introduces Replication Flow Control (
repl-flow-control
), a dynamic mechanism that prioritizes replication traffic on the replica side. By detecting replication pressure and adjusting read frequency adaptively, it reduces the risk of primary buffer overflows and full syncs.Problem
In high-load scenarios, a replica might not consume replication data fast enough, leading to backpressure on the primary. When the primary’s buffer overflows, it drops the replica connection, triggering a full sync, a costly operation that impacts system performance.
Without this feature:
Solution: Replication Flow Control
repl-flow-control
enables the replica to dynamically increase its replication read rate if it detects that replication data is accumulating. The mechanism operates as follows:Detecting replication pressure
Each read from the primary is checked against the max real byte limit. If the read hit the limit, suggesting more data is likely available.
Prioritizing replication reads
If replication pressure is detected, the replica invokes multiple reads per I/O event instead of a single one. This allows the replica to catch up faster, reducing memory consumption and buffer overflows on the primary.
Adaptive flow control
The read limit ramps up gradually if full reads continue, avoiding sudden spikes. If a read does not fill the buffer, the limit is reduced. Increases are rate-limited to once every 100ms, avoiding over-aggressive read bursts. There's a configurable maximum number of reads.
Performance Impact
Test setup:
Latency and Throughput Changes
📌 Observations:
TODO
Implements #1596