Skip to content

Commit 3f6581b

Browse files
JimB123zuiderkwast
andauthored
Update and stabilize defrag tests (#1762)
A number of tests related to defrag have had stability problems. One reason for stability issues is that the tests are run sequentially on the same server, and it takes jemalloc some time to release freed pages. This is especially noticed immediately after a flushall. Over a period of 10 seconds, it is observable that the "fragmentation bytes" can decrease by several MB. Another reason is that there's no standardization between tests. For each test, people have been independently hacking/tweaking the success criteria, without addressing underlying issues. This update revamps all of the defrag tests: * A fresh server is started for each test. Running each test in isolation improves stability. * A uniform function `log_frag` is now used for debug logging * A uniform function `perform_defrag_test` ensures that each test is written and executed in a uniform fashion. Limits are imposed to ensure that the defrag results are consistent/reproducible. The intent is to eliminate failures do to various tweaks to values in individual tests. * Latency is tested much more strictly for most tests, reflecting the recent improvements to defrag latency. * The test `defrag edge case` has been removed. This test attempted to create N pages with EXACTLY equal fragmentation in an attempt to confuse the defrag logic. It's unlikely that this test was performing correctly, and had questionable value. * Tests for hash/list/set/zset/stream have been separated and standardized. It was unlikely that the old test was performing properly as none of the actual data structures were fragmented! It's noted that pubsub doesn't appear to be defragging correctly. The old test was based on deletion of strings (only) which doesn't actually reflect what happens when a pubsub channel is removed. The test has been reduced to only check that pubsub is not damaged during defrag - but doesn't test for defrag efficacy. This isn't likely a significant issue as it would be unlikely to create many thousands of pubsub channels and then have associated fragmentation issues. #1774 Resolves: #1746 --------- Signed-off-by: Jim Brunner <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]>
1 parent 6156590 commit 3f6581b

File tree

2 files changed

+514
-642
lines changed

2 files changed

+514
-642
lines changed

tests/support/test.tcl

+6-2
Original file line numberDiff line numberDiff line change
@@ -124,15 +124,19 @@ proc assert_refcount_morethan {key ref} {
124124

125125
# Wait for the specified condition to be true, with the specified number of
126126
# max retries and delay between retries. Otherwise the 'elsescript' is
127-
# executed.
128-
proc wait_for_condition {maxtries delay e _else_ elsescript} {
127+
# executed. If 'debugscript' is provided, it is executed after failure of
128+
# the confition (before the retry delay).
129+
proc wait_for_condition {maxtries delay e _else_ elsescript {_debug_ ""} {debugscript ""}} {
129130
while {[incr maxtries -1] >= 0} {
130131
set errcode [catch {uplevel 1 [list expr $e]} result]
131132
if {$errcode == 0} {
132133
if {$result} break
133134
} else {
134135
return -code $errcode $result
135136
}
137+
if {$_debug_ == "debug"} {
138+
uplevel 1 $debugscript
139+
}
136140
after $delay
137141
}
138142
if {$maxtries == -1} {

0 commit comments

Comments
 (0)