-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(eviction): Add eviction based on RSS memory. DRAFT #4772
base: main
Are you sure you want to change the base?
feat(eviction): Add eviction based on RSS memory. DRAFT #4772
Conversation
fixes dragonflydb#4011 Signed-off-by: Stepan Bagritsevich <[email protected]>
Have not looked at the code, just commenting on the PR description and thank you for making the effort to describe the background behind the PR. |
At the end the goal is to understand how much RSS is being used by the app (and we have this ability via We can also explore the option to test later versions of mimalloc (aka 2.2.2). |
@adiholden
|
so based on |
MEMORY DEFRAGMENT helps. Strange that when I set mem_defrag_threshold to 0.2 it didn't help:
|
In that I would like to suggest changing the problem statement. We wanted to trigger eviction based on RSS to account for use-cases when a default heap uses lots of memory but data heap shows relatively low levels, which does not wake eviction. This is the use-case we wanted to fix but RSS direction seems unreliable. I think we should write a design document stating why eviction based on RSS does not reliably work, mention the case of defragmentation and allocator caching interfering and suggest another solution - where we still use "used memory" from data heap like today but we also account from used memory of the default heap. we should also fix the defragmentation heuristic - to be more agressive in low-memory use-cases but it's another task. @adiholden - thoughts? |
fixes #4011
THIS IS A DRAFT.
The PR is added to investigate how RSS memory works and whether eviction can be triggered based on it.
Current problem: mimalloc does not release RSS memory back to the OS (as was mentioned here and here). Alternatively, the OS might not be reclaiming this freed memory due to a lack of memory pressure (as noted here). This is a common issue in mimalloc, and I have tried various options, but only one of them helped. Issues that I have used during the investigation: 1, 2, 3, 4, 5
The steps I took:
test_cache_eviction_with_rss_deny_oom_simple_case
. It simply generates data and checks that we are not evicting too many items (since RSS is not decreasing). Thus, theused_memory
after eviction should be around 70% ofmax_memory
(asrss_deny_oom_ratio
is set to 80%).force_decommit_threshold
items.mi_option_purge_delay
to 0, and it helped. I don't know why it works, as it only reduces the memory purging delay from 10ms to 0ms. However, when it was set to the default value, RSS memory didn't change at all, but after setting it to 0, it started decreasing with a small delay. Still, we are evicting a lot due to the delay.From the output, we can see that we are not evicting too much data: the current used memory stopped decreasing and is slightly below the RSS eviction threshold. I also checked the mimalloc stats, and it frees around
150MiB
, which is the expected behavior.test_cache_eviction_with_rss_deny_oom_two_waves
, which does almost the same but with two waves of data population. The first wave remains the same. Then, we wait until eviction stops and populate around 20% of max_memory. The problems are:1. Since RSS used memory does not decrease after the first wave, we get an
OOM
error for the second wave from theShouldDenyOnOOM
method2. I know how to fix it, but for testing purposes, I temporarily removed this logic to check how eviction works for the second wave.
3. As a result, we are not evicting much data for the second wave due to how the eviction algorithm works.
4. I also know how to fix this, but on the other hand, after fixing it, a potential issue could arise where, in some cases, we might evict a large amount of data (though still much less than in our initial naive approach).
So, the current approach is not aggressive, meaning that in all cases, it will either evict a sufficient amount of data or slightly less than needed, leading to an OOM. I think this is better than evicting more than necessary.
Conclusion:
We need to find a way to accurately calculate real RSS usage. The possible steps are:
echo 1 | sudo tee /proc/sys/vm/drop_caches
echo 1 | sudo tee /proc/sys/vm/overcommit_memory
If these commands work, we can invoke them from the control plane side when we detect a significant difference between
used_memory
andrss_used_memory
.Since we will improve our RSS memory calculation, even if it updates with a delay, the current approach should work fine in most cases. In some scenarios, we may still encounter OOM, but I don't think this will be a common case.
Note: I also tried changing the
mem_defrag_threshold
, but it didn’t help. So, I believe the issue is not related to fragmentation.