-
Notifications
You must be signed in to change notification settings - Fork 112
WIP: CNF-18922: Dynamic memory enforcement #1369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@MarSik: This pull request references CNF-16642 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MarSik The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retitle CNF-18922: Dynamic memory enforcement |
@MarSik: This pull request references CNF-18922 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/retitle WIP: CNF-18922: Dynamic memory enforcement |
/retest-required |
So this is a quick workaround until kubernetes/enhancements/issues/5460 gets resolved, right? |
@jmencak A workaround and demonstrator. Yep. |
f56b246
to
6dd716e
Compare
/retest-required |
@MarSik: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/relabel CNF-18922: Dynamic memory enforcement |
import os.path | ||
|
||
ROOT="/sys/fs/cgroup/kubepods.slice" | ||
ROOT_MAX=os.path.join(ROOT, "memory.max") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base line or the entrypoint from which we are performing all the calculation is /sys/fs/cgroup/kubepods.slice/memory.max
Does this value has already the memory system reserved subtract from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msivak@msivak-thinkpadx1carbongen12 ~> oc describe node testnode | grep -i 'memory:'
memory: 130895396Ki # capacity
memory: 129768996Ki # allocatable
sh-5.1# cat /sys/fs/cgroup/kubepods.slice/memory.max
132988309504
msivak@msivak-thinkpadx1carbongen12 ~> expr 132988309504 / 1024
129871396
I would say: Yes, it already does.
if profilecomponent.IsEnforceReservedMemoryEnabled(profile) { | ||
// Add dynamic memory enforcement service only if annotation is enabled and workload partitioning is enabled | ||
clusterHasWorkloadPartitioning := opts.PinningMode != nil && *opts.PinningMode == apiconfigv1.CPUPartitioningAllNodes | ||
if profilecomponent.IsEnforceReservedMemoryEnabled(profile) && clusterHasWorkloadPartitioning { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also check that MemoryManager configure to Static
which is suppose to be obvious if the topologyManager configured to single-numa-node
or restricted
but we're not checking that in this flow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? This has nothing to do with NUMA or logical memory tracking. The code enforces the runtime values only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the whole point was to avoid consuming the reserved memory used for the house keeping tasks, right?
Then if we won't have topologyManager configured to single-numa-node
/restricted
we won't have the memory manager enabled which required configuring reserved memory for the system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, in the worse case the script will work but will do nothing useful
burstable_current = int(open(BURSTABLE_CURRENT).read().strip()) | ||
guaranteed_current = guaranteed_max() | ||
|
||
burstable_max = root_max - guaranteed_current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to double check that memory reserved for workload partitioning is subtracted here as well.
This series adds a systemd service to the tuned system. This service runs in the cpu manager-like reconcile loop fashion, monitors all pod cgroups' memory and sets the limits to kubepods-burstable and kubepods-besteffort to protect the system from runaway pods without limits of their own.
kubepods-burstable is set to = node allocatable memory - sum(all guaranteed containers memory limit)
kubepods-besteffort is set to = kubepods-burstable limit - kubepods-bustable current consumption
Note: The boilerplate code was generated by the claude-4-sonnet AI model as interpreted by the Cursor IDE. The python service is my own.