OCPBUGS-70201: ctrcfg: set increase ulimits when upgrading from 4.20 to 4.21#5516
Conversation
|
@haircommander: This pull request references Jira Issue OCPBUGS-70201, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
pkg/controller/container-runtime-config/container_runtime_config_controller.go
Outdated
Show resolved
Hide resolved
c19a3dd to
6d7228b
Compare
|
failures seem the same as #5459 /skip |
|
/test e2e-aws-ovn |
|
|
||
| ctrl := &Controller{ | ||
| templatesDir: templatesDir, | ||
| namespace: namespace, |
There was a problem hiding this comment.
thought (non-blocking): Will this ConfigMap always be in the MCO namespace? If so, could we instead reference ctrlcommon.MCONamespace?
There was a problem hiding this comment.
works for me, updated
| const ( | ||
| componentName = "machine-config-controller" | ||
| componentName = "machine-config-controller" | ||
| componentNamespace = "openshift-machine-config-operator" |
There was a problem hiding this comment.
thought (non-blocking): We have a constant MCONamespace in github.com/openshift/machine-config-operator/pkg/controller/common aka ctrlcommon.MCONamespace that could be used instead.
|
|
||
| // Create the crio-default-ulimits MC for all the available pools | ||
| for _, pool := range mcpPoolsAll { | ||
| if pool.Name != ctrlcommon.MachineConfigPoolMaster && pool.Name != ctrlcommon.MachineConfigPoolWorker { |
There was a problem hiding this comment.
question (non-blocking): Why exclude other MachineConfigPools?
I hope that's not a silly question!
There was a problem hiding this comment.
I basically copied @sohankunkerkar on this one, which seems to have been inspired by @yuqi-zhang https://github.com/openshift/machine-config-operator/pull/4635/files#r1851191355
There was a problem hiding this comment.
Ahhh, that's a great point that I completely forgot about. This makes sense now. Thanks for pointing me to that!
6d7228b to
a666b72
Compare
QiWang19
left a comment
There was a problem hiding this comment.
LGTM. One nit to fix the CI test.
We can get QE verified before merging.
| const ( | ||
| componentName = "machine-config-controller" | ||
| componentName = "machine-config-controller" | ||
| componentNamespace = "openshift-machine-config-operator" |
There was a problem hiding this comment.
We can drop the const definition since ctrlcommon.MCONamespace will be used
9c0db36 to
ec6c8e2
Compare
in cri-o 1.33, a change cri-o/cri-o#8962 was made to the default limits set for CRI-O. Now, the ulimit nofile is set much lower, with space to set it higher. however, some workloads don't expect this change, and fail (see https://issues.redhat.com/browse/OCPBUGS-62095) This was worked around temporarily in openshift#5308, but that workaround was not intendd to be carried in to 4.21. Instead, we should drop-in an ignition file on upgrades from 4.20 to 4.21 to make sure existing clusters don't get this change, but new clusters started in 4.21 do. This was entirely based on openshift#4715 Signed-off-by: Peter Hunt <[email protected]>
ec6c8e2 to
bcb18a0
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cheesesashimi, haircommander The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
in cluster launched by |
|
@haircommander: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@haircommander: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/skip |
930bff1
into
openshift:release-4.20
|
@haircommander: Jira Issue Verification Checks: Jira Issue OCPBUGS-70201 Jira Issue OCPBUGS-70201 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Fix included in accepted release 4.20.0-0.nightly-2026-01-16-181948 |
so upgrades go through openshift/machine-config-operator#5516 Signed-off-by: Peter Hunt <[email protected]>
in cri-o 1.33, a change cri-o/cri-o#9401 was made to the short name mode for CRI-O. Now, CRI-O is enforcing short names, which means when an image is unqualified (short name) and has an ambiguous pull path (multiple different names returned), cri-o fails to pull the image. This may break users however, and so we shouldn't upgrade with it. We should drop-in an ignition file on upgrades from 4.20 to 4.21 to make sure existing clusters don't get this change, but new clusters started in 4.21 do. This was entirely based on openshift#5516, which has its own inspiration history Signed-off-by: Peter Hunt <[email protected]>
in cri-o 1.33, a change cri-o/cri-o#9401 was made to the short name mode for CRI-O. Now, CRI-O is enforcing short names, which means when an image is unqualified (short name) and has an ambiguous pull path (multiple different names returned), cri-o fails to pull the image. This may break users however, and so we shouldn't upgrade with it. We should drop-in an ignition file on upgrades from 4.20 to 4.21 to make sure existing clusters don't get this change, but new clusters started in 4.21 do. This was entirely based on openshift#5516, which has its own inspiration history Signed-off-by: Peter Hunt <[email protected]>
in cri-o 1.33, a change cri-o/cri-o#9401 was made to the short name mode for CRI-O. Now, CRI-O is enforcing short names, which means when an image is unqualified (short name) and has an ambiguous pull path (multiple different names returned), cri-o fails to pull the image. This may break users however, and so we shouldn't upgrade with it. We should drop-in an ignition file on upgrades from 4.20 to 4.21 to make sure existing clusters don't get this change, but new clusters started in 4.21 do. This was entirely based on openshift#5516, which has its own inspiration history Signed-off-by: Peter Hunt <[email protected]>
in cri-o 1.33, a change cri-o/cri-o#9401 was made to the short name mode for CRI-O. Now, CRI-O is enforcing short names, which means when an image is unqualified (short name) and has an ambiguous pull path (multiple different names returned), cri-o fails to pull the image. This may break users however, and so we shouldn't upgrade with it. We should drop-in an ignition file on upgrades from 4.20 to 4.21 to make sure existing clusters don't get this change, but new clusters started in 4.21 do. This was entirely based on openshift#5516, which has its own inspiration history Signed-off-by: Peter Hunt <[email protected]>
in cri-o 1.33, a change cri-o/cri-o#8962 was made to the default limits set for CRI-O. Now, the ulimit nofile is set much lower, with space to set it higher. however, some workloads don't expect this change, and fail (see https://issues.redhat.com/browse/OCPBUGS-62095)
This was worked around temporarily in #5308, but that workaround was not intendd to be carried in to 4.21.
Instead, we should drop-in an ignition file on upgrades from 4.20 to 4.21 to make sure existing clusters don't get this change, but new clusters started in 4.21 do.
This was entirely based on #4715
- What I did
- How to verify it
- Description for the changelog