MCO-1877: MCO-1879: MCO-1882: MCO-1884: Implement boot image skew enforcement MVP by djoshy · Pull Request #5428 · openshift/machine-config-operator

djoshy · 2025-11-19T21:57:12Z

This PR integrates the boot image skew enforcement API introduced in openshift/api#2357. This involves the following changes:

The operator now populates the bootImageSkewEnforcementStatus field in the MachineConfiguration object based on spec.bootImageSkewEnforcement, platform defaults and cluster version.
The boot image controller will now update the current boot image value in bootImageSkewEnforcementStatus on a successful boot image update. Note that this requires the skew enforcement to be set to Automatic mode, and all machinesets to be opt-ed in for boot image updates.
The operator sets Upgradeable=False when it detects the cluster is out of skew, determined by comparing the boot image values in bootImageSkewEnforcementStatus against the MCO's hardcoded skew limits. Before performing this check, the operator first verifies that the controller is neither in an error state nor currently performing boot image updates. If the controller is in an error state, the operator sets Upgradeable=False and propagates that error instead of proceeding with the skew check. If the controller is mid-update, the operator defers the skew check until later; this is to avoid race conditions.
Some unit tests have been added to sync_test.go and status_test.go to verify the above mechanisms.

Verifying API behavior

This verification will have to be done based on the platform. If the platform:

supports boot image updates and it is on by default(AWS and GCP at the time of writing), i.e. status.managedBootImagesStatus is set to All if spec.managedBootImages is empty. Then, skew enforcement status will be set to Automatic, with a boot image version estimated from cluster version. Then, the boot image controller will perform a sync which will update the boot image(if required) and after all resources have been successfully updated, it will update the boot image value stored in the skew enforcement status. The value set will be the OCP releaseVersion described by the coreos-bootimages configmap. Here's an example:

  spec:
    logLevel: Normal
    managementState: Managed
    operatorLogLevel: Normal
  status:
    bootImageSkewEnforcementStatus:
      automatic:
        ocpVersion: 4.21.0
      mode: Automatic
    conditions:
    - lastTransitionTime: "2025-11-19T22:06:06Z"
      message: Reconciled 3 of 3 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
        | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
      reason: BootImageConfigMapAdded
      status: "False"
      type: BootImageUpdateProgressing
    - lastTransitionTime: "2025-11-19T22:06:07Z"
      message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
        0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
      reason: BootImageConfigMapAdded
      status: "False"
      type: BootImageUpdateDegraded
    managedBootImagesStatus:
      machineManagers:
      - apiGroup: machine.openshift.io
        resource: machinesets
        selection:
          mode: All

supports boot image updates, but is not on by default(vsphere and Azure at the time of writing) i.e. status.managedBootImagesStatus is set to None if spec.managedBootImages is empty. Then, skew enforcement status will be set to Manual, with a boot image version estimated from cluster version. The object would now look like this:

  spec:
    logLevel: Normal
    managementState: Managed
    operatorLogLevel: Normal
  status:
    bootImageSkewEnforcementStatus:
      manual:
        mode: OCPVersion
        ocpVersion: 4.21.0
      mode: Manual
    conditions:
    - lastTransitionTime: "2025-11-19T22:06:06Z"
      message: Reconciled 0 of 0 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
        | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
      reason: BootImageConfigMapAdded
      status: "False"
      type: BootImageUpdateProgressing
    - lastTransitionTime: "2025-11-19T22:06:07Z"
      message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
        0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
      reason: BootImageConfigMapAdded
      status: "False"
      type: BootImageUpdateDegraded
    managedBootImagesStatus:
      machineManagers:
      - apiGroup: machine.openshift.io
        resource: machinesets
        selection:
          mode: None

The admin can choose to opt-in for boot image updates in this case(set spec.ManagedBootImages to All), and the operator should automatically switch the skew enforcement status to Automatic, with the appropriate boot image version. This would mean the object would finally look like this:

  spec:
    logLevel: Normal
    managementState: Managed
    operatorLogLevel: Normal
    managedBootImages:
      machineManagers:
      - apiGroup: machine.openshift.io
        resource: machinesets
        selection:
          mode: All
  status:
    bootImageSkewEnforcementStatus:
      automatic:
        ocpVersion: 4.21.0
      mode: Automatic
    conditions:
    - lastTransitionTime: "2025-11-19T22:06:06Z"
      message: Reconciled 3 of 3 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
        | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
      reason: BootImageConfigMapAdded
      status: "False"
      type: BootImageUpdateProgressing
    - lastTransitionTime: "2025-11-19T22:06:07Z"
      message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
        0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
      reason: BootImageConfigMapAdded
      status: "False"
      type: BootImageUpdateDegraded
    managedBootImagesStatus:
      machineManagers:
      - apiGroup: machine.openshift.io
        resource: machinesets
        selection:
          mode: All

does not support boot image updates(all other platforms at the time of writing) i.e. status.managedBootImagesStatus is empty and spec.managedBootImages cannot be set by the admin. Then, skew enforcement status will be set to Manual, with a boot image version estimated from cluster version. The object would now look like this:

  spec:
    logLevel: Normal
    managementState: Managed
    operatorLogLevel: Normal
  status:
    bootImageSkewEnforcementStatus:
      manual:
        mode: OCPVersion
        ocpVersion: 4.21.0
      mode: Manual

In this case, the admin is expected to manually perform boot image updates and then add a spec field like so:

spec:
  bootImageSkewEnforcement:
    mode: Manual
    manual:
      mode: OCPVersion
      ocpVersion: 4.21.2

The operator should then update the status to include this:

spec:
  bootImageSkewEnforcement:
    mode: Manual
    manual:
      mode: OCPVersion
      ocpVersion: 4.21.2
status:
  bootImageSkewEnforcementStatus:
      mode: OCPVersion
      ocpVersion: 4.21.2

The above snippet is if an admin had chosen to record the OCPVersion. In manual mode, the admin can also choose to to store the RHCOSVersion, like so:

spec:
  bootImageSkewEnforcement:
    mode: Manual
    manual:
      mode: RHCOSVersion
      rhcosVersion: 9.0.20251023-0
status:
  bootImageSkewEnforcementStatus:
    mode: Manual
    manual:
      mode: RHCOSVersion
      rhcosVersion: 9.0.20251023-0

Note that only one of RHCOSVersion or OCPVersion is permitted in Manual mode.

The admin can also choose to disable skew enforcement altogether by setting it None mode in spec.

spec:
  bootImageSkewEnforcement:
    mode: None
status:
  bootImageSkewEnforcementStatus:
    mode: None

Verifying upgrade block

Upgrades will be blocked when the cluster is to determined out of skew. This mechanism works the same way in manual and automatic mode, although it is likely easier to verify in manual mode. The current thresholds for a skew violation is set to when OCP first moved to RHEL9, which corresponds to RHEL version 9.2 and OCP version 4.13.0. The operator will perform semver comparisons of these thresholds against the boot image versions stored in bootImageSkewEnforcementStatus and set Upgradeable=False if necessary. To verify this, first set the mode to Manual with an out of skew boot image version like so:

  spec:
    bootImageSkewEnforcement:
      manual:
	mode: RHCOSVersion
        rhcosVersion: 9.0.20251023-0
      mode: Manual

Now, examine the machine-config CO object's conditions field, it should indicate an issue preventing upgrades like so:

$ oc get co machine-config -o yaml
...
  - lastTransitionTime: "2025-11-20T15:15:12Z"
    message: 'Upgrades have been disabled because the cluster is using RHCOS boot
      image version 9.0.20251023-0(RHEL version: 9.0), which is below the minimum
      required RHEL version 9.2. To enable upgrades, please update your boot images
      following the documentation at [TODO: insert link], or disable boot image skew
      enforcement at [TODO: insert link]'
    reason: ClusterBootImageSkewError
    status: "False"
    type: Upgradeable

Next, set the boot image to one within the skew limits:

  spec:
    bootImageSkewEnforcement:
      manual:
	mode: RHCOSVersion
        rhcosVersion: 9.2.20251023-0
      mode: Manual

Then, the Upgradeable condition should be restored back to True

  - lastTransitionTime: "2025-11-20T15:19:25Z"
    reason: AsExpected
    status: "True"
    type: Upgradeable

These set of steps can be repeated with the OCPVersion specified too. This comparison should only take place in Automatic and Manual mode however, as Automatic is only permitted on the status side, I don't think there is an easy way to test that(other than the units I've included).

In None mode, this version check should not take place.

Some caveats to note about Automatic mode:

The admin is not permitted to use Automatic mode within the spec. This is in an intentional choice because only the MCO will always be able to self determine if a platform is eligible for automatic skew enforcement.
In Automatic mode, API validations will prevent changing the boot image configuration to a setting other than All. To change the boot image configuration, the admin is first expected to go to Manual skew enforcement mode and then attempt to change the boot image configuration of the cluster.
In Automatic mode, if any machinesets are skipped for boot image updates(for example a marketplace or an unknown boot image was detected in any of the machinesets), the boot image controller will not update the boot image value stored in bootImageEnforcementStatus. This is because the cluster cannot be considered up to date on boot image if even one of the machine resources are out of skew.
In Automatic mode, the operator will only populate the OCPVersion. This is because each platform may not have the same RHCOS version of the boot image(for example, across marketplace streams) in a given release, and it would involve a lot of per-platform piping to correctly track the RHCOS version per machineset within the boot image controller. I did not deem this to be worth the effort, but am open to implementing that later if the need arises.

openshift-ci-robot · 2025-11-19T21:57:16Z

@djoshy: This pull request references MCO-1877 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

- What I did

This PR integrates the boot image skew enforcement API introduced in openshift/api#2357. This involves the following changes:

The operator now populates the bootImageSkewEnforcementStatus field in the MachineConfiguration object based on spec.bootImageSkewEnforcement, platform defaults and cluster conditions.

The boot image controller will now update the current boot image value in bootImageSkewEnforcementStatus on a successful boot image update. Note that this requires the skew enforcement to be set to Automatic mode, and all machinesets to be opt-ed in for boot image updates.

The operator will set Upgradeable=False if the cluster is to be detected to be out of skew. This is done by comparing the boot image values referenced in the bootImageSkewEnforcementStatus field against the MCO's hardcoded skew limits.

I've also added unit tests to verify the behaviors above.

- How to verify it
[TBD]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-11-19T21:57:17Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci-robot · 2025-11-20T15:03:19Z

@djoshy: This pull request references MCO-1877 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

- What I did

This PR integrates the boot image skew enforcement API introduced in openshift/api#2357. This involves the following changes:

The operator now populates the bootImageSkewEnforcementStatus field in the MachineConfiguration object based on spec.bootImageSkewEnforcement, platform defaults and cluster version.

The boot image controller will now update the current boot image value in bootImageSkewEnforcementStatus on a successful boot image update. Note that this requires the skew enforcement to be set to Automatic mode, and all machinesets to be opt-ed in for boot image updates.

The operator will set Upgradeable=False if the cluster is to be detected to be out of skew. This is done by comparing the boot image values referenced in the bootImageSkewEnforcementStatus field against the MCO's hardcoded skew limits.

I've also added a few unit tests to verify the above behaviors.

- How to verify it
The verification will have to be done based on the platform. If the platform

supports boot image updates and it is on by default(AWS and GCP at the time of writing), i.e. status.managedBootImagesStatus is set to All if spec.managedBootImages is empty. Then, skew enforcement status will be set to Automatic, with a boot image version estimated from cluster version. Then, the boot image controller will perform a sync which will update the boot image(if required) and after all resources have been successfully updated, it will update the boot image value stored in the skew enforcement status. The value set will be the OCP releaseVersion described by the coreos-bootimages configmap. Here's an example:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
 status:
   bootImageSkewEnforcementStatus:
     automatic:
       ocpVersion: 4.21.0
     mode: Automatic
   conditions:
   - lastTransitionTime: "2025-11-19T22:06:06Z"
     message: Reconciled 3 of 3 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
       | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateProgressing
   - lastTransitionTime: "2025-11-19T22:06:07Z"
     message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
       0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateDegraded
   managedBootImagesStatus:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: All
supports boot image updates, but is not on by default(vsphere and Azure at the time of writing) i.e. status.managedBootImagesStatus is set to None if spec.managedBootImages is empty. Then, skew enforcement status will be set to Manual, with a boot image version estimated from cluster version. The object would now look like this:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
 status:
   bootImageSkewEnforcementStatus:
     manual:
       mode: OCPVersion
       ocpVersion: 4.21.0
     mode: Manual
   conditions:
   - lastTransitionTime: "2025-11-19T22:06:06Z"
     message: Reconciled 0 of 0 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
       | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateProgressing
   - lastTransitionTime: "2025-11-19T22:06:07Z"
     message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
       0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateDegraded
   managedBootImagesStatus:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: None
The user can choose to opt-in for boot image updates in this case(set spec.ManagedBootImages to All), and the operator should automatically switch the skew enforcement status to Automatic, with the appropriate boot image version. This would mean the object would finally look like this:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
   managedBootImages:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: All
 status:
   bootImageSkewEnforcementStatus:
     automatic:
       ocpVersion: 4.21.0
     mode: Automatic
   conditions:
   - lastTransitionTime: "2025-11-19T22:06:06Z"
     message: Reconciled 3 of 3 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
       | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateProgressing
   - lastTransitionTime: "2025-11-19T22:06:07Z"
     message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
       0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateDegraded
   managedBootImagesStatus:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: All
does not support boot image updates(all other platforms at the time of writing) i.e. status.managedBootImagesStatus is empty and spec.managedBootImages cannot be set by the admin. Then, skew enforcement status will be set to Manual, with a boot image version estimated from cluster version. The object would now look like this:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
 status:
   bootImageSkewEnforcementStatus:
     manual:
       mode: OCPVersion
       ocpVersion: 4.21.0
     mode: Manual
In this case, the user is expected to manually perform boot image updates and then add a spec field like so:
spec:
 bootImageSkewEnforcement:
   mode: Manual
   manual:
     mode: OCPVersion
     ocpVersion: 4.21.2
The operator should then update the status to include this:
spec:
 bootImageSkewEnforcement:
   mode: Manual
   manual:
     mode: OCPVersion
     ocpVersion: 4.21.2
status:
 bootImageSkewEnforcementStatus:
     mode: OCPVersion
     ocpVersion: 4.21.2
The above snippet is if an admin had chosen to record the OCPVersion. In manual mode, the user can also choose to to store the RHCOSVersion, like so:
spec:
 bootImageSkewEnforcement:
   mode: Manual
   manual:
     mode: RHCOSVersion
     rhcosVersion: 9.0.20251023-0
status:
 bootImageSkewEnforcementStatus:
   mode: Manual
   manual:
     mode: RHCOSVersion
     rhcosVersion: 9.0.20251023-0
Some caveats to note:

The admin is not permitted to use Automatic mode within the spec. This is in an intentional choice because only the MCO will always be able to self determine if a platform is eligible for automatic skew enforcement.

In Automatic mode, If any machinesets are skipped for boot image updates(for example a marketplace or an unknown boot image was detected in any of the machinesets), the boot image controller will not update the boot image value stored in bootImageEnforcementStatus. This is because the cluster cannot be considered up to date on boot image if even one of the machine resources are out of skew.

In Automatic mode, the operator will only populate the OCPVersion. This is because each platform may not have the same RHCOS version of the boot image(for example, across marketplace streams) in a given release, and it would involve a lot of per-platform piping to correctly track the RHCOS version per machineset within the boot image controller. I did not deem this to be worth the effort, but am open to implementing that later if the need arises.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-11-20T15:24:48Z

@djoshy: This pull request references MCO-1877 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

This PR integrates the boot image skew enforcement API introduced in openshift/api#2357. This involves the following changes:

The operator now populates the bootImageSkewEnforcementStatus field in the MachineConfiguration object based on spec.bootImageSkewEnforcement, platform defaults and cluster version.

The boot image controller will now update the current boot image value in bootImageSkewEnforcementStatus on a successful boot image update. Note that this requires the skew enforcement to be set to Automatic mode, and all machinesets to be opt-ed in for boot image updates.

The operator will set Upgradeable=False if the cluster is to be detected to be out of skew. This is done by comparing the boot image values referenced in the bootImageSkewEnforcementStatus field against the MCO's hardcoded skew limits.

I've also added a few unit tests to verify the above behaviors.

Verifying API behavior

This verification will have to be done based on the platform. If the platform...

supports boot image updates and it is on by default(AWS and GCP at the time of writing), i.e. status.managedBootImagesStatus is set to All if spec.managedBootImages is empty. Then, skew enforcement status will be set to Automatic, with a boot image version estimated from cluster version. Then, the boot image controller will perform a sync which will update the boot image(if required) and after all resources have been successfully updated, it will update the boot image value stored in the skew enforcement status. The value set will be the OCP releaseVersion described by the coreos-bootimages configmap. Here's an example:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
 status:
   bootImageSkewEnforcementStatus:
     automatic:
       ocpVersion: 4.21.0
     mode: Automatic
   conditions:
   - lastTransitionTime: "2025-11-19T22:06:06Z"
     message: Reconciled 3 of 3 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
       | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateProgressing
   - lastTransitionTime: "2025-11-19T22:06:07Z"
     message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
       0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateDegraded
   managedBootImagesStatus:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: All
supports boot image updates, but is not on by default(vsphere and Azure at the time of writing) i.e. status.managedBootImagesStatus is set to None if spec.managedBootImages is empty. Then, skew enforcement status will be set to Manual, with a boot image version estimated from cluster version. The object would now look like this:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
 status:
   bootImageSkewEnforcementStatus:
     manual:
       mode: OCPVersion
       ocpVersion: 4.21.0
     mode: Manual
   conditions:
   - lastTransitionTime: "2025-11-19T22:06:06Z"
     message: Reconciled 0 of 0 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
       | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateProgressing
   - lastTransitionTime: "2025-11-19T22:06:07Z"
     message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
       0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateDegraded
   managedBootImagesStatus:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: None
The user can choose to opt-in for boot image updates in this case(set spec.ManagedBootImages to All), and the operator should automatically switch the skew enforcement status to Automatic, with the appropriate boot image version. This would mean the object would finally look like this:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
   managedBootImages:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: All
 status:
   bootImageSkewEnforcementStatus:
     automatic:
       ocpVersion: 4.21.0
     mode: Automatic
   conditions:
   - lastTransitionTime: "2025-11-19T22:06:06Z"
     message: Reconciled 3 of 3 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
       | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateProgressing
   - lastTransitionTime: "2025-11-19T22:06:07Z"
     message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
       0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateDegraded
   managedBootImagesStatus:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: All
does not support boot image updates(all other platforms at the time of writing) i.e. status.managedBootImagesStatus is empty and spec.managedBootImages cannot be set by the admin. Then, skew enforcement status will be set to Manual, with a boot image version estimated from cluster version. The object would now look like this:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
 status:
   bootImageSkewEnforcementStatus:
     manual:
       mode: OCPVersion
       ocpVersion: 4.21.0
     mode: Manual
In this case, the user is expected to manually perform boot image updates and then add a spec field like so:
spec:
 bootImageSkewEnforcement:
   mode: Manual
   manual:
     mode: OCPVersion
     ocpVersion: 4.21.2
The operator should then update the status to include this:
spec:
 bootImageSkewEnforcement:
   mode: Manual
   manual:
     mode: OCPVersion
     ocpVersion: 4.21.2
status:
 bootImageSkewEnforcementStatus:
     mode: OCPVersion
     ocpVersion: 4.21.2
The above snippet is if an admin had chosen to record the OCPVersion. In manual mode, the user can also choose to to store the RHCOSVersion, like so:
spec:
 bootImageSkewEnforcement:
   mode: Manual
   manual:
     mode: RHCOSVersion
     rhcosVersion: 9.0.20251023-0
status:
 bootImageSkewEnforcementStatus:
   mode: Manual
   manual:
     mode: RHCOSVersion
     rhcosVersion: 9.0.20251023-0
The admin can also choose to disable skew enforcement altogether by setting it None mode in spec.
spec:
 bootImageSkewEnforcement:
   mode: None
status:
 bootImageSkewEnforcementStatus:
   mode: None
Verifying upgrade block

Upgrades will be blocked when the cluster is to determined out of skew. This piece works the same way in manual and automatic mode, although it is likely easier to verify in manual mode. The current thresholds for a skew violation is set to when OCP first moved to RHEL9, which corresponds to RHEL version 9.2 and OCP version 4.13.0. The operator will perform semver comparisons of these thresholds against the boot image versions stored in bootImageSkewEnforcementStatus and set Upgradeable=False if necessary. To verify set the mode to Manual with an out of skew boot image version like so:
 spec:
   bootImageSkewEnforcement:
     manual:
  mode: RHCOSVersion
       rhcosVersion: 9.0.20251023-0
     mode: Manual
Now, examine the CO object named machine-config's conditions field, it should show indicate an issue preventing upgrades like so:
 - lastTransitionTime: "2025-11-20T15:15:12Z"
   message: 'Upgrades have been disabled because the cluster is using RHCOS boot
     image version 9.0.20251023-0(RHEL version: 9.0), which is below the minimum
     required RHEL version 9.2. To enable upgrades, please update your boot images
     following the documentation at [TODO: insert link], or disable boot image skew
     enforcement at [TODO: insert link]'
   reason: ClusterBootImageSkewError
   status: "False"
   type: Upgradeable
Next, set the boot image to one within the skew limits:
 spec:
   bootImageSkewEnforcement:
     manual:
  mode: RHCOSVersion
       rhcosVersion: 9.2.20251023-0
     mode: Manual
Then, the Upgradeable condition should be restored back to True
 - lastTransitionTime: "2025-11-20T15:19:25Z"
   reason: AsExpected
   status: "True"
   type: Upgradeable
These set of steps can be repeated with the OCPVersion specified too. This comparison should only take place in Automatic and Manual mode, however as Automatic is only generated status side, I don't think there is an easy way to test that(other than the units I've included).

In None mode, this version check should not take place.

Some caveats to note:

The admin is not permitted to use Automatic mode within the spec. This is in an intentional choice because only the MCO will always be able to self determine if a platform is eligible for automatic skew enforcement.

In Automatic mode, If any machinesets are skipped for boot image updates(for example a marketplace or an unknown boot image was detected in any of the machinesets), the boot image controller will not update the boot image value stored in bootImageEnforcementStatus. This is because the cluster cannot be considered up to date on boot image if even one of the machine resources are out of skew.

In Automatic mode, the operator will only populate the OCPVersion. This is because each platform may not have the same RHCOS version of the boot image(for example, across marketplace streams) in a given release, and it would involve a lot of per-platform piping to correctly track the RHCOS version per machineset within the boot image controller. I did not deem this to be worth the effort, but am open to implementing that later if the need arises.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-11-20T15:29:12Z

@djoshy: This pull request references MCO-1877 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

This PR integrates the boot image skew enforcement API introduced in openshift/api#2357. This involves the following changes:

The operator now populates the bootImageSkewEnforcementStatus field in the MachineConfiguration object based on spec.bootImageSkewEnforcement, platform defaults and cluster version.

The boot image controller will now update the current boot image value in bootImageSkewEnforcementStatus on a successful boot image update. Note that this requires the skew enforcement to be set to Automatic mode, and all machinesets to be opt-ed in for boot image updates.

The operator will set Upgradeable=False if the cluster is to be detected to be out of skew. This is done by comparing the boot image values referenced in the bootImageSkewEnforcementStatus field against the MCO's hardcoded skew limits.

I've also added a few unit tests to verify the above mechanisms.

Verifying API behavior

This verification will have to be done based on the platform. If the platform:

supports boot image updates and it is on by default(AWS and GCP at the time of writing), i.e. status.managedBootImagesStatus is set to All if spec.managedBootImages is empty. Then, skew enforcement status will be set to Automatic, with a boot image version estimated from cluster version. Then, the boot image controller will perform a sync which will update the boot image(if required) and after all resources have been successfully updated, it will update the boot image value stored in the skew enforcement status. The value set will be the OCP releaseVersion described by the coreos-bootimages configmap. Here's an example:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
 status:
   bootImageSkewEnforcementStatus:
     automatic:
       ocpVersion: 4.21.0
     mode: Automatic
   conditions:
   - lastTransitionTime: "2025-11-19T22:06:06Z"
     message: Reconciled 3 of 3 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
       | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateProgressing
   - lastTransitionTime: "2025-11-19T22:06:07Z"
     message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
       0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateDegraded
   managedBootImagesStatus:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: All
supports boot image updates, but is not on by default(vsphere and Azure at the time of writing) i.e. status.managedBootImagesStatus is set to None if spec.managedBootImages is empty. Then, skew enforcement status will be set to Manual, with a boot image version estimated from cluster version. The object would now look like this:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
 status:
   bootImageSkewEnforcementStatus:
     manual:
       mode: OCPVersion
       ocpVersion: 4.21.0
     mode: Manual
   conditions:
   - lastTransitionTime: "2025-11-19T22:06:06Z"
     message: Reconciled 0 of 0 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
       | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateProgressing
   - lastTransitionTime: "2025-11-19T22:06:07Z"
     message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
       0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateDegraded
   managedBootImagesStatus:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: None
The admin can choose to opt-in for boot image updates in this case(set spec.ManagedBootImages to All), and the operator should automatically switch the skew enforcement status to Automatic, with the appropriate boot image version. This would mean the object would finally look like this:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
   managedBootImages:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: All
 status:
   bootImageSkewEnforcementStatus:
     automatic:
       ocpVersion: 4.21.0
     mode: Automatic
   conditions:
   - lastTransitionTime: "2025-11-19T22:06:06Z"
     message: Reconciled 3 of 3 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
       | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateProgressing
   - lastTransitionTime: "2025-11-19T22:06:07Z"
     message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
       0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateDegraded
   managedBootImagesStatus:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: All
does not support boot image updates(all other platforms at the time of writing) i.e. status.managedBootImagesStatus is empty and spec.managedBootImages cannot be set by the admin. Then, skew enforcement status will be set to Manual, with a boot image version estimated from cluster version. The object would now look like this:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
 status:
   bootImageSkewEnforcementStatus:
     manual:
       mode: OCPVersion
       ocpVersion: 4.21.0
     mode: Manual
In this case, the admin is expected to manually perform boot image updates and then add a spec field like so:
spec:
 bootImageSkewEnforcement:
   mode: Manual
   manual:
     mode: OCPVersion
     ocpVersion: 4.21.2
The operator should then update the status to include this:
spec:
 bootImageSkewEnforcement:
   mode: Manual
   manual:
     mode: OCPVersion
     ocpVersion: 4.21.2
status:
 bootImageSkewEnforcementStatus:
     mode: OCPVersion
     ocpVersion: 4.21.2
The above snippet is if an admin had chosen to record the OCPVersion. In manual mode, the admin can also choose to to store the RHCOSVersion, like so:
spec:
 bootImageSkewEnforcement:
   mode: Manual
   manual:
     mode: RHCOSVersion
     rhcosVersion: 9.0.20251023-0
status:
 bootImageSkewEnforcementStatus:
   mode: Manual
   manual:
     mode: RHCOSVersion
     rhcosVersion: 9.0.20251023-0
The admin can also choose to disable skew enforcement altogether by setting it None mode in spec.
spec:
 bootImageSkewEnforcement:
   mode: None
status:
 bootImageSkewEnforcementStatus:
   mode: None
Verifying upgrade block

Upgrades will be blocked when the cluster is to determined out of skew. This piece works the same way in manual and automatic mode, although it is likely easier to verify in manual mode. The current thresholds for a skew violation is set to when OCP first moved to RHEL9, which corresponds to RHEL version 9.2 and OCP version 4.13.0. The operator will perform semver comparisons of these thresholds against the boot image versions stored in bootImageSkewEnforcementStatus and set Upgradeable=False if necessary. To verify set the mode to Manual with an out of skew boot image version like so:
 spec:
   bootImageSkewEnforcement:
     manual:
  mode: RHCOSVersion
       rhcosVersion: 9.0.20251023-0
     mode: Manual
Now, examine the CO object named machine-config's conditions field, it should show indicate an issue preventing upgrades like so:
 - lastTransitionTime: "2025-11-20T15:15:12Z"
   message: 'Upgrades have been disabled because the cluster is using RHCOS boot
     image version 9.0.20251023-0(RHEL version: 9.0), which is below the minimum
     required RHEL version 9.2. To enable upgrades, please update your boot images
     following the documentation at [TODO: insert link], or disable boot image skew
     enforcement at [TODO: insert link]'
   reason: ClusterBootImageSkewError
   status: "False"
   type: Upgradeable
Next, set the boot image to one within the skew limits:
 spec:
   bootImageSkewEnforcement:
     manual:
  mode: RHCOSVersion
       rhcosVersion: 9.2.20251023-0
     mode: Manual
Then, the Upgradeable condition should be restored back to True
 - lastTransitionTime: "2025-11-20T15:19:25Z"
   reason: AsExpected
   status: "True"
   type: Upgradeable
These set of steps can be repeated with the OCPVersion specified too. This comparison should only take place in Automatic and Manual mode, however as Automatic is only generated status side, I don't think there is an easy way to test that(other than the units I've included).

In None mode, this version check should not take place.

Some caveats to note about Automatic mode:

The admin is not permitted to use Automatic mode within the spec. This is in an intentional choice because only the MCO will always be able to self determine if a platform is eligible for automatic skew enforcement.

In Automatic mode, API validations will prevent changing the boot image configuration to a setting other than All. To change the boot image configuration, the admin is first expected to go to Manual skew enforcement mode and then attempt to change the boot image configuration of the cluster.

In Automatic mode, If any machinesets are skipped for boot image updates(for example a marketplace or an unknown boot image was detected in any of the machinesets), the boot image controller will not update the boot image value stored in bootImageEnforcementStatus. This is because the cluster cannot be considered up to date on boot image if even one of the machine resources are out of skew.

In Automatic mode, the operator will only populate the OCPVersion. This is because each platform may not have the same RHCOS version of the boot image(for example, across marketplace streams) in a given release, and it would involve a lot of per-platform piping to correctly track the RHCOS version per machineset within the boot image controller. I did not deem this to be worth the effort, but am open to implementing that later if the need arises.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-11-20T15:34:01Z

@djoshy: This pull request references MCO-1877 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

This PR integrates the boot image skew enforcement API introduced in openshift/api#2357. This involves the following changes:

The operator now populates the bootImageSkewEnforcementStatus field in the MachineConfiguration object based on spec.bootImageSkewEnforcement, platform defaults and cluster version.

The boot image controller will now update the current boot image value in bootImageSkewEnforcementStatus on a successful boot image update. Note that this requires the skew enforcement to be set to Automatic mode, and all machinesets to be opt-ed in for boot image updates.

The operator will set Upgradeable=False if the cluster is to be detected to be out of skew. This is done by comparing the boot image values referenced in the bootImageSkewEnforcementStatus field against the MCO's hardcoded skew limits.

Some unit tests have been added to sync_test.go and status_test.go to verify the above mechanisms.

Verifying API behavior

This verification will have to be done based on the platform. If the platform:

supports boot image updates and it is on by default(AWS and GCP at the time of writing), i.e. status.managedBootImagesStatus is set to All if spec.managedBootImages is empty. Then, skew enforcement status will be set to Automatic, with a boot image version estimated from cluster version. Then, the boot image controller will perform a sync which will update the boot image(if required) and after all resources have been successfully updated, it will update the boot image value stored in the skew enforcement status. The value set will be the OCP releaseVersion described by the coreos-bootimages configmap. Here's an example:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
 status:
   bootImageSkewEnforcementStatus:
     automatic:
       ocpVersion: 4.21.0
     mode: Automatic
   conditions:
   - lastTransitionTime: "2025-11-19T22:06:06Z"
     message: Reconciled 3 of 3 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
       | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateProgressing
   - lastTransitionTime: "2025-11-19T22:06:07Z"
     message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
       0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateDegraded
   managedBootImagesStatus:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: All
supports boot image updates, but is not on by default(vsphere and Azure at the time of writing) i.e. status.managedBootImagesStatus is set to None if spec.managedBootImages is empty. Then, skew enforcement status will be set to Manual, with a boot image version estimated from cluster version. The object would now look like this:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
 status:
   bootImageSkewEnforcementStatus:
     manual:
       mode: OCPVersion
       ocpVersion: 4.21.0
     mode: Manual
   conditions:
   - lastTransitionTime: "2025-11-19T22:06:06Z"
     message: Reconciled 0 of 0 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
       | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateProgressing
   - lastTransitionTime: "2025-11-19T22:06:07Z"
     message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
       0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateDegraded
   managedBootImagesStatus:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: None
The admin can choose to opt-in for boot image updates in this case(set spec.ManagedBootImages to All), and the operator should automatically switch the skew enforcement status to Automatic, with the appropriate boot image version. This would mean the object would finally look like this:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
   managedBootImages:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: All
 status:
   bootImageSkewEnforcementStatus:
     automatic:
       ocpVersion: 4.21.0
     mode: Automatic
   conditions:
   - lastTransitionTime: "2025-11-19T22:06:06Z"
     message: Reconciled 3 of 3 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
       | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateProgressing
   - lastTransitionTime: "2025-11-19T22:06:07Z"
     message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
       0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateDegraded
   managedBootImagesStatus:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: All
does not support boot image updates(all other platforms at the time of writing) i.e. status.managedBootImagesStatus is empty and spec.managedBootImages cannot be set by the admin. Then, skew enforcement status will be set to Manual, with a boot image version estimated from cluster version. The object would now look like this:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
 status:
   bootImageSkewEnforcementStatus:
     manual:
       mode: OCPVersion
       ocpVersion: 4.21.0
     mode: Manual
In this case, the admin is expected to manually perform boot image updates and then add a spec field like so:
spec:
 bootImageSkewEnforcement:
   mode: Manual
   manual:
     mode: OCPVersion
     ocpVersion: 4.21.2
The operator should then update the status to include this:
spec:
 bootImageSkewEnforcement:
   mode: Manual
   manual:
     mode: OCPVersion
     ocpVersion: 4.21.2
status:
 bootImageSkewEnforcementStatus:
     mode: OCPVersion
     ocpVersion: 4.21.2
The above snippet is if an admin had chosen to record the OCPVersion. In manual mode, the admin can also choose to to store the RHCOSVersion, like so:
spec:
 bootImageSkewEnforcement:
   mode: Manual
   manual:
     mode: RHCOSVersion
     rhcosVersion: 9.0.20251023-0
status:
 bootImageSkewEnforcementStatus:
   mode: Manual
   manual:
     mode: RHCOSVersion
     rhcosVersion: 9.0.20251023-0
Note that only one of RHCOSVersion or OCPVersion is permitted in Manual mode.

The admin can also choose to disable skew enforcement altogether by setting it None mode in spec.
spec:
 bootImageSkewEnforcement:
   mode: None
status:
 bootImageSkewEnforcementStatus:
   mode: None
Verifying upgrade block

Upgrades will be blocked when the cluster is to determined out of skew. This mechanism works the same way in manual and automatic mode, although it is likely easier to verify in manual mode. The current thresholds for a skew violation is set to when OCP first moved to RHEL9, which corresponds to RHEL version 9.2 and OCP version 4.13.0. The operator will perform semver comparisons of these thresholds against the boot image versions stored in bootImageSkewEnforcementStatus and set Upgradeable=False if necessary. To verify this, first set the mode to Manual with an out of skew boot image version like so:
 spec:
   bootImageSkewEnforcement:
     manual:
  mode: RHCOSVersion
       rhcosVersion: 9.0.20251023-0
     mode: Manual
Now, examine the CO object named machine-config's conditions field, it should show indicate an issue preventing upgrades like so:
 - lastTransitionTime: "2025-11-20T15:15:12Z"
   message: 'Upgrades have been disabled because the cluster is using RHCOS boot
     image version 9.0.20251023-0(RHEL version: 9.0), which is below the minimum
     required RHEL version 9.2. To enable upgrades, please update your boot images
     following the documentation at [TODO: insert link], or disable boot image skew
     enforcement at [TODO: insert link]'
   reason: ClusterBootImageSkewError
   status: "False"
   type: Upgradeable
Next, set the boot image to one within the skew limits:
 spec:
   bootImageSkewEnforcement:
     manual:
  mode: RHCOSVersion
       rhcosVersion: 9.2.20251023-0
     mode: Manual
Then, the Upgradeable condition should be restored back to True
 - lastTransitionTime: "2025-11-20T15:19:25Z"
   reason: AsExpected
   status: "True"
   type: Upgradeable
These set of steps can be repeated with the OCPVersion specified too. This comparison should only take place in Automatic and Manual mode, however as Automatic is only permitted on the status side, I don't think there is an easy way to test that(other than the units I've included).

In None mode, this version check should not take place.

Some caveats to note about Automatic mode:

The admin is not permitted to use Automatic mode within the spec. This is in an intentional choice because only the MCO will always be able to self determine if a platform is eligible for automatic skew enforcement.

In Automatic mode, API validations will prevent changing the boot image configuration to a setting other than All. To change the boot image configuration, the admin is first expected to go to Manual skew enforcement mode and then attempt to change the boot image configuration of the cluster.

In Automatic mode, If any machinesets are skipped for boot image updates(for example a marketplace or an unknown boot image was detected in any of the machinesets), the boot image controller will not update the boot image value stored in bootImageEnforcementStatus. This is because the cluster cannot be considered up to date on boot image if even one of the machine resources are out of skew.

In Automatic mode, the operator will only populate the OCPVersion. This is because each platform may not have the same RHCOS version of the boot image(for example, across marketplace streams) in a given release, and it would involve a lot of per-platform piping to correctly track the RHCOS version per machineset within the boot image controller. I did not deem this to be worth the effort, but am open to implementing that later if the need arises.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-11-20T15:34:48Z

@djoshy: This pull request references MCO-1877 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

This PR integrates the boot image skew enforcement API introduced in openshift/api#2357. This involves the following changes:

The operator now populates the bootImageSkewEnforcementStatus field in the MachineConfiguration object based on spec.bootImageSkewEnforcement, platform defaults and cluster version.

The boot image controller will now update the current boot image value in bootImageSkewEnforcementStatus on a successful boot image update. Note that this requires the skew enforcement to be set to Automatic mode, and all machinesets to be opt-ed in for boot image updates.

The operator will set Upgradeable=False if the cluster is to be detected to be out of skew. This is done by comparing the boot image values referenced in the bootImageSkewEnforcementStatus field against the MCO's hardcoded skew limits.

Some unit tests have been added to sync_test.go and status_test.go to verify the above mechanisms.

Verifying API behavior

This verification will have to be done based on the platform. If the platform:

supports boot image updates and it is on by default(AWS and GCP at the time of writing), i.e. status.managedBootImagesStatus is set to All if spec.managedBootImages is empty. Then, skew enforcement status will be set to Automatic, with a boot image version estimated from cluster version. Then, the boot image controller will perform a sync which will update the boot image(if required) and after all resources have been successfully updated, it will update the boot image value stored in the skew enforcement status. The value set will be the OCP releaseVersion described by the coreos-bootimages configmap. Here's an example:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
 status:
   bootImageSkewEnforcementStatus:
     automatic:
       ocpVersion: 4.21.0
     mode: Automatic
   conditions:
   - lastTransitionTime: "2025-11-19T22:06:06Z"
     message: Reconciled 3 of 3 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
       | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateProgressing
   - lastTransitionTime: "2025-11-19T22:06:07Z"
     message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
       0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateDegraded
   managedBootImagesStatus:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: All
supports boot image updates, but is not on by default(vsphere and Azure at the time of writing) i.e. status.managedBootImagesStatus is set to None if spec.managedBootImages is empty. Then, skew enforcement status will be set to Manual, with a boot image version estimated from cluster version. The object would now look like this:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
 status:
   bootImageSkewEnforcementStatus:
     manual:
       mode: OCPVersion
       ocpVersion: 4.21.0
     mode: Manual
   conditions:
   - lastTransitionTime: "2025-11-19T22:06:06Z"
     message: Reconciled 0 of 0 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
       | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateProgressing
   - lastTransitionTime: "2025-11-19T22:06:07Z"
     message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
       0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateDegraded
   managedBootImagesStatus:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: None
The admin can choose to opt-in for boot image updates in this case(set spec.ManagedBootImages to All), and the operator should automatically switch the skew enforcement status to Automatic, with the appropriate boot image version. This would mean the object would finally look like this:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
   managedBootImages:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: All
 status:
   bootImageSkewEnforcementStatus:
     automatic:
       ocpVersion: 4.21.0
     mode: Automatic
   conditions:
   - lastTransitionTime: "2025-11-19T22:06:06Z"
     message: Reconciled 3 of 3 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
       | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateProgressing
   - lastTransitionTime: "2025-11-19T22:06:07Z"
     message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
       0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateDegraded
   managedBootImagesStatus:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: All
does not support boot image updates(all other platforms at the time of writing) i.e. status.managedBootImagesStatus is empty and spec.managedBootImages cannot be set by the admin. Then, skew enforcement status will be set to Manual, with a boot image version estimated from cluster version. The object would now look like this:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
 status:
   bootImageSkewEnforcementStatus:
     manual:
       mode: OCPVersion
       ocpVersion: 4.21.0
     mode: Manual
In this case, the admin is expected to manually perform boot image updates and then add a spec field like so:
spec:
 bootImageSkewEnforcement:
   mode: Manual
   manual:
     mode: OCPVersion
     ocpVersion: 4.21.2
The operator should then update the status to include this:
spec:
 bootImageSkewEnforcement:
   mode: Manual
   manual:
     mode: OCPVersion
     ocpVersion: 4.21.2
status:
 bootImageSkewEnforcementStatus:
     mode: OCPVersion
     ocpVersion: 4.21.2
The above snippet is if an admin had chosen to record the OCPVersion. In manual mode, the admin can also choose to to store the RHCOSVersion, like so:
spec:
 bootImageSkewEnforcement:
   mode: Manual
   manual:
     mode: RHCOSVersion
     rhcosVersion: 9.0.20251023-0
status:
 bootImageSkewEnforcementStatus:
   mode: Manual
   manual:
     mode: RHCOSVersion
     rhcosVersion: 9.0.20251023-0
Note that only one of RHCOSVersion or OCPVersion is permitted in Manual mode.

The admin can also choose to disable skew enforcement altogether by setting it None mode in spec.
spec:
 bootImageSkewEnforcement:
   mode: None
status:
 bootImageSkewEnforcementStatus:
   mode: None
Verifying upgrade block

Upgrades will be blocked when the cluster is to determined out of skew. This mechanism works the same way in manual and automatic mode, although it is likely easier to verify in manual mode. The current thresholds for a skew violation is set to when OCP first moved to RHEL9, which corresponds to RHEL version 9.2 and OCP version 4.13.0. The operator will perform semver comparisons of these thresholds against the boot image versions stored in bootImageSkewEnforcementStatus and set Upgradeable=False if necessary. To verify this, first set the mode to Manual with an out of skew boot image version like so:
 spec:
   bootImageSkewEnforcement:
     manual:
  mode: RHCOSVersion
       rhcosVersion: 9.0.20251023-0
     mode: Manual
Now, examine the CO object named machine-config's conditions field, it should show indicate an issue preventing upgrades like so:
 - lastTransitionTime: "2025-11-20T15:15:12Z"
   message: 'Upgrades have been disabled because the cluster is using RHCOS boot
     image version 9.0.20251023-0(RHEL version: 9.0), which is below the minimum
     required RHEL version 9.2. To enable upgrades, please update your boot images
     following the documentation at [TODO: insert link], or disable boot image skew
     enforcement at [TODO: insert link]'
   reason: ClusterBootImageSkewError
   status: "False"
   type: Upgradeable
Next, set the boot image to one within the skew limits:
 spec:
   bootImageSkewEnforcement:
     manual:
  mode: RHCOSVersion
       rhcosVersion: 9.2.20251023-0
     mode: Manual
Then, the Upgradeable condition should be restored back to True
 - lastTransitionTime: "2025-11-20T15:19:25Z"
   reason: AsExpected
   status: "True"
   type: Upgradeable
These set of steps can be repeated with the OCPVersion specified too. This comparison should only take place in Automatic and Manual mode however, as Automatic is only permitted on the status side, I don't think there is an easy way to test that(other than the units I've included).

In None mode, this version check should not take place.

Some caveats to note about Automatic mode:

The admin is not permitted to use Automatic mode within the spec. This is in an intentional choice because only the MCO will always be able to self determine if a platform is eligible for automatic skew enforcement.

In Automatic mode, API validations will prevent changing the boot image configuration to a setting other than All. To change the boot image configuration, the admin is first expected to go to Manual skew enforcement mode and then attempt to change the boot image configuration of the cluster.

In Automatic mode, If any machinesets are skipped for boot image updates(for example a marketplace or an unknown boot image was detected in any of the machinesets), the boot image controller will not update the boot image value stored in bootImageEnforcementStatus. This is because the cluster cannot be considered up to date on boot image if even one of the machine resources are out of skew.

In Automatic mode, the operator will only populate the OCPVersion. This is because each platform may not have the same RHCOS version of the boot image(for example, across marketplace streams) in a given release, and it would involve a lot of per-platform piping to correctly track the RHCOS version per machineset within the boot image controller. I did not deem this to be worth the effort, but am open to implementing that later if the need arises.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-11-20T15:45:46Z

@djoshy: This pull request references MCO-1877 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

This PR integrates the boot image skew enforcement API introduced in openshift/api#2357. This involves the following changes:

The operator now populates the bootImageSkewEnforcementStatus field in the MachineConfiguration object based on spec.bootImageSkewEnforcement, platform defaults and cluster version.

The boot image controller will now update the current boot image value in bootImageSkewEnforcementStatus on a successful boot image update. Note that this requires the skew enforcement to be set to Automatic mode, and all machinesets to be opt-ed in for boot image updates.

The operator will set Upgradeable=False if the cluster is to be detected to be out of skew. This is done by comparing the boot image values referenced in the bootImageSkewEnforcementStatus field against the MCO's hardcoded skew limits.

Some unit tests have been added to sync_test.go and status_test.go to verify the above mechanisms.

Verifying API behavior

This verification will have to be done based on the platform. If the platform:

supports boot image updates and it is on by default(AWS and GCP at the time of writing), i.e. status.managedBootImagesStatus is set to All if spec.managedBootImages is empty. Then, skew enforcement status will be set to Automatic, with a boot image version estimated from cluster version. Then, the boot image controller will perform a sync which will update the boot image(if required) and after all resources have been successfully updated, it will update the boot image value stored in the skew enforcement status. The value set will be the OCP releaseVersion described by the coreos-bootimages configmap. Here's an example:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
 status:
   bootImageSkewEnforcementStatus:
     automatic:
       ocpVersion: 4.21.0
     mode: Automatic
   conditions:
   - lastTransitionTime: "2025-11-19T22:06:06Z"
     message: Reconciled 3 of 3 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
       | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateProgressing
   - lastTransitionTime: "2025-11-19T22:06:07Z"
     message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
       0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateDegraded
   managedBootImagesStatus:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: All
supports boot image updates, but is not on by default(vsphere and Azure at the time of writing) i.e. status.managedBootImagesStatus is set to None if spec.managedBootImages is empty. Then, skew enforcement status will be set to Manual, with a boot image version estimated from cluster version. The object would now look like this:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
 status:
   bootImageSkewEnforcementStatus:
     manual:
       mode: OCPVersion
       ocpVersion: 4.21.0
     mode: Manual
   conditions:
   - lastTransitionTime: "2025-11-19T22:06:06Z"
     message: Reconciled 0 of 0 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
       | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateProgressing
   - lastTransitionTime: "2025-11-19T22:06:07Z"
     message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
       0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateDegraded
   managedBootImagesStatus:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: None
The admin can choose to opt-in for boot image updates in this case(set spec.ManagedBootImages to All), and the operator should automatically switch the skew enforcement status to Automatic, with the appropriate boot image version. This would mean the object would finally look like this:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
   managedBootImages:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: All
 status:
   bootImageSkewEnforcementStatus:
     automatic:
       ocpVersion: 4.21.0
     mode: Automatic
   conditions:
   - lastTransitionTime: "2025-11-19T22:06:06Z"
     message: Reconciled 3 of 3 MAPI MachineSets | Reconciled 0 of 0 ControlPlaneMachineSets
       | Reconciled 0 of 0 CAPI MachineSets | Reconciled 0 of 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateProgressing
   - lastTransitionTime: "2025-11-19T22:06:07Z"
     message: 0 Degraded MAPI MachineSets | 0 Degraded ControlPlaneMachineSets |
       0 Degraded CAPI MachineSets | 0 CAPI MachineDeployments
     reason: BootImageConfigMapAdded
     status: "False"
     type: BootImageUpdateDegraded
   managedBootImagesStatus:
     machineManagers:
     - apiGroup: machine.openshift.io
       resource: machinesets
       selection:
         mode: All
does not support boot image updates(all other platforms at the time of writing) i.e. status.managedBootImagesStatus is empty and spec.managedBootImages cannot be set by the admin. Then, skew enforcement status will be set to Manual, with a boot image version estimated from cluster version. The object would now look like this:
 spec:
   logLevel: Normal
   managementState: Managed
   operatorLogLevel: Normal
 status:
   bootImageSkewEnforcementStatus:
     manual:
       mode: OCPVersion
       ocpVersion: 4.21.0
     mode: Manual
In this case, the admin is expected to manually perform boot image updates and then add a spec field like so:
spec:
 bootImageSkewEnforcement:
   mode: Manual
   manual:
     mode: OCPVersion
     ocpVersion: 4.21.2
The operator should then update the status to include this:
spec:
 bootImageSkewEnforcement:
   mode: Manual
   manual:
     mode: OCPVersion
     ocpVersion: 4.21.2
status:
 bootImageSkewEnforcementStatus:
     mode: OCPVersion
     ocpVersion: 4.21.2
The above snippet is if an admin had chosen to record the OCPVersion. In manual mode, the admin can also choose to to store the RHCOSVersion, like so:
spec:
 bootImageSkewEnforcement:
   mode: Manual
   manual:
     mode: RHCOSVersion
     rhcosVersion: 9.0.20251023-0
status:
 bootImageSkewEnforcementStatus:
   mode: Manual
   manual:
     mode: RHCOSVersion
     rhcosVersion: 9.0.20251023-0
Note that only one of RHCOSVersion or OCPVersion is permitted in Manual mode.

The admin can also choose to disable skew enforcement altogether by setting it None mode in spec.
spec:
 bootImageSkewEnforcement:
   mode: None
status:
 bootImageSkewEnforcementStatus:
   mode: None
Verifying upgrade block

Upgrades will be blocked when the cluster is to determined out of skew. This mechanism works the same way in manual and automatic mode, although it is likely easier to verify in manual mode. The current thresholds for a skew violation is set to when OCP first moved to RHEL9, which corresponds to RHEL version 9.2 and OCP version 4.13.0. The operator will perform semver comparisons of these thresholds against the boot image versions stored in bootImageSkewEnforcementStatus and set Upgradeable=False if necessary. To verify this, first set the mode to Manual with an out of skew boot image version like so:
 spec:
   bootImageSkewEnforcement:
     manual:
  mode: RHCOSVersion
       rhcosVersion: 9.0.20251023-0
     mode: Manual
Now, examine the machine-config CO object's conditions field, it should indicate an issue preventing upgrades like so:
$ oc get co machine-config -o yaml
...
 - lastTransitionTime: "2025-11-20T15:15:12Z"
   message: 'Upgrades have been disabled because the cluster is using RHCOS boot
     image version 9.0.20251023-0(RHEL version: 9.0), which is below the minimum
     required RHEL version 9.2. To enable upgrades, please update your boot images
     following the documentation at [TODO: insert link], or disable boot image skew
     enforcement at [TODO: insert link]'
   reason: ClusterBootImageSkewError
   status: "False"
   type: Upgradeable
Next, set the boot image to one within the skew limits:
 spec:
   bootImageSkewEnforcement:
     manual:
  mode: RHCOSVersion
       rhcosVersion: 9.2.20251023-0
     mode: Manual
Then, the Upgradeable condition should be restored back to True
 - lastTransitionTime: "2025-11-20T15:19:25Z"
   reason: AsExpected
   status: "True"
   type: Upgradeable
These set of steps can be repeated with the OCPVersion specified too. This comparison should only take place in Automatic and Manual mode however, as Automatic is only permitted on the status side, I don't think there is an easy way to test that(other than the units I've included).

In None mode, this version check should not take place.

Some caveats to note about Automatic mode:

The admin is not permitted to use Automatic mode within the spec. This is in an intentional choice because only the MCO will always be able to self determine if a platform is eligible for automatic skew enforcement.

In Automatic mode, API validations will prevent changing the boot image configuration to a setting other than All. To change the boot image configuration, the admin is first expected to go to Manual skew enforcement mode and then attempt to change the boot image configuration of the cluster.

In Automatic mode, if any machinesets are skipped for boot image updates(for example a marketplace or an unknown boot image was detected in any of the machinesets), the boot image controller will not update the boot image value stored in bootImageEnforcementStatus. This is because the cluster cannot be considered up to date on boot image if even one of the machine resources are out of skew.

In Automatic mode, the operator will only populate the OCPVersion. This is because each platform may not have the same RHCOS version of the boot image(for example, across marketplace streams) in a given release, and it would involve a lot of per-platform piping to correctly track the RHCOS version per machineset within the boot image controller. I did not deem this to be worth the effort, but am open to implementing that later if the need arises.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

djoshy · 2025-11-21T13:40:02Z

/retest-required

djoshy · 2025-11-25T16:12:08Z

/retest-required

pkg/apihelpers/apihelpers.go

djoshy · 2026-01-02T16:10:50Z

Re-rebased to fix all the build issues, should be ready for a pass now 😄

isabella-janssen · 2026-01-02T19:06:02Z

pkg/controller/common/constants.go

+	// Note: Update units in status_test.go when the following are bumped
+	RHCOSVersionBootImageSkewLimit = "9.2"
+	OCPVersionBootImageSkewLimit   = "4.13.0"


How will we remember to bump these?

I envision these being updated when the RHEL major is being bumped, so perhaps it'd be a card within the "new" RHEL migration epic. Although, I could see it being a faster cadence if there's some RHEL bugs that can't be fixed easily. Thoughts, @yuqi-zhang ?

Let's confer with the RHCOS team on the exact cadence and definition. For TP I think it's fine to have it hard coded.

sounds good, thanks!

Could we just make it so that the skew is just N-1 latest supported RHEL/RHCOS for the given stream.

i.e. if latest for this stream is 9.8 based then we'd support the bootimage being set to 9.6, but not 9.4?

It would kind of be nice if this could be dynamically updated (i.e based on set rules similar to what I described above) and then we'd always know, rather than relying on it being hardcoded here.

I think it'd be nice to have a smaller skew to restrict our test matrix, but one of our main concerns is that it would be too aggressive for customers with non-automatically managed environments. We'd essentially be going from (you can use any bootimage) to (you have to manually update your bootimage and bootimage reference every 3-4 y-streams). So we thought we would start with a more relaxed skew and tighten based on technical concerns.

Happy to discuss more in detail in a call sometime.

Yeah. Might be worth discussing how often we think is too often (in terms of time) and work backwards from there. i.e. I think having the customer do something once a year as part of maintenance wouldn't be a crazy ask.

opened https://issues.redhat.com/browse/MCO-2104 to track, @yuqi-zhang mentioned bringing this to CoreOS cabal, so will try to bring that to the next one I can join!

pkg/operator/status.go

This commit adds unit tests for the new Upgradeable guards added in the previous commit.

This commit ensures that the boot image controller state is acceptable before checking the skew. This check is only done in Automatic mode.

sergiordlr · 2026-02-02T17:00:40Z

Verified using IPI on AWS, GCP, Azure and Vsphere

Automatic skew was configured by MCO in AWS and GCP and Manual skew was configured by MCO in Azure and Vsphere.

We tested that the right version was used by the skew process by scaling down the CVO and manually editing the history in the clusterversion resource. MCO is correctly reporting the oldest version in clusterversion.status.history in the skew version, and it is correctly updating the value to the latest version when the bootimage cycle is successfully executed.

In #5547 we can see the automation for the tests that were executed to verify this PR (apart from manually hacking the history in clusterversion).

There is a pending test: upgrading a 4.12 cluster up to 4.22. We are working on it, nevertheless it will take some time and should not block this PR. If any problem is found in this test it can be reported as an issue after merging the code.

/verified by @sergiordlr

openshift-ci-robot · 2026-02-02T17:00:53Z

@sergiordlr: This PR has been marked as verified by @sergiordlr.

Details

In response to this:

Verified using IPI on AWS, GCP, Azure and Vsphere

Automatic skew was configured by MCO in AWS and GCP and Manual skew was configured by MCO in Azure and Vsphere.

We tested that the right version was used by the skew process by scaling down the CVO and manually editing the history in the clusterversion resource. MCO is correctly reporting the oldest version in clusterversion.status.history in the skew version, and it is correctly updating the value to the latest version when the bootimage cycle is successfully executed.

In #5547 we can see the automation for the tests that were executed to verify this PR (apart from manually hacking the history in clusterversion).

There is a pending test: upgrading a 4.12 cluster up to 4.22. We are working on it, nevertheless it will take some time and should not block this PR. If any problem is found in this test it can be reported as an issue after merging the code.

/verified by @sergiordlr

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

isabella-janssen · 2026-02-02T17:23:26Z

/lgtm

djoshy · 2026-02-02T17:24:24Z

/hold

/payload 4.22 nightly blocking

openshift-ci · 2026-02-02T17:24:41Z

@djoshy: trigger 14 job(s) of type blocking for the nightly release of OCP 4.22

periodic-ci-openshift-release-master-ci-4.22-e2e-aws-upgrade-ovn-single-node
periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-upgrade-fips
periodic-ci-openshift-release-master-ci-4.22-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.22-upgrade-from-stable-4.21-e2e-gcp-ovn-rt-upgrade
periodic-ci-openshift-hypershift-release-4.22-periodics-e2e-aws-ovn-conformance
periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-serial-1of2
periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-serial-2of2
periodic-ci-openshift-release-master-ci-4.22-e2e-aws-ovn-techpreview
periodic-ci-openshift-release-master-ci-4.22-e2e-aws-ovn-techpreview-serial-1of3
periodic-ci-openshift-release-master-ci-4.22-e2e-aws-ovn-techpreview-serial-2of3
periodic-ci-openshift-release-master-ci-4.22-e2e-aws-ovn-techpreview-serial-3of3
periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-upgrade-fips-no-nat-instance
periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ipi-ovn-ipv4
periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/05988970-005c-11f1-89a5-3edfa57d9fb2-0

djoshy · 2026-02-02T17:31:21Z

/test all

djoshy · 2026-02-02T17:33:38Z

/payload-job periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-azure-mco-disruptive-techpreview-1of2 periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-azure-mco-disruptive-techpreview-2of2 periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-gcp-mco-disruptive-techpreview-1of2 periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-gcp-mco-disruptive-techpreview-2of2 periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-1of2 periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-2of2 periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-vsphere-mco-disruptive-techpreview-1of2 periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-vsphere-mco-disruptive-techpreview-2of2

openshift-ci · 2026-02-02T17:33:43Z

@djoshy: trigger 8 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-azure-mco-disruptive-techpreview-1of2
periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-azure-mco-disruptive-techpreview-2of2
periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-gcp-mco-disruptive-techpreview-1of2
periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-gcp-mco-disruptive-techpreview-2of2
periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-1of2
periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-2of2
periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-vsphere-mco-disruptive-techpreview-1of2
periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-vsphere-mco-disruptive-techpreview-2of2

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/500a0af0-005d-11f1-92e1-26cef23b55b1-0

sergiordlr · 2026-02-03T14:30:12Z

We verified the skew usin IPI on AWS upgrading from 4.12 to 4.22 and enabling techpreview

The skew version was configured to :

  status:
    bootImageSkewEnforcementStatus:
      automatic:
        ocpVersion: 4.12.84
      mode: Automatic

We see that the cluster is not upgradeable

$ oc adm upgrade
...
Upgradeable=False
...
  * Cluster operator machine-config should not be upgraded between minor or major versions: ClusterBootImageSkewError: Upgrades have been disabled because the cluster is using OCP boot image version 4.12.84, which is below the minimum required version 4.13.0. To enable upgrades, please update your boot images following the documentation at [TODO: insert link], or disable boot image skew enforcement at [TODO: insert link]

The problem was that the ami for 4.20 was recently updated and it was not included in the MCO amis list, hence the controller showed this error

I0203 13:57:23.632096       1 platform_helpers.go:187] current AMI ami-0e0850e74100f0f31 is unknown, skipping update of MachineSet ci-op-l72znhyb-35499-fvns9-worker-us-east-1f
I0203 13:57:23.632118       1 ms_helpers.go:193] No patching required for MAPI machineset ci-op-l72znhyb-35499-fvns9-worker-us-east-1f

Since the bootimage loop could not properly update the images, then the version was not updated.

We manually updated the ami in the machinesets so that they use the last 4.20 ami known by mco, once we did that the update cycle was successfully executed and the skew version reported the right value

  status:
    bootImageSkewEnforcementStatus:
      automatic:
        ocpVersion: 4.22.0
      mode: Automatic

After reporting the new version the cluster stopped reporting that it was not upgradeable because of the versions skew (it is still not upgradeable because it is techpreview, but that's expected).

djoshy · 2026-02-03T15:30:36Z

Trying some metal jobs, these have historically failed(unrelated to this work), but still would be interesting to see the results if the tests actually run:

/payload-job periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-metal-ipi-ovn-dualstack-mco-disruptive-techpreview periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-metal-ipi-ovn-ipv6-mco-disruptive-techpreview periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-metal-ipi-ovn-ipv4-mco-disruptive-techpreview

openshift-ci · 2026-02-03T15:30:43Z

@djoshy: trigger 3 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-metal-ipi-ovn-dualstack-mco-disruptive-techpreview
periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-metal-ipi-ovn-ipv6-mco-disruptive-techpreview
periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-metal-ipi-ovn-ipv4-mco-disruptive-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4a07eb50-0115-11f1-88dd-905896631fcd-0

djoshy · 2026-02-04T13:32:16Z

/payload-job periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-metal-ipi-ovn-dualstack-mco-disruptive-techpreview periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-metal-ipi-ovn-ipv6-mco-disruptive-techpreview periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-metal-ipi-ovn-ipv4-mco-disruptive-techpreview

openshift-ci · 2026-02-04T13:32:20Z

@djoshy: trigger 3 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-metal-ipi-ovn-dualstack-mco-disruptive-techpreview
periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-metal-ipi-ovn-ipv6-mco-disruptive-techpreview
periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-metal-ipi-ovn-ipv4-mco-disruptive-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ecb7d4b0-01cd-11f1-89c2-87ce54c957ad-0

yuqi-zhang

/lgtm

I think we've covered most of the main concerns, and we can iterate on some details (e.g. skew limits) as followups since this is still behind TP

openshift-ci · 2026-02-04T15:55:17Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: djoshy, isabella-janssen, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [djoshy,isabella-janssen,yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

djoshy · 2026-02-04T18:22:34Z

/test all

djoshy · 2026-02-04T20:58:58Z

/unhold

metal runs look good, no new failures

openshift-ci-robot · 2026-02-04T21:50:48Z

/retest-required

Remaining retests: 0 against base HEAD c9188a4 and 2 for PR HEAD 2277bea in total

openshift-ci-robot · 2026-02-05T03:50:34Z

/retest-required

Remaining retests: 0 against base HEAD 067395e and 1 for PR HEAD 2277bea in total

openshift-ci · 2026-02-05T06:29:40Z

@djoshy: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 19, 2025

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 19, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 19, 2025

djoshy force-pushed the implement-skew-enforcement branch from dc9203e to 7b578ab Compare November 20, 2025 16:25

djoshy marked this pull request as ready for review November 20, 2025 20:55

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 20, 2025

openshift-ci bot requested review from dkhater-redhat and yuqi-zhang November 20, 2025 20:58

djoshy force-pushed the implement-skew-enforcement branch from 7b578ab to dddd5c7 Compare November 21, 2025 16:06

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 29, 2025

djoshy force-pushed the implement-skew-enforcement branch from dddd5c7 to a9597e7 Compare December 1, 2025 13:31

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 1, 2025

isabella-janssen reviewed Dec 1, 2025

View reviewed changes

pkg/apihelpers/apihelpers.go Outdated Show resolved Hide resolved

djoshy force-pushed the implement-skew-enforcement branch from a9597e7 to be70c0c Compare December 2, 2025 14:29

djoshy force-pushed the implement-skew-enforcement branch from be70c0c to cbe0fbf Compare December 9, 2025 21:46

djoshy force-pushed the implement-skew-enforcement branch 2 times, most recently from ad978bf to a803b27 Compare January 2, 2026 16:08

isabella-janssen reviewed Jan 2, 2026

View reviewed changes

pkg/operator/status.go Show resolved Hide resolved

djoshy added 2 commits January 30, 2026 14:21

operator: add upgrade block unit tests

a16d81c

This commit adds unit tests for the new Upgradeable guards added in the previous commit.

operator: verify boot image controller state

2277bea

This commit ensures that the boot image controller state is acceptable before checking the skew. This check is only done in Automatic mode.

djoshy force-pushed the implement-skew-enforcement branch from 88ee5d1 to 2277bea Compare January 30, 2026 19:22

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Feb 2, 2026

openshift-ci bot assigned isabella-janssen Feb 2, 2026

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 2, 2026

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 2, 2026

yuqi-zhang approved these changes Feb 4, 2026

View reviewed changes

openshift-ci bot assigned yuqi-zhang Feb 4, 2026

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 4, 2026

openshift-merge-bot bot merged commit aa8d7e3 into openshift:main Feb 5, 2026
15 checks passed

Conversation

djoshy commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verifying API behavior

Verifying upgrade block

Uh oh!

openshift-ci-robot commented Nov 19, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Nov 19, 2025

Uh oh!

openshift-ci-robot commented Nov 20, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Nov 20, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verifying API behavior

Verifying upgrade block

Uh oh!

openshift-ci-robot commented Nov 20, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verifying API behavior

Verifying upgrade block

Uh oh!

openshift-ci-robot commented Nov 20, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verifying API behavior

Verifying upgrade block

Uh oh!

openshift-ci-robot commented Nov 20, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verifying API behavior

Verifying upgrade block

Uh oh!

openshift-ci-robot commented Nov 20, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verifying API behavior

Verifying upgrade block

Uh oh!

djoshy commented Nov 21, 2025

Uh oh!

djoshy commented Nov 25, 2025

Uh oh!

Uh oh!

djoshy commented Jan 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sergiordlr commented Feb 2, 2026

Uh oh!

openshift-ci-robot commented Feb 2, 2026

Uh oh!

isabella-janssen commented Feb 2, 2026

Uh oh!

djoshy commented Feb 2, 2026

Uh oh!

openshift-ci bot commented Feb 2, 2026

Uh oh!

djoshy commented Feb 2, 2026

Uh oh!

djoshy commented Feb 2, 2026

Uh oh!

openshift-ci bot commented Feb 2, 2026

Uh oh!

sergiordlr commented Feb 3, 2026

djoshy commented Nov 19, 2025 •

edited

Loading

openshift-ci-robot commented Nov 19, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Nov 20, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Nov 20, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Nov 20, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Nov 20, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Nov 20, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Nov 20, 2025 •

edited by openshift-ci bot

Loading