OCPBUGS-56274: add datacenter consistency check#212
OCPBUGS-56274: add datacenter consistency check#212RomanBednar wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
@RomanBednar: This pull request references Jira Issue OCPBUGS-56274, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: RomanBednar The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/jira refresh |
|
@RomanBednar: This pull request references Jira Issue OCPBUGS-56274, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (wduan@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
c4dfdec to
5039e0d
Compare
|
@RomanBednar: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/assign @gnufied For review. |
When using zonal deployments of vSphere with OpenShift, if a datacenter referenced by a failure domain in the Infrastructure CR (
infrastructure.config.openshift.io/cluster) is missing from the cloud provider config (cloud-provider-configConfigMap inopenshift-config), the CSI driver silently fails to find VMs in that zone, causing the cluster to degrade. The vSphere Problem Detector (VPD) had no check to detect this misconfiguration. This fix adds a new cluster-level check,CheckDatacenterConsistency, that compares each failure domain's required datacenter against the datacenters listed in the parsedcloud.conf(ctx.VMConfig.Config.VirtualCenter[server].Datacenters). When a datacenter is absent, VPD emits a WARNING naming the missing datacenter, the affected failure domain, and instructs the administrator to update thecloud-provider-configConfigMap in theopenshift-confignamespace.Cluster Setup
Two failure domains configured:
us-east-1→ datacenternested-devqedatacenter-1us-west-1→ datacenternested-devqedatacenter-2Both on vCenter
232-15-184-10.in-addr.arpa.Simulating the Bug
The datacenter
nested-devqedatacenter-2was removed fromcloud-provider-config:Unpatched Behaviour (openshift/main)
Relevant log lines:
No warning or error about the missing datacenter
nested-devqedatacenter-2.Patched Behaviour (OCPBUGS-56274)
Relevant log lines:
WARNING emitted, explicitly naming
nested-devqedatacenter-2as missing, with remediation instructions.