Skip to content

OCPBUGS-60093: propagate nodeSelector removal to control plane workloads#8321

Open
sdminonne wants to merge 1 commit into
openshift:mainfrom
sdminonne:OCPBUGS-60093
Open

OCPBUGS-60093: propagate nodeSelector removal to control plane workloads#8321
sdminonne wants to merge 1 commit into
openshift:mainfrom
sdminonne:OCPBUGS-60093

Conversation

@sdminonne
Copy link
Copy Markdown
Contributor

@sdminonne sdminonne commented Apr 23, 2026

Summary

  • Fix propagation of nodeSelector removal from HostedCluster to control plane deployments managed by the CPo-v2 framework
  • The ApplyManifest function in support/upsert/apply.go now explicitly detects when nodeSelector (or labels) have been removed from the desired state and clears them on the existing object
  • The defaults.go in support/controlplane-component now merges HCP nodeSelector on top of the template's existing nodeSelector via maps.Copy, preserving entries from component YAML templates (e.g., kubernetes.io/os: linux used by 5 catalog and aws-node-termination-handler deployments) while still allowing HCP-specified entries to be added or removed
  • Fix a pre-existing map-aliasing bug in preserveOriginalMetadata: maps.Clone is now used so that label/annotation value changes with the same key count are no longer silently dropped due to the existing and desired objects sharing the same underlying map
  • Unit tests added for ApplyManifest and setDefaults to cover removal, merge, annotation value-change, and nodeSelector value-change scenarios

Test plan

  • Unit tests for ApplyManifest nodeSelector removal (support/upsert/apply_test.go)
  • Unit tests for ApplyManifest annotation value change (TestApplyManifestAnnotationValueChange)
  • Unit tests for ApplyManifest nodeSelector value change with constant key count (TestApplyManifestNodeSelectorValueChange)
  • Unit tests for setDefaults nodeSelector merge with template preservation (support/controlplane-component/defaults_test.go)
  • e2e validation via control-plane-workloads label filter (e2e test in separate PR)

🤖 Generated with Claude Code

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Pod template nodeSelector assignment was changed so workload pod templates always receive HostedControlPlane.Spec.NodeSelector (including propagating nil to clear existing selectors). Upsert/apply logic now records original label and nodeSelector counts, clones original labels/annotations to avoid aliasing, and forces an update when the new pod-template nodeSelector has fewer entries than the original. Unit tests were added to validate nodeSelector propagation and that ApplyManifest updates Deployments/StatefulSets when nodeSelector keys are removed; tests also cover label-value updates with unchanged key counts.

Sequence Diagram(s)

sequenceDiagram
participant HCP as HostedControlPlane
participant Controller as ControlPlaneComponent
participant Upserter as Upsert/Apply
participant KAPI as Kubernetes API

HCP->>Controller: provide Spec.NodeSelector
Controller->>Controller: setDefaultOptions(assign NodeSelector to podTemplate)
Controller->>Upserter: ApplyManifest(desired object)
Upserter->>KAPI: Get(existing)
KAPI->>Upserter: return existing object (labels, annotations, podTemplate.nodeSelector)
Upserter->>Upserter: record originalLabelCount, originalNodeSelectorCount
Upserter->>Upserter: preserveOriginalMetadata(clone originals, apply removals/overwrites)
Upserter->>Upserter: compare desired vs existing (DeepDerivative)
alt desired.nodeSelector count < originalNodeSelectorCount
    Upserter->>Upserter: treat as changed (force update)
end
Upserter->>KAPI: Update(mutated)
KAPI->>Upserter: return updated
Upserter->>Controller: return OperationResult (Updated) and updated object
Loading

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Most Gomega assertions lack meaningful failure messages required for diagnostic context during test failures. Add explanatory messages to all Gomega assertions, e.g., change g.Expect(result).To(Equal(controllerutil.OperationResultUpdated)) to include context about expected behavior.
Stable And Deterministic Test Names ❓ Inconclusive Test files mentioned in PR summary could not be located in the repository to verify test name stability. Ensure repository contains the test files and verify Ginkgo test declarations use only static, non-dynamic strings.
✅ Passed checks (9 passed)
Check name Status Explanation
Title check ✅ Passed The PR title directly and clearly references the bug fix objective: propagating nodeSelector removal to control plane workloads, which is the primary focus of all changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Microshift Test Compatibility ✅ Passed This PR does not add new Ginkgo e2e tests; it only adds standard Go unit tests and a log message formatting adjustment.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR does not add new Ginkgo e2e tests; it only adds standard Go unit tests with testing.T and Gomega assertions.
Topology-Aware Scheduling Compatibility ✅ Passed PR propagates user-provided nodeSelector values from HostedControlPlane.Spec without introducing topology-specific constraints or hard-coded control-plane labels.
Ote Binary Stdout Contract ✅ Passed The PR changes involve unit test modifications and a single log message formatting adjustment in an e2e test, with no process-level stdout writes that could violate the OTE Binary Stdout Contract.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests were added. Only unit tests in source directories and a log message adjustment in an existing e2e test file.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from devguyio and jparrill April 23, 2026 18:20
@openshift-ci openshift-ci Bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Apr 23, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
support/upsert/apply_test.go (1)

259-315: Add a StatefulSet variant to match implementation coverage.

The production path includes StatefulSet handling, but this test validates only Deployment. A second subtest for StatefulSet would better protect against regressions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@support/upsert/apply_test.go` around lines 259 - 315, Add a parallel subtest
to TestApplyManifestNodeSelectorRemoval that mirrors the Deployment scenario for
StatefulSet: create existingStatefulSet with a Pod template Spec.NodeSelector
set (use appsv1.StatefulSet, same metadata/name/namespace pattern), create
desiredStatefulSet with an empty Pod template Spec, use
fake.NewClientBuilder().WithObjects(existingStatefulSet).Build(), call
provider.ApplyManifest(t.Context(), c, desiredStatefulSet) and assert no error,
OperationResultUpdated, then Get the updated StatefulSet and assert
updated.Spec.Template.Spec.NodeSelector is empty; reuse the same provider
(applyProvider) and testing helpers (NewWithT, t.Context(),
types.NamespacedName) as in the Deployment subtest to keep consistency.
support/controlplane-component/defaults_test.go (1)

314-367: Nice targeted coverage; optional extension for StatefulSet would strengthen confidence.

Current cases validate Deployment well. Since setDefaultOptions is generic, a lightweight StatefulSet case would improve confidence across providers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@support/controlplane-component/defaults_test.go` around lines 314 - 367, Add
a parity test for StatefulSet nodeSelector behavior mirroring
TestSetDefaultOptionsNodeSelector: create a controlPlaneWorkload parameterized
for *appsv1.StatefulSet (use the statefulset provider type analogous to
deploymentProvider, e.g. statefulSetProvider), instantiate a StatefulSet with
existingNodeSelector, construct an HCP with hcpNodeSelector, call
cpw.setDefaultOptions with the StatefulSet object and the same fake
client/scheme, and assert the StatefulSet.Spec.Template.Spec.NodeSelector is set
or cleared to match expectedNodeSelector; keep test cases identical to the
Deployment ones to ensure generic behavior across workload types.
support/upsert/apply.go (1)

255-264: Consider future-proofing nodeSelector handling across more workload kinds.

Line 255 currently handles only Deployment and StatefulSet. If ApplyManifest is used with other pod-template resources later, nodeSelector removals there would still be skipped.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@support/upsert/apply.go` around lines 255 - 264, getNodeSelectorCount
currently only checks *appsv1.Deployment and *appsv1.StatefulSet so other
pod-template workloads (DaemonSet, ReplicaSet, Job, CronJob, etc.) will be
missed by ApplyManifest; modify getNodeSelectorCount to detect nodeSelector
generically by converting the crclient.Object to an unstructured.Unstructured
(or use unstructured.NestedStringMap) and read "spec.template.spec.nodeSelector"
safely, returning its length when present, and keep the existing type switches
as fallback for typed objects (refer to getNodeSelectorCount, ApplyManifest, and
the Deployment/StatefulSet cases).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@support/controlplane-component/defaults_test.go`:
- Around line 314-367: Add a parity test for StatefulSet nodeSelector behavior
mirroring TestSetDefaultOptionsNodeSelector: create a controlPlaneWorkload
parameterized for *appsv1.StatefulSet (use the statefulset provider type
analogous to deploymentProvider, e.g. statefulSetProvider), instantiate a
StatefulSet with existingNodeSelector, construct an HCP with hcpNodeSelector,
call cpw.setDefaultOptions with the StatefulSet object and the same fake
client/scheme, and assert the StatefulSet.Spec.Template.Spec.NodeSelector is set
or cleared to match expectedNodeSelector; keep test cases identical to the
Deployment ones to ensure generic behavior across workload types.

In `@support/upsert/apply_test.go`:
- Around line 259-315: Add a parallel subtest to
TestApplyManifestNodeSelectorRemoval that mirrors the Deployment scenario for
StatefulSet: create existingStatefulSet with a Pod template Spec.NodeSelector
set (use appsv1.StatefulSet, same metadata/name/namespace pattern), create
desiredStatefulSet with an empty Pod template Spec, use
fake.NewClientBuilder().WithObjects(existingStatefulSet).Build(), call
provider.ApplyManifest(t.Context(), c, desiredStatefulSet) and assert no error,
OperationResultUpdated, then Get the updated StatefulSet and assert
updated.Spec.Template.Spec.NodeSelector is empty; reuse the same provider
(applyProvider) and testing helpers (NewWithT, t.Context(),
types.NamespacedName) as in the Deployment subtest to keep consistency.

In `@support/upsert/apply.go`:
- Around line 255-264: getNodeSelectorCount currently only checks
*appsv1.Deployment and *appsv1.StatefulSet so other pod-template workloads
(DaemonSet, ReplicaSet, Job, CronJob, etc.) will be missed by ApplyManifest;
modify getNodeSelectorCount to detect nodeSelector generically by converting the
crclient.Object to an unstructured.Unstructured (or use
unstructured.NestedStringMap) and read "spec.template.spec.nodeSelector" safely,
returning its length when present, and keep the existing type switches as
fallback for typed objects (refer to getNodeSelectorCount, ApplyManifest, and
the Deployment/StatefulSet cases).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 8d79e44d-bdc8-4c2f-8cbf-35693f6a01b4

📥 Commits

Reviewing files that changed from the base of the PR and between 06193dc and c326dea.

📒 Files selected for processing (4)
  • support/controlplane-component/defaults.go
  • support/controlplane-component/defaults_test.go
  • support/upsert/apply.go
  • support/upsert/apply_test.go

@openshift-ci openshift-ci Bot added the area/testing Indicates the PR includes changes for e2e testing label Apr 24, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 59076c7c-3904-41f6-a50f-11ae95f22b57

📥 Commits

Reviewing files that changed from the base of the PR and between c326dea and 77a1d65.

📒 Files selected for processing (3)
  • support/upsert/apply.go
  • support/upsert/apply_test.go
  • test/e2e/nodepool_test.go
✅ Files skipped from review due to trivial changes (1)
  • test/e2e/nodepool_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • support/upsert/apply_test.go

Comment thread support/upsert/apply.go
@sdminonne sdminonne changed the title fix(OCPBUGS-60093): propagate nodeSelector removal to control plane workloads OCPBUGS-60093: propagate nodeSelector removal to control plane workloads Apr 24, 2026
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 24, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@sdminonne: This pull request references Jira Issue OCPBUGS-60093, which is invalid:

  • expected the bug to target either version "5.0." or "openshift-5.0.", but it targets "4.19.z" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

  • Fix propagation of nodeSelector removal from HostedCluster to control plane deployments managed by the CPo-v2 framework
  • The ApplyManifest function in support/upsert/apply.go now explicitly detects when nodeSelector (or labels) have been removed from the desired state and clears them on the existing object
  • The defaults.go in support/controlplane-component correctly passes a nil nodeSelector through to the deployment pod template when it has been removed from the HostedControlPlane spec
  • Unit tests added for both ApplyManifest and setDefaults to cover the removal scenario

Test plan

  • Unit tests for ApplyManifest nodeSelector removal (support/upsert/apply_test.go)
  • Unit tests for setDefaults nodeSelector removal (support/controlplane-component/defaults_test.go)
  • e2e validation via control-plane-workloads label filter (e2e test in separate PR)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
  • Ensure removing a nodeSelector from control plane config actually clears it from generated workload templates and triggers updates when entries are removed (including partial removals).
  • Tests
  • Added unit tests covering pod template nodeSelector behavior and apply/update flows.
  • Added integration/e2e test log formatting tweak for clearer messages.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sdminonne
Copy link
Copy Markdown
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 24, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@sdminonne: This pull request references Jira Issue OCPBUGS-60093, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ropatil010

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested a review from ropatil010 April 24, 2026 08:36
@openshift-ci openshift-ci Bot added the area/platform/aws PR/issue for AWS (AWSPlatform) platform label Apr 24, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 37.57%. Comparing base (bded456) to head (7a220d5).
⚠️ Report is 15 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8321      +/-   ##
==========================================
+ Coverage   37.53%   37.57%   +0.03%     
==========================================
  Files         751      751              
  Lines       92026    92051      +25     
==========================================
+ Hits        34544    34584      +40     
+ Misses      54841    54827      -14     
+ Partials     2641     2640       -1     
Files with missing lines Coverage Δ
support/controlplane-component/defaults.go 61.06% <100.00%> (+1.19%) ⬆️
support/upsert/apply.go 57.95% <100.00%> (+6.68%) ⬆️

... and 1 file with indirect coverage changes

Flag Coverage Δ
cmd-support 32.86% <100.00%> (+0.10%) ⬆️
cpo-hostedcontrolplane 36.77% <ø> (ø)
cpo-other 37.76% <ø> (ø)
hypershift-operator 47.93% <ø> (ø)
other 27.77% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@mehabhalodiya mehabhalodiya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is well-structured.

The main things I'd raise as comments are:
(1) the maps.Clone concern in preserveOriginalMetadata (addressing coderabbit's proposed fix),
(2) an explicit acknowledgment of the kubernetes.io/os: linux removal side effect for components that previously had it as a template default, and
(3) the Gomega assertion message is for test quality,

name: tmp-dir
enableServiceLinks: false
nodeSelector:
kubernetes.io/os: linux
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, components like aws-node-termination-handler had a hardcoded kubernetes.io/os: linux default in their YAML templates that was preserved when no HCP nodeSelector was set. After this change, that default is lost.

The removal of kubernetes.io/os: linux from these fixtures represents a behavior change for clusters that never set a nodeSelector: previously, these pods were constrained to Linux nodes; now they are not.

Is this intentional? If these workloads can only run on Linux, consider either:
(a) not hardcoding the nodeSelector in YAML templates (since it'll be wiped anyway), or
(b) merging HCP nodeSelector on top of template defaults rather than overwriting them entirely.

If Windows nodes are not possible in the management cluster context, this is a no-op risk-wise, but it should be explicitly acknowledged.

Copy link
Copy Markdown
Contributor Author

@sdminonne sdminonne Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is intentional 'cause at the moment, AFAIK at least, we have only MGMT cluster on linux. @sjenning @csrwng FYI

Thanks!

provider := &applyProvider{}

result, err := provider.ApplyManifest(t.Context(), c, desiredDeployment)
g.Expect(err).ToNot(HaveOccurred())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This improves failure message output and satisfies the repo's test quality guidelines.

g.Expect(err).ToNot(HaveOccurred(), "ApplyManifest should succeed without error")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for @mehabhalodiya but my understanding the test name should be clear enough and gomega failure matcher should report that we wanted the err not occurred. I would keep it as is

@openshift-ci-robot
Copy link
Copy Markdown

@sdminonne: This pull request references Jira Issue OCPBUGS-60093, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ropatil010

Details

In response to this:

Summary

  • Fix propagation of nodeSelector removal from HostedCluster to control plane deployments managed by the CPo-v2 framework
  • The ApplyManifest function in support/upsert/apply.go now explicitly detects when nodeSelector (or labels) have been removed from the desired state and clears them on the existing object
  • The defaults.go in support/controlplane-component correctly passes a nil nodeSelector through to the deployment pod template when it has been removed from the HostedControlPlane spec
  • Unit tests added for both ApplyManifest and setDefaults to cover the removal scenario

Test plan

  • Unit tests for ApplyManifest nodeSelector removal (support/upsert/apply_test.go)
  • Unit tests for setDefaults nodeSelector removal (support/controlplane-component/defaults_test.go)
  • e2e validation via control-plane-workloads label filter (e2e test in separate PR)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

  • Removing a nodeSelector from control-plane config now clears it from generated workload pod templates and forces updates when selector entries are removed (including partial removals).

  • Tests

  • Added unit tests for pod template nodeSelector behavior and apply/update flows to verify clearing and partial-removal cases.

  • Added an e2e test log formatting tweak for clearer messages.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
support/upsert/apply_test.go (1)

265-420: Optional: consolidate duplicated setup with a small table-driven helper.

The Deployment and StatefulSet scenarios repeat the same Arrange/Act/Assert shape; extracting shared setup would make future cases cheaper to add.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@support/upsert/apply_test.go` around lines 265 - 420, Extract the duplicated
Arrange/Act/Assert into a small table-driven helper used by the tests that call
applyProvider.ApplyManifest: create a helper function (e.g., runNodeSelectorTest
or assertApplyClearsOrUpdatesNodeSelector) that accepts parameters for the
existing object, desired object, expected OperationResult
(controllerutil.OperationResultUpdated), expected final nodeSelector map, and a
testing.T or Gomega instance; move the fake client creation
(fake.NewClientBuilder().WithObjects(...).Build()), provider :=
&applyProvider{}, the call to provider.ApplyManifest, and the Get+assert checks
into that helper and then replace each t.Run body with a single call to the
helper passing the appropriate existingDeployment/existingStatefulSet,
desiredDeployment/desiredStatefulSet, and expected nodeSelector result to
eliminate repeated setup and assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@support/upsert/apply_test.go`:
- Around line 265-420: Extract the duplicated Arrange/Act/Assert into a small
table-driven helper used by the tests that call applyProvider.ApplyManifest:
create a helper function (e.g., runNodeSelectorTest or
assertApplyClearsOrUpdatesNodeSelector) that accepts parameters for the existing
object, desired object, expected OperationResult
(controllerutil.OperationResultUpdated), expected final nodeSelector map, and a
testing.T or Gomega instance; move the fake client creation
(fake.NewClientBuilder().WithObjects(...).Build()), provider :=
&applyProvider{}, the call to provider.ApplyManifest, and the Get+assert checks
into that helper and then replace each t.Run body with a single call to the
helper passing the appropriate existingDeployment/existingStatefulSet,
desiredDeployment/desiredStatefulSet, and expected nodeSelector result to
eliminate repeated setup and assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 78352707-86af-4102-b5c1-d7bc2ddc8b1f

📥 Commits

Reviewing files that changed from the base of the PR and between c9700ac and 66bba45.

⛔ Files ignored due to path filters (25)
  • control-plane-operator/controllers/hostedcontrolplane/testdata/aws-node-termination-handler/AROSwift/zz_fixture_TestControlPlaneComponents_aws_node_termination_handler_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/aws-node-termination-handler/GCP/zz_fixture_TestControlPlaneComponents_aws_node_termination_handler_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/aws-node-termination-handler/IBMCloud/zz_fixture_TestControlPlaneComponents_aws_node_termination_handler_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/aws-node-termination-handler/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_aws_node_termination_handler_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/aws-node-termination-handler/zz_fixture_TestControlPlaneComponents_aws_node_termination_handler_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/certified-operators-catalog/AROSwift/zz_fixture_TestControlPlaneComponents_certified_operators_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/certified-operators-catalog/GCP/zz_fixture_TestControlPlaneComponents_certified_operators_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/certified-operators-catalog/IBMCloud/zz_fixture_TestControlPlaneComponents_certified_operators_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/certified-operators-catalog/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_certified_operators_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/certified-operators-catalog/zz_fixture_TestControlPlaneComponents_certified_operators_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/community-operators-catalog/AROSwift/zz_fixture_TestControlPlaneComponents_community_operators_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/community-operators-catalog/GCP/zz_fixture_TestControlPlaneComponents_community_operators_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/community-operators-catalog/IBMCloud/zz_fixture_TestControlPlaneComponents_community_operators_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/community-operators-catalog/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_community_operators_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/community-operators-catalog/zz_fixture_TestControlPlaneComponents_community_operators_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/redhat-marketplace-catalog/AROSwift/zz_fixture_TestControlPlaneComponents_redhat_marketplace_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/redhat-marketplace-catalog/GCP/zz_fixture_TestControlPlaneComponents_redhat_marketplace_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/redhat-marketplace-catalog/IBMCloud/zz_fixture_TestControlPlaneComponents_redhat_marketplace_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/redhat-marketplace-catalog/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_redhat_marketplace_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/redhat-marketplace-catalog/zz_fixture_TestControlPlaneComponents_redhat_marketplace_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/redhat-operators-catalog/AROSwift/zz_fixture_TestControlPlaneComponents_redhat_operators_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/redhat-operators-catalog/GCP/zz_fixture_TestControlPlaneComponents_redhat_operators_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/redhat-operators-catalog/IBMCloud/zz_fixture_TestControlPlaneComponents_redhat_operators_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/redhat-operators-catalog/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_redhat_operators_catalog_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/redhat-operators-catalog/zz_fixture_TestControlPlaneComponents_redhat_operators_catalog_deployment.yaml is excluded by !**/testdata/**
📒 Files selected for processing (5)
  • support/controlplane-component/defaults.go
  • support/controlplane-component/defaults_test.go
  • support/upsert/apply.go
  • support/upsert/apply_test.go
  • test/e2e/nodepool_test.go
✅ Files skipped from review due to trivial changes (1)
  • test/e2e/nodepool_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • support/controlplane-component/defaults.go

@sdminonne
Copy link
Copy Markdown
Contributor Author

@coderabbitai PTAL

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 27, 2026

@sdminonne: Sure! I'll review the changes now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 27, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mehabhalodiya, sdminonne
Once this PR has been reviewed and has the lgtm label, please assign muraee for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cwbotbot
Copy link
Copy Markdown

cwbotbot commented Apr 27, 2026

Test Results

e2e-aws

e2e-aks

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-azure-self-managed | Build: 2048787790514098176 | Cost: $4.222300400000001 | Failed step: hypershift-azure-run-e2e-self-managed

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 27, 2026
@mehabhalodiya
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 28, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-azure-self-managed | Build: 2048993317483450368 | Cost: $4.279583000000001 | Failed step: hypershift-azure-run-e2e-self-managed

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@sdminonne
Copy link
Copy Markdown
Contributor Author

/test e2e-azure-self-managed

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-azure-self-managed | Build: 2049489177455628288 | Cost: $3.5571524999999995 | Failed step: hypershift-azure-run-e2e-self-managed

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@sdminonne
Copy link
Copy Markdown
Contributor Author

/retest

Copy link
Copy Markdown
Contributor

@jparrill jparrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped some comments. Thanks!

}
// The assignment is unconditional so that removing nodeSelector from the HCP
// also clears it from workload pod templates.
podTemplateSpec.Spec.NodeSelector = hcp.Spec.NodeSelector
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now unconditionally overwrites any nodeSelector set in the component YAML templates. Five components still hardcode nodeSelector: kubernetes.io/os: linux in their source templates

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Changed this to a merge via maps.Copy: HCP nodeSelector is now merged on top of the template's existing nodeSelector instead of replacing it. We now preserve the kubernetes.io/os: linux. Unit test added too

Comment thread support/upsert/apply.go

// getNodeSelectorCount returns the number of nodeSelector entries for
// Deployment and StatefulSet pod templates, or 0 for other types.
func getNodeSelectorCount(obj crclient.Object) int {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: A short comment noting that this intentionally only handles Deployment and StatefulSet (the only workload types in the cpov2 framework) would help future readers understand why DaemonSet, Job, etc. are not covered.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added a comment

Comment thread support/upsert/apply.go
// Without cloning, the comparison in update() between 'existing' (serialized
// after this function runs) and 'obj' would see identical labels/annotations,
// hiding value changes when the key count stays the same.
labels := maps.Clone(original.GetLabels())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maps.Clone fix was applied to both labels and annotations, but only label value changes are tested here. Could you add a parallel test for annotation value changes? Same structure, just swapping labels for annotations.

Also, consider adding a test for nodeSelector value change with constant key count (e.g., {node-role: infra} → {node-role: worker}). That path goes through DeepDerivative rather than getNodeSelectorCount, and an explicit test would guard against regressions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests added and codecov should be happier :)

Comment thread support/upsert/apply.go
}
}

func preserveOriginalMetadata(original, mutated crclient.Object) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note: the maps.Clone fix in preserveOriginalMetadata addresses a pre-existing map-aliasing bug (label/annotation value changes with the same key count were silently dropped). Worth calling that out in the PR description and commit msg.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. In PR description and commit message too.

@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label May 5, 2026
@sdminonne
Copy link
Copy Markdown
Contributor Author

@jparrill addressed all the comments TY!
PTAL

@sdminonne
Copy link
Copy Markdown
Contributor Author

/pipeline required

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2051614540776345600 | Cost: $2.9168365499999998 | Failed step: hypershift-aws-run-e2e-nested

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@mehabhalodiya
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 5, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

@sdminonne
Copy link
Copy Markdown
Contributor Author

/retest-required

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 5, 2026

@sdminonne: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws 40c21ee link true /test e2e-aws

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hypershift-jira-solve-ci
Copy link
Copy Markdown

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

fixture.go:333: Failed to wait for infra resources in guest cluster to be deleted: context deadline exceeded
fixture.go:340: Failed to clean up 9 remaining resources for guest cluster

Summary

The TestKarpenter/Teardown subtest failed because 9 AWS infrastructure resources (7 EBS volumes from Karpenter-provisioned nodes and 1 NLB from the ingress router) were not cleaned up within the 15-minute post-delete validation deadline. All functional Karpenter tests (Main, ValidateHostedCluster, EnsureHostedCluster) passed successfully. This is an infrastructure-level teardown flake caused by slow AWS resource deallocation, unrelated to the PR's nodeSelector propagation changes.

Root Cause

This is a Karpenter test teardown infrastructure flake, not a product bug introduced by this PR.

The failure chain:

  1. Teardown initiated at ~18:26 UTC for hosted cluster karpenter-f6kvp in namespace e2e-clusters-z78kl.
  2. HyperShift operator finalization took 22 minutes (18:26 → 18:49). During this period, the operator was removing its finalizers from the HostedCluster while the infrastructure destroy process waited. The first destroy attempt hit the 15-minute ClusterGracePeriod timeout and reported "hostedcluster wasn't finalized, aborting delete: context deadline exceeded".
  3. Retry succeeded — on the second attempt, the operator had finished finalization and infrastructure destruction completed in ~57 seconds (18:49:17 → 18:50:14 per destroy.log).
  4. Post-delete AWS resource validation failed — the validateAWSGuestResourcesDeletedFunc (fixture.go:303) polls AWS every 20 seconds for up to 15 minutes, checking whether tagged resources (kubernetes.io/cluster/karpenter-f6kvp: owned) have been cleaned up. Nine resources remained after the deadline:
    • 5 EBS volumes from Karpenter nodepools (capacity-reservation-test, version-test, arbitrary-subnet-test, instance-profile-test, on-demand ×2)
    • 2 EBS volumes from CAPA machine nodes
    • 1 NLB from openshift-ingress/router-default

Why this is unrelated to PR #8321:

  • The PR only modifies support/controlplane-component/defaults.go and support/upsert/apply.go — nodeSelector merge logic and ApplyManifest label/nodeSelector removal detection.
  • These files are not part of the Karpenter test, teardown framework, infrastructure cleanup, or AWS resource management code paths.
  • All Karpenter functional tests passed (including the actual provisioning tests that exercise control plane workloads with nodeSelectors).
  • Other tests that also used the PR's code (TestCreateCluster, TestUpgradeControlPlane, TestAutoscaling, TestNodePool) all passed teardown successfully.
  • TestNodePool/HostedCluster0/Teardown also hit the "wasn't finalized" error but ultimately passed (1509s) — same pattern, different timing luck.
Recommendations
  1. Retest / retry the job — This is an infrastructure flake. The failure is in AWS resource cleanup timing, not in the PR's code changes. A /retest should resolve it.
  2. No code changes needed in PR OCPBUGS-60093: propagate nodeSelector removal to control plane workloads #8321 — The nodeSelector propagation changes are functioning correctly (all functional tests passed).
  3. Consider filing a test infrastructure improvement for the Karpenter teardown: the current 15-minute ClusterGracePeriod + 15-minute AWS resource cleanup poll may be insufficient for Karpenter clusters that create many additional nodepools with extra EBS volumes and NLBs. The Karpenter test creates 5+ additional nodepools (capacity-reservation-test, version-test, arbitrary-subnet-test, instance-profile-test, on-demand) which leaves significantly more AWS resources to clean up than standard HyperShift tests.
Evidence
Evidence Detail
Failed test TestKarpenter/Teardown (1542.06s)
Error message Failed to wait for infra resources in guest cluster to be deleted: context deadline exceeded
Remaining resources 9 AWS resources: 7 EBS volumes + 1 NLB tagged with kubernetes.io/cluster/karpenter-f6kvp=owned
Cluster finalization time 22 min 29s (18:26:48 → 18:49:17 UTC) — exceeded 15-min ClusterGracePeriod
Infrastructure destruction 57 seconds (18:49:17 → 18:50:14 UTC) — completed normally on retry
Functional tests All PASSED: TestKarpenter/Main (2101s), TestKarpenter/ValidateHostedCluster (887s), TestKarpenter/EnsureHostedCluster (2.6s)
PR files changed support/controlplane-component/defaults.go, defaults_test.go, support/upsert/apply.go, apply_test.go
Other teardown results TestCreateCluster (473s ✅), TestUpgradeControlPlane (421s ✅), TestAutoscaling (474s ✅), TestNodePool/HC0 (1509s ✅), TestNodePool/HC2 (455s ✅)
Similar failure (same run) TestNodePool/HostedCluster0/Teardown also hit "wasn't finalized" but passed (1509s) — same pattern, different timing
destroy.log Shows full successful destruction: finalizer removed, infra deleted, IAM cleaned — cluster fully destroyed

…orkloads

When a user removes nodeSelector from the HostedControlPlane spec, the
change was not propagated to control plane Deployments and StatefulSets.
Two issues prevented this:

1. setDefaultOptions only assigned nodeSelector conditionally (when
   non-nil), so clearing it from the HCP had no effect on pod templates.

2. ApplyManifest uses DeepDerivative for change detection, which treats
   nil/empty maps as "don't care", silently skipping nodeSelector
   removal even when the desired state had no nodeSelector.

This commit fixes both issues:

- Merge HCP nodeSelector on top of the template's existing nodeSelector
  via maps.Copy in setDefaultOptions. This preserves entries from
  component YAML templates (e.g., kubernetes.io/os: linux used by 5
  catalog and aws-node-termination-handler deployments) while applying
  HCP-specified entries on top. When HCP nodeSelector is nil, only the
  template entries remain, so removal of HCP-added entries is handled
  naturally because cpov2 loads templates fresh each reconciliation.

- Add nodeSelector count tracking in ApplyManifest to force updates
  when nodeSelector entries are removed, matching the existing pattern
  for label removal detection. Only Deployment and StatefulSet are
  handled as they are the only workload kinds in the cpov2 framework.

- Clone metadata maps in preserveOriginalMetadata to prevent aliasing
  between the existing and desired objects. This fixes a pre-existing
  bug where label/annotation value changes with the same key count
  were silently dropped because both objects shared the same underlying
  map.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label May 11, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 11, 2026

New changes are detected. LGTM label has been removed.

@sdminonne
Copy link
Copy Markdown
Contributor Author

/retest-required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/aws PR/issue for AWS (AWSPlatform) platform area/testing Indicates the PR includes changes for e2e testing jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants