Skip to content

ARO-26764: narrow HostedCluster primary watch to actionable metadata updates#8424

Open
tuxerrante wants to merge 11 commits into
openshift:mainfrom
tuxerrante:tuxerrante/hostedcluster-generation-sync
Open

ARO-26764: narrow HostedCluster primary watch to actionable metadata updates#8424
tuxerrante wants to merge 11 commits into
openshift:mainfrom
tuxerrante:tuxerrante/hostedcluster-generation-sync

Conversation

@tuxerrante
Copy link
Copy Markdown

@tuxerrante tuxerrante commented May 5, 2026

What this PR does / why we need it:

The HostedCluster controller currently receives reconcile requests from its primary watch for updates that do not represent new desired state, especially status-only churn. That makes the controller noisier than it needs to be and makes it harder to reason about which reconciliations are actually driven by user intent versus controller-written status.

This change narrows only the primary HostedCluster watch so that it continues to reconcile on meaningful inputs, while avoiding self-induced reconciles from status-only updates.

The key design point is that this cannot be implemented as a pure generation-only filter. For HostedCluster, some metadata-only updates are already real control inputs for reconciliation and must continue to enqueue even when the object generation does not change. Examples include specific annotations consumed by reconciliation, mirrored api.openshift.com/* labels, scope transitions, and deletion start.

To keep the behavior correct while reducing noise, this PR introduces a dedicated primary predicate that allows:

  • generation-changing updates
  • deletion timestamp transitions
  • actionable annotation changes
  • actionable mirrored label changes

and filters out status-only updates from the primary watch.

The change is intentionally narrow:

  • child-resource watches are left unchanged
  • the API / CRD surface is unchanged
  • the existing ObservedGeneration status model remains in place
  • the reconcile loop still re-reads the current HostedCluster, so stale queued requests continue to act on the latest available object state

Alongside the watch change, this PR fixes two same-area behaviors that became important once metadata-only updates were treated more explicitly:

  1. ValidReleaseImage is re-evaluated when SkipReleaseImageValidation is added or removed without a generation bump, so a previously cached True condition cannot remain stale.
  2. mirrored api.openshift.com/* labels are actively removed from HostedControlPlane when they are removed from the HostedCluster, instead of only being copied on add/update.

Overall, the goal is to reduce unnecessary reconciliations without weakening any existing metadata-driven control path.

Which issue(s) this PR fixes:

Fixes ARO-26764

Notes:

This PR is easier to review if you read it as a queueing-boundary change rather than as a reconcile algorithm rewrite.

The new predicate does not try to guarantee one reconcile per generation. Instead, it makes the enqueue boundary stricter and more explicit. The "latest object wins" behavior still comes from the existing reconcile entrypoint, which fetches the current HostedCluster by NamespacedName when the request is processed. That means older queued requests still reconcile against the latest stored object state, not the old event payload.

Because of that, the review focus should be:

  • whether the allowlist of actionable metadata inputs is complete
  • whether status-only updates are now correctly filtered out from the primary watch
  • whether same-generation metadata changes still propagate correctly
  • whether the blast radius is appropriately contained to the shared HostedCluster primary watch and adjacent regression fixes

Expected impact:

  • fewer self-induced HostedCluster reconciliations from status-only updates
  • no intended behavior change for spec updates
  • no intended behavior change for metadata-only inputs that already affect reconciliation
  • HCP-wide effect because this path is shared across Hosted Control Plane distributions, not specific to one platform

Test plan:

  • make verify
  • predicate unit tests cover generation changes, status-only updates, actionable annotations, actionable labels, deletion timestamp changes, and scope transitions
  • falsification tests verify that stale queued requests still reconcile the latest HostedCluster generation and latest actionable metadata
  • regression tests cover same-generation SkipReleaseImageValidation toggles
  • regression tests cover removal of mirrored api.openshift.com/* labels from HostedControlPlane

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • New Features

    • Added an annotation-based option to skip release image validation (records a clear skipped reason).
    • Labels with the OCM prefix are recognized and mirrored to the control plane.
  • Bug Fixes

    • Improved annotation and label synchronization to remove stale entries and only sync actionable prefixes.
    • Reconciliation conditions now correctly reflect generation-aware and skip-aware release-image validation outcomes.
  • Tests

    • Added extensive unit tests for predicates, label/annotation syncing, reconciliation conditions, and release-image validation logic.

tuxerrante and others added 3 commits May 5, 2026 17:13
Reduce self-induced HostedCluster reconciles by filtering the primary watch to
spec changes and explicit metadata triggers that already affect reconciliation.

Preserve mirrored annotation and label behavior with focused falsification tests
so stale queued requests still reconcile the latest HostedCluster state.

Signed-off-by: Alessandro Affinito <aaffinit@redhat.com>
Commit-Message-Assisted-by: Claude (via Claude Code)
Co-authored-by: Cursor <cursoragent@cursor.com>
Track when release image validation was skipped so same-generation annotation
changes can invalidate a cached True ValidReleaseImage condition.

Add a focused regression test matrix covering skip annotation add and remove
transitions without a generation bump.

Signed-off-by: Alessandro Affinito <aaffinit@redhat.com>
Commit-Message-Assisted-by: Claude (via Claude Code)
Co-authored-by: Cursor <cursoragent@cursor.com>
Update the new HostedCluster predicate code to use the k8sutil-scoped
annotation constants introduced on current main so the branch verifies cleanly
after rebasing.

Co-authored-by: Cursor <cursoragent@cursor.com>
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 5, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 5, 2026

@tuxerrante: This pull request references ARO-26764 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the enhancement to target either version "5.0." or "openshift-5.0.", but it targets "ARO-Installer-4.21" instead.

Details

In response to this:

Summary

  • narrow the shared HostedCluster primary watch so HCP stops reconciling on status-only churn while still enqueueing on explicit metadata-only control inputs
  • keep child-resource watches and API/CRD surfaces unchanged, while preserving actionable annotations, mirrored api.openshift.com/* labels, delete-start handling, and release-image validation semantics
  • add falsification and regression coverage for latest-object-wins behavior, same-generation actionable metadata changes, mirrored label removal, and SkipReleaseImageValidation toggles

Test plan

  • make verify
  • Focused HostedCluster predicate, reconciliation, and release-image validation tests covered through the package test suite exercised by make verify
  • Optional follow-up: manually confirm a draft PR title/body and Jira linkage rendering as expected

Made with Cursor

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 5, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 5, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds prefix-based HostedCluster predicates that trigger on generation, deletionTimestamp, actionable annotation/prefix-family, and actionable label-prefix changes. The controller now uses hostedClusterPrimaryPredicate, mirrors actionable HCP annotations/labels by prefix (removing stale prefixed entries), and refactors release-image validation to be generation- and skip-aware via shouldValidateReleaseImage and hasSkipReleaseImageValidationAnnotation, marking ValidReleaseImage as skipped when annotated.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant API as API Server
    participant Controller as HostedCluster Controller
    participant HC as HostedCluster
    participant HCP as HostedControlPlane
    participant Validator as ReleaseImageValidator

    API->>Controller: HostedCluster update event
    Controller->>Controller: hostedClusterPrimaryPredicate evaluates
    alt Predicate allows update
        Controller->>API: Get HostedCluster
        API-->>Controller: HostedCluster (annotations, labels, generation, deletionTimestamp)
        Controller->>API: Get HostedControlPlane
        API-->>Controller: HostedControlPlane (annotations, labels)
        Controller->>Controller: Compute actionable annotation/label diffs (prefix-aware)
        Controller->>HCP: Remove stale prefixed annotations/labels
        Controller->>HCP: Copy actionable prefixed annotations/labels from HC
        Controller->>Controller: shouldValidateReleaseImage?
        alt Skip annotation present
            Controller->>Controller: Set ValidReleaseImage True (ReleaseImageValidationSkipped)
        else Validate
            Controller->>Validator: validateReleaseImage(releaseImageRef, secrets...)
            alt Validation succeeds
                Validator-->>Controller: success
                Controller->>Controller: Set ValidReleaseImage True (AsExpected)
            else Validation fails
                Validator-->>Controller: error
                Controller->>Controller: Set ValidReleaseImage False (reason)
            end
        end
        Controller->>API: Update HCP and HC status/conditions
        API-->>Controller: persisted
    else Predicate blocks update
        Controller-->>API: no reconcile triggered
    end
Loading

Suggested reviewers

  • xiuwang
  • sdminonne
  • jparrill
🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.35% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Tests use standard Go testing, not Ginkgo. Several assertions lack diagnostic messages to help diagnose failures (e.g., t.Fatalf without context on what condition failed). Add descriptive failure messages to assertions: t.Fatalf("expected condition %s found, got %v", type, actual) to help diagnose test failures.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary change: narrowing the HostedCluster primary watch to only enqueue on actionable metadata updates rather than all status changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names across four test files are stable, deterministic, and use static descriptive strings without dynamic content, generated IDs, timestamps, or pod/node/namespace names.
Microshift Test Compatibility ✅ Passed PR adds no Ginkgo e2e tests—only standard Go unit tests in controllers directory using testing.T interface, not Ginkgo framework.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds only Go unit tests (testing.T) with no Ginkgo e2e tests. Custom check applies only to Ginkgo e2e tests, so not applicable here.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only controller logic (predicates, label sync, validation). No scheduling constraints, pod affinity, topology spread, node selectors, or PDBs are introduced.
Ote Binary Stdout Contract ✅ Passed No OTE Binary Stdout Contract violations found. PR adds only function definitions, constants, and test code with no stdout-writing side effects in process-level code.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed All test files added are standard Go unit tests with in-memory fake clients, not Ginkgo e2e tests. No IPv4 assumptions or external connectivity requirements detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels May 5, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go (1)

2399-2407: ⚡ Quick win

Avoid hardcoded actionable-label prefix in HCP label reconciliation.

Line 2400 and Line 2405 duplicate "api.openshift.com" instead of reusing hostedClusterActionableLabelPrefixes. Keeping one source of truth prevents future drift between watch triggering and label mirroring behavior.

♻️ Suggested refactor
-	for key := range hcp.Labels {
-		if strings.HasPrefix(key, "api.openshift.com") {
-			delete(hcp.Labels, key)
-		}
-	}
+	for key := range hcp.Labels {
+		for _, prefix := range hostedClusterActionableLabelPrefixes {
+			if strings.HasPrefix(key, prefix) {
+				delete(hcp.Labels, key)
+				break
+			}
+		}
+	}
 	for key, val := range hcluster.Labels {
-		if strings.HasPrefix(key, "api.openshift.com") {
-			hcp.Labels[key] = val
-		}
+		for _, prefix := range hostedClusterActionableLabelPrefixes {
+			if strings.HasPrefix(key, prefix) {
+				hcp.Labels[key] = val
+				break
+			}
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`
around lines 2399 - 2407, Replace the two hardcoded "api.openshift.com" checks
with the shared constant hostedClusterActionableLabelPrefixes: iterate the slice
hostedClusterActionableLabelPrefixes when pruning hcp.Labels and when copying
labels from hcluster to hcp (the code blocks that currently loop over hcp.Labels
and hcluster.Labels), check each label key against all prefixes in
hostedClusterActionableLabelPrefixes (e.g., using strings.HasPrefix in an inner
loop or a small helper like hasActionablePrefix(key)), and use that single
source of truth so both the deletion and assignment logic reference
hostedClusterActionableLabelPrefixes instead of the literal string.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 2399-2407: Replace the two hardcoded "api.openshift.com" checks
with the shared constant hostedClusterActionableLabelPrefixes: iterate the slice
hostedClusterActionableLabelPrefixes when pruning hcp.Labels and when copying
labels from hcluster to hcp (the code blocks that currently loop over hcp.Labels
and hcluster.Labels), check each label key against all prefixes in
hostedClusterActionableLabelPrefixes (e.g., using strings.HasPrefix in an inner
loop or a small helper like hasActionablePrefix(key)), and use that single
source of truth so both the deletion and assignment logic reference
hostedClusterActionableLabelPrefixes instead of the literal string.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 186ff352-7783-4ad9-9175-cd058d919415

📥 Commits

Reviewing files that changed from the base of the PR and between e09cc2d and 553451f.

📒 Files selected for processing (6)
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_hcp_labels_test.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_predicates.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_predicates_test.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_reconciliation_condition_test.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_release_image_validation_test.go

@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 83.92857% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 40.42%. Comparing base (e09cc2d) to head (b1377f9).
⚠️ Report is 196 commits behind head on main.

Files with missing lines Patch % Lines
...trollers/hostedcluster/hostedcluster_controller.go 74.35% 9 Missing and 1 partial ⚠️
...trollers/hostedcluster/hostedcluster_predicates.go 89.04% 6 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8424      +/-   ##
==========================================
+ Coverage   37.39%   40.42%   +3.02%     
==========================================
  Files         751      756       +5     
  Lines       91806    93277    +1471     
==========================================
+ Hits        34333    37708    +3375     
+ Misses      54838    52866    -1972     
- Partials     2635     2703      +68     
Files with missing lines Coverage Δ
...trollers/hostedcluster/hostedcluster_predicates.go 89.04% <89.04%> (ø)
...trollers/hostedcluster/hostedcluster_controller.go 43.12% <74.35%> (-0.12%) ⬇️

... and 73 files with indirect coverage changes

Flag Coverage Δ
cmd-support 34.44% <ø> (+1.87%) ⬆️
cpo-hostedcontrolplane 41.76% <ø> (+5.28%) ⬆️
cpo-other 40.31% <ø> (+2.58%) ⬆️
hypershift-operator 50.76% <83.92%> (+2.90%) ⬆️
other 31.58% <ø> (+3.80%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Keep HostedControlPlane label mirroring aligned with the shared actionable label
prefix list so the watch predicate and reconciliation logic stay in sync.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go (1)

1191-1218: 💤 Low value

Optional: gate the validateReleaseImage call on skipReleaseImageValidation for clarity.

When the skip annotation is present, r.validateReleaseImage is still invoked but its result is discarded by the case skipReleaseImageValidation: branch. The function happens to short-circuit internally (line 3708), so this is functionally correct, but the caller reads as if validation always runs. Guarding the call makes intent explicit and avoids relying on the callee's internal guard surviving future edits.

♻️ Suggested refactor
 		if shouldValidateReleaseImage(hcluster, condition) {
 			condition := metav1.Condition{
 				Type:               string(hyperv1.ValidReleaseImage),
 				ObservedGeneration: hcluster.Generation,
 			}
 			skipReleaseImageValidation := hasSkipReleaseImageValidationAnnotation(hcluster)
-			err := r.validateReleaseImage(ctx, hcluster, releaseProvider)
+			var err error
+			if !skipReleaseImageValidation {
+				err = r.validateReleaseImage(ctx, hcluster, releaseProvider)
+			}
 			switch {
 			case skipReleaseImageValidation:
 				condition.Status = metav1.ConditionTrue
 				condition.Message = "Release image validation is skipped by annotation"
 				condition.Reason = releaseImageValidationSkippedReason
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`
around lines 1191 - 1218, The call to r.validateReleaseImage should be avoided
when the skip annotation is present to make the intent explicit: check
hasSkipReleaseImageValidationAnnotation(hcluster) (skipReleaseImageValidation)
before invoking r.validateReleaseImage and, if true, set the metav1.Condition
(Type string(hyperv1.ValidReleaseImage), Status True, Message "Release image
validation is skipped by annotation", Reason
releaseImageValidationSkippedReason) and call meta.SetStatusCondition without
calling validateReleaseImage; otherwise call r.validateReleaseImage and handle
err as currently done (setting Status/Message/Reason or AsExpectedReason) so the
skip path no longer relies on validateReleaseImage's internal short‑circuit.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 1191-1218: The call to r.validateReleaseImage should be avoided
when the skip annotation is present to make the intent explicit: check
hasSkipReleaseImageValidationAnnotation(hcluster) (skipReleaseImageValidation)
before invoking r.validateReleaseImage and, if true, set the metav1.Condition
(Type string(hyperv1.ValidReleaseImage), Status True, Message "Release image
validation is skipped by annotation", Reason
releaseImageValidationSkippedReason) and call meta.SetStatusCondition without
calling validateReleaseImage; otherwise call r.validateReleaseImage and handle
err as currently done (setting Status/Message/Reason or AsExpectedReason) so the
skip path no longer relies on validateReleaseImage's internal short‑circuit.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: f30f4f99-4b2a-4808-a4c7-d870c58c50b9

📥 Commits

Reviewing files that changed from the base of the PR and between 553451f and 1280cf1.

📒 Files selected for processing (1)
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go

@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 6, 2026

@tuxerrante: This pull request references ARO-26764 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the enhancement to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

The HostedCluster controller currently receives reconcile requests from its primary watch for updates that do not represent new desired state, especially status-only churn. That makes the controller noisier than it needs to be and makes it harder to reason about which reconciliations are actually driven by user intent versus controller-written status.

This change narrows only the primary HostedCluster watch so that it continues to reconcile on meaningful inputs, while avoiding self-induced reconciles from status-only updates.

The key design point is that this cannot be implemented as a pure generation-only filter. For HostedCluster, some metadata-only updates are already real control inputs for reconciliation and must continue to enqueue even when the object generation does not change. Examples include specific annotations consumed by reconciliation, mirrored api.openshift.com/* labels, scope transitions, and deletion start.

To keep the behavior correct while reducing noise, this PR introduces a dedicated primary predicate that allows:

  • generation-changing updates
  • deletion timestamp transitions
  • actionable annotation changes
  • actionable mirrored label changes

and filters out status-only updates from the primary watch.

The change is intentionally narrow:

  • child-resource watches are left unchanged
  • the API / CRD surface is unchanged
  • the existing ObservedGeneration status model remains in place
  • the reconcile loop still re-reads the current HostedCluster, so stale queued requests continue to act on the latest available object state

Alongside the watch change, this PR fixes two same-area behaviors that became important once metadata-only updates were treated more explicitly:

  1. ValidReleaseImage is re-evaluated when SkipReleaseImageValidation is added or removed without a generation bump, so a previously cached True condition cannot remain stale.
  2. mirrored api.openshift.com/* labels are actively removed from HostedControlPlane when they are removed from the HostedCluster, instead of only being copied on add/update.

Overall, the goal is to reduce unnecessary reconciliations without weakening any existing metadata-driven control path.

Which issue(s) this PR fixes:

Fixes ARO-26764

Special notes for your reviewer:

This PR is easier to review if you read it as a queueing-boundary change rather than as a reconcile algorithm rewrite.

The new predicate does not try to guarantee one reconcile per generation. Instead, it makes the enqueue boundary stricter and more explicit. The "latest object wins" behavior still comes from the existing reconcile entrypoint, which fetches the current HostedCluster by NamespacedName when the request is processed. That means older queued requests still reconcile against the latest stored object state, not the old event payload.

Because of that, the review focus should be:

  • whether the allowlist of actionable metadata inputs is complete
  • whether status-only updates are now correctly filtered out from the primary watch
  • whether same-generation metadata changes still propagate correctly
  • whether the blast radius is appropriately contained to the shared HostedCluster primary watch and adjacent regression fixes

Expected impact:

  • fewer self-induced HostedCluster reconciliations from status-only updates
  • no intended behavior change for spec updates
  • no intended behavior change for metadata-only inputs that already affect reconciliation
  • HCP-wide effect because this path is shared across Hosted Control Plane distributions, not specific to one platform

Test plan:

  • make verify
  • predicate unit tests cover generation changes, status-only updates, actionable annotations, actionable labels, deletion timestamp changes, and scope transitions
  • falsification tests verify that stale queued requests still reconcile the latest HostedCluster generation and latest actionable metadata
  • regression tests cover same-generation SkipReleaseImageValidation toggles
  • regression tests cover removal of mirrored api.openshift.com/* labels from HostedControlPlane

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Replace the dynamic platform override subtest titles with explicit static names
so the HostedCluster predicate tests satisfy deterministic naming checks
without changing the existing assertion style used in this package.

Signed-off-by: Alessandro Affinito <aaffinit@redhat.com>
Commit-Message-Assisted-by: Claude (via Claude Code)
Co-authored-by: Cursor <cursoragent@cursor.com>
@tuxerrante tuxerrante marked this pull request as ready for review May 6, 2026 09:18
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 6, 2026
@openshift-ci openshift-ci Bot requested review from Nirshal and muraee May 6, 2026 09:19
Copy link
Copy Markdown
Contributor

@jparrill jparrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped some comments. Thanks

Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_predicates.go Outdated
Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_predicates.go Outdated
Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go Outdated
Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go Outdated
Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_predicates_test.go Outdated
Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_predicates_test.go Outdated
tuxerrante and others added 4 commits May 22, 2026 11:31
…neric names

annotationPrefixChanged and prefixedAnnotations are used for both
annotations and labels. Rename to prefixedKeyChanged and prefixedEntries
with generic parameter names to avoid reader confusion when the
functions are called with label maps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on is present

Guard the validateReleaseImage call so it is not invoked when the skip
annotation is set. Previously the call ran but its result was discarded
by the switch branch, relying on an internal short-circuit. Making the
guard explicit avoids unnecessary work and makes intent clearer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the local ptrToTime helper with ptr.To from k8s.io/utils/ptr,
which is already a dependency in this package. Rename the predicate
local variable to pred so it does not shadow the controller-runtime
predicate package import.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_predicates.go`:
- Around line 184-185: The predicate uses prefixedKeyChanged(oldLabels,
newLabels, prefix) with a loose prefix (e.g., "api.openshift.com") which matches
keys like "api.openshift.comfoo"; update the call/sites where prefix is defined
so the prefix includes a delimiter (use "api.openshift.com/" or similar) and
ensure prefixedKeyChanged compares with this delimiter-qualified prefix so only
keys in the api.openshift.com/* label family match; modify the prefix variable
passed to prefixedKeyChanged and any related construction logic in
hostedcluster_predicates.go to use the slash-terminated prefix.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 3b8d370c-7d3f-4346-8bfe-046cfe70cde6

📥 Commits

Reviewing files that changed from the base of the PR and between 037e346 and 18d2ac0.

📒 Files selected for processing (3)
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_predicates.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_predicates_test.go

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 22, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tuxerrante
Once this PR has been reviewed and has the lgtm label, please assign sjenning for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the area/api Indicates the PR includes changes for the API label May 22, 2026
tuxerrante and others added 2 commits May 22, 2026 12:29
…tant

Move the hardcoded "api.openshift.com/" label prefix to the API package
as OCMLabelPrefix so consumers outside the controller (ARO-HCP, OCM)
have a single source of truth for the prefix used to mirror labels from
HostedCluster to HostedControlPlane.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…to API package

Move releaseImageValidationSkippedReason from a controller-private
constant to the API package as ReleaseImageValidationSkippedReason,
alongside the existing AsExpectedReason and InvalidImageReason. This
allows external consumers (ARO-HCP, OCM) to inspect the ValidReleaseImage
condition reason programmatically without hardcoding the string.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tuxerrante tuxerrante force-pushed the tuxerrante/hostedcluster-generation-sync branch from e0e1fc8 to b1377f9 Compare May 22, 2026 10:29
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 22, 2026

@tuxerrante: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hypershift-jira-solve-ci
Copy link
Copy Markdown

hypershift-jira-solve-ci Bot commented May 22, 2026

Now I have all the evidence needed. Let me produce the final report.

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

Integration test for component hypershift-operator-main snapshot hypershift-operator-20260522-103006-000
and scenario hypershift-operator-main-enterprise-contract has failed

verify task: 254 success(es), 24 warning(s), 2 failure(s)

Summary

Both Konflux Enterprise Contract checks failed because the PR branch is 196 commits behind main and does not include two critical .tekton/ pipeline fixes merged on May 20, 2026. Specifically, the branch is missing (1) the enable-package-registry-proxy parameter addition to the prefetch-dependencies-oci-ta task, and (2) the Tekton task bundle digest updates including the build-image-index version bump from 0.2→0.3. These same checks pass on main (512 successes, 20 warnings, 0 failures) because main includes both fixes. The 2 EC failures correspond to trusted-task policy violations caused by outdated/untrusted task bundle references in the stale PR branch.

Root Cause

The PR branch tuxerrante/hostedcluster-generation-sync diverged from main on 2026-05-05 (merge base e09cc2d) and has not been rebased since. It is 196 commits behind main.

On 2026-05-20, two commits were merged to main that updated .tekton/pipelines/common-operator-build.yaml:

  1. Commit b424129fix(tekton): enable package registry proxy in prefetch-dependencies task

    • Added enable-package-registry-proxy: "true" to the prefetch-dependencies task parameters
    • The commit message explicitly states: "The enterprise contract check requires the prefetch-dependencies-oci-ta task to have enable-package-registry-proxy set to true"
  2. Commit c3cc46cchore: update Konflux Tekton task bundles

    • Updated 38 Tekton task bundle digest references across pipeline files
    • Includes 3 version bumps: build-image-index 0.2→0.3 (with removal of deprecated COMMIT_SHA and IMAGE_EXPIRES_AFTER params), clamav-scan 0.3→0.3.1, rpms-signature-scan 0.2→0.2.1
    • Updated 32 digest-only references

Because the PR branch lacks these updates, its build pipeline uses:

  • task-build-image-index:0.2@sha256:c7b0f7e1... (deprecated, removed from trusted task list) instead of 0.3@sha256:b33bfa8d...
  • task-prefetch-dependencies-oci-ta:0.3@sha256:a579d00f... without the required enable-package-registry-proxy parameter, instead of @sha256:a2efbcdc... with it

The Enterprise Contract verify step checks all task references against the Konflux trusted task list and validates required parameters. The 2 failures are trusted_task.trusted violations for tasks that are no longer in the trusted list or are missing required configuration.

The build pipeline itself succeeds (the hypershift-operator-main-on-pull-request check passes) because Tekton executes whatever tasks are referenced. The EC check is a post-build policy gate that verifies supply chain compliance — and the stale task references violate that policy.

Recommendations
  1. Rebase the PR branch onto main — This is the only fix needed. Run:

    git fetch upstream main
    git rebase upstream/main
    git push --force-with-lease

    This will pull in the 196 missing commits, including both .tekton/ fixes.

  2. No code changes required — The PR's Go source changes (hostedcluster controller predicates, release image validation, annotations) are unrelated to the EC failures. The failures are entirely caused by stale pipeline configuration files.

  3. Going forward — Consider rebasing long-lived feature branches periodically (at least weekly) to pick up infrastructure and CI pipeline updates. The PR was created on May 5 and the last push was May 22 — a 17-day gap during which the Tekton pipeline configuration diverged significantly.

Evidence
Evidence Detail
PR branch merge base e09cc2d (2026-05-05), 196 commits behind main
Missing fix 1 enable-package-registry-proxy: "true" not present in PR's prefetch-dependencies task (commit b424129, 2026-05-20)
Missing fix 2 38 Tekton task bundle digests outdated, including build-image-index 0.2→0.3 version bump (commit c3cc46c, 2026-05-20)
PR task digest (build-image-index) quay.io/konflux-ci/tekton-catalog/task-build-image-index:0.2@sha256:c7b0f7e1f743...
Main task digest (build-image-index) quay.io/konflux-ci/tekton-catalog/task-build-image-index:0.3@sha256:b33bfa8dc27d...
PR task digest (prefetch-deps) ...task-prefetch-dependencies-oci-ta:0.3@sha256:a579d00f... (missing enable-package-registry-proxy)
Main task digest (prefetch-deps) ...task-prefetch-dependencies-oci-ta:0.3@sha256:a2efbcdc... (with enable-package-registry-proxy: "true")
EC result on main (push) 512 successes, 20 warnings, 0 failures → conclusion: neutral
EC result on PR #8424 254 successes, 24 warnings, 2 failures → conclusion: failure
Build pipeline on PR hypershift-operator-main-on-pull-requestpassed (build itself works fine)
EC pipeline run 1 hypershift-operator-main-enterprise-contract-flx8s
EC pipeline run 2 hypershift-operator-enterprise-contract-r9hwm
Both jobs identical result 254 success / 24 warning / 2 failure — same 2 violations in both EC scenarios

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api Indicates the PR includes changes for the API area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants