ARO-26764: narrow HostedCluster primary watch to actionable metadata updates#8424
ARO-26764: narrow HostedCluster primary watch to actionable metadata updates#8424tuxerrante wants to merge 11 commits into
Conversation
Reduce self-induced HostedCluster reconciles by filtering the primary watch to spec changes and explicit metadata triggers that already affect reconciliation. Preserve mirrored annotation and label behavior with focused falsification tests so stale queued requests still reconcile the latest HostedCluster state. Signed-off-by: Alessandro Affinito <aaffinit@redhat.com> Commit-Message-Assisted-by: Claude (via Claude Code) Co-authored-by: Cursor <cursoragent@cursor.com>
Track when release image validation was skipped so same-generation annotation changes can invalidate a cached True ValidReleaseImage condition. Add a focused regression test matrix covering skip annotation add and remove transitions without a generation bump. Signed-off-by: Alessandro Affinito <aaffinit@redhat.com> Commit-Message-Assisted-by: Claude (via Claude Code) Co-authored-by: Cursor <cursoragent@cursor.com>
Update the new HostedCluster predicate code to use the k8sutil-scoped annotation constants introduced on current main so the branch verifies cleanly after rebasing. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@tuxerrante: This pull request references ARO-26764 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the enhancement to target either version "5.0." or "openshift-5.0.", but it targets "ARO-Installer-4.21" instead. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds prefix-based HostedCluster predicates that trigger on generation, deletionTimestamp, actionable annotation/prefix-family, and actionable label-prefix changes. The controller now uses hostedClusterPrimaryPredicate, mirrors actionable HCP annotations/labels by prefix (removing stale prefixed entries), and refactors release-image validation to be generation- and skip-aware via shouldValidateReleaseImage and hasSkipReleaseImageValidationAnnotation, marking ValidReleaseImage as skipped when annotated. Sequence Diagram(s)sequenceDiagram
autonumber
participant API as API Server
participant Controller as HostedCluster Controller
participant HC as HostedCluster
participant HCP as HostedControlPlane
participant Validator as ReleaseImageValidator
API->>Controller: HostedCluster update event
Controller->>Controller: hostedClusterPrimaryPredicate evaluates
alt Predicate allows update
Controller->>API: Get HostedCluster
API-->>Controller: HostedCluster (annotations, labels, generation, deletionTimestamp)
Controller->>API: Get HostedControlPlane
API-->>Controller: HostedControlPlane (annotations, labels)
Controller->>Controller: Compute actionable annotation/label diffs (prefix-aware)
Controller->>HCP: Remove stale prefixed annotations/labels
Controller->>HCP: Copy actionable prefixed annotations/labels from HC
Controller->>Controller: shouldValidateReleaseImage?
alt Skip annotation present
Controller->>Controller: Set ValidReleaseImage True (ReleaseImageValidationSkipped)
else Validate
Controller->>Validator: validateReleaseImage(releaseImageRef, secrets...)
alt Validation succeeds
Validator-->>Controller: success
Controller->>Controller: Set ValidReleaseImage True (AsExpected)
else Validation fails
Validator-->>Controller: error
Controller->>Controller: Set ValidReleaseImage False (reason)
end
end
Controller->>API: Update HCP and HC status/conditions
API-->>Controller: persisted
else Predicate blocks update
Controller-->>API: no reconcile triggered
end
Suggested reviewers
🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go (1)
2399-2407: ⚡ Quick winAvoid hardcoded actionable-label prefix in HCP label reconciliation.
Line 2400 and Line 2405 duplicate
"api.openshift.com"instead of reusinghostedClusterActionableLabelPrefixes. Keeping one source of truth prevents future drift between watch triggering and label mirroring behavior.♻️ Suggested refactor
- for key := range hcp.Labels { - if strings.HasPrefix(key, "api.openshift.com") { - delete(hcp.Labels, key) - } - } + for key := range hcp.Labels { + for _, prefix := range hostedClusterActionableLabelPrefixes { + if strings.HasPrefix(key, prefix) { + delete(hcp.Labels, key) + break + } + } + } for key, val := range hcluster.Labels { - if strings.HasPrefix(key, "api.openshift.com") { - hcp.Labels[key] = val - } + for _, prefix := range hostedClusterActionableLabelPrefixes { + if strings.HasPrefix(key, prefix) { + hcp.Labels[key] = val + break + } + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go` around lines 2399 - 2407, Replace the two hardcoded "api.openshift.com" checks with the shared constant hostedClusterActionableLabelPrefixes: iterate the slice hostedClusterActionableLabelPrefixes when pruning hcp.Labels and when copying labels from hcluster to hcp (the code blocks that currently loop over hcp.Labels and hcluster.Labels), check each label key against all prefixes in hostedClusterActionableLabelPrefixes (e.g., using strings.HasPrefix in an inner loop or a small helper like hasActionablePrefix(key)), and use that single source of truth so both the deletion and assignment logic reference hostedClusterActionableLabelPrefixes instead of the literal string.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 2399-2407: Replace the two hardcoded "api.openshift.com" checks
with the shared constant hostedClusterActionableLabelPrefixes: iterate the slice
hostedClusterActionableLabelPrefixes when pruning hcp.Labels and when copying
labels from hcluster to hcp (the code blocks that currently loop over hcp.Labels
and hcluster.Labels), check each label key against all prefixes in
hostedClusterActionableLabelPrefixes (e.g., using strings.HasPrefix in an inner
loop or a small helper like hasActionablePrefix(key)), and use that single
source of truth so both the deletion and assignment logic reference
hostedClusterActionableLabelPrefixes instead of the literal string.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 186ff352-7783-4ad9-9175-cd058d919415
📒 Files selected for processing (6)
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.gohypershift-operator/controllers/hostedcluster/hostedcluster_hcp_labels_test.gohypershift-operator/controllers/hostedcluster/hostedcluster_predicates.gohypershift-operator/controllers/hostedcluster/hostedcluster_predicates_test.gohypershift-operator/controllers/hostedcluster/hostedcluster_reconciliation_condition_test.gohypershift-operator/controllers/hostedcluster/hostedcluster_release_image_validation_test.go
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8424 +/- ##
==========================================
+ Coverage 37.39% 40.42% +3.02%
==========================================
Files 751 756 +5
Lines 91806 93277 +1471
==========================================
+ Hits 34333 37708 +3375
+ Misses 54838 52866 -1972
- Partials 2635 2703 +68
... and 73 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Keep HostedControlPlane label mirroring aligned with the shared actionable label prefix list so the watch predicate and reconciliation logic stay in sync. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go (1)
1191-1218: 💤 Low valueOptional: gate the
validateReleaseImagecall onskipReleaseImageValidationfor clarity.When the skip annotation is present,
r.validateReleaseImageis still invoked but its result is discarded by thecase skipReleaseImageValidation:branch. The function happens to short-circuit internally (line 3708), so this is functionally correct, but the caller reads as if validation always runs. Guarding the call makes intent explicit and avoids relying on the callee's internal guard surviving future edits.♻️ Suggested refactor
if shouldValidateReleaseImage(hcluster, condition) { condition := metav1.Condition{ Type: string(hyperv1.ValidReleaseImage), ObservedGeneration: hcluster.Generation, } skipReleaseImageValidation := hasSkipReleaseImageValidationAnnotation(hcluster) - err := r.validateReleaseImage(ctx, hcluster, releaseProvider) + var err error + if !skipReleaseImageValidation { + err = r.validateReleaseImage(ctx, hcluster, releaseProvider) + } switch { case skipReleaseImageValidation: condition.Status = metav1.ConditionTrue condition.Message = "Release image validation is skipped by annotation" condition.Reason = releaseImageValidationSkippedReason🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go` around lines 1191 - 1218, The call to r.validateReleaseImage should be avoided when the skip annotation is present to make the intent explicit: check hasSkipReleaseImageValidationAnnotation(hcluster) (skipReleaseImageValidation) before invoking r.validateReleaseImage and, if true, set the metav1.Condition (Type string(hyperv1.ValidReleaseImage), Status True, Message "Release image validation is skipped by annotation", Reason releaseImageValidationSkippedReason) and call meta.SetStatusCondition without calling validateReleaseImage; otherwise call r.validateReleaseImage and handle err as currently done (setting Status/Message/Reason or AsExpectedReason) so the skip path no longer relies on validateReleaseImage's internal short‑circuit.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 1191-1218: The call to r.validateReleaseImage should be avoided
when the skip annotation is present to make the intent explicit: check
hasSkipReleaseImageValidationAnnotation(hcluster) (skipReleaseImageValidation)
before invoking r.validateReleaseImage and, if true, set the metav1.Condition
(Type string(hyperv1.ValidReleaseImage), Status True, Message "Release image
validation is skipped by annotation", Reason
releaseImageValidationSkippedReason) and call meta.SetStatusCondition without
calling validateReleaseImage; otherwise call r.validateReleaseImage and handle
err as currently done (setting Status/Message/Reason or AsExpectedReason) so the
skip path no longer relies on validateReleaseImage's internal short‑circuit.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: f30f4f99-4b2a-4808-a4c7-d870c58c50b9
📒 Files selected for processing (1)
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
|
@tuxerrante: This pull request references ARO-26764 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the enhancement to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Replace the dynamic platform override subtest titles with explicit static names so the HostedCluster predicate tests satisfy deterministic naming checks without changing the existing assertion style used in this package. Signed-off-by: Alessandro Affinito <aaffinit@redhat.com> Commit-Message-Assisted-by: Claude (via Claude Code) Co-authored-by: Cursor <cursoragent@cursor.com>
jparrill
left a comment
There was a problem hiding this comment.
Dropped some comments. Thanks
…neric names annotationPrefixChanged and prefixedAnnotations are used for both annotations and labels. Rename to prefixedKeyChanged and prefixedEntries with generic parameter names to avoid reader confusion when the functions are called with label maps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on is present Guard the validateReleaseImage call so it is not invoked when the skip annotation is set. Previously the call ran but its result was discarded by the switch branch, relying on an internal short-circuit. Making the guard explicit avoids unnecessary work and makes intent clearer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the local ptrToTime helper with ptr.To from k8s.io/utils/ptr, which is already a dependency in this package. Rename the predicate local variable to pred so it does not shadow the controller-runtime predicate package import. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_predicates.go`:
- Around line 184-185: The predicate uses prefixedKeyChanged(oldLabels,
newLabels, prefix) with a loose prefix (e.g., "api.openshift.com") which matches
keys like "api.openshift.comfoo"; update the call/sites where prefix is defined
so the prefix includes a delimiter (use "api.openshift.com/" or similar) and
ensure prefixedKeyChanged compares with this delimiter-qualified prefix so only
keys in the api.openshift.com/* label family match; modify the prefix variable
passed to prefixedKeyChanged and any related construction logic in
hostedcluster_predicates.go to use the slash-terminated prefix.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 3b8d370c-7d3f-4346-8bfe-046cfe70cde6
📒 Files selected for processing (3)
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.gohypershift-operator/controllers/hostedcluster/hostedcluster_predicates.gohypershift-operator/controllers/hostedcluster/hostedcluster_predicates_test.go
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: tuxerrante The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
…tant Move the hardcoded "api.openshift.com/" label prefix to the API package as OCMLabelPrefix so consumers outside the controller (ARO-HCP, OCM) have a single source of truth for the prefix used to mirror labels from HostedCluster to HostedControlPlane. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…to API package Move releaseImageValidationSkippedReason from a controller-private constant to the API package as ReleaseImageValidationSkippedReason, alongside the existing AsExpectedReason and InvalidImageReason. This allows external consumers (ARO-HCP, OCM) to inspect the ValidReleaseImage condition reason programmatically without hardcoding the string. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e0e1fc8 to
b1377f9
Compare
|
@tuxerrante: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Now I have all the evidence needed. Let me produce the final report. Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryBoth Konflux Enterprise Contract checks failed because the PR branch is 196 commits behind Root CauseThe PR branch On 2026-05-20, two commits were merged to
Because the PR branch lacks these updates, its build pipeline uses:
The Enterprise Contract The build pipeline itself succeeds (the Recommendations
Evidence
|
What this PR does / why we need it:
The
HostedClustercontroller currently receives reconcile requests from its primary watch for updates that do not represent new desired state, especially status-only churn. That makes the controller noisier than it needs to be and makes it harder to reason about which reconciliations are actually driven by user intent versus controller-written status.This change narrows only the primary
HostedClusterwatch so that it continues to reconcile on meaningful inputs, while avoiding self-induced reconciles from status-only updates.The key design point is that this cannot be implemented as a pure generation-only filter. For
HostedCluster, some metadata-only updates are already real control inputs for reconciliation and must continue to enqueue even when the object generation does not change. Examples include specific annotations consumed by reconciliation, mirroredapi.openshift.com/*labels, scope transitions, and deletion start.To keep the behavior correct while reducing noise, this PR introduces a dedicated primary predicate that allows:
and filters out status-only updates from the primary watch.
The change is intentionally narrow:
ObservedGenerationstatus model remains in placeHostedCluster, so stale queued requests continue to act on the latest available object stateAlongside the watch change, this PR fixes two same-area behaviors that became important once metadata-only updates were treated more explicitly:
ValidReleaseImageis re-evaluated whenSkipReleaseImageValidationis added or removed without a generation bump, so a previously cachedTruecondition cannot remain stale.api.openshift.com/*labels are actively removed fromHostedControlPlanewhen they are removed from theHostedCluster, instead of only being copied on add/update.Overall, the goal is to reduce unnecessary reconciliations without weakening any existing metadata-driven control path.
Which issue(s) this PR fixes:
Fixes ARO-26764
Notes:
This PR is easier to review if you read it as a queueing-boundary change rather than as a reconcile algorithm rewrite.
The new predicate does not try to guarantee one reconcile per generation. Instead, it makes the enqueue boundary stricter and more explicit. The "latest object wins" behavior still comes from the existing reconcile entrypoint, which fetches the current
HostedClusterbyNamespacedNamewhen the request is processed. That means older queued requests still reconcile against the latest stored object state, not the old event payload.Because of that, the review focus should be:
HostedClusterprimary watch and adjacent regression fixesExpected impact:
HostedClusterreconciliations from status-only updatesTest plan:
make verifyHostedClustergeneration and latest actionable metadataSkipReleaseImageValidationtogglesapi.openshift.com/*labels fromHostedControlPlaneChecklist:
Summary by CodeRabbit
New Features
Bug Fixes
Tests