OCPBUGS-77856: fix: use NodePort for HCP router Service on non-cloud platforms#8439
OCPBUGS-77856: fix: use NodePort for HCP router Service on non-cloud platforms#8439vsolanki12 wants to merge 1 commit into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@vsolanki12: This pull request references Jira Issue OCPBUGS-77856, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (yli2@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughReconcileRouterService now sets svc.Spec.Type based on HostedControlPlane platform: ServiceTypeNodePort for AgentPlatform, KubevirtPlatform, and NonePlatform; ServiceTypeLoadBalancer for other platforms. The LoadBalancerSourceRanges population is only applied when the service is external, not ARO HCP, and svc.Spec.Type == LoadBalancer. reconcileRouterServiceStatus now handles NodePort services by returning svc.Spec.ClusterIP when the first port has an assigned NodePort (empty host if not assigned). ReconcileServiceStatus (Route publishing) returns the Route hostname and port 443 early for NodePort services with an assigned NodePort. Tests were added/updated for these cases. Sequence Diagram(s)sequenceDiagram
participant Reconciler
participant HCP as "HostedControlPlane"
participant K8sAPI as "Kubernetes API"
participant Service as "Service"
Note over Reconciler,HCP: Platform-aware service type selection
Reconciler->>HCP: read hcp.Spec.Platform.Type
alt platform is Agent/Kubevirt/None
Reconciler->>Service: set spec.type = NodePort
else other platforms
Reconciler->>Service: set spec.type = LoadBalancer
end
Reconciler->>K8sAPI: apply Service
sequenceDiagram
participant Reconciler
participant K8sAPI as "Kubernetes API"
participant Service as "Service"
participant RoutePub as "Route publishing"
Note over Reconciler,Service: ReconcileServiceStatus (Route publishing case)
Reconciler->>K8sAPI: get Service
alt Service.spec.type == NodePort and Service.spec.ports[0].nodePort != 0
Reconciler->>RoutePub: read strategy.Route.Hostname
Reconciler->>Reconciler: set port = 443
Reconciler-->>Caller: return host, port (early)
else
Reconciler->>K8sAPI: collect LoadBalancer status/messages
Reconciler-->>Caller: return LB-derived host/port or empty host
end
Possibly related PRs
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (10 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
610f561 to
ccc8bcb
Compare
|
@vsolanki12: This pull request references Jira Issue OCPBUGS-77856, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (yli2@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8439 +/- ##
==========================================
+ Coverage 37.44% 40.43% +2.98%
==========================================
Files 751 755 +4
Lines 91969 93254 +1285
==========================================
+ Hits 34435 37703 +3268
+ Misses 54894 52849 -2045
- Partials 2640 2702 +62
... and 71 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@control-plane-operator/controllers/hostedcontrolplane/kas/service.go`:
- Around line 170-173: The NodePort branch dereferences strategy.Route without
checking for nil, which can panic; update the NodePort handling in service.go so
you first check if strategy.Route is non-nil (and optionally that
strategy.Route.Hostname is non-empty) before reading strategy.Route.Hostname and
assigning host/port—e.g. wrap the host = strategy.Route.Hostname and port = 443
assignments inside an if strategy.Route != nil { ... } block (keeping the
existing svc.Spec.Ports and NodePort checks intact) so no nil dereference
occurs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 34176f5e-a42c-4fab-9813-0c0e14dc16ec
📒 Files selected for processing (4)
control-plane-operator/controllers/hostedcontrolplane/infra/infra.gocontrol-plane-operator/controllers/hostedcontrolplane/ingress/router.gocontrol-plane-operator/controllers/hostedcontrolplane/ingress/router_test.gocontrol-plane-operator/controllers/hostedcontrolplane/kas/service.go
|
/auto-cc |
d8ef2fb to
96922b5
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
control-plane-operator/controllers/hostedcontrolplane/kas/service.go (1)
169-182:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winAdd nil guards for
strategy.Routeon both lines 172 and 180.Following up on the previous discussion: you're correct that line 180 also dereferences
strategy.Route.Hostnamewithout a nil check, which creates the same risk. The pattern inReconcileKonnectivityServerServiceStatus(line 394) demonstrates thatstrategy.Routecan indeed be nil even whenstrategy.Type == hyperv1.Route, so relying on an implicit API contract isn't safe.For consistency and safety, both the new NodePort branch (line 172) and the existing LoadBalancer branch (line 180) should add defensive nil checks, following the established pattern in this file.
🔒 Proposed fix to harden both branches
case hyperv1.Route: if svc.Spec.Type == corev1.ServiceTypeNodePort { if len(svc.Spec.Ports) > 0 && svc.Spec.Ports[0].NodePort != 0 { - host = strategy.Route.Hostname - port = 443 + if strategy.Route != nil && strategy.Route.Hostname != "" { + host = strategy.Route.Hostname + port = 443 + } } return } if message, err := k8sutil.CollectLBMessageIfNotProvisioned(svc, messageCollector); err != nil || message != "" { return host, port, message, err } - host = strategy.Route.Hostname + if strategy.Route != nil && strategy.Route.Hostname != "" { + host = strategy.Route.Hostname + } port = 443🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@control-plane-operator/controllers/hostedcontrolplane/kas/service.go` around lines 169 - 182, In the hyperv1.Route case inside service.go, add defensive nil checks for strategy.Route before dereferencing strategy.Route.Hostname in both the NodePort branch and the LoadBalancer branch: in the NodePort branch (where svc.Spec.Type == corev1.ServiceTypeNodePort) check if strategy.Route == nil and return early (matching the file's established pattern) instead of using strategy.Route.Hostname, and do the same before setting host = strategy.Route.Hostname in the LoadBalancer path (after the k8sutil.CollectLBMessageIfNotProvisioned call); reference the case hyperv1.Route block, strategy.Route.Hostname, svc.Spec.Type == corev1.ServiceTypeNodePort, and k8sutil.CollectLBMessageIfNotProvisioned to locate where to insert the nil guards.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@control-plane-operator/controllers/hostedcontrolplane/kas/service.go`:
- Around line 169-182: In the hyperv1.Route case inside service.go, add
defensive nil checks for strategy.Route before dereferencing
strategy.Route.Hostname in both the NodePort branch and the LoadBalancer branch:
in the NodePort branch (where svc.Spec.Type == corev1.ServiceTypeNodePort) check
if strategy.Route == nil and return early (matching the file's established
pattern) instead of using strategy.Route.Hostname, and do the same before
setting host = strategy.Route.Hostname in the LoadBalancer path (after the
k8sutil.CollectLBMessageIfNotProvisioned call); reference the case hyperv1.Route
block, strategy.Route.Hostname, svc.Spec.Type == corev1.ServiceTypeNodePort, and
k8sutil.CollectLBMessageIfNotProvisioned to locate where to insert the nil
guards.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: c00ce44a-273b-4d01-a53f-6741571f8800
📒 Files selected for processing (6)
control-plane-operator/controllers/hostedcontrolplane/infra/infra.gocontrol-plane-operator/controllers/hostedcontrolplane/infra/infra_test.gocontrol-plane-operator/controllers/hostedcontrolplane/ingress/router.gocontrol-plane-operator/controllers/hostedcontrolplane/ingress/router_test.gocontrol-plane-operator/controllers/hostedcontrolplane/kas/service.gocontrol-plane-operator/controllers/hostedcontrolplane/kas/service_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
- control-plane-operator/controllers/hostedcontrolplane/ingress/router_test.go
- control-plane-operator/controllers/hostedcontrolplane/kas/service_test.go
- control-plane-operator/controllers/hostedcontrolplane/infra/infra.go
- control-plane-operator/controllers/hostedcontrolplane/infra/infra_test.go
- control-plane-operator/controllers/hostedcontrolplane/ingress/router.go
|
Please @vsolanki12 do you mind address the reviews from Coderrabit bot? |
| case hyperv1.Route: | ||
| if svc.Spec.Type == corev1.ServiceTypeNodePort { | ||
| if len(svc.Spec.Ports) > 0 && svc.Spec.Ports[0].NodePort != 0 { | ||
| host = strategy.Route.Hostname |
There was a problem hiding this comment.
Hey, strategy.Route is an optional pointer — if it's ever nil here we'd get a panic. I noticed that ReconcileKonnectivityServerServiceStatus in this same file does check for nil before accessing .Hostname. Might be worth adding the same guard here just to be safe:
if strategy.Route != nil {
host = strategy.Route.Hostname
port = 443
}(line 180 below has the same pre-existing issue, but that's a separate thing)
There was a problem hiding this comment.
Thank you, Added the nil check on both line 172 and line 180 as per the suggestion.
| svc.Spec.Type = corev1.ServiceTypeLoadBalancer | ||
| switch hcp.Spec.Platform.Type { | ||
| case hyperv1.AgentPlatform, hyperv1.KubevirtPlatform, hyperv1.OpenStackPlatform, hyperv1.NonePlatform: | ||
| svc.Spec.Type = corev1.ServiceTypeNodePort |
There was a problem hiding this comment.
I might be wrong here, but doesn't OpenStack have LB support through Octavia? If there are existing HyperShift-on-OpenStack deployments where the LB actually works today, switching them to NodePort would break things.
Have you checked with the OpenStack folks whether their management clusters typically have a cloud-controller-manager configured? If they do, we might want to leave OpenStack out of this list.
Also, I see the same four-platform grouping in support/netutil/visibility.go:LabelHCPRoutes(). Not necessarily for this PR, but eventually it could be worth extracting a shared helper so the two lists don't drift apart when someone adds a new platform.
There was a problem hiding this comment.
OpenStack indeed supports LB through Octavia, but because OpenStack being OpenStack, you have no guarantee the service is enabled. This shouldn't be a problem for RH OpenStack, where Octavia is generally enabled.
So, perhaps the logic for OpenStack should be updated to dynamically check for an Octavia endoint?
There was a problem hiding this comment.
| } | ||
| svc.Spec.Selector = hcpRouterLabels() | ||
| foundHTTPS := false | ||
|
|
There was a problem hiding this comment.
Small thing — now that some services will be NodePort, LoadBalancerSourceRanges (line 98) gets set on them too. Kubernetes ignores it for non-LB types so it won't break anything, but it's a bit confusing if someone inspects the service. Maybe worth gating this on svc.Spec.Type == corev1.ServiceTypeLoadBalancer?
There was a problem hiding this comment.
Thanks for highlighting this, I have made the changes as per suggestion.
|
Dropped some comments. Thanks! |
96922b5 to
fb5f71e
Compare
|
/retest |
|
/test e2e-azure-v2-self-managed |
| case hyperv1.AgentPlatform, hyperv1.KubevirtPlatform, hyperv1.NonePlatform: | ||
| svc.Spec.Type = corev1.ServiceTypeNodePort | ||
| default: | ||
| svc.Spec.Type = corev1.ServiceTypeLoadBalancer |
There was a problem hiding this comment.
nit: this platform list will need updating if a new non-cloud platform is added. Not blocking, just noting it's not centralized with LabelHCPRoutes() in support/netutil/visibility.go.
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jparrill, vsolanki12 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e-azure-v2-self-managed |
| svc.Labels[k] = v | ||
| } | ||
| svc.Spec.Type = corev1.ServiceTypeLoadBalancer | ||
| switch hcp.Spec.Platform.Type { |
There was a problem hiding this comment.
My main concern with this change is that we will change the Service type for an existing HostedCluster. That could be unexpected on an upgrade. My preference would be to set the service type if creating the service. If an existing service, leave the type alone.
There was a problem hiding this comment.
Thank you @csrwng , wrapped the type assignment in a svc.Spec.Type == "" check so it only applies on initial creation. Existing services will keep their current type on upgrade. Added a test to verify an existing LoadBalancer service on a non-cloud platform stays LoadBalancer after reconcile.
On non-cloud platforms (Agent, KubeVirt, OpenStack, None), the HCP router Service was unconditionally created as LoadBalancer. Without a cloud LB controller, the Service stays pending forever, blocking InfrastructureReady. - ReconcileRouterService: use NodePort for non-cloud platforms - reconcileRouterServiceStatus: guard against NodePort before calling CollectLBMessageIfNotProvisioned - kas.ReconcileServiceStatus: add NodePort guard in Route case Signed-off-by: Vimal Solanki <vsolanki@redhat.com> Closes: OCPBUGS-77856
fb5f71e to
d926fc8
Compare
|
@vsolanki12: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThese two Enterprise Contract (EC) failures are not caused by PR #8439's code changes. They are a pre-existing infrastructure issue caused by a Konflux EC policy enforcement change that occurred between 06:55 and 08:46 UTC on May 22, 2026. Prior to this policy update, the 2 Root CauseThe root cause is a Konflux Enterprise Contract policy enforcement change combined with pre-existing container image compliance gaps in the Policy change timeline:
Pre-existing compliance gaps (confirmed from build attestations):
These same 2 preflight failures exist in:
Why the control-plane-operator passes: The Recommendations
Evidence
|
What this PR does / why we need it:
On non-cloud platforms (Agent, KubeVirt, OpenStack, None), the HCP router Service was unconditionally created as LoadBalancer. Since these platforms lack a cloud LB controller, the Service stays pending forever, blocking InfrastructureReady and preventing the hosted control plane from becoming available.
Three code paths are fixed:
ReconcileRouterService: use NodePort for non-cloud platforms instead of unconditional LoadBalancerreconcileRouterServiceStatus: add NodePort guard before callingCollectLBMessageIfNotProvisionedkas.ReconcileServiceStatus: add NodePort guard in the Route caseWhich issue(s) this PR fixes:
Fixes OCPBUGS-77856
Special notes for your reviewer:
LoadBalancerSourceRangeson NodePort services is harmless — Kubernetes ignores this field for non-LoadBalancer types.etcd-upload/s3_uploader_mock.gobuild failure in CI is a pre-existing issue on main, unrelated to this PR.Checklist:
Summary by CodeRabbit
Bug Fixes
Tests