Skip to content

RFE-9138: feat: label some hosted control plane services#8298

Open
ingvagabund wants to merge 1 commit into
openshift:mainfrom
ingvagabund:label-some-services
Open

RFE-9138: feat: label some hosted control plane services#8298
ingvagabund wants to merge 1 commit into
openshift:mainfrom
ingvagabund:label-some-services

Conversation

@ingvagabund
Copy link
Copy Markdown
Member

@ingvagabund ingvagabund commented Apr 21, 2026

What this PR does / why we need it:

Not every hosted control plane service is labeled. Which makes it impossible to list a service or a list of service using a label selector. E.g. when a Metal load balancer needs to select the Hosted Clusters Services for the IPAddressPool.

The service labeling is aligned with the corresponding component labels:

  • oauth-openshift: app: oauth-openshift (from the component's Deployment spec)
  • konnectivity-server: app: konnectivity-agent (from the component's DS spec)
  • ignition-server: app: ignition-server (from the component's Deployment spec)

Which issue(s) this PR fixes:

Fixes

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Chores
    • Added metadata labels to several Kubernetes Service resources to improve service identification (OAuth server, konnectivity server, ignition-server).
  • Tests
    • Updated unit test expectations to include the new Service metadata labels.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 21, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 21, 2026

@ingvagabund: This pull request references RFE-9138 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the feature request to target the "5.0.0" version, but no target version was set.

Details

In response to this:

To allow the services to be listed through a label selector. The service labeling is aligned with the corresponding component labels:

  • oauth-openshift: app: oauth-openshift (from the component's Deployment spec)
  • konnectivity-server: app: konnectivity-agent (from the component's DS spec)
  • ignition-server: app: ignition-server (from the component's Deployment spec)

What this PR does / why we need it:

Not every hosted control plane service is labeled. Which makes it impossible to list a service or a list of service using a label selector. E.g. when a Metal load balancer needs to select the Hosted Clusters Services for the IPAddressPool.

Which issue(s) this PR fixes:

Fixes

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-area labels Apr 21, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: e3482b73-f96b-4cc8-8b44-c0317819fb92

📥 Commits

Reviewing files that changed from the base of the PR and between 80c768b and a692ea4.

⛔ Files ignored due to path filters (16)
  • control-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_ARO_Route_SharedIngress.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_ARO_Route_SharedIngress_And_Swift.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_AWS_Private_KAS_LoadBalancer.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_AWS_Private_Route.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_AWS_PublicAndPrivate_KAS_LoadBalancer.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_AWS_PublicAndPrivate_Route.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_AWS_Public_KAS_LoadBalancer.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_AWS_Public_Route.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_Azure_Private_KAS_LoadBalancer.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_Azure_Private_OAuth_LoadBalancer.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_When_Azure_self_managed_cluster_has_KAS_Route_with_hostname__it_should_need_an_external_router.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/ignition-server/AROSwift/zz_fixture_TestControlPlaneComponents_ignition_server_service.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/ignition-server/GCP/zz_fixture_TestControlPlaneComponents_ignition_server_service.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/ignition-server/IBMCloud/zz_fixture_TestControlPlaneComponents_ignition_server_service.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/ignition-server/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_ignition_server_service.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/ignition-server/zz_fixture_TestControlPlaneComponents_ignition_server_service.yaml is excluded by !**/testdata/**
📒 Files selected for processing (3)
  • control-plane-operator/controllers/hostedcontrolplane/infra/infra_test.go
  • control-plane-operator/controllers/hostedcontrolplane/manifests/infra.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/assets/ignition-server/service.yaml
✅ Files skipped from review due to trivial changes (1)
  • control-plane-operator/controllers/hostedcontrolplane/v2/assets/ignition-server/service.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • control-plane-operator/controllers/hostedcontrolplane/manifests/infra.go
  • control-plane-operator/controllers/hostedcontrolplane/infra/infra_test.go

📝 Walkthrough

Walkthrough

The changes add metadata labels to three Kubernetes Service resources. In control-plane-operator/manifests/infra.go, OAuth and Konnectivity Services now include labels app: oauth-openshift and app: konnectivity-server respectively. The ignition-server Service YAML asset gains metadata.labels.app: ignition-server. The OAuth Service unit test was updated to expect the new app: oauth-openshift label. No other fields or control flow were modified.

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: adding labels to hosted control plane services (oauth-openshift, konnectivity-server, and ignition-server).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR uses standard Go testing (func Test*), not Ginkgo. No Ginkgo test framework declarations (It, Describe, Context, When) found in modified files. Custom check for Ginkgo test names is not applicable.
Test Structure And Quality ✅ Passed Check not applicable. PR modifies standard Go tests (*testing.T), not Ginkgo tests (It blocks). Custom check requires reviewing Ginkgo code.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests added. PR only adds labels to Service manifests and updates unit test fixtures. Custom check for MicroShift e2e test compatibility is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR does not add new Ginkgo e2e tests. Changes are only Service label additions and unit test fixture updates. The custom check's trigger condition is not met.
Topology-Aware Scheduling Compatibility ✅ Passed Adds labels to Service objects only. No affinity, nodeSelector, topology constraints, or pod scheduling changes introduced. Service metadata does not affect pod scheduling across topologies.
Ote Binary Stdout Contract ✅ Passed OTE Binary Stdout Contract check not applicable. PR only modifies Kubernetes service manifests and their test fixtures using standard Go testing package, not OTE binaries or framework.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR does not add new Ginkgo e2e tests. Changes only add labels to Service manifests and update existing unit test fixtures. Custom check applies only to Ginkgo e2e tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from clebs and devguyio April 21, 2026 10:53
@openshift-ci openshift-ci Bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release and removed do-not-merge/needs-area labels Apr 21, 2026
@ingvagabund ingvagabund marked this pull request as draft April 21, 2026 10:55
@ingvagabund
Copy link
Copy Markdown
Member Author

/retest-required

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@control-plane-operator/controllers/hostedcontrolplane/manifests/infra.go`:
- Around line 144-146: The KonnectivityServerService resource's metadata Labels
map currently uses "app: konnectivity-agent" which mismatches the service
selector produced by kasLabels(); update the Labels in the
KonnectivityServerService definition (infra.go, the Labels map in the service
metadata) to use "app: kube-apiserver" so the resource labels reflect the actual
selector used by kasLabels() and remain consistent at reconciliation time.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: bab30a8d-90ad-4b47-8043-248f6a5a7bfc

📥 Commits

Reviewing files that changed from the base of the PR and between d0a4024 and b39164b.

📒 Files selected for processing (2)
  • control-plane-operator/controllers/hostedcontrolplane/manifests/infra.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/assets/ignition-server/service.yaml

Comment on lines +144 to +146
Labels: map[string]string{
"app": "konnectivity-agent",
},
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify konnectivity service selector and deployment labels

echo "=== Searching for konnectivity-server service full specification ==="
rg -n -A 20 "func.*KonnectivityServerService" --type=go

echo -e "\n=== Searching for konnectivity service selector configuration ==="
rg -n "konnectivity.*selector" --type=go --type=yaml -C 3

echo -e "\n=== Searching for konnectivity-server deployment/daemonset ==="
fd -e yaml -e go "konnectivity-server" -x echo "File: {}" \; -x head -30 {}

echo -e "\n=== Searching for konnectivity-agent deployment/daemonset ==="
fd -e yaml -e go "konnectivity-agent" -x echo "File: {}" \; -x head -30 {}

echo -e "\n=== Checking if service selector gets populated elsewhere ==="
rg -n "KonnectivityServerService.*Spec.*Selector" --type=go -C 5

Repository: openshift/hypershift

Length of output: 10600


🏁 Script executed:

#!/bin/bash
# Find the kasLabels() function definition and context

echo "=== Searching for kasLabels() function definition ==="
rg -n "func kasLabels" --type=go -A 5

echo -e "\n=== Checking kas/service.go for context around selector assignment ==="
rg -n "svc.Spec.Selector = kasLabels" --type=go -B 3 -A 5

echo -e "\n=== Searching for kasLabels usage in other services ==="
rg -n "kasLabels()" --type=go -B 2 -A 2

echo -e "\n=== Checking if metadata labels are used elsewhere ==="
rg -n "konnectivity-agent" --type=go -C 3

Repository: openshift/hypershift

Length of output: 47414


Change metadata labels to match the actual service selector.

The KonnectivityServerService has metadata labels set to app: konnectivity-agent, but the service selector is set to app: kube-apiserver (via kasLabels() in the reconciler at kas/service.go:281). This mismatch is misleading—the metadata labels should reflect what the selector actually targets. Update the labels in infra.go lines 144-145 to app: kube-apiserver for consistency with the selector set at reconciliation time.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@control-plane-operator/controllers/hostedcontrolplane/manifests/infra.go`
around lines 144 - 146, The KonnectivityServerService resource's metadata Labels
map currently uses "app: konnectivity-agent" which mismatches the service
selector produced by kasLabels(); update the Labels in the
KonnectivityServerService definition (infra.go, the Labels map in the service
metadata) to use "app: kube-apiserver" so the resource labels reflect the actual
selector used by kasLabels() and remain consistent at reconciliation time.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is invalid. The label used to identify the service is different from the selector it has to select specific pods.
It does make sense to have a label to find the service and it does not change its current behavior when routing to pods with label app: kube-apiserver.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clebs, you're absolutely right, and I apologize for the confusion. I incorrectly conflated two distinct Kubernetes concepts:

  • metadata.labels — labels on the Service object itself, used by external selectors (like a Metal LB IPAddressPool) to find the Service.
  • spec.selector — used by the Service to route traffic to matching pods.

The label app: konnectivity-agent on the Service metadata is entirely independent of the spec.selector (set to app: kube-apiserver via kasLabels()). Adding the metadata label does not affect pod routing at all. The PR's intent is valid and the implementation is correct.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

@ingvagabund ingvagabund force-pushed the label-some-services branch 2 times, most recently from 3e8a572 to 80c768b Compare April 21, 2026 12:05
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 39.39%. Comparing base (5c06422) to head (a692ea4).
⚠️ Report is 348 commits behind head on main.

Files with missing lines Patch % Lines
.../controllers/hostedcontrolplane/manifests/infra.go 0.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8298      +/-   ##
==========================================
+ Coverage   35.70%   39.39%   +3.69%     
==========================================
  Files         767      773       +6     
  Lines       93401    94678    +1277     
==========================================
+ Hits        33353    37303    +3950     
+ Misses      57346    54676    -2670     
+ Partials     2702     2699       -3     
Files with missing lines Coverage Δ
.../controllers/hostedcontrolplane/manifests/infra.go 0.00% <0.00%> (ø)

... and 165 files with indirect coverage changes

Flag Coverage Δ
cmd-support 32.53% <ø> (?)
cpo-hostedcontrolplane 40.57% <0.00%> (?)
cpo-other 40.14% <ø> (?)
hypershift-operator 50.61% <ø> (?)
other 31.54% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ingvagabund ingvagabund marked this pull request as ready for review April 21, 2026 14:36
@openshift-ci openshift-ci Bot requested review from jparrill and sjenning April 21, 2026 14:37
@ingvagabund
Copy link
Copy Markdown
Member Author

/retest-required

@clebs
Copy link
Copy Markdown
Member

clebs commented Apr 23, 2026

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 23, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@clebs
Copy link
Copy Markdown
Member

clebs commented Apr 23, 2026

/unassign
/assign @devguyio

@openshift-ci openshift-ci Bot assigned devguyio and unassigned clebs Apr 23, 2026
@ingvagabund ingvagabund changed the title wip: RFE-9138: feat: label some hosted control plane services RFE-9138: feat: label some hosted control plane services Apr 23, 2026
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 23, 2026
@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-azure-self-managed | Build: 2047239919058817024 | Cost: $3.5705067499999994 | Failed step: hypershift-azure-run-e2e-self-managed

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2047239918689718272 | Cost: $1.9215299499999998 | Failed step: hypershift-aws-run-e2e-nested

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@cwbotbot
Copy link
Copy Markdown

cwbotbot commented Apr 23, 2026

Test Results

e2e-aks

Failed Tests

Total failed tests: 18

  • TestAutoscaling
  • TestAutoscaling/ValidateHostedCluster
  • TestAzureScheduler
  • TestAzureScheduler/ValidateHostedCluster
  • TestCreateCluster

... and 13 more failed tests

e2e-aws

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aks | Build: 2047239918647775232 | Cost: $2.3195519000000004 | Failed step: hypershift-azure-run-e2e

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@ingvagabund
Copy link
Copy Markdown
Member Author

@clebs thank you for the review. Is there a CI job from which we can validate the service labels are present in the rendered objects?

@clebs
Copy link
Copy Markdown
Member

clebs commented Apr 27, 2026

@clebs thank you for the review. Is there a CI job from which we can validate the service labels are present in the rendered objects?

Hi @ingvagabund , I think this is already checked by unit tests given the fixtures you updated with the labels. If the resulting manifests differ from the fixtures unit tests would fail.

@clebs
Copy link
Copy Markdown
Member

clebs commented Apr 27, 2026

/retest

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-azure-self-managed | Build: 2048722208833933312 | Cost: $2.6430307499999994 | Failed step: hypershift-azure-run-e2e-self-managed

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@ingvagabund
Copy link
Copy Markdown
Member Author

/retest-required

return &corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: konnectivityServerServiceName,
Namespace: hostedClusterNamespace,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This Service (konnectivity-server) selects KAS pods via kasLabels() — the konnectivity server runs as a sidecar in the KAS pod. Labeling it app: konnectivity-agent is misleading since the agent is the client that connects to this Service, not the backend it exposes. Should this be app: konnectivity-server instead?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objection here. I will update the PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

To allow the services to be listed through a label selector.
The service labeling is aligned with the corresponding component labels:
- oauth-openshift: app: oauth-openshift (from the component's Deployment spec)
- konnectivity-server: app: konnectivity-server
- ignition-server: app: ignition-server (from the component's Deployment spec)
@ingvagabund ingvagabund force-pushed the label-some-services branch from 80c768b to a692ea4 Compare May 19, 2026 16:53
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label May 19, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 19, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ingvagabund
Once this PR has been reviewed and has the lgtm label, please ask for approval from devguyio. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ingvagabund
Copy link
Copy Markdown
Member Author

@clebs anyone who can approve the PR?

@clebs
Copy link
Copy Markdown
Member

clebs commented May 19, 2026

/lgtm

@clebs
Copy link
Copy Markdown
Member

clebs commented May 19, 2026

@ingvagabund according to @openshift-ci-robot, the chosen one to approve is @devguyio

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 19, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

@ingvagabund
Copy link
Copy Markdown
Member Author

/retest-required

1 similar comment
@ingvagabund
Copy link
Copy Markdown
Member Author

/retest-required

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@ingvagabund: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws a692ea4 link true /test e2e-aws

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hypershift-jira-solve-ci
Copy link
Copy Markdown

hypershift-jira-solve-ci Bot commented May 21, 2026

Now I have a complete picture of all four failures. Let me compile the final report.

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

fixture.go:333: Failed to wait for infra resources in guest cluster to be deleted: context deadline exceeded
fixture.go:340: Failed to clean up 3 remaining resources for guest cluster:
  - 2 EC2 EBS volumes (attached to worker nodes private-r7fl4-us-east-1b-rgwxf-frklt and private-r7fl4-us-east-1b-rgwxf-chdhv)
  - 1 NLB load balancer (openshift-ingress/router-default)

Summary

The TestCreateClusterPrivate test passed all functional sub-tests (cluster creation, private→public switching, validation, metrics, network policies) but failed during Teardown when the infrastructure controller timed out waiting for 3 AWS resources (2 EBS volumes and 1 NLB) to be deleted. This is a known flaky teardown race condition unrelated to the PR's changes, which only add app labels to Service manifests for oauth-openshift, konnectivity-server, and ignition-server. Three other non-Prow checks also failed: codecov/patch due to a pre-existing 0% file coverage gap in manifests/infra.go (348 commits behind baseline), and two Konflux Enterprise Contract checks due to 2/280 supply-chain policy failures on the container image snapshot—also unrelated to source code changes.

Root Cause

e2e-aws (Prow): Flaky AWS infrastructure teardown timeout

The test framework's Teardown sub-test waits for the hosted cluster's guest infrastructure resources to be garbage-collected by the HyperShift infrastructure controller. In this run, 3 AWS resources were not released within the timeout:

  1. 2 EBS volumes — Attached to EC2 worker node instances (private-r7fl4-us-east-1b-rgwxf-frklt and private-r7fl4-us-east-1b-rgwxf-chdhv). These volumes could not be deleted because the EC2 instances had not fully terminated yet — a known race condition in the AWS detach-then-delete sequence.
  2. 1 NLB load balancer — Tied to openshift-ingress/router-default. The load balancer was still draining connections when the cleanup deadline expired.

The CLI-side destroy.log shows VPC, subnets, security groups, IAM roles, and DNS zones were all cleaned up successfully — confirming this is purely an AWS eventual-consistency timing issue, not a functional regression.

Cross-run evidence confirms flakiness: This PR has had 4 e2e-aws runs — Run 2 passed fully; Runs 1, 3, and 4 each failed on different teardown tests (TestCreateCluster, TestKarpenter, TestCreateClusterPrivate), the hallmark pattern of infrastructure flakiness.

codecov/patch: Pre-existing coverage gap

The manifests/infra.go file has 0% test coverage — none of the manifest constructor functions (OauthServerService(), KonnectivityServerService(), etc.) are exercised by unit tests. The PR added 6 lines to these constructors, all uncovered. The codecov baseline is 348 commits behind main, making the threshold comparison stale. Overall project coverage actually increased by +3.69%.

Konflux Enterprise Contract (both checks): Image policy failure

Both Konflux checks ran against the same container image snapshot (hypershift-operator-20260519-165311-000) and produced identical results: 254 pass, 24 warn, 2 fail out of 280 policy checks. These checks validate container image signing, provenance, and SLSA compliance — they do not inspect Go source code or Kubernetes labels. The same checks show NEUTRAL on neighboring PRs and main, indicating a transient issue with the specific image build.

Recommendations
  1. e2e-aws: Re-trigger with /retest e2e-aws. The failure is a flaky teardown timeout unrelated to the PR's label additions. All functional test sub-tests passed.

  2. codecov/patch: This can be safely ignored. The manifests/infra.go file has 0% coverage pre-existing. If desired, add unit tests for OauthServerService() and KonnectivityServerService() constructors in a follow-up PR, but this is not a regression introduced by this PR.

  3. Konflux Enterprise Contract (both): Retry the Konflux pipeline. The 2 policy failures affect the container image snapshot, not source code. If failures persist, investigate the Konflux pipeline run logs for which specific EC rules are failing.

  4. No code changes needed: The PR's changes (adding app labels to 3 Service manifests and updating corresponding test fixtures) are correct and all functional tests validated them successfully.

Evidence
Evidence Detail
Failed test TestCreateClusterPrivate/Teardown — context deadline exceeded waiting for 3 AWS resources
Leaked AWS resources 2 EBS volumes (worker nodes), 1 NLB (router-default)
All functional sub-tests PASSED — ValidateHostedCluster, SwitchFromPrivateToPublic, SwitchFromPublicToPrivate, EnsureNoCrashingPods, all EnsureHostedCluster sub-tests
PR scope Adds app labels to oauth-openshift, konnectivity-server, ignition-server Services + test fixture updates — none of these are involved in the teardown failure
Cross-run pattern Run 2: ✅ PASS; Run 1: ❌ different tests; Run 3: ❌ TestKarpenter/Teardown; Run 4: ❌ TestCreateClusterPrivate/Teardown — classic flaky teardown pattern
codecov/patch 0% patch coverage on 6 new lines in manifests/infra.go (file has 0% total coverage, baseline 348 commits stale)
Konflux EC (both) 2/280 policy failures on image snapshot hypershift-operator-20260519-165311-000 — supply-chain policy, not source code
Build log destroy.log confirms CLI-side cleanup succeeded (VPC, subnets, SGs, IAM, DNS all removed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants