Skip to content

OCPBUGS-54790: Move packageserver PDB from guest cluster to management cluster#8459

Open
dhgautam99 wants to merge 2 commits into
openshift:mainfrom
dhgautam99:remove-packageserver-pdb-from-guest
Open

OCPBUGS-54790: Move packageserver PDB from guest cluster to management cluster#8459
dhgautam99 wants to merge 2 commits into
openshift:mainfrom
dhgautam99:remove-packageserver-pdb-from-guest

Conversation

@dhgautam99
Copy link
Copy Markdown

@dhgautam99 dhgautam99 commented May 7, 2026

What this PR does / why we need it:

The packageserver PodDisruptionBudget was being created in the guest cluster's
openshift-operator-lifecycle-manager namespace by CVO. However, packageserver
pods run on the management cluster in the clusters-<hosted-cluster> namespace,
making the guest cluster PDB ineffective.

This PR:

  • Prevents CVO from creating the packageserver PDB in the guest cluster (via manifestsToOmit)
  • Cleans up the orphaned PDB on existing clusters during upgrade (via resourcesToRemove)
  • Creates the PDB in the management cluster namespace using the cpov2 framework

Which issue(s) this PR fixes:

Fixes OCPBUGS-54790

Special notes for your reviewer:

The PDB cleanup applies to all platforms (both IBM/PowerVS and default) since
packageserver runs on the management cluster regardless of platform.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Improvements

    • Added availability protection for the packageserver component (PodDisruptionBudget) to reduce service disruption during maintenance and ensure consistent behavior across platforms.
  • Tests

    • Added a unit test validating the packageserver component is constructed correctly.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 7, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@dhgautam99: This pull request references Jira Issue OCPBUGS-54790, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

What this PR does / why we need it:

The packageserver PodDisruptionBudget was being created in the guest cluster's
openshift-operator-lifecycle-manager namespace by CVO. However, packageserver
pods run on the management cluster in the clusters-<hosted-cluster> namespace,
making the guest cluster PDB ineffective.

This PR:

  • Prevents CVO from creating the packageserver PDB in the guest cluster (via manifestsToOmit)
  • Cleans up the orphaned PDB on existing clusters during upgrade (via resourcesToRemove)
  • Creates the PDB in the management cluster namespace using the cpov2 framework

Which issue(s) this PR fixes:

Fixes OCPBUGS-54790

Special notes for your reviewer:

The PDB cleanup applies to all platforms (both IBM/PowerVS and default) since
packageserver runs on the management cluster regardless of platform.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 7, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

📝 Walkthrough

Walkthrough

A PodDisruptionBudget manifest for the packageserver workload was added (packageserver-pdb with minAvailable: 1). The packageserver component now adapts this PDB manifest and a unit test for NewComponent was added. The CVO deployment code imports policy/v1 and updates its resourcesToRemove logic to include the packageserver-pdb in the cleanup list for the modified platform branches.

Sequence Diagram(s)

sequenceDiagram
    participant Operator as HostedControlPlane Operator
    participant Component as packageserver Component
    participant CVO as CVO / payload generator
    participant Kube as Kubernetes API

    Operator->>Component: NewComponent()
    Component->>Component: WithManifestAdapter(pdb.yaml -> AdaptPodDisruptionBudget)
    Operator->>CVO: preparePayload / resourcesToRemove
    CVO->>CVO: include packageserver-pdb in resourcesToRemove
    CVO->>Kube: omit pdb from generated payload (when applicable)
    Operator->>Kube: apply manifests (ensure packageserver-pdb exists for cleanup)
Loading
🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Test lacks meaningful failure messages on lines 14-15. Assertions should include context like: g.Expect(component).ToNot(BeNil(), "NewComponent should return a non-nil component") Add descriptive failure messages to Expect assertions per check requirement: g.Expect(component).ToNot(BeNil(), "...") and g.Expect(component.Name()).To(Equal("packageserver"), "...") to match codebase patterns.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and clearly describes the main change: moving the packageserver PodDisruptionBudget from the guest cluster to the management cluster, which aligns perfectly with the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The test added (TestNewComponent) uses standard Go testing, not Ginkgo. The test name is static, descriptive, and deterministic—containing no dynamic information that changes between runs.
Microshift Test Compatibility ✅ Passed PR contains only standard Go unit tests, not Ginkgo e2e tests, so the MicroShift compatibility check does not apply.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests added in this PR. The new test is a standard Go unit test (TestNewComponent) using Gomega assertions, not Ginkgo patterns. Custom check scope is limited to Ginkgo e2e tests.
Topology-Aware Scheduling Compatibility ✅ Passed PDB uses topology-aware AdaptPodDisruptionBudget() which adjusts per ControllerAvailabilityPolicy: SingleReplica sets minAvailable=1, HighlyAvailable sets maxUnavailable=1.
Ote Binary Stdout Contract ✅ Passed PR introduces no process-level stdout writes: no main(), init(), TestMain(), BeforeSuite/AfterSuite functions; no fmt.Print/log/klog calls; test is properly isolated using Gomega assertions.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests added. Only a standard Go unit test (component_test.go) using testing.T and Gomega. Check applies only to Ginkgo-style e2e tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release and removed do-not-merge/needs-area labels May 7, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@dhgautam99: This pull request references Jira Issue OCPBUGS-54790, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

What this PR does / why we need it:

The packageserver PodDisruptionBudget was being created in the guest cluster's
openshift-operator-lifecycle-manager namespace by CVO. However, packageserver
pods run on the management cluster in the clusters-<hosted-cluster> namespace,
making the guest cluster PDB ineffective.

This PR:

  • Prevents CVO from creating the packageserver PDB in the guest cluster (via manifestsToOmit)
  • Cleans up the orphaned PDB on existing clusters during upgrade (via resourcesToRemove)
  • Creates the PDB in the management cluster namespace using the cpov2 framework

Which issue(s) this PR fixes:

Fixes OCPBUGS-54790

Special notes for your reviewer:

The PDB cleanup applies to all platforms (both IBM/PowerVS and default) since
packageserver runs on the management cluster regardless of platform.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Improvements
  • Enhanced availability protection for the packageserver component to minimize service disruptions during cluster maintenance operations.
  • Improved component configuration management for operational consistency and platform-specific handling.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.36%. Comparing base (2fc8a13) to head (86fec73).
⚠️ Report is 141 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8459      +/-   ##
==========================================
+ Coverage   37.50%   40.36%   +2.85%     
==========================================
  Files         751      755       +4     
  Lines       91992    93173    +1181     
==========================================
+ Hits        34505    37606    +3101     
+ Misses      54844    52864    -1980     
- Partials     2643     2703      +60     
Files with missing lines Coverage Δ
...ontrollers/hostedcontrolplane/v2/cvo/deployment.go 41.17% <100.00%> (+0.58%) ⬆️
...stedcontrolplane/v2/olm/packageserver/component.go 73.91% <100.00%> (+73.91%) ⬆️

... and 70 files with indirect coverage changes

Flag Coverage Δ
cmd-support 34.30% <ø> (+1.61%) ⬆️
cpo-hostedcontrolplane 41.86% <100.00%> (+5.09%) ⬆️
cpo-other 40.14% <ø> (+2.41%) ⬆️
hypershift-operator 50.72% <ø> (+2.79%) ⬆️
other 31.54% <ø> (+3.76%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dhgautam99 dhgautam99 force-pushed the remove-packageserver-pdb-from-guest branch from 46bdd14 to ffe96be Compare May 7, 2026 13:51
@dhgautam99 dhgautam99 marked this pull request as ready for review May 7, 2026 14:10
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2026
@openshift-ci openshift-ci Bot requested review from cblecker and enxebre May 7, 2026 14:12
@cblecker
Copy link
Copy Markdown
Member

cblecker commented May 7, 2026

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 7, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@cwbotbot
Copy link
Copy Markdown

cwbotbot commented May 7, 2026

Test Results

e2e-aws

e2e-aks

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-azure-self-managed | Build: 2052414791430967296 | Cost: $3.95796535 | Failed step: hypershift-azure-run-e2e-self-managed

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2052414791355469824 | Cost: $3.2685650000000006 | Failed step: hypershift-aws-run-e2e-nested

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@dhgautam99
Copy link
Copy Markdown
Author

/test e2e-aws
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2052722353925787648 | Cost: $2.8874605499999997 | Failed step: hypershift-aws-run-e2e-nested

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@dhgautam99
Copy link
Copy Markdown
Author

/test e2e-aws

"0000_50_olm_07-collect-profiles.cronjob.yaml",
"0000_50_olm_08-catalog-operator.deployment.ibm-cloud-managed.yaml",
"0000_50_olm_08-catalog-operator.deployment.yaml",
"0000_50_olm_00-packageserver.pdb.yaml",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

…ster

The packageserver PDB was being created in the guest cluster's
openshift-operator-lifecycle-manager namespace by CVO, but packageserver
pods run on the management cluster in the clusters-<hosted-cluster>
namespace. This moves the PDB to the correct location.

- Add packageserver PDB manifest to manifestsToOmit to prevent CVO from
  creating it in guest clusters
- Add packageserver-pdb to resourcesToRemove for all platforms to clean
  up the orphaned PDB on existing clusters during upgrade
- Register PDB manifest adapter in packageserver component to create the
  PDB in the management cluster namespace
Regenerate CVO deployment and packageserver component test fixtures
to reflect the packageserver PDB being omitted from the guest cluster
CVO payload and added to the management cluster namespace.
@dhgautam99 dhgautam99 force-pushed the remove-packageserver-pdb-from-guest branch from ffe96be to 86fec73 Compare May 20, 2026 06:40
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label May 20, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

New changes are detected. LGTM label has been removed.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dhgautam99
Once this PR has been reviewed and has the lgtm label, please ask for approval from cblecker. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
control-plane-operator/controllers/hostedcontrolplane/v2/olm/packageserver/component.go (1)

9-9: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Test name should follow "When...it should..." pattern.

As per coding guidelines, unit tests should use the format "When ... it should ..." for test case descriptions.

📝 Suggested fix
-func TestNewComponent(t *testing.T) {
+func TestNewComponent_WhenCreatingComponent_ItShouldReturnValidPackageserverComponent(t *testing.T) {

As per coding guidelines: "Always use 'When ... it should ...' format for describing test cases when creating unit tests".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/controllers/hostedcontrolplane/v2/olm/packageserver/component.go`
at line 9, Test descriptions in the OLM package server tests must follow the
"When ... it should ..." pattern; locate any test declarations (e.g.,
t.Run("..."), It("..."), or DescribeTable entries) related to the PackageServer
component and rename their string descriptions to the form "When <condition> it
should <expected outcome>" (for example change "validates X" to "When X it
should validate Y"). Ensure all changed descriptions remain clear and update any
related test helpers or snapshots that assert on the test name.
🧹 Nitpick comments (1)
control-plane-operator/controllers/hostedcontrolplane/v2/olm/packageserver/component_test.go (1)

9-9: ⚡ Quick win

Rename test to follow Gherkin "When... it should..." pattern.

The test name should follow the Gherkin syntax pattern as specified in the coding guidelines. Consider renaming to something like TestNewComponent_WhenCalled_ItShouldReturnComponentNamedPackageserver to align with project standards.

As per coding guidelines, "Always use 'When ... it should ...' format for describing test cases when creating unit tests."

♻️ Proposed fix for test naming
-func TestNewComponent(t *testing.T) {
+func TestNewComponent_WhenCalled_ItShouldReturnComponentNamedPackageserver(t *testing.T) {
 	t.Parallel()
 	g := NewWithT(t)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/controllers/hostedcontrolplane/v2/olm/packageserver/component_test.go`
at line 9, Rename the unit test function TestNewComponent to follow the Gherkin
"When... it should..." pattern; update the function name to something like
TestNewComponent_WhenCalled_ItShouldReturnComponentNamedPackageserver and adjust
any references or test runners accordingly so the test still compiles and
executes (look for the TestNewComponent function in component_test.go and rename
it consistently).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go`:
- Line 270: Add the packageserver PodDisruptionBudget manifest to the
manifestsToOmit list so it is never deployed (rather than created then removed);
specifically append the PodDisruptionBudget entry for Name="packageserver-pdb",
Namespace="openshift-operator-lifecycle-manager" (the same object literal used
in the diff: &policyv1.PodDisruptionBudget{ObjectMeta: metav1.ObjectMeta{Name:
"packageserver-pdb", Namespace: "openshift-operator-lifecycle-manager"}}) into
the manifestsToOmit array where other omitted manifests are listed (look for the
manifestsToOmit slice/variable in this file) so the CVO will skip deploying that
PDB on new clusters.

---

Outside diff comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/olm/packageserver/component.go`:
- Line 9: Test descriptions in the OLM package server tests must follow the
"When ... it should ..." pattern; locate any test declarations (e.g.,
t.Run("..."), It("..."), or DescribeTable entries) related to the PackageServer
component and rename their string descriptions to the form "When <condition> it
should <expected outcome>" (for example change "validates X" to "When X it
should validate Y"). Ensure all changed descriptions remain clear and update any
related test helpers or snapshots that assert on the test name.

---

Nitpick comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/olm/packageserver/component_test.go`:
- Line 9: Rename the unit test function TestNewComponent to follow the Gherkin
"When... it should..." pattern; update the function name to something like
TestNewComponent_WhenCalled_ItShouldReturnComponentNamedPackageserver and adjust
any references or test runners accordingly so the test still compiles and
executes (look for the TestNewComponent function in component_test.go and rename
it consistently).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 3c0b9051-33db-4f0b-b4b2-c45a059fff33

📥 Commits

Reviewing files that changed from the base of the PR and between ffe96be and 86fec73.

⛔ Files ignored due to path filters (15)
  • control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator/AROSwift/zz_fixture_TestControlPlaneComponents_cluster_version_operator_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator/GCP/zz_fixture_TestControlPlaneComponents_cluster_version_operator_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator/IBMCloud/zz_fixture_TestControlPlaneComponents_cluster_version_operator_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_cluster_version_operator_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator/zz_fixture_TestControlPlaneComponents_cluster_version_operator_deployment.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/AROSwift/zz_fixture_TestControlPlaneComponents_packageserver_controlplanecomponent.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/AROSwift/zz_fixture_TestControlPlaneComponents_packageserver_pdb_poddisruptionbudget.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/GCP/zz_fixture_TestControlPlaneComponents_packageserver_controlplanecomponent.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/GCP/zz_fixture_TestControlPlaneComponents_packageserver_pdb_poddisruptionbudget.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/IBMCloud/zz_fixture_TestControlPlaneComponents_packageserver_controlplanecomponent.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/IBMCloud/zz_fixture_TestControlPlaneComponents_packageserver_pdb_poddisruptionbudget.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_packageserver_controlplanecomponent.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/TechPreviewNoUpgrade/zz_fixture_TestControlPlaneComponents_packageserver_pdb_poddisruptionbudget.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/zz_fixture_TestControlPlaneComponents_packageserver_controlplanecomponent.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/testdata/packageserver/zz_fixture_TestControlPlaneComponents_packageserver_pdb_poddisruptionbudget.yaml is excluded by !**/testdata/**
📒 Files selected for processing (4)
  • control-plane-operator/controllers/hostedcontrolplane/v2/assets/packageserver/pdb.yaml
  • control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/olm/packageserver/component.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/olm/packageserver/component_test.go
✅ Files skipped from review due to trivial changes (1)
  • control-plane-operator/controllers/hostedcontrolplane/v2/assets/packageserver/pdb.yaml

switch platformType {
case hyperv1.IBMCloudPlatform, hyperv1.PowerVSPlatform:
return []client.Object{
&policyv1.PodDisruptionBudget{ObjectMeta: metav1.ObjectMeta{Name: "packageserver-pdb", Namespace: "openshift-operator-lifecycle-manager"}},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Search for packageserver PDB manifest in the codebase and verify manifestsToOmit contains it.

# Search for packageserver PDB manifest file
echo "=== Searching for packageserver PDB manifest ==="
fd -e yaml -e yml . | rg -i 'packageserver.*pdb|pdb.*packageserver'

# Check if it's referenced in manifestsToOmit
echo -e "\n=== Checking manifestsToOmit array ==="
rg -A 60 'manifestsToOmit.*=.*\[\]string' control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go | rg -i 'packageserver.*pdb'

# Search for the actual manifest name in CVO payload or OLM manifests
echo -e "\n=== Searching for OLM packageserver manifests ==="
rg -l '0000_50_olm.*packageserver' --type yaml

echo -e "\n=== Checking if PDB manifest name follows OLM pattern ==="
rg 'kind:\s*PodDisruptionBudget' --type yaml -A 5 -B 5 | rg -i packageserver

Repository: openshift/hypershift

Length of output: 17195


🏁 Script executed:

cat -n control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go | sed -n '140,200p'

Repository: openshift/hypershift

Length of output: 3942


🏁 Script executed:

cat -n control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go | sed -n '265,290p'

Repository: openshift/hypershift

Length of output: 2747


🏁 Script executed:

# Search for packageserver PDB manifest and check if it exists with a standard OLM naming pattern
cd control-plane-operator/controllers/hostedcontrolplane/v2/assets/packageserver
ls -la
cat pdb.yaml

Repository: openshift/hypershift

Length of output: 464


🏁 Script executed:

# Check how packageserver PDB is loaded and deployed in deployment.go
# Look for references to pdb.yaml or how assets are loaded
rg -A 10 -B 5 'packageserver.*pdb|pdb.*packageserver' control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go

Repository: openshift/hypershift

Length of output: 2425


🏁 Script executed:

# Search for how packageserver component is structured and deployed
# Check the packageserver component code to understand asset handling
rg -l 'packageserver' control-plane-operator/controllers/hostedcontrolplane/v2/ --type go | head -5

Repository: openshift/hypershift

Length of output: 502


🏁 Script executed:

# Check the packageserver component to see how it deploys assets
cat -n control-plane-operator/controllers/hostedcontrolplane/v2/olm/packageserver/component.go | head -50

Repository: openshift/hypershift

Length of output: 1782


🏁 Script executed:

# The component.go shows pdb.yaml is deployed by hypershift packageserver component.
# Now check if there's a packageserver PDB from CVO payload that needs to be omitted.
# Search for OLM manifests with packageserver and pdb pattern

# First, let's check what the CVO payload looks like in test fixtures
rg 'kind: PodDisruptionBudget' control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator/ -A 5 -B 5 | grep -A 5 -B 5 packageserver

Repository: openshift/hypershift

Length of output: 11022


🏁 Script executed:

# Check the CVO deployment script to see what manifest filename the packageserver PDB has
# Extract more context from one of the CVO deployment test files
grep -n "0000_01_cleanup.yaml" control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator/zz_fixture_TestControlPlaneComponents_cluster_version_operator_deployment.yaml | head -1

Repository: openshift/hypershift

Length of output: 142


🏁 Script executed:

# Search for the actual packageserver PDB manifest file in the CVO payload
# Check what manifests are actually in the payload vs what's in cleanup
rg 'packageserver.*pdb|0000_.*packageserver.*pdb' control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator/zz_fixture_TestControlPlaneComponents_cluster_version_operator_deployment.yaml | head -20

Repository: openshift/hypershift

Length of output: 100


🏁 Script executed:

# Look at the preparePayloadScript function to understand how manifests are processed
# Search for the actual manifest list operations
rg -A 200 'func preparePayloadScript' control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go | grep -i 'packageserver\|manifest' | head -20

Repository: openshift/hypershift

Length of output: 1654


🏁 Script executed:

# Check what the actual cleanup script contains in the test fixtures
sed -n '190,250p' control-plane-operator/controllers/hostedcontrolplane/testdata/cluster-version-operator/zz_fixture_TestControlPlaneComponents_cluster_version_operator_deployment.yaml

Repository: openshift/hypershift

Length of output: 2468


🏁 Script executed:

# Check if there's an actual packageserver PDB manifest in the CVO release that should be omitted
# Look at the preparePayloadScript more carefully to see where release-manifests come from
sed -n '198,280p' control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go

Repository: openshift/hypershift

Length of output: 4453


🏁 Script executed:

# Look for any comments or references to packageserver PDB manifest naming
# and check if there's a specific manifest file for it
rg '0000_.*packageserver' --type yaml control-plane-operator/controllers/hostedcontrolplane/v2/

Repository: openshift/hypershift

Length of output: 46


🏁 Script executed:

# Check if there's a packageserver PDB manifest in the CVO payload by searching git history or release content
# First, let's check what files are being copied into the payload
rg 'release-manifests' control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go -B 5 -A 5 | head -30

Repository: openshift/hypershift

Length of output: 1720


🏁 Script executed:

# Check if there's any packageserver PDB manifest from OpenShift release in the test data
# Look for what manifests are in the release-manifests directory
find control-plane-operator/controllers/hostedcontrolplane/testdata -name "*.yaml" | xargs grep -l 'kind: PodDisruptionBudget' | xargs grep -l 'packageserver'

Repository: openshift/hypershift

Length of output: 2598


🏁 Script executed:

# Final verification: Check if there are any git changes or recent commits related to manifestsToOmit
git log --oneline -20 -- control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go 2>/dev/null || echo "Git log not available"

Repository: openshift/hypershift

Length of output: 136


Add packageserver PDB manifest to manifestsToOmit array.

The packageserver PDB exists in the CVO release payload (confirmed by resourcesToRemove cleanup), but is missing from the manifestsToOmit array. While resourcesToRemove will clean it up on upgrades, manifestsToOmit is needed to prevent the manifest from being deployed on new clusters in the first place. Without this entry, the CVO's packageserver PDB is unnecessarily created before being removed by the cleanup manifest, rather than being omitted from deployment initially.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go`
at line 270, Add the packageserver PodDisruptionBudget manifest to the
manifestsToOmit list so it is never deployed (rather than created then removed);
specifically append the PodDisruptionBudget entry for Name="packageserver-pdb",
Namespace="openshift-operator-lifecycle-manager" (the same object literal used
in the diff: &policyv1.PodDisruptionBudget{ObjectMeta: metav1.ObjectMeta{Name:
"packageserver-pdb", Namespace: "openshift-operator-lifecycle-manager"}}) into
the manifestsToOmit array where other omitted manifests are listed (look for the
manifestsToOmit slice/variable in this file) so the CVO will skip deploying that
PDB on new clusters.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@dhgautam99: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@dhgautam99
Copy link
Copy Markdown
Author

/retest

@hypershift-jira-solve-ci
Copy link
Copy Markdown

I now have a comprehensive understanding of the situation. Let me compile the final report.

Test Failure Analysis Complete

Job Information

  • Prow Job: Red Hat Konflux / hypershift-operator-main-enterprise-contract / hypershift-operator-main
  • Build ID: hypershift-operator-main-enterprise-contract-ksk2b (check run 77131750563)
  • Second Job: Red Hat Konflux / hypershift-operator-enterprise-contract / hypershift-operator-main
  • Second Build ID: hypershift-operator-enterprise-contract-2nq2x (check run 77131749131)
  • PR: #8459 — OCPBUGS-54790: Move packageserver PDB from guest cluster to management cluster
  • Commit: 86fec73
  • Snapshot: hypershift-operator-20260521-081421-000
  • EC Verify Results: 254 successes, 24 warnings, 2 failures (both checks show identical counts)

Test Failure Analysis

Error

Integration test for component hypershift-operator-main snapshot
hypershift-operator-20260521-081421-000 and scenario
hypershift-operator-main-enterprise-contract has failed

Task: verify | Status: FAILURE | 254 success(es), 24 warning(s), 2 failure(s)

Summary

These two Konflux Enterprise Contract (EC) failures are not caused by PR #8459. They are a pre-existing, intermittent infrastructure issue specific to the hypershift-operator-main component's EC policy validation. The same 2 failures (with identical counts: 254 pass / 24 warn / 2 fail) appear across 13+ unrelated open PRs dating back to May 19, including trivial dependency bumps (#8555) and unrelated code changes (#8540, #8548, #8550, #8553, #8554, #8556). Meanwhile, other PRs (#8560, #8561, #8562, #8563) running within minutes of this PR show the neutral (passing) result (256 pass / 8 warn / 0 fail). The build pipeline itself succeeded — all tasks (init, clone, build, Clair scan, ClamAV scan, Snyk SAST, etc.) completed successfully. Only the post-build EC verification verify task fails. These checks are non-blocking — PRs #8552 and #8557 merged successfully with the same checks at neutral status.

Root Cause

The 2 EC policy failures are a pre-existing, intermittent infrastructure issue in the Konflux Enterprise Contract verification pipeline — not caused by this PR's code changes.

Key evidence proving this is not PR-related:

  1. Identical failure pattern across unrelated PRs: At least 13 open PRs show the exact same failure signature (254/24/2), including PR build(deps): bump google.golang.org/api from 0.279.0 to 0.280.0 in the misc-dependencies group across 1 directory #8555 (a go.sum-only dependency bump) and PR ci(deps): bump idna from 3.10 to 3.15 in /hypershift-ci-python #8548 (a Python idna package bump). The PR's actual code change (moving a PodDisruptionBudget from guest to management cluster) cannot cause EC policy violations.

  2. Intermittent, not deterministic: PR OCPBUGS-86310: Handle CA bundle aggregation delay by requeuing revocation #8563 ran at 08:10 UTC and passed (neutral). PR OCPBUGS-54790: Move packageserver PDB from guest cluster to management cluster #8459 ran at 08:14 UTC — just 4 minutes later — and failed. Both built the same component (hypershift-operator-main) using the same Tekton pipeline, same base images (ubi9/go-toolset:1.25.9-1778054913 builder, ubi9/ubi-minimal:9.7-1777857961 runtime), and identical container labels.

  3. Component-specific, not systemic: The control-plane-operator-enterprise-contract check passes with success on the same PR, despite using the same build pipeline (.tekton/pipelines/common-operator-build.yaml) and the same base images. The enterprise-contract-mce-50 checks also pass with neutral.

  4. Rule count delta points to intermittent policy evaluation: Passing runs evaluate 264 rules (256 pass + 8 warn), while failing runs evaluate 280 rules (254 pass + 24 warn + 2 fail). The 16 additional rules in failing runs suggest the EC validator is intermittently evaluating additional policy rules (likely CVE-related or attestation-related checks) that don't always execute.

  5. Most likely root cause: Based on the pattern, the 2 intermittently failing rules are most likely CVE scan threshold checks (cve_results_found or cve.blocking_cve_check) where the Clair vulnerability database state intermittently produces different severity classifications for borderline CVEs in the hypershift-operator image's RPM packages. Alternatively, they could be test result attestation checks (test.no_test_warnings) sensitive to timing or build metadata.

  6. Cannot determine exact failing rule names: The Konflux UI (konflux-ui.apps.stone-prd-rh01.pg1f.p1.openshiftapps.com) requires authentication, preventing direct inspection of the EC violation details in the verify task logs.

Recommendations
  1. No action needed for this PR: The packageserver PDB changes are unrelated to the EC failures. All actual Prow CI tests and the Konflux build pipeline succeeded. The EC failures do not block merging.

  2. To merge this PR: Proceed with standard lgtm + approve workflow. The EC check failures are informational and non-blocking — recent PRs (OCPBUGS-86238: set limits for aro.openshift.io/swift-nic in request overrides for ARO swift #8552, NO-JIRA: Update Konflux Tekton task bundles #8557) merged successfully with the same checks at neutral status.

  3. To clear the EC failure (optional): Push an empty commit or retrigger the Konflux build. The failure is intermittent — a re-run has a reasonable chance of producing the neutral (passing) result, as demonstrated by PR OCPBUGS-86310: Handle CA bundle aggregation delay by requeuing revocation #8563 passing just 4 minutes before this PR's check ran.

  4. For the Konflux/build infrastructure team: Investigate the intermittent 2-rule failure in the hypershift-operator-main EC scenario by:

    • Inspecting the pipeline logs at hypershift-operator-main-enterprise-contract-ksk2b to identify the exact 2 failing policy rules
    • Comparing the EC policy configuration between hypershift-operator-main and control-plane-operator-main to understand why one fails and the other passes
    • Checking if the CVE database sync timing correlates with pass/fail outcomes
Evidence
Evidence Detail
PR #8459 EC result failure — 254 pass, 24 warn, 2 fail (snapshot 20260521-081421, completed 08:25 UTC)
PR #8563 EC result neutral — 256 pass, 8 warn, 0 fail (snapshot 20260521-081028, completed 08:21 UTC, 4 min before PR #8459)
PR #8555 EC result (unrelated dependency bump) failure — 254 pass, 24 warn, 2 fail (same pattern)
control-plane-operator EC on same PR success — passes with identical base images and build pipeline
Affected open PRs #8459, #8540, #8543, #8545, #8548, #8549, #8550, #8553, #8554, #8555, #8556 (13+ PRs)
Passing open PRs (same timeframe) #8560, #8561, #8562, #8563 — all neutral
Recently merged with neutral EC PR #8552 (merged May 21 07:12), PR #8557 (merged May 20 16:05)
Build pipeline status success — all 17 build tasks passed (init, clone, build, Clair, ClamAV, Snyk, etc.)
Builder base image registry.access.redhat.com/ubi9/go-toolset:1.25.9-1778054913
Runtime base image registry.access.redhat.com/ubi9/ubi-minimal:9.7-1777857961
Container labels Identical between FAILURE and NEUTRAL images
Merge-blocking? No — EC checks are not required for merge
Pipeline run (check 1) hypershift-operator-main-enterprise-contract-ksk2b
Pipeline run (check 2) hypershift-operator-enterprise-contract-2nq2x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants