Skip to content

[DO NOT MERGE] testing install: Split multi-document YAML and add missing annotations#1398

Open
jhadvig wants to merge 8 commits into
openshift:mainfrom
jhadvig:fix-console-plugin-manifests
Open

[DO NOT MERGE] testing install: Split multi-document YAML and add missing annotations#1398
jhadvig wants to merge 8 commits into
openshift:mainfrom
jhadvig:fix-console-plugin-manifests

Conversation

@jhadvig

@jhadvig jhadvig commented Jun 4, 2026

Copy link
Copy Markdown
Member

Summary

Fixes the CI failures in #1388 caused by 0000_50_cluster-update-console-plugin_80_servicemonitor.yaml containing two YAML documents (ServiceMonitor + PrometheusRule) in a single file separated by ---. The kube-apiserver manifest renderer cannot decode multi-document YAML, causing bootstrap to fail with:

unable to decode ".../0000_50_cluster-update-console-plugin_80_servicemonitor.yaml"
couldn't get version/kind; json parse error: invalid character '-' after top-level value

Changes:

  • Split the ServiceMonitor and PrometheusRule into separate files (_80_servicemonitor.yaml and _81_prometheusrule.yaml)
  • Add missing capability.openshift.io/name: Console and release.openshift.io/feature-set: TechPreviewNoUpgrade annotations to both resources, matching all other console-plugin manifests in this set

Test plan

  • CI e2e-aws-ovn-techpreview job should pass (was failing due to bootstrap crash)
  • CI e2e-agnostic-ovn-techpreview-serial jobs should pass
  • Unit tests pass locally

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added cluster-update-console-plugin to OpenShift console for managing cluster updates with integrated monitoring and alerting infrastructure.
  • Chores

    • Enhanced manifest rendering to support dynamic image reference resolution in payload deployment.

wking and others added 5 commits May 12, 2026 18:44
The console folks are pushing to decentralize console implementation
from [1], so we've created a new console plugin for cluster updates
[2].  It's built by both CI [3] and ART [4].  Checking on app.ci
ImageStreams:

  $ oc whoami -c
  default/api-ci-l2s4-p1-openshiftapps-com:6443/wking
  $ oc -n ocp get -o json imagestream 5.0 | jq -r '.status.tags[] | select(.tag == "cluster-update-console-plugin").items[] | .created + " " + .image'
  2026-05-01T20:55:38Z sha256:10e4f1b5763f40372823173b2a9528777ff8e97d416c5447e10df023c0e35656
  2026-04-29T05:00:56Z sha256:b0433455cbbff13bdda03ee78371e18c139adebc00432dea716e8cdf83eeb042
  2026-04-17T11:10:31Z sha256:e1296b64ffb35757fb2fb56bb5dd9cbd55c7f17f9f59c0a10a2d71a0ad6702d3
  $ oc -n ocp get -o json imagestream 5.0-art-latest | jq -r '.status.tags[] | select(.tag == "cluster-update-console-plugin").items[] | .created + " " + .image'
  2026-05-12T19:45:33Z sha256:b943be0ae0eba97c27741d0184e99a77ea928749cc578ae7e17a8e5329652642
  2026-05-12T14:42:55Z sha256:c29ba37ef5a426de5320d680d3fb58befc530274e6df4c32b4dc4fd0acaaaae0
  2026-05-12T09:02:35Z sha256:901bc6aac1142fe1da2a756bb4d91ae8fe14b459ce2bf9904ceae6d2fc818fc2
  2026-05-12T04:38:56Z sha256:1f4e8b200d97f82db1784b4d7fc9f0bd3ccaf2cc664fd5f0ff6b80485da16950
  2026-05-11T22:54:34Z sha256:d369ca7c73d7a3abe159e9e6f5644f63e0b091f0b75f202773a023e91c7faaf6

This commit sets up an image-references file [5], so 'oc adm release
new ...' knows that we'll want that image injected in the Deployment
manifest.  I'm using placeholder.url.oc.will.replace.this.example.org
as part of my placeholder name.  That's similar to the machine-config
operator's use of placeholder.url.oc.will.replace.this.org [6], except
that I'm using a subdomain of the reserved example.com [7], to avoid
any possible confusion with an actually in-use domain.

The new manifests are in run-level 50, which is the default, so they
can roll out in parallel with other components to avoid slowing
updates.

The new manifests are tried to the Console capability [8] and the
TechPreviewNoUpgrade feature set [9] (in the absence of a specific
feature gate for this functionality).

I'm just carrying the old
exclude.release.openshift.io/internal-openshift-hosted annotation over
from other CVO manifests.  It predates cluster profiles [10], and I'm
not sure anyone still uses it, but it seems like the CVO should be
consistent about whether it matters or not anymore.  Perhaps we can
drop it from all CVO manifests in follow-up work.

Otherwise these manifests are loosely based on my attempts to meld the
plugin's Help chart templates [11] with existing CVO manifest
conventions.

[1]: https://github.com/openshift/console
[2]: https://github.com/openshift/cluster-update-console-plugin
[3]: openshift/release#77945
[4]: openshift-eng/ocp-build-data#10393
[5]: https://github.com/openshift/enhancements/blob/4f67eee19ad16f1d5e9e8a2622b708e2ea6d8e6a/dev-guide/cluster-version-operator/dev/operators.md#how-do-i-ensure-the-right-images-get-used-by-my-manifests
[6]: https://github.com/openshift/machine-config-operator/blob/99cb8a46e6a31b2b72d6a8371c6cd4ee45393263/install/image-references#L10
[7]: https://www.rfc-editor.org/rfc/rfc6761#section-6.5
[8]: https://github.com/openshift/enhancements/blob/4f67eee19ad16f1d5e9e8a2622b708e2ea6d8e6a/enhancements/installer/component-selection.md#manifest-annotations
[9]: https://github.com/openshift/enhancements/blob/4f67eee19ad16f1d5e9e8a2622b708e2ea6d8e6a/enhancements/update/cvo-techpreview-manifests.md#proposal
[10]: https://github.com/openshift/enhancements/blob/4f67eee19ad16f1d5e9e8a2622b708e2ea6d8e6a/enhancements/update/ibm-public-cloud-support.md#cluster-version-operator-changes-for-beta
[11]: https://github.com/openshift/cluster-update-console-plugin/tree/9778f4fc0c19e60cad55a45591a066b6b7a3cb12/charts/openshift-console-plugin/templates
Avoid:

  $ go test ./pkg/payload
  ...
  --- FAIL: Test_cvoManifests (0.02s)
      --- FAIL: Test_cvoManifests/install_dir (0.02s)
          render_test.go:355: failed to load manifests: error parsing: error unmarshaling JSON: while decoding JSON: Resource with fields Group: "image.openshift.io" Kind: "ImageStream" Name: "" must contain kubernetes required fields kind and name
  ...

These image-refernces files are helpers for 'oc adm release new ...',
they don't need all the properties set that they'd need to be pushed
into a cluster.
…Users false

Cluster Bot 'launch 5.0.0-0.ci gcp,techpreview' [1].  Also asked for a release image that has my change via Cluster Bot 'build 5.0.0-0.ci,openshift#1388' [2].  Launch the update:

  $ oc adm release info registry.build10.ci.openshift.org/ci-ln-rmnm9yt/release:latest | grep Pull
  Pull From: registry.build10.ci.openshift.org/ci-ln-rmnm9yt/release@sha256:2cdcefbd1857a6f58a538e1ccc4460c89798318c35b9d51a2d2ff74d05e2fc1f
  $ oc adm upgrade --force --allow-explicit-upgrade --to-image registry.build10.ci.openshift.org/ci-ln-rmnm9yt/release@sha256:2cdcefbd1857a6f58a538e1ccc4460c89798318c35b9d51a2d2ff74d05e2fc1f

--force because the CI image is unsigned, and --allow-explicit-upgrade
because it is not recommended by an Update Service, neither one would
be something I'd recommend outside of testing on a throw-away cluster.
gather-extra artifacts in the run [3] show the Deployment struggling:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/release-openshift-origin-installer-launch-gcp-modern/2060016448293572608/artifacts/launch/gather-extra/artifacts/clusterversion.json | jq -r '.items[].status.conditions[] | select(.type == "Failing").message'
  deployment openshift-cluster-update-console-plugin/cluster-update-console-plugin has a replica failure FailedCreate: pods "cluster-update-console-plugin-547878cc4d-" is forbidden: unable to validate against any security context constraint: provider restricted-v3: .spec.hostUsers: Invalid value: null: Host Users must be set to false

[1]: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/release-openshift-origin-installer-launch-gcp-modern/2060016448293572608
[2]: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/release-openshift-origin-installer-launch-aws-modern/2060017269097893888
[3]: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/release-openshift-origin-installer-launch-gcp-modern/2060016448293572608/artifacts/launch/gather-extra/artifacts/
…ra nodeSelector

Infra Nodes are optional, but not required, and CI clusters don't have them, e.g. [1,2]:

  $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-serial-1of2/2059320133616144384/artifacts/e2e-aws-ovn-serial/gather-extra/artifacts/nodes.json | jq -r '.items[].metadata.labels' | grep node-role | sort | uniq -c
      3   "node-role.kubernetes.io/control-plane": "",
      3   "node-role.kubernetes.io/master": "",
      3   "node-role.kubernetes.io/worker": "",

Without this change, the Pods fail to schedule on that kind of CI cluster:

  $ oc -n openshift-cluster-update-console-plugin get -o json pods | jq -r '.items[].status.conditions[].message'
  0/6 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint(s). no new claims to deallocate, preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.

We'll probably need to have the CVO manage this Deployment more
actively, so it can set useful nodeSelector on clusters which do have
infra Nodes.  But for now, just drop the selector.

[1]: https://amd64.ocp.releases.ci.openshift.org/releasestream/5-dev-preview/release/5.0.0-ec.2
[2]: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-serial-1of2/2059320133616144384
The servicemonitor manifest contained both ServiceMonitor and
PrometheusRule in a single file separated by `---`. The kube-apiserver
manifest renderer cannot decode multi-document YAML, causing bootstrap
to fail with "invalid character '-' after top-level value".

Split into separate files and add the missing capability.openshift.io/name
and release.openshift.io/feature-set annotations to match the other
console-plugin manifests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown

Walkthrough

This PR introduces a new cluster-update-console-plugin component to the OpenShift cluster-version-operator. It adds templated image reference support to the manifest rendering pipeline, defines Kubernetes infrastructure (namespace, network policy, deployment, service), configures Prometheus monitoring and alerting, and registers the plugin with the OpenShift console.

Changes

Cluster Update Console Plugin

Layer / File(s) Summary
Image reference template rendering infrastructure
pkg/payload/render.go, pkg/payload/payload.go, pkg/payload/render_test.go
Extends manifest rendering to load image stream references from the release payload and expose them via a template-accessible Images map. Adds imagesFromImageRef helper to derive short-name→URI mappings, and threads the image reference through LoadUpdateloadPayloadTasksmanifestRenderConfig.
Namespace and network policy baseline
install/0000_50_cluster-update-console-plugin_10_namespace.yaml, install/0000_50_cluster-update-console-plugin_20_networkpolicy.yaml
Establishes the openshift-cluster-update-console-plugin namespace with restricted pod security labels and a default-deny network policy blocking all ingress/egress traffic by default.
Plugin deployment and service
install/0000_50_cluster-update-console-plugin_50_deployment.yaml, install/0000_50_cluster-update-console-plugin_60_service.yaml
Defines the plugin deployment using a templated image reference (supplied by the rendering infrastructure), exposes HTTPS port 9001, enforces restrictive security (non-root, dropped capabilities, seccomp runtime default, no privilege escalation), mounts a TLS certificate volume, and uses a ClusterIP service to expose the deployment to consumers.
Monitoring infrastructure
install/0000_50_cluster-update-console-plugin_80_servicemonitor.yaml, install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml
Adds ServiceMonitor for Prometheus scraping of cluster-version-operator metrics over HTTPS with TLS client authentication. Defines alerting rules for operator health (absence, flapping, degraded), update availability (retrieval failures, upstream recommendations), and release acceptance timeout; includes complex PromQL aggregation and Console URL templating.
OpenShift console plugin registration
install/0000_50_cluster-update-console-plugin_90_consoleplugin.yaml
Registers the cluster-update-console-plugin as a console plugin, wiring its backend service endpoint and setting display configuration and feature/release targeting.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 13 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title '[DO NOT MERGE] testing install: Split multi-document YAML and add missing annotations' is only partially related to the main changeset. While it mentions splitting YAML and adding annotations (which appear in the install manifests), it omits the significant code changes in pkg/payload/* that add image reference support to manifest templates. The [DO NOT MERGE] prefix also indicates this is a work-in-progress. Remove the [DO NOT MERGE] prefix since this appears to be a real fix for CI failures. Consider revising the title to better reflect the broader scope: e.g., 'Split multi-document YAML manifest and add image reference support to CVO templates' or similar to capture both manifest refactoring and the payload changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (13 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The PR introduces no Ginkgo tests. It modifies standard Go tests (render_test.go) and YAML manifests, neither of which contain dynamic test names or Ginkgo BDD-style test definitions.
Test Structure And Quality ✅ Passed Custom check for Ginkgo test quality does not apply. Repository uses standard Go testing, not Ginkgo framework. No Ginkgo tests found anywhere in the codebase.
Microshift Test Compatibility ✅ Passed All new Ginkgo e2e tests that use unavailable APIs (ClusterVersion, PrometheusRule) have proper util.SkipIfMicroshift() runtime guards in BeforeEach blocks.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests were added in this PR. Changes are Kubernetes manifests and unit tests (standard Go testing, not Ginkgo). SNO compatibility check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed Deployment uses safe scheduling: specific infra toleration only, maxUnavailable: 1, no control-plane nodeSelector, no affinity constraints. Compatible with all topologies.
Ote Binary Stdout Contract ✅ Passed PR contains no stdout writes in process-level code. Only logging via klog.Warningf(), which is configured with alsologtostderr=true to write to stderr.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR adds Kubernetes manifests and unit tests only, no Ginkgo e2e tests. Custom check for IPv6 compatibility applies to new Ginkgo e2e tests, which are not present.
No-Weak-Crypto ✅ Passed No weak cryptography detected. Only FNV-64 hashing for manifest integrity checks; no MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB, custom crypto, or insecure comparisons found.
Container-Privileges ✅ Passed All container manifests properly restrict privileges: allowPrivilegeEscalation=false, all capabilities dropped, runs as non-root, restricted-v3 SCC, and no host access settings.
No-Sensitive-Data-In-Logs ✅ Passed No sensitive data (passwords, tokens, API keys, PII, session IDs, internal hostnames, customer data) is exposed in any logging statements across the PR changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jhadvig
Once this PR has been reviewed and has the lgtm label, please assign davidhurta for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jhadvig jhadvig changed the title install: Split multi-document YAML and add missing annotations [DO NOT MERGE] testing install: Split multi-document YAML and add missing annotations Jun 4, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml (1)

44-51: 💤 Low value

Consider adding a for: duration to prevent transient alerts.

The UpdateAvailable alert fires immediately when cluster_version_available_updates > 0. While this may be intentional for prompt notification, a short for: duration (e.g., 5m) would prevent transient fluctuations from generating alert noise.

♻️ Suggested addition
       expr: |
         sum by (channel, namespace, upstream) (cluster_version_available_updates) > 0
+      for: 5m
       labels:
         severity: info
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml` around
lines 44 - 51, The UpdateAvailable PrometheusRule alert currently fires
immediately; open the alert definition for alert: UpdateAvailable and add a for:
duration (e.g., for: 5m) directly under the alert metadata so the rule only
transitions to firing after the condition (sum by (channel, namespace, upstream)
(cluster_version_available_updates) > 0) holds for that sustained period; update
the alert block (alert: UpdateAvailable, expr: ...) to include the new for:
field.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@install/0000_50_cluster-update-console-plugin_20_networkpolicy.yaml`:
- Around line 13-16: The NetworkPolicy currently uses podSelector: {} with
policyTypes: - Ingress and - Egress which creates a default-deny for all pods;
update the policy to either target specific pods instead of podSelector: {}
(e.g., use a labelSelector matching the console-plugin pods) or add explicit
allow rules for required traffic (ingress rules to permit backend/monitoring
sources and egress rules to permit upstream destinations) so the console plugin
service and monitoring can communicate; reference the podSelector and
policyTypes (Ingress, Egress) in the NetworkPolicy being modified and ensure the
new rules explicitly allow the necessary CIDRs/namespaceSelectors/port/protocols
for plugin traffic.

In `@install/0000_50_cluster-update-console-plugin_50_deployment.yaml`:
- Around line 37-46: The plugin container currently defines resources.requests
but no resources.limits and its securityContext lacks readOnlyRootFilesystem;
update the container spec for the container named "plugin" to add a
resources.limits block (set cpu and memory limits equal to or higher than the
existing requests, e.g., cpu and memory) and set
securityContext.readOnlyRootFilesystem: true; ensure you modify the container
entry that contains resources.requests and securityContext so both limits and
readOnlyRootFilesystem are present and compliant.
- Around line 30-37: The pod spec for the container named "plugin" is missing
liveness and readiness probes; add both probes under the container "plugin" that
use the existing port/name (https / containerPort 9001) — for example configure
an httpGet (or tcpSocket) probe hitting a health endpoint (e.g. /health or
/ready) on port 9001, set sensible initialDelaySeconds, periodSeconds and
failureThreshold values, and include identical or appropriately different probe
settings for liveness (to restart unhealthy containers) and readiness (to gate
service traffic) so the deployment meets the Liveness + readiness probes
requirement.

In `@install/0000_50_cluster-update-console-plugin_60_service.yaml`:
- Around line 6-11: The Service manifest for the cluster-update console plugin
is missing the serving-cert request annotation; add the annotation key
service.beta.openshift.io/serving-cert-secret-name with value
"cluster-update-console-plugin-cert" to the Service's metadata.annotations so
the cluster will provision the same secret that the Deployment mounts
(secretName: cluster-update-console-plugin-cert), ensuring the TLS secret is
created for the pods waiting on /var/cert.

In `@install/0000_50_cluster-update-console-plugin_80_servicemonitor.yaml`:
- Around line 1-27: The ServiceMonitor in this manifest uses the same
metadata.name and namespace as the cluster-version-operator ServiceMonitor,
causing the later 0000_90 manifest to overwrite annotations; fix by giving this
console plugin ServiceMonitor a unique metadata.name (e.g., append "-console" or
similar) and ensure any related references (labels/selectors like
metadata.labels k8s-app and selector.matchLabels) remain correct, or
alternatively merge the annotations (capability.openshift.io/name and
release.openshift.io/feature-set) into the later ServiceMonitor so they are not
lost when 0000_90 is applied.

In `@install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml`:
- Around line 1-13: This PrometheusRule duplicates an existing PrometheusRule
with metadata.name "cluster-version-operator" in namespace
"openshift-cluster-version" and lacks runbookUrl fields and proper
last_over_time wrapping; remove or rename this duplicate resource (or merge its
alerts into the existing PrometheusRule) so only one PrometheusRule with kind
PrometheusRule and name "cluster-version-operator" exists for that namespace,
and update each alert expression to wrap range queries with
last_over_time(...[5m]) and add appropriate runbookUrl annotations for each
alert to preserve Console/feature-set gating annotations (ensure the resource
that remains includes the Console gating annotations shown in
metadata.annotations).

In `@install/0000_50_cluster-update-console-plugin_90_consoleplugin.yaml`:
- Line 20: The manifest sets backend.service.port to the string "https" but
ConsolePluginService.port is an int32 and the actual Service listens on 9001;
change the backend.service.port value from "https" to the numeric 9001 so it
matches ConsolePluginService.port and the Service target port. Update the field
named backend.service.port in the ConsolePlugin YAML to the integer 9001 to
align with the Service configuration.
- Around line 18-19: The ConsolePlugin manifest sets spec.backend.service.name
to cluster-update-console-plugin which doesn't match the actual Service resource
name openshift-cluster-update-console-plugin; update the ConsolePlugin's
spec.backend.service.name to "openshift-cluster-update-console-plugin" (or
alternatively rename the Service to "cluster-update-console-plugin") so the
ConsolePlugin backend service name and the Service resource name match; check
the ConsolePlugin field spec.backend.service.port.targetPort and the Service
port too to ensure ports align after renaming.

---

Nitpick comments:
In `@install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml`:
- Around line 44-51: The UpdateAvailable PrometheusRule alert currently fires
immediately; open the alert definition for alert: UpdateAvailable and add a for:
duration (e.g., for: 5m) directly under the alert metadata so the rule only
transitions to firing after the condition (sum by (channel, namespace, upstream)
(cluster_version_available_updates) > 0) holds for that sustained period; update
the alert block (alert: UpdateAvailable, expr: ...) to include the new for:
field.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a03fc91b-d22b-41d8-8ed5-8cf949246c0e

📥 Commits

Reviewing files that changed from the base of the PR and between 8a13554 and 994f964.

📒 Files selected for processing (9)
  • install/0000_50_cluster-update-console-plugin_10_namespace.yaml
  • install/0000_50_cluster-update-console-plugin_20_networkpolicy.yaml
  • install/0000_50_cluster-update-console-plugin_50_deployment.yaml
  • install/0000_50_cluster-update-console-plugin_60_service.yaml
  • install/0000_50_cluster-update-console-plugin_80_servicemonitor.yaml
  • install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml
  • install/0000_50_cluster-update-console-plugin_90_consoleplugin.yaml
  • install/image-references
  • pkg/payload/render_test.go

Comment on lines +13 to +16
podSelector: {}
policyTypes:
- Ingress
- Egress

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Default-deny policy blocks plugin traffic without companion allow policies.

podSelector: {} + Ingress/Egress with no rules denies all traffic. As-is, backend access to the plugin service (and monitoring ingress) will be blocked unless additional allow policies are added in this namespace.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@install/0000_50_cluster-update-console-plugin_20_networkpolicy.yaml` around
lines 13 - 16, The NetworkPolicy currently uses podSelector: {} with
policyTypes: - Ingress and - Egress which creates a default-deny for all pods;
update the policy to either target specific pods instead of podSelector: {}
(e.g., use a labelSelector matching the console-plugin pods) or add explicit
allow rules for required traffic (ingress rules to permit backend/monitoring
sources and egress rules to permit upstream destinations) so the console plugin
service and monitoring can communicate; reference the podSelector and
policyTypes (Ingress, Egress) in the NetworkPolicy being modified and ensure the
new rules explicitly allow the necessary CIDRs/namespaceSelectors/port/protocols
for plugin traffic.

Comment on lines +30 to +37
containers:
- name: plugin
image: placeholder.url.oc.will.replace.this.example.org:cluster-update-console-plugin
imagePullPolicy: IfNotPresent
ports:
- name: https
containerPort: 9001
resources:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add readiness/liveness probes for the plugin container.

The container currently has no probes, so rollout/health gating is weaker than required.

Suggested patch
       - name: plugin
         image: placeholder.url.oc.will.replace.this.example.org:cluster-update-console-plugin
         imagePullPolicy: IfNotPresent
         ports:
         - name: https
           containerPort: 9001
+        readinessProbe:
+          httpGet:
+            path: /
+            port: https
+            scheme: HTTPS
+        livenessProbe:
+          httpGet:
+            path: /
+            port: https
+            scheme: HTTPS

As per coding guidelines, **/*.{yaml,yml} requires Liveness + readiness probes defined.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@install/0000_50_cluster-update-console-plugin_50_deployment.yaml` around
lines 30 - 37, The pod spec for the container named "plugin" is missing liveness
and readiness probes; add both probes under the container "plugin" that use the
existing port/name (https / containerPort 9001) — for example configure an
httpGet (or tcpSocket) probe hitting a health endpoint (e.g. /health or /ready)
on port 9001, set sensible initialDelaySeconds, periodSeconds and
failureThreshold values, and include identical or appropriately different probe
settings for liveness (to restart unhealthy containers) and readiness (to gate
service traffic) so the deployment meets the Liveness + readiness probes
requirement.

Comment on lines +37 to +46
resources:
requests:
cpu: 20m
memory: 50Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
terminationMessagePolicy: FallbackToLogsOnError

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add container limits and read-only root filesystem.

Container plugin has requests but no limits, and readOnlyRootFilesystem is not set.

Suggested patch
       resources:
         requests:
           cpu: 20m
           memory: 50Mi
+        limits:
+          cpu: 100m
+          memory: 200Mi
       securityContext:
         allowPrivilegeEscalation: false
+        readOnlyRootFilesystem: true
         capabilities:
           drop:
           - ALL

As per coding guidelines, **/*.{yaml,yml} requires readOnlyRootFilesystem and Resource limits (cpu, memory) on every container.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
resources:
requests:
cpu: 20m
memory: 50Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
terminationMessagePolicy: FallbackToLogsOnError
resources:
requests:
cpu: 20m
memory: 50Mi
limits:
cpu: 100m
memory: 200Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
terminationMessagePolicy: FallbackToLogsOnError
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@install/0000_50_cluster-update-console-plugin_50_deployment.yaml` around
lines 37 - 46, The plugin container currently defines resources.requests but no
resources.limits and its securityContext lacks readOnlyRootFilesystem; update
the container spec for the container named "plugin" to add a resources.limits
block (set cpu and memory limits equal to or higher than the existing requests,
e.g., cpu and memory) and set securityContext.readOnlyRootFilesystem: true;
ensure you modify the container entry that contains resources.requests and
securityContext so both limits and readOnlyRootFilesystem are present and
compliant.

Comment thread install/0000_50_cluster-update-console-plugin_60_service.yaml
Comment on lines +1 to +27
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: cluster-version-operator
name: cluster-version-operator
namespace: openshift-cluster-version
annotations:
kubernetes.io/description: Configure Prometheus to monitor cluster-version operator metrics.
capability.openshift.io/name: Console
release.openshift.io/feature-set: TechPreviewNoUpgrade
exclude.release.openshift.io/internal-openshift-hosted: "true"
include.release.openshift.io/self-managed-high-availability: "true"
spec:
endpoints:
- interval: 30s
port: metrics
scheme: https
tlsConfig:
serverName: cluster-version-operator.openshift-cluster-version.svc
scrapeClass: tls-client-certificate-auth
namespaceSelector:
matchNames:
- openshift-cluster-version
selector:
matchLabels:
k8s-app: cluster-version-operator

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for ServiceMonitor resources with the same name/namespace
echo "=== ServiceMonitors named 'cluster-version-operator' ==="
rg -l "name: cluster-version-operator" --glob '*servicemonitor*.yaml' install/

echo ""
echo "=== Feature-set annotations in matching files ==="
for f in $(fd -e yaml servicemonitor install/); do
  if rg -q "name: cluster-version-operator" "$f"; then
    echo "--- $f ---"
    rg "release.openshift.io/feature-set|name:|namespace:" "$f" | head -10
  fi
done

Repository: openshift/cluster-version-operator

Length of output: 1174


🏁 Script executed:

#!/bin/bash
set -euo pipefail

files=(
  install/0000_50_cluster-update-console-plugin_80_servicemonitor.yaml
  install/0000_90_cluster-version-operator_02_servicemonitor.yaml
)

for f in "${files[@]}"; do
  echo "===== $f ====="
  # Print only metadata/annotations area (first ~80 lines around 'annotations:')
  if rg -n "annotations:" "$f" >/dev/null; then
    start=$(rg -n "annotations:" "$f" | head -1 | cut -d: -f1)
    # show 40 lines starting at annotations
    sed -n "$start,$((start+60))p" "$f" | cat -n
  else
    echo "(no annotations key found)"
  fi
  echo
done

Repository: openshift/cluster-version-operator

Length of output: 5204


🏁 Script executed:

#!/bin/bash
set -euo pipefail

files=(
  install/0000_50_cluster-update-console-plugin_80_servicemonitor.yaml
  install/0000_90_cluster-version-operator_02_servicemonitor.yaml
)

for f in "${files[@]}"; do
  echo "===== $f ====="
  if rg -n "annotations:" "$f" >/dev/null; then
    start=$(rg -n "annotations:" "$f" | head -1 | cut -d: -f1)
    sed -n "$start,$((start+80))p" "$f" | cat -n
  else
    echo "(no annotations key found)"
  fi
  echo
done

Repository: openshift/cluster-version-operator

Length of output: 6778


Fix duplicate ServiceMonitor name/namespace to avoid annotation overwrite

install/0000_50_cluster-update-console-plugin_80_servicemonitor.yaml and install/0000_90_cluster-version-operator_02_servicemonitor.yaml both define a ServiceMonitor named cluster-version-operator in openshift-cluster-version. Since 0000_90 is applied after 0000_50, it will replace the resource and drop capability.openshift.io/name: Console and release.openshift.io/feature-set: TechPreviewNoUpgrade from the console plugin manifest. Use a unique ServiceMonitor name for the console plugin (or merge the required annotations into the later manifest).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@install/0000_50_cluster-update-console-plugin_80_servicemonitor.yaml` around
lines 1 - 27, The ServiceMonitor in this manifest uses the same metadata.name
and namespace as the cluster-version-operator ServiceMonitor, causing the later
0000_90 manifest to overwrite annotations; fix by giving this console plugin
ServiceMonitor a unique metadata.name (e.g., append "-console" or similar) and
ensure any related references (labels/selectors like metadata.labels k8s-app and
selector.matchLabels) remain correct, or alternatively merge the annotations
(capability.openshift.io/name and release.openshift.io/feature-set) into the
later ServiceMonitor so they are not lost when 0000_90 is applied.

Comment on lines +1 to +13
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
k8s-app: cluster-version-operator
name: cluster-version-operator
namespace: openshift-cluster-version
annotations:
kubernetes.io/description: Alerting rules for when cluster-version operator metrics call for administrator attention.
capability.openshift.io/name: Console
release.openshift.io/feature-set: TechPreviewNoUpgrade
exclude.release.openshift.io/internal-openshift-hosted: "true"
include.release.openshift.io/self-managed-high-availability: "true"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for PrometheusRule resources with the same name/namespace
echo "=== PrometheusRules named 'cluster-version-operator' ==="
rg -l "kind: PrometheusRule" install/ | xargs -I{} sh -c 'rg -q "name: cluster-version-operator" {} && echo {}'

echo ""
echo "=== Alert definitions in 0000_90 file for comparison ==="
rg "alert:" install/0000_90_cluster-version-operator_02_servicemonitor.yaml 2>/dev/null || echo "File not found or no alerts"

Repository: openshift/cluster-version-operator

Length of output: 736


🏁 Script executed:

#!/bin/bash
set -euo pipefail

files=(
  install/0000_90_cluster-version-operator_02_servicemonitor.yaml
  install/0000_90_cluster-version-operator_02_prometheusrule_servicemonitor.yaml
  install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml
)

for f in "${files[@]}"; do
  echo "=== $f (metadata) ==="
  rg -n "^(kind:|name:|namespace:)" "$f" || true
  echo
done

echo "=== alerts present in 0000_50_cluster-update-console-plugin_81_prometheusrule.yaml ==="
rg -n "^[[:space:]]*- alert:" install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml || true

echo
echo "=== alerts present in 0000_90_cluster-version-operator_02_prometheusrule_servicemonitor.yaml ==="
rg -n "^[[:space:]]*- alert:" install/0000_90_cluster-version-operator_02_prometheusrule_servicemonitor.yaml || true

Repository: openshift/cluster-version-operator

Length of output: 1034


🏁 Script executed:

#!/bin/bash
set -euo pipefail

files=(
  install/0000_90_cluster-version-operator_02_servicemonitor.yaml
  install/0000_90_cluster-version-operator_02_prometheusrule_servicemonitor.yaml
  install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml
)

for f in "${files[@]}"; do
  echo "=== $f (PrometheusRule metadata.name/namespace blocks) ==="
  awk '
    $0 ~ /kind:[[:space:]]*PrometheusRule/ {inpr=1}
    inpr && $0 ~ /^[[:space:]]*name:[[:space:]]*/ {print "name:", $0}
    inpr && $0 ~ /^[[:space:]]*namespace:[[:space:]]*/ {print "namespace:", $0}
    $0 ~ /^---/ {inpr=0}
  ' "$f" || true
  echo
done

echo "=== $PWD / sanity: show exact metadata snippet from 0000_50 file ==="
rg -n "kind: PrometheusRule|name: |namespace: |annotations:|labels:" install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml

Repository: openshift/cluster-version-operator

Length of output: 1572


🏁 Script executed:

#!/bin/bash
set -euo pipefail

f="install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml"

echo "=== Check last_over_time usage ==="
rg -n "last_over_time" "$f" || true

echo
echo "=== Check for '== 0' vs '!= 0' in alert expressions ==="
rg -n "expr:.*(==[[:space:]]*0|!=[[:space:]]*0)" "$f" || true

echo
echo "=== Extract UpdateAvailable block (up to next alert) ==="
python3 - <<'PY'
import re, itertools, sys
path="install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml"
lines=open(path,'r',encoding='utf-8').read().splitlines()
start=None
for i,l in enumerate(lines):
    if re.match(r'^\s*-\s*alert:\s*UpdateAvailable\s*$', l):
        start=i
        break
if start is None:
    print("UpdateAvailable not found")
    sys.exit(0)
# print until next "- alert:" line or EOF
for l in lines[start:]:
    if l!=lines[start] and re.match(r'^\s*-\s*alert:\s*', l):
        break
    print(l)
PY

echo
echo "=== Check runbookUrl and description/summary annotations for each alert ==="
python3 - <<'PY'
import re, sys
path="install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml"
text=open(path,'r',encoding='utf-8').read().splitlines()
alerts=[]
current=None
buf=[]
for l in text:
    m=re.match(r'^\s*-\s*alert:\s*(.+)\s*$', l)
    if m:
        if current:
            alerts.append((current, buf))
        current=m.group(1).strip()
        buf=[l]
    else:
        if current:
            buf.append(l)
if current:
    alerts.append((current, buf))

for name,blk in alerts:
    # capture first occurrences
    run=None; desc=None; summ=None
    for l in blk:
        if 'runbookUrl:' in l:
            run=l.strip()
        if re.match(r'^\s*description:\s*', l):
            desc=l.strip()
        if re.match(r'^\s*summary:\s*', l):
            summ=l.strip()
    print(f"--- {name} ---")
    if run: print(run)
    if summ: print(summ)
    if desc: print(desc)
    # for presence
    for_present=any(re.match(r'^\s*for:\s*', l) for l in blk)
    print("for:", for_present)
PY

Repository: openshift/cluster-version-operator

Length of output: 6708


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) ServiceMonitor + PrometheusRule names/namespaces inside the combined servicemonitor file
f1="install/0000_90_cluster-version-operator_02_servicemonitor.yaml"
echo "=== $f1: all name/namespace blocks by resource kind ==="
python3 - <<'PY'
import re
path="install/0000_90_cluster-version-operator_02_servicemonitor.yaml"
lines=open(path,'r',encoding='utf-8').read().splitlines()
kind=None
out=[]
for i,l in enumerate(lines):
    m=re.match(r'^\s*kind:\s*(.+)\s*$', l)
    if m:
        kind=m.group(1)
    if kind and re.match(r'^\s*name:\s*', l):
        out.append(("name",kind,l.strip()))
        kind=None  # only first name after kind
    if re.match(r'^\s*namespace:\s*', l):
        # heuristic: namespace nearby to metadata block
        out.append(("namespace",kind,l.strip()))
# just print all lines that look relevant
for l in lines:
    if re.match(r'^\s*(kind|name|namespace):', l):
        print(l)
PY

echo
echo "=== $f1: last_over_time + runbookUrl presence ==="
rg -n "last_over_time|runbookUrl:" "$f1" || true

echo
echo "=== $f1: UpdateAvailable alert block (if present) for expr/for/runbookUrl ==="
python3 - <<'PY'
import re
path="install/0000_90_cluster-version-operator_02_servicemonitor.yaml"
lines=open(path,'r',encoding='utf-8').read().splitlines()
# Find block for alert: UpdateAvailable
start=None
for i,l in enumerate(lines):
    if re.match(r'^\s*-\s*alert:\s*UpdateAvailable\s*$', l):
        start=i
        break
if start is None:
    print("No UpdateAvailable alert in file")
    raise SystemExit
for l in lines[start:]:
    if l!=lines[start] and re.match(r'^\s*-\s*alert:\s*', l):
        break
    print(l)
PY

echo
echo "=== $f1: ClusterVersionOperatorDown block (if present): for + runbookUrl ==="
python3 - <<'PY'
import re
path="install/0000_90_cluster-version-operator_02_servicemonitor.yaml"
lines=open(path,'r',encoding='utf-8').read().splitlines()
start=None
target="ClusterVersionOperatorDown"
for i,l in enumerate(lines):
    if re.match(r'^\s*-\s*alert:\s*%s\s*$' % re.escape(target), l):
        start=i; break
if start is None:
    print("No ClusterVersionOperatorDown alert in file"); raise SystemExit
for l in lines[start:]:
    if l!=lines[start] and re.match(r'^\s*-\s*alert:\s*', l):
        break
    if 'runbookUrl:' in l or re.match(r'^\s*for:\s*', l) or re.match(r'^\s*-\s*alert:\s*', l) or re.match(r'^\s*expr:\s*|^\s*summary:\s*', l) or re.match(r'^\s*description:\s*', l):
        print(l)
PY

echo
echo "=== 0000_90_prometheusrule_servicemonitor.yaml: last_over_time + runbookUrl ==="
f2="install/0000_90_cluster-version-operator_02_prometheusrule_servicemonitor.yaml"
rg -n "last_over_time|runbookUrl:" "$f2" || true

echo
echo "=== 0000_90_prometheusrule_servicemonitor.yaml: all alert names ==="
rg -n "^[[:space:]]*-[[:space:]]*alert:" "$f2"

Repository: openshift/cluster-version-operator

Length of output: 2408


🏁 Script executed:

#!/bin/bash
set -euo pipefail

new="install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml"
old="install/0000_90_cluster-version-operator_02_servicemonitor.yaml"

echo "=== Alert names in new PrometheusRule ==="
rg -n "^[[:space:]]*-[[:space:]]*alert:" "$new" | sed -E 's/.*alert:[[:space:]]*//'

echo
echo "=== Alert names in old PrometheusRule (cluster-version-operator) ==="
# Extract only the PrometheusRule section for name: cluster-version-operator, then list alerts.
python3 - <<'PY'
import re
path="install/0000_90_cluster-version-operator_02_servicemonitor.yaml"
text=open(path,'r',encoding='utf-8').read().splitlines()

# Find PrometheusRule with metadata.name == cluster-version-operator
in_pr=False
in_meta=False
found=False
blocks=[]
current=[]
for i,l in enumerate(text):
    if re.match(r'^\s*kind:\s*PrometheusRule\s*$', l):
        in_pr=True
        in_meta=False
        found=False
        current=[l]
        continue
    if in_pr:
        current.append(l)
        if re.match(r'^\s*metadata:\s*$', l):
            in_meta=True
            continue
        if in_meta and re.match(r'^\s*name:\s*cluster-version-operator\s*$', l):
            found=True
        # end of document heuristic: next 'kind:' at same indent (assume file has single document or multiple separated by ---)
        if re.match(r'^---\s*$', l) and current:
            in_pr=False
    # For simplicity, when found alert names we will re-scan using a regex window.
PY

# Regex window approach: take from "kind: PrometheusRule" near "name: cluster-version-operator" to next "kind:" at same indent or EOF
python3 - <<'PY'
import re
path="install/0000_90_cluster-version-operator_02_servicemonitor.yaml"
s=open(path,'r',encoding='utf-8').read()

# Grab the PrometheusRule block that contains "kind: PrometheusRule" and "name: cluster-version-operator"
m=re.search(r'(kind:\s*PrometheusRule[\s\S]*?name:\s*cluster-version-operator[\s\S]*?)(?=kind:\s|$)', s)
if not m:
    print("Could not isolate PrometheusRule block"); raise SystemExit(1)
block=m.group(1)
alerts=re.findall(r'^\s*-\s*alert:\s*(.+?)\s*$', block, flags=re.M)
for a in alerts:
    print(a)
PY

Repository: openshift/cluster-version-operator

Length of output: 618


🏁 Script executed:

#!/bin/bash
set -euo pipefail

new="install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml"
old="install/0000_90_cluster-version-operator_02_servicemonitor.yaml"
old2="install/0000_90_cluster-version-operator_02_prometheusrule_servicemonitor.yaml"

echo "=== new: last_over_time/runbookUrl presence ==="
rg -n "last_over_time\\(|runbookUrl:" "$new" || true

echo
echo "=== new: expr contains == 0 or != 0 ==="
rg -n "(==[[:space:]]*0|!=[[:space:]]*0)" "$new" || true

echo
echo "=== new: metric name patterns with _conditions/_condition ==="
rg -n "cluster_.*_conditions|_conditions\\b|_condition\\b" "$new" || true

echo
echo "=== new: for: durations by alert ==="
python3 - <<'PY'
import re
path="install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml"
lines=open(path,'r',encoding='utf-8').read().splitlines()
alerts=[]
cur=None; buf=[]
for l in lines:
    m=re.match(r'^\s*-\s*alert:\s*(.+)\s*$', l)
    if m:
        if cur: alerts.append((cur,buf))
        cur=m.group(1).strip(); buf=[l]
    else:
        if cur: buf.append(l)
if cur: alerts.append((cur,buf))
for name,blk in alerts:
    dur=None
    for l in blk:
        m=re.match(r'^\s*for:\s*(.+)\s*$', l)
        if m:
            dur=m.group(1).strip(); break
    print(f"{name}: for={dur}")
PY

echo
echo "=== old: last_over_time/runbookUrl presence (same check) ==="
rg -n "last_over_time\\(|runbookUrl:" "$old" || true

echo
echo "=== old2: last_over_time/runbookUrl presence (same check) ==="
rg -n "last_over_time\\(|runbookUrl:" "$old2" || true

Repository: openshift/cluster-version-operator

Length of output: 3632


🏁 Script executed:

#!/bin/bash
set -euo pipefail

new="install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml"
old="install/0000_90_cluster-version-operator_02_servicemonitor.yaml"

python3 - <<'PY'
import re, json

def parse_prometheusrule(path, target_name="cluster-version-operator"):
    text=open(path,'r',encoding='utf-8').read().splitlines()

    # isolate first PrometheusRule document whose metadata.name matches target_name
    # (heuristic: find "kind: PrometheusRule", then metadata.name line)
    docs=[]
    cur_kind=None
    cur=[]
    for l in text:
        if l.strip()=="kind: PrometheusRule":
            cur_kind="PrometheusRule"
        cur.append(l)
    s="\n".join(text)

    # extract block between "kind: PrometheusRule" and next "---" at doc boundary
    # We'll approximate by searching for "kind: PrometheusRule" segments and then checking name.
    parts=re.split(r'(?m)^---\s*$', s)
    for p in parts:
        if "kind: PrometheusRule" not in p:
            continue
        # check metadata.name
        if not re.search(r'(?m)^\s*name:\s*%s\s*$' % re.escape(target_name), p):
            continue
        docs.append(p)

    if not docs:
        raise SystemExit(f"No PrometheusRule with name={target_name} found in {path}")
    p=docs[0]

    alerts={}
    # For each alert block, capture until next "- alert:"
    # We'll operate on YAML lines for stability.
    lines=p.splitlines()
    cur=None; buf=[]
    for l in lines:
        m=re.match(r'^\s*-\s*alert:\s*(.+?)\s*$', l)
        if m:
            if cur:
                alerts[cur]=buf
            cur=m.group(1).strip()
            buf=[l]
        else:
            if cur:
                buf.append(l)
    if cur:
        alerts[cur]=buf

    out={}
    for alert, blk in alerts.items():
        item={}
        for l in blk:
            m=re.match(r'^\s*expr:\s*(.*)$', l)
            if m:
                item['expr_firstline']=m.group(1).strip()
            m=re.match(r'^\s*for:\s*(.*)$', l)
            if m:
                item['for']=m.group(1).strip()
            m=re.match(r'^\s*runbookUrl:\s*(.*)$', l)
            if m:
                item['runbookUrl']=m.group(1).strip()
            m=re.match(r'^\s*summary:\s*(.*)$', l)
            if m:
                item['summary_firstline']=m.group(1).strip()
            if re.match(r'^\s*description:\s*', l):
                # keep first line only
                item['description_firstline']=l.strip()
        out[alert]=item
    return out

old_map=parse_prometheusrule(old)
new_map=parse_prometheusrule(new)

alerts=sorted(set(old_map)|set(new_map))
report=[]
for a in alerts:
    o=old_map.get(a,{})
    n=new_map.get(a,{})
    keys=set(o)|set(n)
    diff={}
    for k in sorted(keys):
        if o.get(k)!=n.get(k):
            diff[k]={'old':o.get(k),'new':n.get(k)}
    report.append((a,diff))

print(json.dumps(report, indent=2, sort_keys=False))
PY

Repository: openshift/cluster-version-operator

Length of output: 211


🏁 Script executed:

#!/bin/bash
set -euo pipefail

new="install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml"
old="install/0000_90_cluster-version-operator_02_servicemonitor.yaml"

python3 - <<'PY'
import re, json

def parse_prometheusrule(path, target_name="cluster-version-operator"):
    text=open(path,'r',encoding='utf-8').read().splitlines()
    s="\n".join(text)

    parts=re.split(r'(?m)^---\s*$', s)
    docs=[]
    for p in parts:
        if "kind: PrometheusRule" not in p:
            continue
        if re.search(r'(?m)^\s*name:\s*%s\s*$' % re.escape(target_name), p):
            docs.append(p)

    if not docs:
        raise SystemExit(f"No PrometheusRule with name={target_name} found in {path}")

    p=docs[0]

    alerts={}
    lines=p.splitlines()
    cur=None; buf=[]
    for l in lines:
        m=re.match(r'^\s*-\s*alert:\s*(.+?)\s*$', l)
        if m:
            if cur:
                alerts[cur]=buf
            cur=m.group(1).strip()
            buf=[l]
        else:
            if cur:
                buf.append(l)
    if cur:
        alerts[cur]=buf

    out={}
    for alert, blk in alerts.items():
        item={}
        for l in blk:
            m=re.match(r'^\s*expr:\s*(.*)$', l)
            if m:
                item['expr_firstline']=m.group(1).strip()
            m=re.match(r'^\s*for:\s*(.*)$', l)
            if m:
                item['for']=m.group(1).strip()
            m=re.match(r'^\s*runbookUrl:\s*(.*)$', l)
            if m:
                item['runbookUrl']=m.group(1).strip()
            if re.match(r'^\s*summary:\s*', l):
                item['summary_firstline']=l.strip().removeprefix('summary:').strip()
            if re.match(r'^\s*description:\s*', l):
                item['description_firstline']=l.strip().removeprefix('description:').strip()
        out[alert]=item
    return out

old_map=parse_prometheusrule(old)
new_map=parse_prometheusrule(new)

alerts=sorted(set(old_map)|set(new_map))
report=[]
for a in alerts:
    o=old_map.get(a,{})
    n=new_map.get(a,{})
    diff={}
    for k in sorted(set(o)|set(n)):
        if o.get(k)!=n.get(k):
            diff[k]={'old':o.get(k),'new':n.get(k)}
    report.append((a,diff))

print(json.dumps([{'alert':a,'diff':d} for a,d in report if d], indent=2))
print("\nUnchanged alerts:", [a for a,d in report if not d])
PY

Repository: openshift/cluster-version-operator

Length of output: 211


🏁 Script executed:

#!/bin/bash
set -euo pipefail

old="install/0000_90_cluster-version-operator_02_servicemonitor.yaml"
new="install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml"

python3 - <<PY
import re, json

def parse_prometheusrule(path, target_name="cluster-version-operator"):
    text=open(path,'r',encoding='utf-8').read().splitlines()
    s="\\n".join(text)

    parts=re.split(r'(?m)^---\\s*$', s)
    docs=[]
    for p in parts:
        if "kind: PrometheusRule" not in p:
            continue
        if re.search(r'(?m)^\\s*name:\\s*%s\\s*$' % re.escape(target_name), p):
            docs.append(p)
    if not docs:
        raise SystemExit(f"No PrometheusRule with name={target_name} found in {path}")
    p=docs[0]

    alerts={}
    lines=p.splitlines()
    cur=None; buf=[]
    for l in lines:
        m=re.match(r'^\\s*-\\s*alert:\\s*(.+?)\\s*$', l)
        if m:
            if cur:
                alerts[cur]=buf
            cur=m.group(1).strip()
            buf=[l]
        else:
            if cur:
                buf.append(l)
    if cur:
        alerts[cur]=buf

    out={}
    for alert, blk in alerts.items():
        item={}
        for l in blk:
            m=re.match(r'^\\s*expr:\\s*(.*)$', l)
            if m and 'expr_firstline' not in item:
                item['expr_firstline']=m.group(1).strip()
            m=re.match(r'^\\s*for:\\s*(.*)$', l)
            if m:
                item['for']=m.group(1).strip()
            m=re.match(r'^\\s*runbookUrl:\\s*(.*)$', l)
            if m:
                item['runbookUrl']=m.group(1).strip()
            m=re.match(r'^\\s*summary:\\s*(.*)$', l)
            if m:
                item['summary']=m.group(1).strip()
            m=re.match(r'^\\s*description:\\s*(.*)$', l)
            if m:
                item['description']=m.group(1).strip()
        out[alert]=item
    return out

old_map=parse_prometheusrule("$old")
new_map=parse_prometheusrule("$new")

alerts=sorted(set(old_map)|set(new_map))
report=[]
unchanged=[]
for a in alerts:
    o=old_map.get(a,{})
    n=new_map.get(a,{})
    diff={}
    for k in sorted(set(o)|set(n)):
        if o.get(k)!=n.get(k):
            diff[k]={'old':o.get(k),'new':n.get(k)}
    if diff:
        report.append({'alert':a,'diff':diff})
    else:
        unchanged.append(a)

print("Modified alerts:")
print(json.dumps(report, indent=2))
print("\\nUnchanged alerts:")
print(unchanged)
PY

Repository: openshift/cluster-version-operator

Length of output: 359


🏁 Script executed:

#!/bin/bash
set -euo pipefail

python3 - <<'PY'
import re, json
path="install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml"
lines=open(path,'r',encoding='utf-8').read().splitlines()

alerts={}
cur=None; buf=[]
for l in lines:
    m=re.match(r'^\s*-\s*alert:\s*(.+?)\s*$', l)
    if m:
        if cur: alerts[cur]=buf
        cur=m.group(1).strip(); buf=[l]
    else:
        if cur: buf.append(l)
if cur: alerts[cur]=buf

def get_label_severity(blk):
    sev=None
    # find first "severity:" under labels: within the alert block
    for i,l in enumerate(blk):
        if re.match(r'^\s*labels:\s*$', l):
            # scan until next "labels:" or next alert header
            for j in range(i+1, len(blk)):
                ll=blk[j]
                if re.match(r'^\s*-\s*alert:\s*', ll): break
                if re.match(r'^\s*[a-zA-Z0-9_-]+\s*:\s*', ll):
                    # first severity line
                    m=re.match(r'^\s*severity:\s*(.+?)\s*$', ll)
                    if m:
                        return m.group(1).strip()
    return sev

for name in ["UpdateAvailable","CannotRetrieveUpdates","CannotEvaluateConditionalUpdates"]:
    blk=alerts.get(name)
    print(name, "severity=", get_label_severity(blk) if blk else None)
PY

Repository: openshift/cluster-version-operator

Length of output: 200


🏁 Script executed:

#!/bin/bash
set -euo pipefail

old="install/0000_90_cluster-version-operator_02_servicemonitor.yaml"
new="install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml"

echo "=== Console-related metadata in old ($old) ==="
rg -n "capability\\.openshift\\.io/name|release\\.openshift\\.io/feature-set|console_url" "$old" || true
echo
echo "=== Console-related metadata in new ($new) ==="
rg -n "capability\\.openshift\\.io/name|release\\.openshift\\.io/feature-set|console_url" "$new" || true

Repository: openshift/cluster-version-operator

Length of output: 9664


🏁 Script executed:

#!/bin/bash
set -euo pipefail

old="install/0000_90_cluster-version-operator_02_servicemonitor.yaml"
new="install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml"

echo "=== Old file: release/include/exclude/feature-set annotations ==="
rg -n "exclude\\.release\\.openshift\\.io/|include\\.release\\.openshift\\.io/|feature-set|capability\\.openshift\\.io/name" "$old" || true

echo
echo "=== New file: release/include/exclude/feature-set annotations ==="
rg -n "exclude\\.release\\.openshift\\.io/|include\\.release\\.openshift\\.io/|feature-set|capability\\.openshift\\.io/name" "$new" || true

Repository: openshift/cluster-version-operator

Length of output: 753


Fix duplicate PrometheusRule and missing PrometheusRule annotations

  • install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml defines kind: PrometheusRule with metadata.name: cluster-version-operator and metadata.namespace: openshift-cluster-version, which matches the existing PrometheusRule in install/0000_90_cluster-version-operator_02_servicemonitor.yaml; since the alert set/name is the same, one will overwrite the other depending on apply order, potentially dropping the Console/feature-set gating annotations.
  • The alerts in this PrometheusRule have no runbookUrl fields and the expressions are not wrapped with last_over_time(...[5m]) per PrometheusRule conventions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@install/0000_50_cluster-update-console-plugin_81_prometheusrule.yaml` around
lines 1 - 13, This PrometheusRule duplicates an existing PrometheusRule with
metadata.name "cluster-version-operator" in namespace
"openshift-cluster-version" and lacks runbookUrl fields and proper
last_over_time wrapping; remove or rename this duplicate resource (or merge its
alerts into the existing PrometheusRule) so only one PrometheusRule with kind
PrometheusRule and name "cluster-version-operator" exists for that namespace,
and update each alert expression to wrap range queries with
last_over_time(...[5m]) and add appropriate runbookUrl annotations for each
alert to preserve Console/feature-set gating annotations (ensure the resource
that remains includes the Console gating annotations shown in
metadata.annotations).

Comment thread install/0000_50_cluster-update-console-plugin_90_consoleplugin.yaml Outdated
Comment thread install/0000_50_cluster-update-console-plugin_90_consoleplugin.yaml Outdated
jhadvig and others added 3 commits June 4, 2026 10:20
The ConsolePlugin referenced service name "cluster-update-console-plugin"
but the actual Service is named "openshift-cluster-update-console-plugin",
preventing the console from reaching the plugin backend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add the service.beta.openshift.io/serving-cert-secret-name annotation to
the Service so the service-ca operator provisions the TLS secret that the
Deployment mounts at /var/cert. Without it the pod gets stuck waiting for
the secret volume.

Also fix the ConsolePlugin backend port from string "https" to numeric
9001, since ConsolePluginService.Port is int32.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CVO manifests in /manifests/ use Go template rendering but only had
access to ReleaseImage and ClusterProfile variables. Manifests that
need to reference other component images (like cluster-update-console-
plugin) had no way to resolve placeholder image URLs at deploy time.

Add an Images map to manifestRenderConfig, populated from the release
payload's /release-manifests/image-references ImageStream. CVO manifests
can now use {{index .Images "component-name"}} to reference any image
in the release.

Remove install/image-references since it was not processed from
/manifests/ by oc adm release new, and the image is already registered
in the release's image-references via CI/ART configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pkg/payload/render.go (1)

195-207: ⚡ Quick win

Consider validating that image URIs are non-empty.

The function does not check whether tag.From.Name is non-empty before inserting into the map. If an ImageStream tag has an empty DockerImage reference, templates using {{index .Images "tag-name"}} would render with an empty string, causing invalid manifest application.

🛡️ Proposed validation
 func imagesFromImageRef(imageRef *imagev1.ImageStream) map[string]string {
 	images := make(map[string]string)
 	if imageRef == nil {
 		return images
 	}
 	for _, tag := range imageRef.Spec.Tags {
-		if tag.From != nil && tag.From.Kind == "DockerImage" {
+		if tag.From != nil && tag.From.Kind == "DockerImage" && tag.From.Name != "" {
 			images[tag.Name] = tag.From.Name
 		}
 	}
 	return images
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/payload/render.go` around lines 195 - 207, imagesFromImageRef currently
inserts tag.From.Name into the images map without checking for emptiness; update
imagesFromImageRef to skip tags where tag.From == nil or tag.From.Kind !=
"DockerImage" or tag.From.Name is empty (do not add entries with an empty URI),
and optionally record/debug-log skipped tags if desired; reference the function
imagesFromImageRef and the fields tag.From, tag.From.Kind and tag.From.Name when
making the change.
pkg/payload/render_test.go (1)

316-318: ⚡ Quick win

Consider adding unit tests for imagesFromImageRef.

While Test_cvoManifests exercises the Images map indirectly through rendering, there's no direct unit test for the imagesFromImageRef helper function. A table-driven test would document expected behavior and improve confidence for edge cases such as nil imageRef, empty tags, non-DockerImage kinds, and duplicate tag names.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/payload/render_test.go` around lines 316 - 318, Add a new table-driven
unit test for the helper function imagesFromImageRef (e.g.,
Test_imagesFromImageRef) in pkg/payload/render_test.go to cover nil imageRef,
empty tags, non-DockerImage kinds, and duplicate tag names; for each case
construct an imageRef input, call imagesFromImageRef, and assert the returned
map matches expected keys/values (including handling duplicates and empty tags),
keeping Test_cvoManifests as-is but ensuring the new test directly verifies
imagesFromImageRef behavior and edge cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/payload/render.go`:
- Around line 42-47: The code currently silences errors from loadImageReferences
causing renderConfig.Images to be empty and templates that reference .Images
(e.g., install/0000_50_cluster-update-console-plugin_50_deployment.yaml) to
render invalid empty image fields; update the logic around loadImageReferences
in pkg/payload/render.go so that if loadImageReferences(releaseManifestsDir)
returns an error you propagate/return that error from the rendering path (or
explicitly gate and return an error when templates require .Images), rather than
only logging a warning — ensure renderConfig.Images is only left unset when that
is acceptable and otherwise call imagesFromImageRef(imageRef) and fail fast when
image refs cannot be loaded.

---

Nitpick comments:
In `@pkg/payload/render_test.go`:
- Around line 316-318: Add a new table-driven unit test for the helper function
imagesFromImageRef (e.g., Test_imagesFromImageRef) in pkg/payload/render_test.go
to cover nil imageRef, empty tags, non-DockerImage kinds, and duplicate tag
names; for each case construct an imageRef input, call imagesFromImageRef, and
assert the returned map matches expected keys/values (including handling
duplicates and empty tags), keeping Test_cvoManifests as-is but ensuring the new
test directly verifies imagesFromImageRef behavior and edge cases.

In `@pkg/payload/render.go`:
- Around line 195-207: imagesFromImageRef currently inserts tag.From.Name into
the images map without checking for emptiness; update imagesFromImageRef to skip
tags where tag.From == nil or tag.From.Kind != "DockerImage" or tag.From.Name is
empty (do not add entries with an empty URI), and optionally record/debug-log
skipped tags if desired; reference the function imagesFromImageRef and the
fields tag.From, tag.From.Kind and tag.From.Name when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: b86d3e40-c869-4f98-bc00-cff997f23625

📥 Commits

Reviewing files that changed from the base of the PR and between a1d4346 and 624ade1.

📒 Files selected for processing (4)
  • install/0000_50_cluster-update-console-plugin_50_deployment.yaml
  • pkg/payload/payload.go
  • pkg/payload/render.go
  • pkg/payload/render_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • install/0000_50_cluster-update-console-plugin_50_deployment.yaml

Comment thread pkg/payload/render.go
Comment on lines +42 to +47
imageRef, err := loadImageReferences(releaseManifestsDir)
if err != nil {
klog.Warningf("Failed to load image references for manifest rendering: %v", err)
} else {
renderConfig.Images = imagesFromImageRef(imageRef)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if manifests in install/ use the .Images template variable

rg -n '{{.*\.Images' install/ --type yaml

Repository: openshift/cluster-version-operator

Length of output: 166


🏁 Script executed:

rg -n '\{\{[^}]*\.Images' install/ --type yaml

Repository: openshift/cluster-version-operator

Length of output: 213


Fail hard (or gate) image ref rendering when templates depend on .Images.

install/0000_50_cluster-update-console-plugin_50_deployment.yaml uses image: '{{index .Images "cluster-update-console-plugin"}}'. In pkg/payload/render.go, if loadImageReferences(...) fails the code only logs a warning and does not populate renderConfig.Images, so this template can render image: "", producing invalid rendered manifests. Rendering should return an error (or explicitly detect required .Images usage and fail) when image references can’t be loaded.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/payload/render.go` around lines 42 - 47, The code currently silences
errors from loadImageReferences causing renderConfig.Images to be empty and
templates that reference .Images (e.g.,
install/0000_50_cluster-update-console-plugin_50_deployment.yaml) to render
invalid empty image fields; update the logic around loadImageReferences in
pkg/payload/render.go so that if loadImageReferences(releaseManifestsDir)
returns an error you propagate/return that error from the rendering path (or
explicitly gate and return an error when templates require .Images), rather than
only logging a warning — ensure renderConfig.Images is only left unset when that
is acceptable and otherwise call imagesFromImageRef(imageRef) and fail fast when
image refs cannot be loaded.

@openshift-ci

openshift-ci Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

@jhadvig: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-hypershift 624ade1 link true /test e2e-hypershift
ci/prow/e2e-agnostic-ovn-upgrade-into-change 624ade1 link true /test e2e-agnostic-ovn-upgrade-into-change
ci/prow/e2e-aws-ovn-techpreview 624ade1 link true /test e2e-aws-ovn-techpreview
ci/prow/e2e-agnostic-ovn-techpreview-serial-1of3 624ade1 link true /test e2e-agnostic-ovn-techpreview-serial-1of3
ci/prow/e2e-agnostic-ovn-techpreview-serial-3of3 624ade1 link true /test e2e-agnostic-ovn-techpreview-serial-3of3
ci/prow/e2e-agnostic-ovn 624ade1 link true /test e2e-agnostic-ovn
ci/prow/e2e-agnostic-ovn-techpreview-serial-2of3 624ade1 link true /test e2e-agnostic-ovn-techpreview-serial-2of3

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jhadvig

jhadvig commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants