Skip to content

Add install-trustee-operator step for CoCo tests#80168

Open
tbuskey wants to merge 1 commit into
openshift:mainfrom
tbuskey:install-trustee-260605
Open

Add install-trustee-operator step for CoCo tests#80168
tbuskey wants to merge 1 commit into
openshift:mainfrom
tbuskey:install-trustee-260605

Conversation

@tbuskey
Copy link
Copy Markdown
Contributor

@tbuskey tbuskey commented Jun 5, 2026

Summary

This PR adds the install-trustee-operator step to enable automated Trustee operator installation in sandboxed-containers-operator CoCo tests.

Changes

  • Add install-trustee-operator step to step-registry
  • Enable TRUSTEE_INSTALL=true for all CoCo periodic jobs (aws, azure, aro)
  • Configure Trustee catalog source and installation parameters
  • Add step to sandboxed-containers-operator-pre chain

Details

The install-trustee-operator step:

  • Installs Trustee operator via OLM (CatalogSource → Subscription → InstallPlan → CSV → Deployment)
  • Deploys Trustee operands (KBS, Attestation Service, Routes)
  • Generates INITDATA for confidential containers (TLS certs, KBS URLs, image security policy)
  • Verifies connectivity using kbs-client pod
  • Updates osc-config ConfigMap with TRUSTEE_URL and INITDATA
  • Works with network restrictions (uses pre-rendered manifests with runtime substitution)

Testing

The step will be validated via periodic job execution on all CoCo test variants.

🤖 Generated with Claude Code

Summary by CodeRabbit

This PR adds automated Trustee operator installation support to the OpenShift CI infrastructure for sandboxed-containers-operator CoCo (confidential containers) tests across multiple deployment platforms.

What changed

Configuration updates (ci-operator/config/openshift/sandboxed-containers-operator/):

  • Six variant configuration files now inject Trustee-related environment variables into three CoCo test job definitions: azure-ipi-coco, aro-ipi-coco, and aws-ipi-coco
  • Each receives three new variables:
    • TRUSTEE_INSTALL: "true" — enables the installation workflow
    • TRUSTEE_CATALOG_SOURCE_IMAGE — points to the trustee-test-fbc operator image (1.1.0-1776506656)
    • TRUSTEE_CATALOG_SOURCE_NAME — set to trustee-catalog

New step registry (ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/):

  • Added a comprehensive shell-based CI step that orchestrates the complete Trustee operator lifecycle:
    • Operator installation via OLM: Creates CatalogSource (if needed), applies Namespace/OperatorGroup/Subscription, waits through install stages (CatalogSource ready → InstallPlan → CSV → Deployment)
    • Operand deployment of KBS (Key Broker Service) and Attestation Service with TLS-secured Routes
    • INITDATA generation for confidential containers: Gathers TLS certificates, KBS URLs, and container image security policies; encodes as gzip/base64
    • Configuration patching of the osc-config ConfigMap with Trustee URL and INITDATA
    • Connectivity validation: Deploys a temporary kbs-client pod to test KBS endpoint accessibility; logs diagnostics on failure
    • Version mapping logic to automatically select appropriate kbs-client image tags based on Trustee CSV version
  • Added step definition metadata including resource requirements, environment variable documentation, and ownership/review lists

Workflow integration (ci-operator/step-registry/sandboxed-containers-operator/pre/):

  • Inserted the new sandboxed-containers-operator-install-trustee-operator step into the sandboxed-containers-operator-pre chain, positioned before metadata recording

Impact

The changes enable automated, repeatable Trustee operator provisioning in CoCo test environments, handling manifest generation, credential setup, endpoint exposure, and validation without requiring external network access (using pre-rendered manifests with runtime substitution). This supports testing of confidential container workloads with full operator-managed service dependencies across AWS, Azure, and ARO deployments.

This PR adds the install-trustee-operator step to enable automated
Trustee operator installation in sandboxed-containers-operator CoCo tests.

Changes:
- Add install-trustee-operator step to step-registry
- Enable TRUSTEE_INSTALL for all CoCo periodic jobs (aws, azure, aro)
- Configure Trustee catalog source and installation parameters
- Add step to sandboxed-containers-operator-pre chain

The step:
- Installs Trustee operator via OLM (CatalogSource → Subscription → CSV)
- Deploys Trustee operands (KBS, Attestation Service)
- Generates INITDATA for confidential containers
- Verifies connectivity using kbs-client
- Updates osc-config ConfigMap with TRUSTEE_URL and INITDATA
- Works with network restrictions (uses pre-rendered manifests)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

Walkthrough

This PR introduces trustee operator installation and configuration for sandboxed containers testing. It adds a new CI step that orchestrates operator deployment, operand configuration, INITDATA generation, and KBS connectivity validation, enables it across five downstream-candidate test configurations by setting trustee environment variables for cloud-based COCO tests, and integrates the step into the pre-testing workflow.

Changes

Trustee Operator Installation and CI Integration

Layer / File(s) Summary
CI test configuration updates for trustee
ci-operator/config/openshift/sandboxed-containers-operator/openshift-sandboxed-containers-operator-devel__downstream-candidate*.yaml (5 files)
Five downstream-candidate configurations add TRUSTEE_CATALOG_SOURCE_IMAGE, TRUSTEE_CATALOG_SOURCE_NAME, and TRUSTEE_INSTALL: "true" environment variables to the azure-ipi-coco, aro-ipi-coco, and aws-ipi-coco test jobs, enabling trustee setup for all test variants.
Trustee operator installation step implementation
ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-ref.yaml, sandboxed-containers-operator-install-trustee-operator-ref.metadata.json, OWNERS
New CI step that conditionally installs the trustee operator and operands into a target namespace, derives cluster domain and TLS certificates, configures KBS access URLs, generates base64-encoded INITDATA (gzipped initdata.toml with token and policy configuration), patches osc-config ConfigMaps, validates KBS connectivity through a temporary kbs-client Pod, and exports trustee connectivity details to shared directories. Step includes metadata governance (approvers/reviewers) and comprehensive error diagnostics.
Pre-testing workflow chain integration
ci-operator/step-registry/sandboxed-containers-operator/pre/sandboxed-containers-operator-pre-chain.yaml
Adds the new trustee operator installation step to the pre-testing chain, positioned before record-metadata, making it part of the standard pre-test initialization.

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

lgtm, approved, rehearsals-ack, ok-to-test


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
No-Sensitive-Data-In-Logs ❌ Error The script logs sensitive KBS resource data: line 845 echoes retrieved resource value, line 861 cats the resource file contents to console, and line 866 combines both in debug output. Remove logging of KBS resource values (lines 845, 861, 866 in verify_trustee_connectivity). Log only success/failure status without echoing actual resource data that could contain tokens or secrets.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (13 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add install-trustee-operator step for CoCo tests' directly and clearly describes the main change—adding a new CI/CD step for Trustee operator installation in CoCo test workflows.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR contains no Ginkgo test definitions - only CI operator configs, bash scripts, and metadata files. Check for stable test names is not applicable.
Test Structure And Quality ✅ Passed PR contains no Ginkgo test code; check is not applicable. Changes are CI automation YAML configs and bash installation scripts, not Go test suites.
Microshift Test Compatibility ✅ Passed PR adds no new Ginkgo e2e tests—only CI infrastructure (YAML configs and installation script), so MicroShift test compatibility check does not apply.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR does not add any Ginkgo e2e tests; it only adds CI/operator infrastructure (YAML configs, bash scripts, OWNERS files). The custom check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed No topology-breaking scheduling constraints found. PR adds trustee operator without affinity, nodeSelectors, or topology assumptions that would break SNO, Two-Node, or HyperShift topologies.
Ote Binary Stdout Contract ✅ Passed PR contains no Go test binaries or OTE integration—only YAML CI configs, a bash script, and metadata files. OTE Binary Stdout Contract check is not applicable.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR adds CI infrastructure (YAML, bash, registry files), not Ginkgo tests. The check only applies to new Ginkgo e2e tests and is not applicable here.
No-Weak-Crypto ✅ Passed No weak crypto patterns detected. Code uses standard openssl for TLS retrieval, base64/gzip for encoding, SHA256 in config only, with no custom crypto or secret comparisons.
Container-Privileges ✅ Passed No privileged container settings found. The kbs-client pod explicitly disables privilege escalation, runs non-root, drops all capabilities, and uses RuntimeDefault seccomp.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 5, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tbuskey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 5, 2026
@openshift-ci openshift-ci Bot requested review from bpradipt and vvoronko June 5, 2026 18:02
@tbuskey
Copy link
Copy Markdown
Contributor Author

tbuskey commented Jun 5, 2026

/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-coco

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@tbuskey
Copy link
Copy Markdown
Contributor Author

tbuskey commented Jun 5, 2026

/pj-rehearse list

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@tbuskey
Copy link
Copy Markdown
Contributor Author

tbuskey commented Jun 5, 2026

/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-coco

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-ref.yaml (1)

32-41: 💤 Low value

Clarify deprecated environment variables.

TRUSTEE_IMAGE_REPO and TRUSTEE_IMAGE_TAG are marked as deprecated but still have default values. If these are truly deprecated and superseded by TRUSTEE_CATALOG_SOURCE_IMAGE, consider removing their defaults or documenting why they're retained (e.g., backward compatibility).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-ref.yaml`
around lines 32 - 41, TRUSTEE_IMAGE_REPO and TRUSTEE_IMAGE_TAG are marked
deprecated but still set with defaults; either remove their default values or
add a clear comment explaining why defaults are retained for backward
compatibility and when TRUSTEE_CATALOG_SOURCE_IMAGE supersedes them. Update the
YAML entries for TRUSTEE_IMAGE_REPO and TRUSTEE_IMAGE_TAG to either drop the
default fields (leaving them unset) or expand the documentation block to state
explicitly that they are deprecated, retained only for backward compatibility,
and will be ignored when TRUSTEE_CATALOG_SOURCE_IMAGE is provided (include the
exact variable names TRUSTEE_IMAGE_REPO, TRUSTEE_IMAGE_TAG, and
TRUSTEE_CATALOG_SOURCE_IMAGE in the doc text).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-commands.sh`:
- Around line 704-717: The pod spec for the kbs-client probe should disable
mounting the service account token; update the Pod manifest where the container
named "kbs-client" is defined to add automountServiceAccountToken: false at the
Pod spec level (alongside restartPolicy and containers) so the probe pod does
not receive a service account token; ensure you modify the Pod spec containing
the "kbs-client" container rather than the container-level securityContext.
- Around line 886-891: The guard that decides whether to copy "${log_file}" uses
"${ARTIFACT_DIR}" directly and can fail under "set -u"; update the if condition
around the copy (the block referencing ARTIFACT_DIR, SHARED_DIR, and log_file)
to use the same safe default expansion as the log_file assignment, e.g. replace
checks of "${ARTIFACT_DIR}" with "${ARTIFACT_DIR:-}" (both in the -n test and
the comparison) so the script won't abort if ARTIFACT_DIR is unset.
- Around line 318-425: The current per-stage 60s polling loops (CatalogSource
check using TRUSTEE_CATALOG_SOURCE_NAME/ TRU STEE_CATALOG_SOURCE_IMAGE,
Subscription -> installplan lookup, InstallPlan phase check, CSV Succeeded check
exporting TRUSTEE_CSV_NAME, and deployment Available check using
control-plane=controller-manager in TRUSTEE_NAMESPACE) are too short; replace
these repetitive 12-iteration sleep loops with oc wait commands (or extend to
multi-minute timeouts like 5m–10m) that target the specific resources and
conditions (e.g., oc wait --for=condition=READY catalogsource/<name> -n
openshift-marketplace --timeout=5m, oc wait
--for=jsonpath='{.status.installplan.name}' subscription/trustee-operator -n
${TRUSTEE_NAMESPACE} or loop with a 5m timeout for .status.installplan.name, oc
wait installplan/<name> --for=condition=Complete --timeout=5m, oc wait csv
--for=condition=Succeeded -n ${TRUSTEE_NAMESPACE} --timeout=5m and oc wait
deployment -l control-plane=controller-manager -n ${TRUSTEE_NAMESPACE}
--for=condition=Available --timeout=5m) so OLM has multi-minute budget and
transient delays stop failing the job.

In
`@ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-ref.yaml`:
- Around line 46-51: Update the ref YAML docs to accurately reflect runtime
behavior: clarify that automatic discovery in get_kbs_client_tag() (which may
invoke "skopeo list-tags docker://quay.io/confidential-containers/kbs-client"
when KBS_CLIENT_TAG is empty and no trustee→kbs-client mapping applies) can
require network access or access via the cluster registry proxy even when
restrict_network_access is true, and state that registry tag discovery may fail
without that access; also align the documented fallback tag with the script by
changing the documented fallback from v0.19.0 to v0.17.0 (or alternatively
change the script’s final-resort fallback to v0.19.0) so the YAML’s fallback
matches the actual value used by get_kbs_client_tag().

---

Nitpick comments:
In
`@ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-ref.yaml`:
- Around line 32-41: TRUSTEE_IMAGE_REPO and TRUSTEE_IMAGE_TAG are marked
deprecated but still set with defaults; either remove their default values or
add a clear comment explaining why defaults are retained for backward
compatibility and when TRUSTEE_CATALOG_SOURCE_IMAGE supersedes them. Update the
YAML entries for TRUSTEE_IMAGE_REPO and TRUSTEE_IMAGE_TAG to either drop the
default fields (leaving them unset) or expand the documentation block to state
explicitly that they are deprecated, retained only for backward compatibility,
and will be ignored when TRUSTEE_CATALOG_SOURCE_IMAGE is provided (include the
exact variable names TRUSTEE_IMAGE_REPO, TRUSTEE_IMAGE_TAG, and
TRUSTEE_CATALOG_SOURCE_IMAGE in the doc text).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 3f79f954-a0a1-4119-ad55-3c5330008f6d

📥 Commits

Reviewing files that changed from the base of the PR and between 5adb631 and 08a0b06.

📒 Files selected for processing (11)
  • ci-operator/config/openshift/sandboxed-containers-operator/openshift-sandboxed-containers-operator-devel__downstream-candidate.yaml
  • ci-operator/config/openshift/sandboxed-containers-operator/openshift-sandboxed-containers-operator-devel__downstream-candidate417.yaml
  • ci-operator/config/openshift/sandboxed-containers-operator/openshift-sandboxed-containers-operator-devel__downstream-candidate418.yaml
  • ci-operator/config/openshift/sandboxed-containers-operator/openshift-sandboxed-containers-operator-devel__downstream-candidate419.yaml
  • ci-operator/config/openshift/sandboxed-containers-operator/openshift-sandboxed-containers-operator-devel__downstream-candidate420.yaml
  • ci-operator/config/openshift/sandboxed-containers-operator/openshift-sandboxed-containers-operator-devel__downstream-candidate421.yaml
  • ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/OWNERS
  • ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-commands.sh
  • ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-ref.metadata.json
  • ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-ref.yaml
  • ci-operator/step-registry/sandboxed-containers-operator/pre/sandboxed-containers-operator-pre-chain.yaml

Comment on lines +318 to +425
# Stage 1: Wait for CatalogSource to be READY (60s)
# Skip if using existing catalog (no TRUSTEE_CATALOG_SOURCE_IMAGE provided)
if [[ -n "${TRUSTEE_CATALOG_SOURCE_IMAGE}" ]]; then
echo ">>> Waiting for CatalogSource ${TRUSTEE_CATALOG_SOURCE_NAME} to be READY..."
local catalog_ready=false
for i in {1..12}; do
local state
state=$(oc get catalogsource -n openshift-marketplace "${TRUSTEE_CATALOG_SOURCE_NAME}" -o jsonpath='{.status.connectionState.lastObservedState}' 2>/dev/null || echo "")
if [[ "${state}" == "READY" ]]; then
echo ">>> CatalogSource ${TRUSTEE_CATALOG_SOURCE_NAME} is READY"
catalog_ready=true
break
fi
[[ ${i} -lt 12 ]] && sleep 5
done

if [[ "${catalog_ready}" != "true" ]]; then
echo ">>> ERROR: CatalogSource ${TRUSTEE_CATALOG_SOURCE_NAME} not READY after 60s"
oc get catalogsource -n openshift-marketplace "${TRUSTEE_CATALOG_SOURCE_NAME}" -o yaml || true
oc get pods -n openshift-marketplace -l olm.catalogSource="${TRUSTEE_CATALOG_SOURCE_NAME}" || true
oc describe pods -n openshift-marketplace -l olm.catalogSource="${TRUSTEE_CATALOG_SOURCE_NAME}" | tail -50 || true
return 1
fi
else
echo ">>> Using existing CatalogSource ${TRUSTEE_CATALOG_SOURCE_NAME}, skipping readiness check"
fi

# Stage 2: Wait for Subscription to reference an InstallPlan (60s)
echo ">>> Waiting for Subscription to reference InstallPlan..."
local installplan_ref=""
for i in {1..12}; do
installplan_ref=$(oc get subscription -n "${TRUSTEE_NAMESPACE}" trustee-operator -o jsonpath='{.status.installplan.name}' 2>/dev/null || echo "")
if [[ -n "${installplan_ref}" ]]; then
echo ">>> Subscription references InstallPlan: ${installplan_ref}"
break
fi
[[ ${i} -lt 12 ]] && sleep 5
done

if [[ -z "${installplan_ref}" ]]; then
echo ">>> ERROR: Subscription has no InstallPlan reference after 60s"
oc get subscription -n "${TRUSTEE_NAMESPACE}" trustee-operator -o yaml || true
return 1
fi

# Stage 3: Wait for InstallPlan to be Complete (60s)
echo ">>> Waiting for InstallPlan ${installplan_ref} to be Complete..."
local installplan_complete=false
for i in {1..12}; do
local phase
phase=$(oc get installplan -n "${TRUSTEE_NAMESPACE}" "${installplan_ref}" -o jsonpath='{.status.phase}' 2>/dev/null || echo "")
if [[ "${phase}" == "Complete" ]]; then
echo ">>> InstallPlan is Complete"
installplan_complete=true
break
fi
[[ ${i} -lt 12 ]] && sleep 5
done

if [[ "${installplan_complete}" != "true" ]]; then
echo ">>> ERROR: InstallPlan not Complete after 60s"
oc get installplan -n "${TRUSTEE_NAMESPACE}" "${installplan_ref}" -o yaml || true
return 1
fi

# Stage 4: Wait for CSV to be Succeeded (60s)
echo ">>> Waiting for CSV to be Succeeded..."
local csv_succeeded=false
local csv_name=""
for i in {1..12}; do
local csv_phase
csv_phase=$(oc get csv -n "${TRUSTEE_NAMESPACE}" -o jsonpath='{.items[0].status.phase}' 2>/dev/null || echo "")
if [[ "${csv_phase}" == "Succeeded" ]]; then
csv_name=$(oc get csv -n "${TRUSTEE_NAMESPACE}" -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || echo "")
echo ">>> CSV ${csv_name} is Succeeded"
csv_succeeded=true
break
fi
[[ ${i} -lt 12 ]] && sleep 5
done

if [[ "${csv_succeeded}" != "true" ]]; then
echo ">>> ERROR: CSV not Succeeded after 60s"
oc get csv -n "${TRUSTEE_NAMESPACE}" -o yaml || true
return 1
fi

# Export CSV name for kbs-client version mapping
export TRUSTEE_CSV_NAME="${csv_name}"

# Stage 5: Wait for Deployment to be Available (60s)
echo ">>> Waiting for operator deployment to be Available..."
local deployment_ready=false
for i in {1..12}; do
if oc get deployment -n "${TRUSTEE_NAMESPACE}" -l control-plane=controller-manager -o jsonpath='{.items[0].status.conditions[?(@.type=="Available")].status}' 2>/dev/null | grep -q "True"; then
echo ">>> Operator deployment is Available"
deployment_ready=true
break
fi
[[ ${i} -lt 12 ]] && sleep 5
done

if [[ "${deployment_ready}" != "true" ]]; then
echo ">>> ERROR: Operator deployment not Available after 60s"
oc get deployment -n "${TRUSTEE_NAMESPACE}" || true
oc get pods -n "${TRUSTEE_NAMESPACE}" || true
oc describe pods -n "${TRUSTEE_NAMESPACE}" -l control-plane=controller-manager || true
return 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Increase the OLM wait budget before this starts flaking periodics.

Lines 323-425 give CatalogSource readiness, InstallPlan creation, CSV success, and controller rollout only 60s each. That is tight for OLM on busy CI clusters, especially now that this step is inserted into the shared CoCo pre-chain and enabled in downstream candidate jobs. Please switch these stages to a multi-minute timeout or oc wait on the relevant conditions so transient install latency does not fail the job.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-commands.sh`
around lines 318 - 425, The current per-stage 60s polling loops (CatalogSource
check using TRUSTEE_CATALOG_SOURCE_NAME/ TRU STEE_CATALOG_SOURCE_IMAGE,
Subscription -> installplan lookup, InstallPlan phase check, CSV Succeeded check
exporting TRUSTEE_CSV_NAME, and deployment Available check using
control-plane=controller-manager in TRUSTEE_NAMESPACE) are too short; replace
these repetitive 12-iteration sleep loops with oc wait commands (or extend to
multi-minute timeouts like 5m–10m) that target the specific resources and
conditions (e.g., oc wait --for=condition=READY catalogsource/<name> -n
openshift-marketplace --timeout=5m, oc wait
--for=jsonpath='{.status.installplan.name}' subscription/trustee-operator -n
${TRUSTEE_NAMESPACE} or loop with a 5m timeout for .status.installplan.name, oc
wait installplan/<name> --for=condition=Complete --timeout=5m, oc wait csv
--for=condition=Succeeded -n ${TRUSTEE_NAMESPACE} --timeout=5m and oc wait
deployment -l control-plane=controller-manager -n ${TRUSTEE_NAMESPACE}
--for=condition=Available --timeout=5m) so OLM has multi-minute budget and
transient delays stop failing the job.

Comment on lines +704 to +717
spec:
containers:
- name: kbs-client
image: KBS_CLIENT_IMAGE_PLACEHOLDER
command: ["sleep", "infinity"]
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
capabilities:
drop:
- ALL
restartPolicy: Never
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Disable service-account token mounting on the kbs-client probe pod.

This pod is only used as an exec target for kbs-client; it never talks to the Kubernetes API itself. Leaving the default token mounted needlessly exposes credentials inside the trustee namespace.

Suggested hardening
 spec:
+  automountServiceAccountToken: false
   containers:
   - name: kbs-client
     image: KBS_CLIENT_IMAGE_PLACEHOLDER
     command: ["sleep", "infinity"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-commands.sh`
around lines 704 - 717, The pod spec for the kbs-client probe should disable
mounting the service account token; update the Pod manifest where the container
named "kbs-client" is defined to add automountServiceAccountToken: false at the
Pod spec level (alongside restartPolicy and containers) so the probe pod does
not receive a service account token; ensure you modify the Pod spec containing
the "kbs-client" container rather than the container-level securityContext.

Comment on lines +886 to +891
local log_file="${ARTIFACT_DIR:-${SHARED_DIR}}/kbs-attestation-logs.txt"
# Strip ANSI color codes from logs for cleaner output
oc logs "${kbs_pod}" -n "${TRUSTEE_NAMESPACE}" --since=5m 2>&1 | sed 's/\x1b\[[0-9;]*m//g' > "${log_file}" || true

if [[ -n "${ARTIFACT_DIR}" && "${ARTIFACT_DIR}" != "${SHARED_DIR}" ]]; then
cp "${log_file}" "${SHARED_DIR}/kbs-attestation-logs.txt" 2>/dev/null || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use the same default expansion for ARTIFACT_DIR in the copy guard.

Line 886 already handles an unset ARTIFACT_DIR, but Line 890 dereferences ${ARTIFACT_DIR} directly under set -u. If that variable is missing, the step aborts after the log capture path was supposed to fall back to SHARED_DIR.

Suggested fix
-    if [[ -n "${ARTIFACT_DIR}" && "${ARTIFACT_DIR}" != "${SHARED_DIR}" ]]; then
+    if [[ -n "${ARTIFACT_DIR:-}" && "${ARTIFACT_DIR:-}" != "${SHARED_DIR}" ]]; then
       cp "${log_file}" "${SHARED_DIR}/kbs-attestation-logs.txt" 2>/dev/null || true
     fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-commands.sh`
around lines 886 - 891, The guard that decides whether to copy "${log_file}"
uses "${ARTIFACT_DIR}" directly and can fail under "set -u"; update the if
condition around the copy (the block referencing ARTIFACT_DIR, SHARED_DIR, and
log_file) to use the same safe default expansion as the log_file assignment,
e.g. replace checks of "${ARTIFACT_DIR}" with "${ARTIFACT_DIR:-}" (both in the
-n test and the comparison) so the script won't abort if ARTIFACT_DIR is unset.

Comment on lines +46 to +51
- name: KBS_CLIENT_TAG
default: ""
documentation: |-
The kbs-client image tag to use for connectivity testing. If empty, automatically
discovers the latest v.X.Y.Z tag using skopeo. Override to pin a specific version
(e.g., "v0.19.0"). Fallback is v0.19.0 if skopeo lookup fails.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check if KBS_CLIENT_TAG auto-discovery could cause reproducibility issues

# Search for skopeo usage in the commands script
echo "=== Searching for skopeo usage in commands script ==="
fd -e sh 'sandboxed-containers-operator-install-trustee-operator-commands.sh' --exec cat {}

echo ""
echo "=== Checking for network restriction handling ==="
# Look for how the script handles network-restricted environments
fd -e sh 'sandboxed-containers-operator-install-trustee-operator-commands.sh' --exec grep -A5 -B5 'skopeo\|KBS_CLIENT_TAG\|restrict_network' {}

Repository: openshift/release

Length of output: 31765


🏁 Script executed:

#!/bin/bash
set -euo pipefail

repo_root="."
file_ref="ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-ref.yaml"
file_cmd="ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-commands.sh"

echo "==> Ref YAML (KBS_CLIENT_TAG docs + restrict_network_access claim)"
sed -n '1,120p' "$file_ref" | nl -ba | sed -n '1,120p'
echo "----"
rg -n "KBS_CLIENT_TAG|restrict_network_access|NO NETWORK ACCESS REQUIRED|skopeo" "$file_ref" || true
echo "---- context around KBS_CLIENT_TAG"
rg -n "name: KBS_CLIENT_TAG" "$file_ref" -n || true
ln=$(rg -n "name: KBS_CLIENT_TAG" "$file_ref" | head -n1 | cut -d: -f1)
start=$((ln-20)); end=$((ln+60))
sed -n "${start},${end}p" "$file_ref" | nl -ba

echo
echo "==> Commands SH (get_kbs_client_tag + skopeo usage + fallback)"
rg -n "get_kbs_client_tag|list-tags|docker://|KBS_CLIENT_TAG|TRUSTEE_CSV_NAME|v0\.|skopeo" "$file_cmd" || true

echo "---- context around get_kbs_client_tag"
ln=$(rg -n "get_kbs_client_tag" "$file_cmd" | head -n1 | cut -d: -f1)
start=$((ln-80)); end=$((ln+180))
sed -n "${start},${end}p" "$file_cmd" | nl -ba

echo "---- specific fallback string occurrences"
rg -n "v0\.19\.0|v0\.17\.0" "$file_cmd" || true

Repository: openshift/release

Length of output: 189


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file_ref="ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-ref.yaml"
file_cmd="ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-commands.sh"

echo "==> Basic sanity (files exist?)"
ls -l "$file_ref" "$file_cmd"

echo
echo "==> Ref YAML: lines matching KBS_CLIENT_TAG / restrict_network_access / NO NETWORK ACCESS"
rg -n "KBS_CLIENT_TAG|restrict_network_access|NO NETWORK ACCESS REQUIRED|skopeo" "$file_ref" || true

echo
echo "==> Ref YAML: surrounding KBS_CLIENT_TAG block (approx)"
ln=$(rg -n "name: KBS_CLIENT_TAG" "$file_ref" | head -n1 | cut -d: -f1)
start=$((ln-25)); end=$((ln+70))
sed -n "${start},${end}p" "$file_ref" | awk -v s="$start" '{printf "%5d  %s\n", s+NR-1, $0}'

echo
echo "==> Commands SH: lines matching skopeo / get_kbs_client_tag / list-tags / fallback versions"
rg -n "get_kbs_client_tag|list-tags|docker://|KBS_CLIENT_TAG|TRUSTEE_CSV_NAME|v0\.19\.0|v0\.17\.0|restrict_network_access|NO NETWORK ACCESS" "$file_cmd" || true

echo
echo "==> Commands SH: surrounding get_kbs_client_tag function"
ln=$(rg -n "get_kbs_client_tag" "$file_cmd" | head -n1 | cut -d: -f1)
start=$((ln-60)); end=$((ln+220))
sed -n "${start},${end}p" "$file_cmd" | awk -v s="$start" '{printf "%5d  %s\n", s+NR-1, $0}'

echo
echo "==> Commands SH: explicit fallback behavior for latest_tag empty / KBS_CLIENT_TAG unset"
rg -n "if .*latest_tag.*empty|latest_tag.*=\"\"|fallback|v0\.17\.0|v0\.19\.0" "$file_cmd" || true

Repository: openshift/release

Length of output: 15812


Fix misleading “NO NETWORK ACCESS REQUIRED” docs and correct KBS_CLIENT_TAG fallback

  • sandboxed-containers-operator-install-trustee-operator-ref.yaml claims “NO NETWORK ACCESS REQUIRED … Works with restrict_network_access: true”, but get_kbs_client_tag() in sandboxed-containers-operator-install-trustee-operator-commands.sh can run skopeo list-tags docker://quay.io/confidential-containers/kbs-client to auto-discover the latest semver tag when KBS_CLIENT_TAG is empty and no trustee→kbs-client mapping applies; ensure this works under restrict_network_access (e.g., via the cluster registry proxy) or adjust the documentation to reflect that registry tag discovery may need network access.
  • The ref YAML documents a v0.19.0 fallback “if skopeo lookup fails”, but the script’s actual last-resort fallback is v0.17.0; align the documented fallback with the implementation (or change the implementation).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/sandboxed-containers-operator/install-trustee-operator/sandboxed-containers-operator-install-trustee-operator-ref.yaml`
around lines 46 - 51, Update the ref YAML docs to accurately reflect runtime
behavior: clarify that automatic discovery in get_kbs_client_tag() (which may
invoke "skopeo list-tags docker://quay.io/confidential-containers/kbs-client"
when KBS_CLIENT_TAG is empty and no trustee→kbs-client mapping applies) can
require network access or access via the cluster registry proxy even when
restrict_network_access is true, and state that registry tag discovery may fail
without that access; also align the documented fallback tag with the script by
changing the documented fallback from v0.19.0 to v0.17.0 (or alternatively
change the script’s final-resort fallback to v0.19.0) so the YAML’s fallback
matches the actual value used by get_kbs_client_tag().

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@tbuskey: job(s): periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-coco either don't exist or were not found to be affected, and cannot be rehearsed

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 5, 2026

@tbuskey: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@tbuskey: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-azure-ipi-kata N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate421-azure-ipi-kata N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release-azure-ipi-kata N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate420-azure-ipi-kata N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate421-azure-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release-aws-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-azure-ipi-kata N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release-aws-ipi-coco N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release-aro-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate420-azure-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-azure-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate418-aro-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release-aro-ipi-coco N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate418-azure-ipi-kata N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate418-aws-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-aro-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-aws-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release-azure-ipi-coco N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate420-aro-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate420-aws-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate417-azure-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate421-aws-ipi-peerpods N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate417-azure-ipi-kata N/A periodic Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release-azure-ipi-peerpods N/A periodic Registry content changed

A total of 31 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs.

A full list of affected jobs can be found here
The following jobs are not rehearsable without the network-access-rehearsals-ok, and approved labels present on this PR. This is due to the restrict_network_access field being set to false. The network-access-rehearsals-ok label can be added by any openshift org member other than the PR's author by commenting: /pj-rehearse network-access-allowed:

Test name
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate420-aws-ipi-coco
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate420-azure-ipi-coco
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate420-aro-ipi-coco
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate417-azure-ipi-coco
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate417-aro-ipi-coco
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate417-aws-ipi-coco
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aro-ipi-coco
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-coco
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-azure-ipi-coco
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-aro-ipi-coco
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-aws-ipi-coco
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-azure-ipi-coco
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate421-aws-ipi-coco
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate421-azure-ipi-coco
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate421-aro-ipi-coco
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate418-azure-ipi-coco
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate418-aro-ipi-coco
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate418-aws-ipi-coco
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@tbuskey
Copy link
Copy Markdown
Contributor Author

tbuskey commented Jun 5, 2026

/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-coco

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@tbuskey: job(s): periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-coco either don't exist or were not found to be affected, and cannot be rehearsed

1 similar comment
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@tbuskey: job(s): periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-coco either don't exist or were not found to be affected, and cannot be rehearsed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant