[DEBUG] [DO NOT MERGE] pre-PR #71127

tbuskey · 2025-11-06T19:45:24Z

Pre PR of the whole mess to be redone properly. Think of it as a heads up.

This installs trustee onto the cluster and configures it.
Helped by cursor so lots of iterations.

sandboxed-containers-operator-trustee-install-commands.sh can run against an existing cluster to install/config trustee. It assumes sandboxed-containers-operator-env-cm-commands.sh in prow configured things for coco.

It should create a script that has the encrypted INITDATA for peer-pods-cm and will patch it. So it could be used on another cluster.

Changes sandboxed-containers-operator-create-prowjob-commands.sh to allow it to create openshift-sandboxed-containers-operator-devel__downstream-release.yaml and openshift-sandboxed-containers-operator-devel__downstream-candidate.yaml. We should use the script to update those files and not edit them directly.

sandboxed-containers-operator-create-prowjob-commands.sh needs more of the options for sandboxed-containers-operator-trustee-install-commands.sh added.

Readme files haven't been reviewed yet

openshift-ci · 2025-11-06T19:46:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tbuskey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~ci-operator/config/openshift/sandboxed-containers-operator/OWNERS~~ [tbuskey]
~~ci-operator/jobs/openshift/sandboxed-containers-operator/OWNERS~~ [tbuskey]
~~ci-operator/step-registry/sandboxed-containers-operator/OWNERS~~ [tbuskey]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tbuskey · 2025-11-06T19:58:42Z

/assign @beraldoleal
/assign @ldoktor
/assign @wainersm
/assign @vvoronko

tbuskey · 2025-11-06T20:00:09Z

/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release419-azure-ipi-peerpods

openshift-ci-robot · 2025-11-06T20:00:11Z

@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

tbuskey · 2025-11-10T12:51:41Z

/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release419-azure-ipi-peerpods

openshift-ci-robot · 2025-11-10T12:51:43Z

@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

tbuskey · 2025-11-10T13:10:00Z

/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-azure-ipi-peerpods

openshift-ci-robot · 2025-11-10T13:10:02Z

@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

tbuskey · 2025-11-10T13:47:51Z

/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release419-azure-ipi-peerpods

openshift-ci-robot · 2025-11-10T13:47:54Z

@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

tbuskey · 2025-11-10T18:35:12Z

/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-azure-ipi-peerpods

openshift-ci-robot · 2025-11-10T18:35:14Z

@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

ldoktor · 2025-11-11T09:28:43Z

...-containers-operator/create-prowjob/sandboxed-containers-operator-create-prowjob-commands.sh

-          subfolder="trustee-fbc/"
+      # Trustee Catalog Configuration - get latest or use provided
+      if [[ -z "${TRUSTEE_CATALOG_TAG:-}" ]]; then
+          TRUSTEE_CATALOG_TAG=$(get_latest_catalog_tag "https://quay.io/api/v1/repository/redhat-user-workloads/ose-osc-tenant/trustee-test-fbc")


IIUC the regexp changed from "^trustee-fbc-${OCP_VER}-on-push-.*-build-image-index$" to "^[0-9]+\.[0-9]+(\.[0-9]+)?-[0-9]+$", is this fine? (it'd be easier to review if you made incremental changes, this doesn't seem related to the current PR still it's hard to extract what is and what isn't)

trustee's catalog is changed to match the format that OSC uses. So, same regexp for both now.

ldoktor · 2025-11-11T10:06:53Z

ci-operator/step-registry/sandboxed-containers-operator/trustee-install/README.md

+
+## Troubleshooting
+
+If the step fails:


I like this section, we should add it to other steps as well :-)

ldoktor · 2025-11-11T10:28:41Z

...ontainers-operator/trustee-install/sandboxed-containers-operator-trustee-install-commands.sh

+    echo "Waiting for subscription '${subscription_name}' to be ready (state: AtLatestKnown)..."
+    local subscription_ready=false
+
+    for i in $(seq 1 "$max_attempts"); do


I'd prefer looping over a specific deadline than number of iterations, it seems more predictable

Me too.
There is a deadline though. It sleeps T sec every time so the deadline is T * $max_attempts

The problem with those deadlines is they are very rough and depend on whether things in between don't hang/fail/someone changes them, etc. (it should be at least sleep_seconds * max_attempts, but who knows what else adds and where someone forgets to put something, what interrupts the sleep and so on)

Btw I take this back as I recently learned the hard way that SECONDS is not monotonic. So this seems fine :-)

ldoktor · 2025-11-11T10:31:43Z

...ontainers-operator/trustee-install/sandboxed-containers-operator-trustee-install-commands.sh

+
+    if [ "$subscription_ready" = false ]; then
+        echo "Warning: Subscription '${subscription_name}' did not reach AtLatestKnown state after $((max_attempts * sleep_seconds)) seconds"
+        echo "Please check the subscription status with: oc get subscription ${subscription_name} -n ${namespace} -o yaml"


It'd be nice to print the oc get subscription as part of the script before exiting.

ldoktor · 2025-11-11T10:32:12Z

...ontainers-operator/trustee-install/sandboxed-containers-operator-trustee-install-commands.sh

+
+    if [ -z "$csv_name" ]; then
+        echo "Warning: Could not get installedCSV from subscription '${subscription_name}'"
+        echo "Please check the subscription status with: oc get subscription ${subscription_name} -n ${namespace} -o yaml"


The same here, if we know something might help, we should IMO print it and have it as part of the logs.

ldoktor · 2025-11-11T10:34:44Z

...ontainers-operator/trustee-install/sandboxed-containers-operator-trustee-install-commands.sh

+    for i in $(seq 1 "$max_attempts"); do
+        # Check if CSV (ClusterServiceVersion) is in Succeeded phase with InstallSucceeded reason
+        local csv_status=$(oc get csv "${csv_name}" -n "${namespace}" -o jsonpath='{.status.phase}{.status.reason}' 2>/dev/null || echo "")
+        if [[ "$csv_status" == "SucceededInstallSucceeded" ]]; then


Do we need to check the reason? What if it was installed before and we only updated to the latest?

Also if you decide to keep the check, would you mind splitting them at least by space?

This is the same check we have in the test automation.
We needed to check the reason and phase.
The jsonpath mashes them together.

Sure, but it's possible to put there a space in between, something like csv_status=$(oc get csv "${csv_name}" -n "${namespace}" -o jsonpath='{.status.phase} {.status.reason}' (haven't tested it now, but used it before, either like this or with extra quotation)

ldoktor · 2025-11-11T10:36:26Z

...ontainers-operator/trustee-install/sandboxed-containers-operator-trustee-install-commands.sh

+    if [ "$installed" = false ]; then
+        echo "Warning: CSV '${csv_name}' did not finish after $((max_attempts * sleep_seconds)) seconds"
+        echo "Please check the subscription status with: oc get subscription ${subscription_name} -n ${namespace}"
+        echo "And check the CSV status with: oc get csv ${csv_name} -n ${namespace} -o yaml"


Please attempt to get&print the useful debug info before exiting.

ldoktor · 2025-11-11T10:38:18Z

...ontainers-operator/trustee-install/sandboxed-containers-operator-trustee-install-commands.sh

+    # Create OperatorGroup if it doesn't exist
+    if ! resource_exists "operatorgroup" "trustee-operator-group" "${operator_namespace}"; then
+        echo "Creating OperatorGroup 'trustee-operator-group'..."
+        cat > trustee-operatorgroup.yaml << EOF


Are we likely to use this file later? If not than I'd suggest using cat | oc apply -f - << EOF instead

... everywhere

Good idea.
It can be useful for debugging, but we delete later & it should be in the cluster if we really needed it

ldoktor · 2025-11-11T10:41:30Z

...ontainers-operator/trustee-install/sandboxed-containers-operator-trustee-install-commands.sh

+        # Create the secret with public key
+        oc create secret generic kbs-auth-public-key --from-file=publicKey -n trustee-operator-system
+
+        echo "Created kbs-auth-public-key secret"


Do we need the key files later? It'd be good practice to delete them after creating the secret otherwise.

They do get deleted at the end, but deleting them ASAP is probably better

If it was production code, I'd insist on shred directly afterwards. Since it's CI we should at least use rm and ideally here

ldoktor · 2025-11-11T15:59:23Z

...ontainers-operator/trustee-install/sandboxed-containers-operator-trustee-install-commands.sh

+    default allow = false
+EOF
+        # Add allow rules if provided
+        if [ ${#allow_rules[@]} -gt 0 ]; then


I don't think the if is needed here, if the allow_rules is empty, it will simply iterate through nothing and proceed.

ldoktor · 2025-11-11T16:03:26Z

...ontainers-operator/trustee-install/sandboxed-containers-operator-trustee-install-commands.sh

+
+    echo "Creating attestation token secret..."
+    if resource_exists "secret" "attestation-token"; then
+        echo "Secret 'attestation-token' already exists"


There are many places where if exists; echo we're using existing.... The question is, will it work? What I mean is for some tokens we can get the existing secrets, but for some we rely on our generated values. If these exist in before, will the following steps be able to get to the trustee? (not talking about this particular steps, but about all the other where we accept existing value, my main concern is whether we should allow that or fail the workflow in such case)

I'll need to rethink that too

ldoktor · 2025-11-11T16:07:11Z

...ontainers-operator/trustee-install/sandboxed-containers-operator-trustee-install-commands.sh

+    "ghcr.io/confidential-containers/test-container-image-rs|sigstoreSigned|kbs:///default/cosign-public-key/test"
+
+if resource_exists "secret" "security-policy"; then
+    echo "Secret 'security-policy' already exists"


For example here, do we want to proceed when this exists? Do we want to use it or overwrite it with our expected secret? Will the testing work with any pre-existing setting?

ldoktor · 2025-11-11T16:08:44Z

...ontainers-operator/trustee-install/sandboxed-containers-operator-trustee-install-commands.sh

+    else
+        echo "Warning: kbs-https-certificate secret not found, using placeholder certificate"
+        KBS_CERT="-----BEGIN CERTIFICATE-----
+MIICertificatePlaceholder


How is this cert gona work? Is it usable?

ldoktor · 2025-11-11T16:11:17Z

...ontainers-operator/trustee-install/sandboxed-containers-operator-trustee-install-commands.sh

+fi
+
+# Convert initdata.toml to base64 for INITDATA
+INITDATA_STRING=$(cat initdata.toml | gzip| base64 -w0 )


Rather than magically named files created by called functions I'd prefer calling a function and redirect it to file which I'm going to use there:

create_initdata_config "${TRUSTEE_URL}" "${KBS_CERT}" > initdata.toml ... INITDATA_STRING=$(cat initdata.toml | gzip| base64 -w0 )

What do you think? (I'm afraid people might forget where this came from and this seems more explicit to me, anyway not a strong opinion)

I don't like magic names either. Its always useful to be able to define it

ldoktor · 2025-11-11T16:16:22Z

...ontainers-operator/trustee-install/sandboxed-containers-operator-trustee-install-commands.sh

+echo ""
+echo "Next steps:"
+echo "1. Trustee operator subscription is automatically handled by this script"
+echo "   - To use a different catalog: TRUSTEE_CATALOG_SOURCE_NAME='my-catalog' ./trustee_configure.sh"


Where does the trustee_configure.sh comes from? I'm missing some context here (probably obvious but I haven't deployed trustee yet)

I created the standalone script in a different directory and tested. Then moved & converted it into a prow step.

ldoktor · 2025-11-11T16:18:28Z

...ontainers-operator/trustee-install/sandboxed-containers-operator-trustee-install-commands.sh

+
+# trustee install
+# Installs and configures the Trustee Operator for Confidential Containers
+# Can run standalone or as part of sandboxed-containers-operator-pre chain


Could you please link the documentation based on which this file is generated? Either here or in the steps documentation so it's easier found and verified (also to find more info in case things malfunction). I think it'd be nice to include the actual version so people know what it's based on and eventually we might update it once in a while.

I pulled in the htmls from https://docs.redhat.com/en/documentation/openshift_sandboxed_containers/1.10/ (which is trustee 0.4.2) and had cursor use it to generate the script. Then I iterated w/ cursor to add things to open up for trustee.

Trustee 1.0 changes things & there is an internal docs PR with the changes. So I need to revisit the script.

I'm definitely going to break out the trustee bits from this PR!

ldoktor

Thanks, what a beast. Except of the obvious splitting the unrelated changes I left a few comments here and there. It'd be nice to include the documentation link to know what it's based on, then it'd be nice to print debug output in case of failures so we get at least something to start with and a few little things here and there.

I have not executed this yet so functionally-wise I can't tell much, hopefully tomorrow I'll get to that (is it ready, is it suppose to work? Is kubernetes enough or do I need full OCP?)

openshift-ci-robot · 2025-11-11T20:26:04Z

[REHEARSALNOTIFIER]
@tbuskey: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name	Repo	Type	Reason
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-aws-ipi-peerpods	N/A	periodic	Periodic changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release419-azure-ipi-peerpods	N/A	periodic	Periodic changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release419-aws-ipi-coco	N/A	periodic	Periodic changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release419-azure-ipi-coco	N/A	periodic	Periodic changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-peerpods	N/A	periodic	Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-azure-ipi-coco	N/A	periodic	Periodic changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-aws-ipi-coco	N/A	periodic	Ci-operator config changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release419-aws-ipi-peerpods	N/A	periodic	Periodic changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release-azure-ipi-coco	N/A	periodic	Ci-operator config changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-azure-ipi-kata	N/A	periodic	Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release-azure-ipi-kata	N/A	periodic	Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release-azure-ipi-peerpods	N/A	periodic	Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-azure-ipi-coco	N/A	periodic	Ci-operator config changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release-aws-ipi-coco	N/A	periodic	Ci-operator config changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release419-azure-ipi-kata	N/A	periodic	Periodic changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-aws-ipi-coco	N/A	periodic	Periodic changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate-azure-ipi-peerpods	N/A	periodic	Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-azure-ipi-kata	N/A	periodic	Periodic changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release-aws-ipi-peerpods	N/A	periodic	Registry content changed
periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-azure-ipi-peerpods	N/A	periodic	Periodic changed

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

openshift-ci · 2025-11-11T20:43:25Z

@tbuskey: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/rehearse/periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-azure-ipi-peerpods	`d2d4a73`	link	unknown	`/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-azure-ipi-peerpods`
ci/rehearse/periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release419-azure-ipi-peerpods	`d2d4a73`	link	unknown	`/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release419-azure-ipi-peerpods`
ci/prow/step-registry-shellcheck	`3c1a6df`	link	true	`/test step-registry-shellcheck`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

[DEBUG] [DO NOT MERGE] pre-PR

7ee1161

openshift-ci bot requested review from c3d and jensfr November 6, 2025 19:46

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 6, 2025

openshift-ci bot assigned beraldoleal, ldoktor, vvoronko and wainersm Nov 6, 2025

changes to pass shellcheck

0455db8

add rpm for 4.19 candidate

d2d4a73

ldoktor reviewed Nov 11, 2025

View reviewed changes

changes from internal docs

3c1a6df

tbuskey closed this Nov 17, 2025

[DEBUG] [DO NOT MERGE] pre-PR #71127

[DEBUG] [DO NOT MERGE] pre-PR #71127

Uh oh!

Conversation

tbuskey commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Nov 6, 2025

Uh oh!

tbuskey commented Nov 6, 2025

Uh oh!

tbuskey commented Nov 6, 2025

Uh oh!

openshift-ci-robot commented Nov 6, 2025

Uh oh!

tbuskey commented Nov 10, 2025

Uh oh!

openshift-ci-robot commented Nov 10, 2025

Uh oh!

tbuskey commented Nov 10, 2025

Uh oh!

openshift-ci-robot commented Nov 10, 2025

Uh oh!

tbuskey commented Nov 10, 2025

Uh oh!

openshift-ci-robot commented Nov 10, 2025

Uh oh!

tbuskey commented Nov 10, 2025

Uh oh!

openshift-ci-robot commented Nov 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

tbuskey commented Nov 6, 2025 •

edited

Loading