-
Notifications
You must be signed in to change notification settings - Fork 2k
[DEBUG] [DO NOT MERGE] pre-PR #71127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tbuskey The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/assign @beraldoleal |
|
/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release419-azure-ipi-peerpods |
|
@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release419-azure-ipi-peerpods |
|
@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-azure-ipi-peerpods |
|
@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-release419-azure-ipi-peerpods |
|
@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-sandboxed-containers-operator-devel-downstream-candidate419-azure-ipi-peerpods |
|
@tbuskey: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
| subfolder="trustee-fbc/" | ||
| # Trustee Catalog Configuration - get latest or use provided | ||
| if [[ -z "${TRUSTEE_CATALOG_TAG:-}" ]]; then | ||
| TRUSTEE_CATALOG_TAG=$(get_latest_catalog_tag "https://quay.io/api/v1/repository/redhat-user-workloads/ose-osc-tenant/trustee-test-fbc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the regexp changed from "^trustee-fbc-${OCP_VER}-on-push-.*-build-image-index$" to "^[0-9]+\.[0-9]+(\.[0-9]+)?-[0-9]+$", is this fine? (it'd be easier to review if you made incremental changes, this doesn't seem related to the current PR still it's hard to extract what is and what isn't)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trustee's catalog is changed to match the format that OSC uses. So, same regexp for both now.
|
|
||
| ## Troubleshooting | ||
|
|
||
| If the step fails: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this section, we should add it to other steps as well :-)
| echo "Waiting for subscription '${subscription_name}' to be ready (state: AtLatestKnown)..." | ||
| local subscription_ready=false | ||
|
|
||
| for i in $(seq 1 "$max_attempts"); do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer looping over a specific deadline than number of iterations, it seems more predictable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too.
There is a deadline though. It sleeps T sec every time so the deadline is T * $max_attempts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with those deadlines is they are very rough and depend on whether things in between don't hang/fail/someone changes them, etc. (it should be at least sleep_seconds * max_attempts, but who knows what else adds and where someone forgets to put something, what interrupts the sleep and so on)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw I take this back as I recently learned the hard way that SECONDS is not monotonic. So this seems fine :-)
|
|
||
| if [ "$subscription_ready" = false ]; then | ||
| echo "Warning: Subscription '${subscription_name}' did not reach AtLatestKnown state after $((max_attempts * sleep_seconds)) seconds" | ||
| echo "Please check the subscription status with: oc get subscription ${subscription_name} -n ${namespace} -o yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to print the oc get subscription as part of the script before exiting.
|
|
||
| if [ -z "$csv_name" ]; then | ||
| echo "Warning: Could not get installedCSV from subscription '${subscription_name}'" | ||
| echo "Please check the subscription status with: oc get subscription ${subscription_name} -n ${namespace} -o yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same here, if we know something might help, we should IMO print it and have it as part of the logs.
| for i in $(seq 1 "$max_attempts"); do | ||
| # Check if CSV (ClusterServiceVersion) is in Succeeded phase with InstallSucceeded reason | ||
| local csv_status=$(oc get csv "${csv_name}" -n "${namespace}" -o jsonpath='{.status.phase}{.status.reason}' 2>/dev/null || echo "") | ||
| if [[ "$csv_status" == "SucceededInstallSucceeded" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check the reason? What if it was installed before and we only updated to the latest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if you decide to keep the check, would you mind splitting them at least by space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same check we have in the test automation.
We needed to check the reason and phase.
The jsonpath mashes them together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but it's possible to put there a space in between, something like csv_status=$(oc get csv "${csv_name}" -n "${namespace}" -o jsonpath='{.status.phase} {.status.reason}' (haven't tested it now, but used it before, either like this or with extra quotation)
| if [ "$installed" = false ]; then | ||
| echo "Warning: CSV '${csv_name}' did not finish after $((max_attempts * sleep_seconds)) seconds" | ||
| echo "Please check the subscription status with: oc get subscription ${subscription_name} -n ${namespace}" | ||
| echo "And check the CSV status with: oc get csv ${csv_name} -n ${namespace} -o yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please attempt to get&print the useful debug info before exiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree!
| # Create OperatorGroup if it doesn't exist | ||
| if ! resource_exists "operatorgroup" "trustee-operator-group" "${operator_namespace}"; then | ||
| echo "Creating OperatorGroup 'trustee-operator-group'..." | ||
| cat > trustee-operatorgroup.yaml << EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we likely to use this file later? If not than I'd suggest using cat | oc apply -f - << EOF instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
It can be useful for debugging, but we delete later & it should be in the cluster if we really needed it
| # Create the secret with public key | ||
| oc create secret generic kbs-auth-public-key --from-file=publicKey -n trustee-operator-system | ||
|
|
||
| echo "Created kbs-auth-public-key secret" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the key files later? It'd be good practice to delete them after creating the secret otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do get deleted at the end, but deleting them ASAP is probably better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it was production code, I'd insist on shred directly afterwards. Since it's CI we should at least use rm and ideally here
| default allow = false | ||
| EOF | ||
| # Add allow rules if provided | ||
| if [ ${#allow_rules[@]} -gt 0 ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the if is needed here, if the allow_rules is empty, it will simply iterate through nothing and proceed.
|
|
||
| echo "Creating attestation token secret..." | ||
| if resource_exists "secret" "attestation-token"; then | ||
| echo "Secret 'attestation-token' already exists" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many places where if exists; echo we're using existing.... The question is, will it work? What I mean is for some tokens we can get the existing secrets, but for some we rely on our generated values. If these exist in before, will the following steps be able to get to the trustee? (not talking about this particular steps, but about all the other where we accept existing value, my main concern is whether we should allow that or fail the workflow in such case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need to rethink that too
| "ghcr.io/confidential-containers/test-container-image-rs|sigstoreSigned|kbs:///default/cosign-public-key/test" | ||
|
|
||
| if resource_exists "secret" "security-policy"; then | ||
| echo "Secret 'security-policy' already exists" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example here, do we want to proceed when this exists? Do we want to use it or overwrite it with our expected secret? Will the testing work with any pre-existing setting?
| else | ||
| echo "Warning: kbs-https-certificate secret not found, using placeholder certificate" | ||
| KBS_CERT="-----BEGIN CERTIFICATE----- | ||
| MIICertificatePlaceholder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this cert gona work? Is it usable?
| fi | ||
|
|
||
| # Convert initdata.toml to base64 for INITDATA | ||
| INITDATA_STRING=$(cat initdata.toml | gzip| base64 -w0 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than magically named files created by called functions I'd prefer calling a function and redirect it to file which I'm going to use there:
create_initdata_config "${TRUSTEE_URL}" "${KBS_CERT}" > initdata.toml
...
INITDATA_STRING=$(cat initdata.toml | gzip| base64 -w0 )
What do you think? (I'm afraid people might forget where this came from and this seems more explicit to me, anyway not a strong opinion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like magic names either. Its always useful to be able to define it
| echo "" | ||
| echo "Next steps:" | ||
| echo "1. Trustee operator subscription is automatically handled by this script" | ||
| echo " - To use a different catalog: TRUSTEE_CATALOG_SOURCE_NAME='my-catalog' ./trustee_configure.sh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the trustee_configure.sh comes from? I'm missing some context here (probably obvious but I haven't deployed trustee yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created the standalone script in a different directory and tested. Then moved & converted it into a prow step.
|
|
||
| # trustee install | ||
| # Installs and configures the Trustee Operator for Confidential Containers | ||
| # Can run standalone or as part of sandboxed-containers-operator-pre chain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please link the documentation based on which this file is generated? Either here or in the steps documentation so it's easier found and verified (also to find more info in case things malfunction). I think it'd be nice to include the actual version so people know what it's based on and eventually we might update it once in a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pulled in the htmls from https://docs.redhat.com/en/documentation/openshift_sandboxed_containers/1.10/ (which is trustee 0.4.2) and had cursor use it to generate the script. Then I iterated w/ cursor to add things to open up for trustee.
Trustee 1.0 changes things & there is an internal docs PR with the changes. So I need to revisit the script.
I'm definitely going to break out the trustee bits from this PR!
ldoktor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, what a beast. Except of the obvious splitting the unrelated changes I left a few comments here and there. It'd be nice to include the documentation link to know what it's based on, then it'd be nice to print debug output in case of failures so we get at least something to start with and a few little things here and there.
I have not executed this yet so functionally-wise I can't tell much, hopefully tomorrow I'll get to that (is it ready, is it suppose to work? Is kubernetes enough or do I need full OCP?)
|
[REHEARSALNOTIFIER]
Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
@tbuskey: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Pre PR of the whole mess to be redone properly. Think of it as a heads up.
This installs trustee onto the cluster and configures it.
Helped by cursor so lots of iterations.
sandboxed-containers-operator-trustee-install-commands.sh can run against an existing cluster to install/config trustee. It assumes sandboxed-containers-operator-env-cm-commands.sh in prow configured things for coco.
It should create a script that has the encrypted INITDATA for peer-pods-cm and will patch it. So it could be used on another cluster.
Changes sandboxed-containers-operator-create-prowjob-commands.sh to allow it to create
openshift-sandboxed-containers-operator-devel__downstream-release.yamlandopenshift-sandboxed-containers-operator-devel__downstream-candidate.yaml. We should use the script to update those files and not edit them directly.sandboxed-containers-operator-create-prowjob-commands.sh needs more of the options for sandboxed-containers-operator-trustee-install-commands.sh added.
Readme files haven't been reviewed yet