Skip to content

AGENT-1533: Increase iso-no-registry time by moving out of generic workflow#80124

Open
bfournie wants to merge 1 commit into
openshift:mainfrom
bfournie:agent-installer-increase-timeout
Open

AGENT-1533: Increase iso-no-registry time by moving out of generic workflow#80124
bfournie wants to merge 1 commit into
openshift:mainfrom
bfournie:agent-installer-increase-timeout

Conversation

@bfournie
Copy link
Copy Markdown
Contributor

@bfournie bfournie commented Jun 4, 2026

Split agent_create_cluster out of baremetalds-devscripts-setup into a new dedicated step (agent-e2e-iso-no-registry-create-cluster) with a 3h timeout, so ISO_NO_REGISTRY jobs are no longer killed by the 2h default step timeout before cluster installation completes.

Summary by CodeRabbit

This PR updates OpenShift CI configuration in the openshift/release repository to prevent ISO_NO_REGISTRY agent-installer jobs from being killed by the default 2-hour step timeout by moving cluster creation into a separate step with a longer timeout.

What changed and practical impact:

  • New dedicated create-cluster step:

    • Adds a ci-operator step-registry ref (agent-e2e-iso-no-registry-create-cluster) and a commands script that runs the agent create-cluster flow over SSH, captures kubeconfig and kubeadmin credentials, injects a proxy URL when needed, gathers agent logs/artifacts, redacts secrets from outputs, produces install-status and console.url artifacts, and writes dev-scripts variables for downstream steps.
    • The ref sets a 3h timeout (with a 10m grace period) and resource requests, and documents DEVSCRIPTS_CONFIG. This gives ISO_NO_REGISTRY installs more time to complete without being killed by the default step timeout.
  • Workflow and preparation split:

    • The generic agent iso-no-registry workflow now invokes the new create-cluster ref in its pre section (agent-e2e-iso-no-registry-create-cluster) alongside the existing pre steps.
    • DEVSCRIPTS_TARGET in the workflow was expanded into an explicit sequence of dev-scripts targets (agent_requirements, requirements configure, agent_build_installer, agent_prepare_release, agent_configure) to separate preparation tasks from the time-consuming cluster creation.
  • Dev-scripts setup adjustments:

    • The baremetalds-devscripts-setup commands now gate post-install actions (copying /tmp/ds-vars.conf, publishing console URL, log capture) on the presence of the remote kubeconfig to avoid duplicate or premature post-install steps for jobs using the separate create-cluster step (e.g., ISO_NO_REGISTRY).
  • Release job timeout edits:

    • Removed explicit 4h per-job timeouts from two periodic nightly job entries (e2e-agent-ha-dualstack-iso-no-registry-techpreview and e2e-agent-ha5-dualstack-iso-no-registry-techpreview), relying on step-level timeouts instead.
  • OWNERS updates:

    • Added approvers to iso-no-registry and create-cluster OWNERS (andfasano, bfournie, pamoedom, pawanpinjarkar, rwsu, zaneb).

Why this matters:

  • Extracting agent_create_cluster into its own step with a 3-hour timeout prevents ISO_NO_REGISTRY CI runs from being terminated by the default 2-hour step timeout, while preserving artifact collection, log capture, and secret redaction for debugging and reporting.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 4, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Jun 4, 2026

@bfournie: This pull request references AGENT-1533 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "5.0.0" version, but no target version was set.

Details

In response to this:

… workflow

Split agent_create_cluster out of baremetalds-devscripts-setup into a new dedicated step (agent-e2e-iso-no-registry-create-cluster) with a 3h timeout, so ISO_NO_REGISTRY jobs are no longer killed by the 2h default step timeout before cluster installation completes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@bfournie bfournie changed the title AGENT-1533: Increase iso-no-registry timeout by moving out of generic… AGENT-1533: Increase iso-no-registry time by moving out of generic workflow Jun 4, 2026
@openshift-ci openshift-ci Bot requested review from neisw and stbenjam June 4, 2026 20:00
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a279755a-e002-4e2e-acac-920fb69e84cc

📥 Commits

Reviewing files that changed from the base of the PR and between f419ff9 and 2d7df0a.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/openshift/release/openshift-release-main-periodics.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (8)
  • ci-operator/config/openshift/release/openshift-release-main__nightly-5.0.yaml
  • ci-operator/step-registry/agent/e2e/generic/conformance/iso-no-registry/agent-e2e-generic-conformance-iso-no-registry-workflow.yaml
  • ci-operator/step-registry/agent/e2e/iso-no-registry/OWNERS
  • ci-operator/step-registry/agent/e2e/iso-no-registry/create-cluster/OWNERS
  • ci-operator/step-registry/agent/e2e/iso-no-registry/create-cluster/agent-e2e-iso-no-registry-create-cluster-commands.sh
  • ci-operator/step-registry/agent/e2e/iso-no-registry/create-cluster/agent-e2e-iso-no-registry-create-cluster-ref.metadata.json
  • ci-operator/step-registry/agent/e2e/iso-no-registry/create-cluster/agent-e2e-iso-no-registry-create-cluster-ref.yaml
  • ci-operator/step-registry/baremetalds/devscripts/setup/baremetalds-devscripts-setup-commands.sh
💤 Files with no reviewable changes (1)
  • ci-operator/config/openshift/release/openshift-release-main__nightly-5.0.yaml
✅ Files skipped from review due to trivial changes (2)
  • ci-operator/step-registry/agent/e2e/iso-no-registry/create-cluster/OWNERS
  • ci-operator/step-registry/agent/e2e/iso-no-registry/create-cluster/agent-e2e-iso-no-registry-create-cluster-ref.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
  • ci-operator/step-registry/agent/e2e/iso-no-registry/create-cluster/agent-e2e-iso-no-registry-create-cluster-ref.metadata.json
  • ci-operator/step-registry/agent/e2e/generic/conformance/iso-no-registry/agent-e2e-generic-conformance-iso-no-registry-workflow.yaml
  • ci-operator/step-registry/baremetalds/devscripts/setup/baremetalds-devscripts-setup-commands.sh

Walkthrough

This PR separates the agent ISO-no-registry cluster creation into a dedicated step with independent timeout management. It adds a new SSH-based create-cluster commands script and step-ref, expands workflow dev-scripts targets and pre-steps to invoke the new step, gates shared post-install artifact publication on kubeconfig presence, removes two explicit test timeouts, and updates OWNERS files.

Changes

Agent ISO-no-registry cluster creation step separation

Layer / File(s) Summary
Cluster creation orchestration and artifacts
ci-operator/step-registry/agent/e2e/iso-no-registry/create-cluster/agent-e2e-iso-no-registry-create-cluster-commands.sh, ci-operator/step-registry/agent/e2e/iso-no-registry/create-cluster/agent-e2e-iso-no-registry-create-cluster-ref.yaml, ci-operator/step-registry/agent/e2e/iso-no-registry/create-cluster/OWNERS, ci-operator/step-registry/agent/e2e/iso-no-registry/create-cluster/agent-e2e-iso-no-registry-create-cluster-ref.metadata.json
Adds a new SSH-driven agent_create_cluster commands script with getExtraVal() and finished() handlers to capture kubeconfig, credentials, logs, redact secrets, copy dev-scripts variables, and write console.url; adds a step-registry ref and metadata with timeout/resource/env; adds approvers for this step.
Workflow wiring and OWNERS
ci-operator/step-registry/agent/e2e/generic/conformance/iso-no-registry/agent-e2e-generic-conformance-iso-no-registry-workflow.yaml, ci-operator/step-registry/agent/e2e/iso-no-registry/OWNERS
Expands DEVSCRIPTS_TARGET to a multi-stage agent setup/build/release/configure sequence and inserts agent-e2e-iso-no-registry-create-cluster into the workflow pre chain; updates iso-no-registry OWNERS approvers.
Conditional post-install steps in shared setup and timeout removals
ci-operator/step-registry/baremetalds/devscripts/setup/baremetalds-devscripts-setup-commands.sh, ci-operator/config/openshift/release/openshift-release-main__nightly-5.0.yaml
Gates post-install artifact sharing (ds-vars and console.url upload) on remote kubeconfig existence to skip when cluster creation runs in a separate step; removes two explicit timeout: 4h0m0s entries from nightly release job definitions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

lgtm, rehearsals-ack

Suggested reviewers

  • stbenjam
  • andfasano
  • pawanpinjarkar
🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main objective: extracting agent_create_cluster into a dedicated step with extended timeout to prevent timeouts in iso-no-registry jobs.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR contains no Ginkgo test definitions. Changes are CI/CD infrastructure configuration (YAML, shell scripts, OWNERS files), making this check inapplicable.
Test Structure And Quality ✅ Passed This PR contains no Ginkgo test files. It only modifies workflow YAML, bash scripts, configuration files, and OWNERS files. The check is not applicable.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests were added in this PR. Changes are limited to CI/CD infrastructure (bash scripts, YAML configs, OWNERS files) and do not include any test code.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. Changes consist of CI/CD infrastructure (YAML workflows, shell scripts, configs) only, making the SNO test compatibility check inapplicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only CI test infrastructure (step-registry, workflows, test config) in ci-operator/. No deployment manifests, operator code, or pod scheduling constraints are introduced.
Ote Binary Stdout Contract ✅ Passed OTE Binary Stdout Contract check is not applicable: PR contains only YAML workflows, bash scripts, and configuration files with no Go code or test binaries.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR does not add any Ginkgo e2e tests (Go test files). Changes are CI/CD configuration files (YAML, shell scripts, OWNERS) only. The IPv6/disconnected network compatibility check is not applicable.
No-Weak-Crypto ✅ Passed No weak cryptography patterns detected. Modified files contain no MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB, or custom crypto implementations. No non-constant-time secret comparisons found.
Container-Privileges ✅ Passed No K8s container privilege escalation settings found. The --privileged flag in bash script is Docker/Podman CLI, not K8s securityContext.
No-Sensitive-Data-In-Logs ✅ Passed All SSH output is piped through sed redaction filters removing auth tokens, passwords, and secrets. Logs are post-processed to redact sensitive fields. No unredacted credentials detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@bfournie bfournie force-pushed the agent-installer-increase-timeout branch from b60c134 to 478b7f8 Compare June 4, 2026 20:16
@bfournie
Copy link
Copy Markdown
Contributor Author

bfournie commented Jun 4, 2026

/pj-rehearse periodic-ci-openshift-release-main-nightly-5.0-e2e-agent-compact-iso-no-registry-techpreview

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@bfournie: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@bfournie bfournie force-pushed the agent-installer-increase-timeout branch from 478b7f8 to f419ff9 Compare June 4, 2026 20:30
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 4, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bfournie
Once this PR has been reviewed and has the lgtm label, please assign stbenjam for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

… workflow

Split agent_create_cluster out of baremetalds-devscripts-setup into a new
dedicated step (agent-e2e-iso-no-registry-create-cluster) with a 3h timeout,
so ISO_NO_REGISTRY jobs are no longer killed by the 2h default step timeout
before cluster installation completes.
@bfournie bfournie force-pushed the agent-installer-increase-timeout branch from f419ff9 to 2d7df0a Compare June 4, 2026 21:11
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@bfournie: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-frr-main-metallb-e2e-metal openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-main-frrk8s-e2e-metal openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-5.1-metallb-e2e-metal openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-5.1-frrk8s-e2e-metal openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-5.0-metallb-e2e-metal openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-5.0-frrk8s-e2e-metal openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.23-metallb-e2e-metal openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.23-frrk8s-e2e-metal openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.22-metallb-e2e-metal openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.22-frrk8s-e2e-metal openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.21-metallb-e2e-metal openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.21-frrk8s-e2e-metal openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.20-metallb-e2e-metal openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.20-frrk8s-e2e-metal openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.19-metallb-e2e-metal openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.19-frrk8s-e2e-metal openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.18-metallb-e2e-metal-frrk8s openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.18-frrk8s-e2e-metal openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.18-frrk8s-e2e-metal-cno openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.17-metallb-e2e-metal-frrk8s openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.17-metallb-e2e-metal-ipi-ovn openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.16-metallb-e2e-metal-frrk8s openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.16-metallb-e2e-metal-ipi-ovn openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.15-metallb-e2e-metal-ipi openshift/frr presubmit Registry content changed
pull-ci-openshift-frr-release-4.15-metallb-e2e-metal-ipi-ovn openshift/frr presubmit Registry content changed

A total of 4361 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs.

A full list of affected jobs can be found here

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@bfournie
Copy link
Copy Markdown
Contributor Author

bfournie commented Jun 4, 2026

/pj-rehearse periodic-ci-openshift-release-main-nightly-5.0-e2e-agent-compact-iso-no-registry-techpreview

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@bfournie: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@bfournie
Copy link
Copy Markdown
Contributor Author

bfournie commented Jun 4, 2026

/pj-rehearse periodic-ci-openshift-release-main-nightly-5.0-e2e-agent-ha-dualstack-iso-no-registry-techpreview

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@bfournie: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@bfournie
Copy link
Copy Markdown
Contributor Author

bfournie commented Jun 5, 2026

/pj-rehearse periodic-ci-openshift-release-main-nightly-5.0-e2e-agent-ha-dualstack-iso-no-registry-techpreview

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@bfournie: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@bfournie
Copy link
Copy Markdown
Contributor Author

bfournie commented Jun 5, 2026

/pj-rehearse periodic-ci-openshift-release-main-nightly-5.0-e2e-agent-ha5-dualstack-iso-no-registry-techpreview

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@bfournie: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@bfournie
Copy link
Copy Markdown
Contributor Author

bfournie commented Jun 5, 2026

/pj-rehearse periodic-ci-openshift-release-main-nightly-5.0-e2e-agent-ha-dualstack-iso-no-registry-techpreview

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@bfournie: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@bfournie
Copy link
Copy Markdown
Contributor Author

bfournie commented Jun 5, 2026

/pj-rehearse periodic-ci-openshift-release-main-nightly-5.0-e2e-agent-ha-dualstack-iso-no-registry-techpreview

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@bfournie: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@bfournie
Copy link
Copy Markdown
Contributor Author

bfournie commented Jun 5, 2026

/pj-rehearse periodic-ci-openshift-release-main-nightly-5.0-e2e-agent-ha5-dualstack-iso-no-registry-techpreview

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@bfournie: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@bfournie
Copy link
Copy Markdown
Contributor Author

bfournie commented Jun 5, 2026

/pj-rehearse periodic-ci-openshift-release-main-nightly-5.0-e2e-agent-ha5-dualstack-iso-no-registry-techpreview

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@bfournie: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@bfournie
Copy link
Copy Markdown
Contributor Author

bfournie commented Jun 5, 2026

/pj-rehearse periodic-ci-openshift-release-main-nightly-5.0-e2e-agent-ha-dualstack-iso-no-registry-techpreview

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@bfournie: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@bfournie
Copy link
Copy Markdown
Contributor Author

bfournie commented Jun 5, 2026

/pj-rehearse periodic-ci-openshift-release-main-nightly-5.0-e2e-agent-ha-dualstack-iso-no-registry-techpreview

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@bfournie: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 6, 2026

@bfournie: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-openshift-release-main-nightly-5.0-e2e-agent-ha-dualstack-iso-no-registry-techpreview 2d7df0a link unknown /pj-rehearse periodic-ci-openshift-release-main-nightly-5.0-e2e-agent-ha-dualstack-iso-no-registry-techpreview

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@bfournie
Copy link
Copy Markdown
Contributor Author

bfournie commented Jun 6, 2026

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants