Skip to content

Enforce that PR CI jobs are a subset of Auto CI jobs modulo carve-outs #144244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jieyouxu
Copy link
Member

@jieyouxu jieyouxu commented Jul 21, 2025

Background

Currently, it is possible for a PR with red PR-only CI to pass Auto CI, then all subsequent PR CI runs will be red until that is fixed, even in completely unrelated PRs. For instance, this happened with PR-CI-only Spellcheck (#144183).

See more discussions at #t-infra > Spellcheck workflow now fails on all PRs (tree bad?).

CI invariant: PR CI jobs are a subset of Auto CI jobs modulo carve-outs

To prevent red PR CI in completely unrelated subsequent PRs and PR CI runs, we need to maintain an invariant that PR CI jobs are a subset of Auto CI jobs modulo carve-outs.

This is not a "strict" subset relationship: some jobs necessarily have to differ under PR CI and Auto CI environments, at least in the current setup. Still, we can try to enforce a weaker "subset modulo carve-outs" relationship between CI jobs and their corresponding Auto jobs. For instance:

  • x86_64-gnu-tools will have auto-only env vars like DEPLOY_TOOLSTATES_JSON: toolstates-linux.json.
  • tidy will want to continue_on_error: true in PR CI to allow for more "useful" compilation errors to also be reported, whereas it should be continue_on_error: false in Auto CI to prevent wasting Auto CI resources.

The carve-outs are:

  1. env variables.
  2. continue_on_error.

We enforce this invariant through citool, so only affects job definitions that are handled by citool. Notably, this is not sufficient alone to address the CI-only Spellcheck issue (#144183). To carry out this enforcement, we modify citool to auto-register PR jobs as Auto jobs with continue_on_error overridden to false unless there's an overriding Auto job for the PR job of the same name that only differs by the permitted carve-outs.

Addressing the Spellcheck PR-only CI issue

Note that Spellcheck currently does not go through citool or bootstrap, and is its own GitHub Actions workflow. To actually address the PR-CI-only Spellcheck issue (#144183), and carry out the subset-modulo-carve-outs enforcement universally, this PR additionally removes the current Spellcheck implementation (a separate GitHub Actions Workflow). That is incompatible with Homu unless we do some hacks in the main CI workflow.

This effectively partially reverts #134006 (the separate workflow part, not the tidy extra checks component), but is not prejudice against relanding the typos-based spellcheck in another implementation that goes through the usual bootstrap CI workflow so that it does work with Homu. The typos-based spellcheck seems to have a good false-positive rate.

Closes #144183.


r? infra-ci

@jieyouxu
Copy link
Member Author

FYI @rust-lang/infra as this slightly changes the main r-l/r CI setup.

@jieyouxu jieyouxu added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. A-CI Area: Our Github Actions CI labels Jul 21, 2025
@@ -6,7 +6,7 @@ const TEST_JOBS_YML_PATH: &str = concat!(env!("CARGO_MANIFEST_DIR"), "/tests/tes
fn auto_jobs() {
let stdout = get_matrix("push", "commit", "refs/heads/auto");
insta::assert_snapshot!(stdout, @r#"
jobs=[{"name":"aarch64-gnu","full_name":"auto - aarch64-gnu","os":"ubuntu-22.04-arm","env":{"ARTIFACTS_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZN24CBO55","AWS_REGION":"us-west-1","CACHES_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZI5DHEBFL","DEPLOY_BUCKET":"rust-lang-ci2","TOOLSTATE_PUBLISH":1},"free_disk":true},{"name":"x86_64-gnu-llvm-18-1","full_name":"auto - x86_64-gnu-llvm-18-1","os":"ubuntu-24.04","env":{"ARTIFACTS_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZN24CBO55","AWS_REGION":"us-west-1","CACHES_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZI5DHEBFL","DEPLOY_BUCKET":"rust-lang-ci2","DOCKER_SCRIPT":"stage_2_test_set1.sh","IMAGE":"x86_64-gnu-llvm-18","READ_ONLY_SRC":"0","RUST_BACKTRACE":1,"TOOLSTATE_PUBLISH":1},"free_disk":true},{"name":"aarch64-apple","full_name":"auto - aarch64-apple","os":"macos-14","env":{"ARTIFACTS_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZN24CBO55","AWS_REGION":"us-west-1","CACHES_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZI5DHEBFL","DEPLOY_BUCKET":"rust-lang-ci2","MACOSX_DEPLOYMENT_TARGET":11.0,"MACOSX_STD_DEPLOYMENT_TARGET":11.0,"NO_DEBUG_ASSERTIONS":1,"NO_LLVM_ASSERTIONS":1,"NO_OVERFLOW_CHECKS":1,"RUSTC_RETRY_LINKER_ON_SEGFAULT":1,"RUST_CONFIGURE_ARGS":"--enable-sanitizers --enable-profiler --set rust.jemalloc","SCRIPT":"./x.py --stage 2 test --host=aarch64-apple-darwin --target=aarch64-apple-darwin","SELECT_XCODE":"/Applications/Xcode_15.4.app","TOOLSTATE_PUBLISH":1,"USE_XCODE_CLANG":1}},{"name":"dist-i686-msvc","full_name":"auto - dist-i686-msvc","os":"windows-2022","env":{"ARTIFACTS_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZN24CBO55","AWS_REGION":"us-west-1","CACHES_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZI5DHEBFL","CODEGEN_BACKENDS":"llvm,cranelift","DEPLOY_BUCKET":"rust-lang-ci2","DIST_REQUIRE_ALL_TOOLS":1,"RUST_CONFIGURE_ARGS":"--build=i686-pc-windows-msvc --host=i686-pc-windows-msvc --target=i686-pc-windows-msvc,i586-pc-windows-msvc --enable-full-tools --enable-profiler","SCRIPT":"python x.py dist bootstrap --include-default-paths","TOOLSTATE_PUBLISH":1}}]
jobs=[{"name":"aarch64-gnu","full_name":"auto - aarch64-gnu","os":"ubuntu-22.04-arm","env":{"ARTIFACTS_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZN24CBO55","AWS_REGION":"us-west-1","CACHES_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZI5DHEBFL","DEPLOY_BUCKET":"rust-lang-ci2","TOOLSTATE_PUBLISH":1},"free_disk":true},{"name":"x86_64-gnu-llvm-18-1","full_name":"auto - x86_64-gnu-llvm-18-1","os":"ubuntu-24.04","env":{"ARTIFACTS_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZN24CBO55","AWS_REGION":"us-west-1","CACHES_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZI5DHEBFL","DEPLOY_BUCKET":"rust-lang-ci2","DOCKER_SCRIPT":"stage_2_test_set1.sh","IMAGE":"x86_64-gnu-llvm-18","READ_ONLY_SRC":"0","RUST_BACKTRACE":1,"TOOLSTATE_PUBLISH":1},"free_disk":true},{"name":"aarch64-apple","full_name":"auto - aarch64-apple","os":"macos-14","env":{"ARTIFACTS_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZN24CBO55","AWS_REGION":"us-west-1","CACHES_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZI5DHEBFL","DEPLOY_BUCKET":"rust-lang-ci2","MACOSX_DEPLOYMENT_TARGET":11.0,"MACOSX_STD_DEPLOYMENT_TARGET":11.0,"NO_DEBUG_ASSERTIONS":1,"NO_LLVM_ASSERTIONS":1,"NO_OVERFLOW_CHECKS":1,"RUSTC_RETRY_LINKER_ON_SEGFAULT":1,"RUST_CONFIGURE_ARGS":"--enable-sanitizers --enable-profiler --set rust.jemalloc","SCRIPT":"./x.py --stage 2 test --host=aarch64-apple-darwin --target=aarch64-apple-darwin","SELECT_XCODE":"/Applications/Xcode_15.4.app","TOOLSTATE_PUBLISH":1,"USE_XCODE_CLANG":1}},{"name":"dist-i686-msvc","full_name":"auto - dist-i686-msvc","os":"windows-2022","env":{"ARTIFACTS_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZN24CBO55","AWS_REGION":"us-west-1","CACHES_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZI5DHEBFL","CODEGEN_BACKENDS":"llvm,cranelift","DEPLOY_BUCKET":"rust-lang-ci2","DIST_REQUIRE_ALL_TOOLS":1,"RUST_CONFIGURE_ARGS":"--build=i686-pc-windows-msvc --host=i686-pc-windows-msvc --target=i686-pc-windows-msvc,i586-pc-windows-msvc --enable-full-tools --enable-profiler","SCRIPT":"python x.py dist bootstrap --include-default-paths","TOOLSTATE_PUBLISH":1}},{"name":"pr-check-1","full_name":"auto - pr-check-1","os":"ubuntu-24.04","env":{"ARTIFACTS_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZN24CBO55","AWS_REGION":"us-west-1","CACHES_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZI5DHEBFL","DEPLOY_BUCKET":"rust-lang-ci2","TOOLSTATE_PUBLISH":1},"continue_on_error":false,"free_disk":true},{"name":"pr-check-2","full_name":"auto - pr-check-2","os":"ubuntu-24.04","env":{"ARTIFACTS_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZN24CBO55","AWS_REGION":"us-west-1","CACHES_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZI5DHEBFL","DEPLOY_BUCKET":"rust-lang-ci2","TOOLSTATE_PUBLISH":1},"continue_on_error":false,"free_disk":true},{"name":"tidy","full_name":"auto - tidy","os":"ubuntu-24.04","env":{"ARTIFACTS_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZN24CBO55","AWS_REGION":"us-west-1","CACHES_AWS_ACCESS_KEY_ID":"AKIA46X5W6CZI5DHEBFL","DEPLOY_BUCKET":"rust-lang-ci2","TOOLSTATE_PUBLISH":1},"continue_on_error":false,"free_disk":true,"doc_url":"https://foo.bar"}]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remark: this is not very review-friendly, but the diff is that the 3 PR-only jobs were auto-registered in Auto jobs...

@jieyouxu
Copy link
Member Author

FYI @klensy @nnethercote as this partially reverts the separate GHA workflow introduced by #134006. I didn't remove the tidy extra check, that seems fine if @klensy or someone else wants to work on a reland that doesn't use a separate GHA workflow.

Copy link
Member

@Kobzol Kobzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really took this seriously 😆 Pretty nice.

for pr_job in &db.pr_jobs {
let Some(auto_job) = db.auto_jobs.iter().find(|auto_job| auto_job.name == pr_job.name)
else {
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point this should be a hard error, right? Otherwise we missed adding some PR jobs to auto jobs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also db.auto_jobs.iter().find(|auto_job| auto_job.name == pr_job.name) repeats a lot, might be worth adding some helper function for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let me double-check this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point this should be a hard error, right? Otherwise we missed adding some PR jobs to auto jobs.

I changed this to a panic, because it shouldn't be reachable at this point.

Also db.auto_jobs.iter().find(|auto_job| auto_job.name == pr_job.name) repeats a lot, might be worth adding some helper function for it.

Add JobDatabase::find_auto_job_by_name helper.

jieyouxu added 3 commits July 21, 2025 18:06
To prevent possibility of a PR with red PR-only CI passing Auto CI, then
all subsequent PR CI runs will be red until that is fixed.

Note that this is **not** a "strict" subset relationship: some jobs
necessarily have to differ under PR CI and Auto CI environments. For
instance:

- `x86_64-gnu-tools` will have auto-only env vars like
  `DEPLOY_TOOLSTATES_JSON: toolstates-linux.json`.
- `tidy` will want to `continue_on_error: true` in PR CI to allow for
  more "useful" compilation errors to also be reported, whereas it needs
  to `continue_on_error: false` in Auto CI to prevent wasting Auto CI
  resources.

The carve-outs are:

1. `env` variables.
2. `continue_on_error`.
Unfortunately, the separate spellcheck GHA workflow does not really work
with homu, if we would like to enforce the invariant that PR CI is a
subset of Full CI (modulo carve outs).

This is not prejudice against a reland of a `typos`-based spellcheck, it
just probably has to go through the "usual" CI flow with bootstrap, so
that it can work with homu.
@rustbot rustbot added the A-testsuite Area: The testsuite used to check the correctness of rustc label Jul 21, 2025
@jieyouxu
Copy link
Member Author

jieyouxu commented Jul 21, 2025

Changes since last review: https://github.com/rust-lang/rust/compare/4b597f074fcdda86c5ce64693da972e61e42a352..523594d7aec3293c6f1926e4f1edcec141ef496b

  • Removed outdated comment.
  • Added helper.
  • Changed continue; to panic, should be unreachable at that point.

@Kobzol
Copy link
Member

Kobzol commented Jul 21, 2025

Thanks! You can r=me once CI is green.

@jieyouxu
Copy link
Member Author

Infra team vibe-check: #t-infra > meeting 2025-07-21 @ 💬. Noted that env is quite a big carve-out (as it can substantially influence what gets run or not...), but still stricter than before. Also discussed maybe using alternative schemes for non-blocking jobs. I'll open a follow-up issue tracking differences between PR CI and Auto CI.

PR CI is 🍏, so
@bors r=Kobzol rollup=never

@bors
Copy link
Collaborator

bors commented Jul 21, 2025

📌 Commit 523594d has been approved by Kobzol

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-CI Area: Our Github Actions CI A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spellcheck is PR CI only
4 participants