Skip to content

Use ubuntu-24.04-arm again #1867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 27, 2025
Merged

Conversation

EliahKagan
Copy link
Member

Closes #1780
Fixes #1866

This changes ubuntu-22.04-arm back to ubuntu-24.04-arm both in the test-fast ARM job and the test-32bit ARM job, because the underlying cause of the problems that we changed them to 22.04 to avoid (rust-lang/rust#135867) have been fixed to the extent that they affect GitHub Actions runners. See #1866 for details on that, or actions/partner-runner-images#36 (comment) for a quick summary of the underlying fix.

The first commit here includes a heavy test, running hundreds of matrix jobs, to confirm the above and look for any remaining problems with ubuntu-24.04-arm before switching back to it. The second commit removes the separate workflow for those tests, since we should not run these extra jobs regularly, and since if they are ever useful again then future tests would still need to modify them significantly.

The heavy-test workflow runs, where the first run didn't have fail-fast: false on test-32bit, were:

Enough test jobs were run that failures were to be expected: even outside of ARM runners, there is a low but nonzero rate of nondeterministic failure in gitoxide CI. The failures I observed were as follows:

  • In test-fast on 24.04, in one of the runs that tested a modified tool installation procedure using cargo quickinstall, downloading failed with HTTP 403 including after retries.

  • In test-32bit on 22.04, gix-prompt::prompt ask::askpass_only failed with EOF on this sub-case. I vaguely recall seeing similar failures in these expectrl-based tests, very rarely. But I am not sure. In any case, this is with 22.04, not 24.04.

  • In test-fast on 22.04, gix-worktree-state-tests::worktree state::checkout::dangling_symlinks_can_be_created failed because the probe did not detect the ability to create symlinks. This is due to #1816. It makes the third known occurrence, after #1789 and #1816 (comment). It is interesting because the first two occurrences were instead with gix-worktree-state-tests::worktree state::checkout::overwriting_files_and_lone_directories_works instead.

  • In test-32bit on 22.04, gix-status-tests::status index_as_worktree_with_renames::changed_and_untracked_and_renamed failed. Examining the actual vs. expected diffs reveals that it is due to #1832:

    <    DirwalkEntry {
    <        rela_path: "dir/untracked",
    <        status: Untracked,
    <        disk_kind: Some(
    <            File,
    <        ),
    >    Rewrite {
    >        source_rela_path: "empty",
    >        dest_rela_path: "dir/untracked",
    >        dest_dirwalk_status: Untracked,
    >        diff: None,
    >        copy: true,
    <    DirwalkEntry {
    <        rela_path: "untracked",
    <        status: Untracked,
    <        disk_kind: Some(
    <            File,
    <        ),
    >    Rewrite {
    >        source_rela_path: "empty",
    >        dest_rela_path: "untracked",
    >        dest_dirwalk_status: Untracked,
    >        diff: None,
    >        copy: true,

These are four failures in hundreds of runs. Only one of the four failures is on 24.04, with a download procedure that we do not use outside of these tests and the experiments that precede them (ci.yml does not use cargo quickinstall) and that I beleive I have seen locally a number of times on different platforms. The other three failures were on 22.04, all of them look like they are not runner-specific in any way, and none of them resemble any of the failures that had motivated us to avoid ubuntu-24.04-arm before.

Since the known issue looks like it has been fixed by switching the
runners to different virtual machines.

This repeats experiment 3 as well as doing a comparable experiment
for the 32-bit containerized test, which had also failed to run
some commands in Docker, where it was unclear if it was somehow
related or a separate problem.
@Byron
Copy link
Member

Byron commented Feb 27, 2025

Thanks so much!

I love the approach taken here, 'simply' testing stability by taking a large amount of samples, quickly.
Locally I do the same, typically using hyperfine to run a test binary many times to trigger non-deterministic failures (the last one triggered when a file was created in second N, but was checked in second N+1, so it's a rare occurrence but not at all impossible), but I never dared to think this can be done on CI.

@Byron Byron merged commit d0ef276 into GitoxideLabs:main Feb 27, 2025
21 checks passed
@EliahKagan EliahKagan deleted the run-ci/arm-segv-revert branch March 1, 2025 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ubuntu-24.04-arm can probably be used again Container creation sometimes fails in the 32-bit ARM test job
2 participants