Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(IDX): only enable flaky = True for tests that are >= 1% flaky #4325

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

basvandijk
Copy link
Collaborator

Tests marked with flaky = True are not cached by bazel which means they'll run for every PR regardless of whether they ran successfully before. This hurts CI performance. Therefor we only mark tests as flaky for tests that were actually more than 1% flaky over the last month.

We also remove flaky = False settings since that's the default.

@basvandijk basvandijk added the CI_ALL_BAZEL_TARGETS Runs all bazel targets and uploads them to S3 label Mar 11, 2025
@github-actions github-actions bot added the test label Mar 11, 2025
@basvandijk basvandijk removed the CI_ALL_BAZEL_TARGETS Runs all bazel targets and uploads them to S3 label Mar 11, 2025
@basvandijk basvandijk marked this pull request as ready for review March 11, 2025 17:50
@basvandijk basvandijk requested review from a team as code owners March 11, 2025 17:50
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this pull request affects the behavior of any canister owned by
the Governance team, remember to update the corresponding
unreleased_changes.md file(s).

To acknowldge this reminder (and unblock the PR), dismiss this
code review by going to the bottom of the pull request page, and
supply one of the following reasons:

  1. Done.

  2. No canister behavior changes.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this pull request affects the behavior of any canister owned by
the Governance team, remember to update the corresponding
unreleased_changes.md file(s).

To acknowldge this reminder (and unblock the PR), dismiss this
code review by going to the bottom of the pull request page, and
supply one of the following reasons:

  1. Done.

  2. No canister behavior changes.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this pull request affects the behavior of any canister owned by
the Governance team, remember to update the corresponding
unreleased_changes.md file(s).

To acknowldge this reminder (and unblock the PR), dismiss this
code review by going to the bottom of the pull request page, and
supply one of the following reasons:

  1. Done.

  2. No canister behavior changes.

@basvandijk basvandijk dismissed github-actions[bot]’s stale review March 11, 2025 18:02

No canister behavior changes.

@mbjorkqvist
Copy link
Member

Thanks for your relentless work on improving our testing, and in particular the flakiness @basvandijk! Do you have some concrete numbers or estimates on the impact on CI performance? Looking at the issue from another PoV, this will potentially increase the time that developers need to spend on PRs, either looking into the logs of failed (now non-flaky) tests to figure out if they were caused by their changes, or alternatively simply clicking retry once or twice (possibly with some delay) before deciding to investigate further? Yet another viewpoint is that some of the tests that are now being marked non-flaky are actually not flaky (anymore), with the rare failures being due to e.g., unrelated transient infrastructure issues, in which case removing the flaky flag definitely makes sense.

@basvandijk
Copy link
Collaborator Author

@mbjorkqvist

Do you have some concrete numbers or estimates on the impact on CI performance?

No I don't have concrete numbers nor an estimate. It's hard to predict what effect this will have in practise.

Looking at the issue from another PoV, this will potentially increase the time that developers need to spend on PRs, either looking into the logs of failed (now non-flaky) tests to figure out if they were caused by their changes, or alternatively simply clicking retry once or twice (possibly with some delay) before deciding to investigate further?

True, which is why we limited the removal of flake = True to those tests with a flakiness rate of < 1%.

Yet another viewpoint is that some of the tests that are now being marked non-flaky are actually not flaky (anymore), with the rare failures being due to e.g., unrelated transient infrastructure issues, in which case removing the flaky flag definitely makes sense.

Yes, the vast majority of tests where I removed flaky = True had a flakiness rate of 0%.

Note that we will extend Superset with some new charts that show the tests that were run as part of multiple attempts of a single workflow (i.e. those workflows that were manually retried) and that had both more than 0 failures and more than 0 successes, i.e. flaky tests.

@basvandijk basvandijk enabled auto-merge March 11, 2025 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.