-
Notifications
You must be signed in to change notification settings - Fork 319
Test | Add flaky test quarantine zone #3856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR simplifies test filtering to support a quarantine zone for flaky tests by updating the --filter expressions in test execution targets. The changes remove platform-specific category exclusions and standardize filtering to only exclude tests marked as "failing" or "flaky" across all test types (Functional and Manual).
- Simplifies test filters to exclude only "failing" and "flaky" categories
- Removes platform-specific category filters that are no longer needed
- Applies consistent filtering logic across Functional and Manual test targets for both Windows and Unix platforms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 12 comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 13 comments.
eng/pipelines/common/templates/steps/build-and-run-tests-netcore-step.yml
Show resolved
Hide resolved
eng/pipelines/common/templates/steps/build-and-run-tests-netcore-step.yml
Show resolved
Hide resolved
|
Can we also include the diagnostic testing please? These seemed to be flaky on ARM64. |
paulmedynski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, assuming the Copilot comments will be addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
eng/pipelines/common/templates/steps/build-and-run-tests-netcore-step.yml
Show resolved
Hide resolved
eng/pipelines/common/templates/steps/build-and-run-tests-netcore-step.yml
Show resolved
Hide resolved
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## main #3856 +/- ##
===========================================
- Coverage 90.82% 69.68% -21.15%
===========================================
Files 6 254 +248
Lines 316 64067 +63751
===========================================
+ Hits 287 44642 +44355
- Misses 29 19425 +19396
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
At the moment, we have a number of flaky tests in our pipelines (a couple of which I'll take credit for) that cause random failures, necessitate pipeline reruns, and cheapen the value of our regression testing suite. When tests fail regularly, we pay less attention when we see a failing test. A common pattern to deal with flaky tests is to establish a "quarantine zone" where these tests can live temporarily until they're improved. The quarantine zone is separate from the main build pipelines and does not block main builds, but runs regularly so that we can keep an eye on how flaky tests are performing.
To establish a quarantine zone, I'm planning to use the "flaky" category introduced in this PR: #3488. Tests in this category will be run in a separate testing stage immediately after the corresponding Unit/Functional/Manual testing stage (e.g. Unit => UnitFlaky => Functional => FunctionalFlaky ...). The flaky test stage will not block the pipeline, and failures will be ignored using the continueOnError property: steps.task definition | Microsoft Learn
The ActiveIssue tag will remain reserved for tests that cannot pass due to a platform specific bug, pipeline issue, or other limitation that causes the test to always fail.
I started off by tagging our top offenders based on this dashboard. As we discover more flaky tests, the DRI (or anyone) can open a PR adding the flaky tag to the offending tests. Flaky tests can be addressed either via prioritization during sprint planning or when working on a related section of code.