-
Notifications
You must be signed in to change notification settings - Fork 13.3k
compiletest: Use the new non-libtest executor by default #139998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
r? @onur-ozkan rustbot has assigned @onur-ozkan. Use |
Some changes occurred in src/tools/compiletest cc @jieyouxu |
Makes sense, thanks! @bors r+ rollup |
Just in case, have you benchmarked the time to execute the UI suite between the two executors? |
With libtest:
With the new executor:
Based on a few local runs, the new executor seems to consistently be a couple of seconds slower across a 3 minute ui test run, which is a little surprising to me since it should be doing marginally less work overall. |
It's interesting that the new executor seems to be more stable. 5% slower isn't that terrible, but it would be nice to investigate more why it seems to be slower. |
Bug found in deadline detection (#139660 (comment)); I'll push a fix. @bors r- |
Let me also do a few smoke runs of the new executor on msvc in case anything funny occurs. |
I'd also mark this as rollup-never since we're changing the |
Maybe there's another issue with the deadline logic (with the // tests/ui/foo.rs
//@ run-pass
fn main() {
std::thread::sleep(std::time::Duration::from_secs(70)); // 70 to hit the libtest default slow test warning threshold of 60s
} Due to the way how the libtest-esque test filtering logic works,
|
Yeah, I'm seeing the same thing. I think it's because we don't actually remove completed tests from the deadline queue when they finish, so tests are treated as having timed out long after they have already finished. This was being masked by the other bug, which made us not emit any timeout events due to the backwards check. |
I wonder if we can come up with some kind of self-tests for this 🤔 |
Well the good news is that fixing the deadline bugs also seems to eliminate the perf difference. 😅 |
I think what I want to do here is open a separate PR to fix the deadline bugs (hopefully with some unit tests), and then we can revisit this PR once that has landed. |
Sounds good |
Separate PR to fix deadline bugs: #140031. |
compiletest: Fix deadline bugs in new executor The experimental new executor for compiletest (rust-lang#139660) was found to have two major bugs in deadline handling for detecting slow tests: - The comparison between `now` and test deadlines was reversed, causing no timeouts to ever be recognised. - After fixing that bug, it was found that the existing code would issue timeouts for any test that had started more than 60 seconds ago, even if the test had finished long before its deadline was reached. This PR fixes those bugs. (The new executor is not yet enabled by default, so this PR has no immediate effect on contributors.) --- I noted in rust-lang#139998 (comment) that I hoped to have some unit tests to accompany these fixes. Unfortunately that turned out to be infeasible, because `DeadlineQueue` is tightly coupled to concrete `mpsc::Receiver` APIs (in addition to `Instant::now`), and trying to mock all of those would make the code much more complicated. I did, however, add a few assertions that would have caught the failure to remove tests from the queue after their deadline. r? jieyouxu
Rollup merge of rust-lang#140031 - Zalathar:deadline, r=jieyouxu compiletest: Fix deadline bugs in new executor The experimental new executor for compiletest (rust-lang#139660) was found to have two major bugs in deadline handling for detecting slow tests: - The comparison between `now` and test deadlines was reversed, causing no timeouts to ever be recognised. - After fixing that bug, it was found that the existing code would issue timeouts for any test that had started more than 60 seconds ago, even if the test had finished long before its deadline was reached. This PR fixes those bugs. (The new executor is not yet enabled by default, so this PR has no immediate effect on contributors.) --- I noted in rust-lang#139998 (comment) that I hoped to have some unit tests to accompany these fixes. Unfortunately that turned out to be infeasible, because `DeadlineQueue` is tightly coupled to concrete `mpsc::Receiver` APIs (in addition to `Instant::now`), and trying to mock all of those would make the code much more complicated. I did, however, add a few assertions that would have caught the failure to remove tests from the queue after their deadline. r? jieyouxu
The job Click to see the possible cause of the failure (guessed by this bot)
|
💔 Test failed - checks-actions |
@bors retry |
☀️ Test successful - checks-actions |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 645d0ad (parent) -> be181dd (this PR) Test differencesShow 33042 test diffsStage 1
Stage 2
(and 16426 additional test diffs) Additionally, 16516 doctest diffs were found. These are ignored, as they are noisy. Job group index Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard be181dd75c83d72fcc95538e235768bc367b76b9 --output-dir test-dashboard And then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
Some of these test changes are explained by I think that maybe we should revert this and land it again with some fake compiler change, just to make sure that |
Hm, might be worth doing a reland with the synthetic compiler change just to compare the timing too... @Zalathar would you be okay with a reland? It didn't occur to me that |
Finished benchmarking commit (be181dd): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary -0.7%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 0.6%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 776.892s -> 775.475s (-0.18%) |
OK, let's revert this and re-land with a synthetic compiler change. |
Revert <rust-lang#139998> because the original merge triggered download-rustc, which messes with test metrics and prevents us from properly comparing them before/after the change. The plan is to re-land this PR as-is, combined with a trivial compiler change to avoid download-rustc and get proper test metrics. This reverts commit be181dd, reversing changes made to 645d0ad.
Revert <rust-lang#139998> because the original merge triggered download-rustc, which messes with test metrics and prevents us from properly comparing them before/after the change. The plan is to re-land this PR as-is, combined with a trivial compiler change to avoid download-rustc and get proper test metrics for comparison. This reverts commit be181dd, reversing changes made to 645d0ad.
…ouxu Revert compiletest new-executor, to re-land without download-rustc Revert <rust-lang#139998> because the original merge triggered download-rustc, which messes with test metrics and prevents us from properly comparing them before/after the change. The plan is to re-land this PR as-is, combined with a trivial compiler change to avoid download-rustc and get proper test metrics for comparison. This reverts commit be181dd, reversing changes made to 645d0ad. r? ghost
…ouxu Revert compiletest new-executor, to re-land without download-rustc Revert <rust-lang#139998> because the original merge triggered download-rustc, which messes with test metrics and prevents us from properly comparing them before/after the change. The plan is to re-land this PR as-is, combined with a trivial compiler change to avoid download-rustc and get proper test metrics for comparison. This reverts commit be181dd, reversing changes made to 645d0ad. r? ghost
…ouxu Revert compiletest new-executor, to re-land without download-rustc Revert <rust-lang#139998> because the original merge triggered download-rustc, which messes with test metrics and prevents us from properly comparing them before/after the change. The plan is to re-land this PR as-is, combined with a trivial compiler change to avoid download-rustc and get proper test metrics for comparison. This reverts commit be181dd, reversing changes made to 645d0ad. r? ghost
(Re-landing rust-lang#139998, with a compiler change to inhibit download-rustc.) Currently the new executor can be explicitly disabled by passing the `-N` flag to compiletest (e.g. `./x test ui -- -N`), but eventually that flag will be removed, alongside the removal of the libtest dependency.
Revert <rust-lang#139998> because the original merge triggered download-rustc, which messes with test metrics and prevents us from properly comparing them before/after the change. The plan is to re-land this PR as-is, combined with a trivial compiler change to avoid download-rustc and get proper test metrics for comparison. This reverts commit be181dd, reversing changes made to 645d0ad.
The new executor was implemented in #139660, but required a manual opt-in. This PR activates the new executor by default, but leaves the old libtest-based executor in place (temporarily) to make reverting easier if something unexpectedly goes horribly wrong.
Currently the new executor can be explicitly disabled by passing the
-N
flag to compiletest (e.g../x test ui -- -N
), but eventually that flag will be removed, alongside the removal of the libtest dependency. The flag is mostly there to make manual comparative testing easier if something does go wrong.As before, there should be no user-visible difference between the old executor and the new executor.
I didn't get much of a response to my call for testing thread on Zulip, and the reports I did get (along with my own usage) indicate that there aren't any problems. So I think it's reasonable to move forward with making this the default, in the hopes of being able to remove the libtest dependency relatively soon.
When the libtest dependency is removed, it should be reasonable to build compiletest against pre-built stage0 std by default, even after the stage0 redesign. (Though we should probably have at least one CI job using in-tree stage1 std instead, to guard against the possibility of the
#![feature(internal_output_capture)]
API actually changing.)