-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Only work-steal in the main loop for rustc_thread_pool #143035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
These commits modify the If this was unintentional then you should revert the changes before this PR is merged. |
@@ -52,6 +54,12 @@ struct ScopeBase<'scope> { | |||
/// latch to track job counts | |||
job_completed_latch: CountLatch, | |||
|
|||
/// Jobs that have been spawned, but not yet started. | |||
pending_jobs: Mutex<IndexSet<JobRefId>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this swapped to IndexSet
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lint is "prefer FxHashSet
over HashSet
, it has better performance".
Should I suppress it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The impact on performance needs to be measured with rustc-perf. For now, we can keep the original implementation :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been restored.
You should add me as a co-author for proper copyright assignment. |
Done |
This comment has been minimized.
This comment has been minimized.
@@ -796,14 +797,83 @@ impl WorkerThread { | |||
/// stealing tasks as necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this comment, which contains "stealing tasks", is still correct after we removed work-steal.
Co-authored-by: Zoxc <[email protected]>
Co-authored-by: Zoxc <[email protected]>
Co-authored-by: Zoxc <[email protected]>
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Only work-steal in the main loop for rustc_thread_pool This PR is a replica of <rust-lang/rustc-rayon#12> that only retained work-steal in the main loop for rustc_thread_pool. r? `@oli-obk` cc `@SparrowLii` `@Zoxc` `@cuviper` Updates #113349
Sorry about the build failure, that was a temporary bug. It shouldn't have affected any perf. numbers though. |
AFAIK the perf tool only tests single-threaded scenarios, so there should not be such performance regresses. We need to identify the problem. |
Once the perf tool is fixed, we can try again. |
There is one benchmark that runs with 4 threads in rustc-perf now. We can't compare it by icount though, ofc. |
@bors2 try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Only work-steal in the main loop for rustc_thread_pool <!-- homu-ignore:start --> <!-- If this PR is related to an unstable feature or an otherwise tracked effort, please link to the relevant tracking issue here. If you don't know of a related tracking issue or there are none, feel free to ignore this. This PR will get automatically assigned to a reviewer. In case you would like a specific user to review your work, you can assign it to them by using r? <reviewer name> --> <!-- homu-ignore:end --> This PR is a replica of <rust-lang/rustc-rayon#12> that only retained work-steal in the main loop for rustc_thread_pool. r? `@oli-obk` cc `@SparrowLii` `@Zoxc` `@cuviper` Updates #113349
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (c41093d): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary 3.0%, secondary 1.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 1.8%, secondary 2.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 461.861s -> 463.58s (0.37%) |
How about this performance, Is it acceptable? |
Oh I saw it. |
The wall-time of some cases has regressed. I am confused since these cases are running under single-thread and should not be affected by We need to identify the cause. I think you can do more local performance testing under single-thread/mult-ithread (like I listed here) |
Note that we are switching the benchmarking collector to a different machine today, and this result was only the second benchmark run produced on the new machine. So if you want to get more stable results, with noise threshold updated, I would wait a few days. That being said, we currently only have one multithreaded benchmark in the suite. Local benchmarks would probably be more useful here. |
Some benchmarks have executed on a local machine. Here is the result I summarized. As a result, there is little impact(below 4%) when the number of threads exceeds 8, but this is probably a significant regression with fewer than 8 threads( the average wall time has increased by 5%, and it’s over 9% for full compilation on 4 threads.). |
Personally I think the results are acceptable. Especially in My suggestion is that we can merge this PR and mark the original |
@bors r+ |
☀️ Test successful - checks-actions |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 8df4a58 (parent) -> 25cf7d1 (this PR) Test differencesShow 49 test diffsStage 1
Additionally, 22 doctest diffs were found. These are ignored, as they are noisy. Job group index
Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard 25cf7d13c960a3ac47d1424ca354077efb6946ff --output-dir test-dashboard And then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
Finished benchmarking commit (25cf7d1): comparison URL. Overall result: ❌ regressions - no action needed@rustbot label: -perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (secondary 0.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 466.974s -> 464.368s (-0.56%) |
This PR is a replica of rust-lang/rustc-rayon#12 that only retained work-steal in the main loop for rustc_thread_pool.
r? @oli-obk
cc @SparrowLii @Zoxc @cuviper
Updates #113349