-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Introduce a fast path that avoids the debug_tuple
abstraction when deriving Debug for unit-like enum variants.
#88832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…deriving Debug for unit-like enum variants. The intent here is to allow LLVM to remove the switch entirely in favor of an indexed load from a table of constant strings, which is likely what the programmer would write in C. Unfortunately, LLVM currently doesn't perform this optimization due to a bug, but there is [a patch](https://reviews.llvm.org/D109565) that fixes this issue. I've verified that, with that patch applied on top of this commit, Debug for unit-like tuple variants becomes a load, reducing the O(n) code bloat to O(1). Note that inlining `DebugTuple::finish()` wasn't enough to allow LLVM to optimize the code properly; I had to avoid the abstraction entirely. Not using the abstraction is likely better for compile time anyway. Part of rust-lang#88793.
If anyone else wants to review, feel free. |
@bors try @rust-timer queue Since we're trying to make LLVM behave here, I think we should have a codegen test that checks that the generated code is indeed match-free |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 79bc538 with merge 4d514a123db701cd4bb7d0e4c3d98b23d0604e67... |
☀️ Try build successful - checks-actions |
Queued 4d514a123db701cd4bb7d0e4c3d98b23d0604e67 with parent 9f85cd6, future comparison URL. |
Finished benchmarking commit (4d514a123db701cd4bb7d0e4c3d98b23d0604e67): comparison url. Summary: This change led to large relevant mixed results 🤷 in compiler performance.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never |
I can't tell where the keccak regression is coming from, it doesn't have or use any debug impls. So I'm assuming it's changes in rustc possibly due to changed inlining. |
Unfortunately that's not possible right now because https://reviews.llvm.org/D109565 hasn't landed yet. |
Insufficient permissions to issue commands to rust-timer. |
@bors r+ we're keeping the issue open, I'm going to mark it as needs-test and blocked |
📌 Commit 79bc538 has been approved by |
⌛ Testing commit 79bc538 with merge ffbb23d7425f8443ff49055ddfa4fcf0db86c29f... |
The job Click to see the possible cause of the failure (guessed by this bot)
|
💔 Test failed - checks-actions |
@oli-obk I think we probably want a little more investigation on perf regressions before approving PRs. In particular, in this case, I would say that the instruction count regressions are justified because the same benchmarks do not regress in cycles, so the instruction count regressions are not representative of a real performance loss. (They're just a proxy for potential performance losses). It'd be good to justify regressions and mark them as "triaged" before approving a PR if the performance loss is noted in a pre-merge perf run. @rustbot label: +perf-regression-triaged |
⌛ Testing commit 79bc538 with merge 01fb7bac7823c96679ed7b93186036992e89cc2a... |
💥 Test timed out |
@bors retry |
@pcwalton: 🔑 Insufficient privileges: not in try users |
@bors retry |
☀️ Test successful - checks-actions |
The intent here is to allow LLVM to remove the switch entirely in favor of an
indexed load from a table of constant strings, which is likely what the
programmer would write in C. Unfortunately, LLVM currently doesn't perform this
optimization due to a bug, but there is a
patch that fixes this issue. I've verified
that, with that patch applied on top of this commit, Debug for unit-like tuple
variants becomes a load, reducing the O(n) code bloat to O(1).
Note that inlining
DebugTuple::finish()
wasn't enough to allow LLVM tooptimize the code properly; I had to avoid the abstraction entirely. Not using
the abstraction is likely better for compile time anyway.
Part of #88793.
r? @oli-obk