-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Tweak the vec-calloc runtime check to only apply to shortish-arrays #96596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
Would a codegen test be feasible to make sure llvm keeps doing it (and maybe a test for "one more" where llvm doesn't? r=me with that added, or if it proves too difficult to be worthwhile |
I've added a couple codegen tests that different kinds of @bors r=Mark-Simulacrum rollup=iffy I've intentionally not included a negative codegen test, because those are really hard to detect anything meaningful, and it's not clear to me that them breaking for an improved implementation is more valuable than troublesome. Let me know if there's something specific you'd like to see, though, and I'm happy to add things. |
📌 Commit 2830dbd has been approved by |
☀️ Test successful - checks-actions |
Finished benchmarking commit (6b6c1ff): comparison url. Summary: This benchmark run did not return any relevant results. If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. @rustbot label: -perf-regression |
@scottmcm It seems that constant-folding optimization with arrays even more fragile because if there is another call to If you add to your vec-calloc.rs test this function, your codegen test would fail.
It seems that we should either measure performance gain from checking array (which is tricky since there is different allocators available with different performance) or limit our implementation by very low size of array like 8 for u8 or 3 for u32. Also, there is a little issue for me that IsZero trait is now lying: it says that |
Also, godbolt link with compilation results. |
r? @Mark-Simulacrum
@nbdd0121 pointed out in #95362 (comment) that LLVM currently doesn't constant-fold the
IsZero
check for long arrays, so that seems like a reasonable justification for limiting it.It appears that it's based on length, not byte size, (https://godbolt.org/z/4s48Y81dP), so that's what I used in the PR. Maybe it's a "the number of inlining shall be three" sort of situation.
Certainly there's more that could be done here -- that generated code that checks long arrays byte-by-byte is highly suboptimal, for example -- but this is an easy, low-risk tweak.