Skip to content

Tweak the vec-calloc runtime check to only apply to shortish-arrays #96596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 2, 2022

Conversation

scottmcm
Copy link
Member

@scottmcm scottmcm commented May 1, 2022

r? @Mark-Simulacrum

@nbdd0121 pointed out in #95362 (comment) that LLVM currently doesn't constant-fold the IsZero check for long arrays, so that seems like a reasonable justification for limiting it.

It appears that it's based on length, not byte size, (https://godbolt.org/z/4s48Y81dP), so that's what I used in the PR. Maybe it's a "the number of inlining shall be three" sort of situation.

Certainly there's more that could be done here -- that generated code that checks long arrays byte-by-byte is highly suboptimal, for example -- but this is an easy, low-risk tweak.

@rust-highfive

This comment was marked as resolved.

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label May 1, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 1, 2022
@Mark-Simulacrum
Copy link
Member

Mark-Simulacrum commented May 1, 2022

Would a codegen test be feasible to make sure llvm keeps doing it (and maybe a test for "one more" where llvm doesn't?

r=me with that added, or if it proves too difficult to be worthwhile

@scottmcm
Copy link
Member Author

scottmcm commented May 2, 2022

I've added a couple codegen tests that different kinds of vec![ZERO; n] fold away the non-zeroed allocation calls. I'm marking rollup=iffy because while they pass locally and in CI, the variety of builders in the full bors set is somewhat more prone to failing codegen tests.

@bors r=Mark-Simulacrum rollup=iffy

I've intentionally not included a negative codegen test, because those are really hard to detect anything meaningful, and it's not clear to me that them breaking for an improved implementation is more valuable than troublesome. Let me know if there's something specific you'd like to see, though, and I'm happy to add things.

@bors
Copy link
Collaborator

bors commented May 2, 2022

📌 Commit 2830dbd has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 2, 2022
@bors
Copy link
Collaborator

bors commented May 2, 2022

⌛ Testing commit 2830dbd with merge 6b6c1ff...

@bors
Copy link
Collaborator

bors commented May 2, 2022

☀️ Test successful - checks-actions
Approved by: Mark-Simulacrum
Pushing 6b6c1ff to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label May 2, 2022
@bors bors merged commit 6b6c1ff into rust-lang:master May 2, 2022
@rustbot rustbot added this to the 1.62.0 milestone May 2, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (6b6c1ff): comparison url.

Summary: This benchmark run did not return any relevant results.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

@AngelicosPhosphoros
Copy link
Contributor

@scottmcm It seems that constant-folding optimization with arrays even more fragile because if there is another call to Vec::from_element with same type parameter, it prefers not to inline it so constant folding breaks.

If you add to your vec-calloc.rs test this function, your codegen test would fail.

// CHECK-LABEL: @vec_one_array_32
#[no_mangle]
pub fn vec_one_array_32(n: usize) -> Vec<[i64; 32]> {
    vec![[1_i64; 32]; n]
}

It seems that we should either measure performance gain from checking array (which is tricky since there is different allocators available with different performance) or limit our implementation by very low size of array like 8 for u8 or 3 for u32.

Also, there is a little issue for me that IsZero trait is now lying: it says that [0;1024] is not zero. Maybe we should rename trait somehow?

@AngelicosPhosphoros
Copy link
Contributor

Also, godbolt link with compilation results.
godbolt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants