Skip to content

Iteration over arrays until end pointer (C++ vector iterator, Rust iterators) doesn't preserve number of iterations for the optimizer #48309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sdroege opened this issue Jan 30, 2021 · 3 comments
Labels
bugzilla Issues migrated from bugzilla llvm:optimizations

Comments

@sdroege
Copy link

sdroege commented Jan 30, 2021

Bugzilla Link 48965
Version 11.0
OS Linux
CC @dwblaikie,@fhahn,@jrmuizel,@LebedevRI,@nikic

Extended Description

This was originally reported here: rust-lang/rust#75935

In the end it boils down to the following minimal testcases in C/C++/Rust.

Godbolt links: https://godbolt.org/z/s6reoz and https://rust.godbolt.org/z/a7f6rj

All of them have in common that the iteration happens not by having a counter but instead iterating until the end pointer. When additionally counting the iterations, the maximum (or here: exact) number of iterations does not seem to be available during optimizations. As such the two assertions are not detected as dead code and not optimized away. In this specific minimal case, the whole function should've been possible to optimize away.

The testcase is extremely contrived but it also happens in real code (the Rust issue contains something closer to my original code) and causes missed optimization opportunities, and is especially problematic with Rust because of automatic bounds checks for array indexing that can be impossible to optimize away because of this and then preventing other optimizations (like auto-vectorization or loop unrolling) to kick in.

C

void foo(const uint32_t *y, size_t y_len) {
const uint32_t *y_end = y + y_len;
size_t c = 0;
for (const uint32_t *y_iter = y; y_iter != y_end; y_iter++, c++) {
assert(c < y_len);
}
assert(c == y_len);
}

C++

void foo(const std::vector<uint32_t>& y) {
size_t c = 0;
for (auto y_iter = y.cbegin(); y_iter != y.cend(); y_iter++, c++) {
assert(c < y.size());
}
assert(c == y.size());
}

Rust

pub fn foo(y: &[u32]) {
let mut x = 0;
for (c, _y) in y.iter().enumerate() {
assert!(c < y.len());
x = c;
}
assert!(x == y.len());
}

@nikic
Copy link
Contributor

nikic commented Jan 31, 2021

https://llvm.godbolt.org/z/YThjs9

define i64 @​test.i8(i8* %p.base, i64 %len) {
entry:
%p.end = getelementptr inbounds i8, i8* %p.base, i64 %len
%g1 = icmp ne i64 %len, 0
br i1 %g1, label %header, label %trap

header:
%p = phi i8* [ %p.base, %entry ], [ %p.inc, %latch ]
%i = phi i64 [ 0, %entry ], [ %i.inc, %latch]
%p.inc = getelementptr inbounds i8, i8* %p, i64 1
%i.inc = add nsw nuw i64 %i, 1
%g2 = icmp ult i64 %i, %len
br i1 %g2, label %latch, label %trap

latch:
%c = icmp ne i8* %p.inc, %p.end
br i1 %c, label %header, label %exit

exit:
ret i64 %i

trap:
ret i64 -1
}

define i64 @​test.i32(i32* %p.base, i64 %len) {
entry:
%p.end = getelementptr inbounds i32, i32* %p.base, i64 %len
%g1 = icmp ne i64 %len, 0
br i1 %g1, label %header, label %trap

header:
%p = phi i32* [ %p.base, %entry ], [ %p.inc, %latch ]
%i = phi i64 [ 0, %entry ], [ %i.inc, %latch]
%p.inc = getelementptr inbounds i32, i32* %p, i64 1
%i.inc = add nsw nuw i64 %i, 1
%g = icmp ult i64 %i, %len
br i1 %g, label %latch, label %trap

latch:
%c = icmp ne i32* %p.inc, %p.end
br i1 %c, label %header, label %exit

exit:
ret i64 %i

trap:
ret i64 -1
}

We fold the condition in the first case, but not the second. In both cases an unnecessary umin is retained for the result calculation.

@nikic
Copy link
Contributor

nikic commented Jan 31, 2021

I think the main problem here is the signed GEP semantics. The backedge taken counts for the two examples are:

((-1 + %len) umin %len)
(((-4 + (4 * %len)) /u 4) umin %len)

The first is not great in that it should fold to just %len once %len != 0 is taken into account, but at least we still catch this when folding the exit condition.

The second one we simply can't fold, even knowing that %len != 0, because we only have a flag on the multiplication, not a flag.

I'm wondering if it might be worthwhile to add a flag to GEP that indicates that the offsets are unsigned... certainly not the the only place where the current signed semantics cause issues.

@fhahn
Copy link
Contributor

fhahn commented Feb 20, 2021

I put up a patch that tries to improve the reasoning in SCEV to handle a slight variation of the C/C++ case, where unsigned is used for y_len. In that case there's a zext which makes it obvious that y_len must be positive: https://reviews.llvm.org/D97122

As Nikita mentioned, for index types that match the pointer width, we need a way to convey the fact that the values are unsigned. Once we are able to handle the ZExt case, I am planning on trying to put something together.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla llvm:optimizations
Projects
None yet
Development

No branches or pull requests

4 participants