-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LV] Consider whether vscale is a known power of two for iteration check #144963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LV] Consider whether vscale is a known power of two for iteration check #144963
Conversation
Going mostly by the comment here - but it says "vscale is not necessarily a power-of-2". Both in tree targets have vscale as a power of two, and we have an existing TTI hook for that.
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-vectorizers Author: Philip Reames (preames) ChangesGoing mostly by the comment here - but it says "vscale is not necessarily a power-of-2". Both in tree targets have vscale as a power of two, and we have an existing TTI hook for that. Patch is 74.93 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144963.diff 19 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 2f4416d2782e8..fd23e438553ee 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -2434,7 +2434,7 @@ Value *InnerLoopVectorizer::createIterationCountCheck(ElementCount VF,
// check is known to be true, or known to be false.
CheckMinIters = Builder.CreateICmp(P, Count, Step, "min.iters.check");
} // else step known to be < trip count, use CheckMinIters preset to false.
- } else if (VF.isScalable() &&
+ } else if (VF.isScalable() && !TTI->isVScaleKnownToBeAPowerOfTwo() &&
!isIndvarOverflowCheckKnownFalse(Cost, VF, UF) &&
Style != TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck) {
// vscale is not necessarily a power-of-2, which means we cannot guarantee
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll b/llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll
index 71d03afa6b6f1..8a28196fd4f7e 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll
@@ -54,11 +54,7 @@ define void @simple_memset_tailfold(i32 %val, ptr %ptr, i64 %n) "target-features
; DATA-LABEL: @simple_memset_tailfold(
; DATA-NEXT: entry:
; DATA-NEXT: [[UMAX:%.*]] = call i64 @llvm.umax.i64(i64 [[N:%.*]], i64 1)
-; DATA-NEXT: [[TMP0:%.*]] = sub i64 -1, [[UMAX]]
-; DATA-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; DATA-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; DATA-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; DATA-NEXT: br i1 [[TMP3]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; DATA-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; DATA: vector.ph:
; DATA-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; DATA-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -98,11 +94,7 @@ define void @simple_memset_tailfold(i32 %val, ptr %ptr, i64 %n) "target-features
; DATA_NO_LANEMASK-LABEL: @simple_memset_tailfold(
; DATA_NO_LANEMASK-NEXT: entry:
; DATA_NO_LANEMASK-NEXT: [[UMAX:%.*]] = call i64 @llvm.umax.i64(i64 [[N:%.*]], i64 1)
-; DATA_NO_LANEMASK-NEXT: [[TMP0:%.*]] = sub i64 -1, [[UMAX]]
-; DATA_NO_LANEMASK-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; DATA_NO_LANEMASK-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; DATA_NO_LANEMASK-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; DATA_NO_LANEMASK-NEXT: br i1 [[TMP3]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; DATA_NO_LANEMASK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; DATA_NO_LANEMASK: vector.ph:
; DATA_NO_LANEMASK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; DATA_NO_LANEMASK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -150,11 +142,7 @@ define void @simple_memset_tailfold(i32 %val, ptr %ptr, i64 %n) "target-features
; DATA_AND_CONTROL-LABEL: @simple_memset_tailfold(
; DATA_AND_CONTROL-NEXT: entry:
; DATA_AND_CONTROL-NEXT: [[UMAX:%.*]] = call i64 @llvm.umax.i64(i64 [[N:%.*]], i64 1)
-; DATA_AND_CONTROL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[UMAX]]
-; DATA_AND_CONTROL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; DATA_AND_CONTROL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; DATA_AND_CONTROL-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; DATA_AND_CONTROL-NEXT: br i1 [[TMP3]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; DATA_AND_CONTROL-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; DATA_AND_CONTROL: vector.ph:
; DATA_AND_CONTROL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; DATA_AND_CONTROL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll b/llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
index 8e90287bac2a2..e85d5579bb332 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
@@ -344,16 +344,12 @@ define i32 @smin(ptr %a, i64 %n, i32 %start) {
;
; IF-EVL-OUTLOOP-LABEL: @smin(
; IF-EVL-OUTLOOP-NEXT: entry:
-; IF-EVL-OUTLOOP-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N:%.*]]
-; IF-EVL-OUTLOOP-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-OUTLOOP-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-OUTLOOP-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; IF-EVL-OUTLOOP-NEXT: br i1 [[TMP3]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; IF-EVL-OUTLOOP-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; IF-EVL-OUTLOOP: vector.ph:
; IF-EVL-OUTLOOP-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-OUTLOOP-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
; IF-EVL-OUTLOOP-NEXT: [[TMP6:%.*]] = sub i64 [[TMP5]], 1
-; IF-EVL-OUTLOOP-NEXT: [[N_RND_UP:%.*]] = add i64 [[N]], [[TMP6]]
+; IF-EVL-OUTLOOP-NEXT: [[N_RND_UP:%.*]] = add i64 [[N:%.*]], [[TMP6]]
; IF-EVL-OUTLOOP-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP5]]
; IF-EVL-OUTLOOP-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
; IF-EVL-OUTLOOP-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
@@ -401,16 +397,12 @@ define i32 @smin(ptr %a, i64 %n, i32 %start) {
;
; IF-EVL-INLOOP-LABEL: @smin(
; IF-EVL-INLOOP-NEXT: entry:
-; IF-EVL-INLOOP-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N:%.*]]
-; IF-EVL-INLOOP-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-INLOOP-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-INLOOP-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; IF-EVL-INLOOP-NEXT: br i1 [[TMP3]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; IF-EVL-INLOOP-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; IF-EVL-INLOOP: vector.ph:
; IF-EVL-INLOOP-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-INLOOP-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
; IF-EVL-INLOOP-NEXT: [[TMP6:%.*]] = sub i64 [[TMP5]], 1
-; IF-EVL-INLOOP-NEXT: [[N_RND_UP:%.*]] = add i64 [[N]], [[TMP6]]
+; IF-EVL-INLOOP-NEXT: [[N_RND_UP:%.*]] = add i64 [[N:%.*]], [[TMP6]]
; IF-EVL-INLOOP-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP5]]
; IF-EVL-INLOOP-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
; IF-EVL-INLOOP-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll b/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
index 3a929f3e43a73..a5f2f9f5345d6 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
@@ -10,11 +10,7 @@ define void @type_info_cache_clobber(ptr %dstv, ptr %src, i64 %wide.trip.count)
; CHECK-SAME: ptr [[DSTV:%.*]], ptr [[SRC:%.*]], i64 [[WIDE_TRIP_COUNT:%.*]]) #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: [[ENTRY:.*]]:
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[WIDE_TRIP_COUNT]], 1
-; CHECK-NEXT: [[TMP1:%.*]] = sub i64 -1, [[TMP0]]
-; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 8
-; CHECK-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP1]], [[TMP3]]
-; CHECK-NEXT: br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; CHECK-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; CHECK: [[VECTOR_MEMCHECK]]:
; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DSTV]], i64 1
; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[WIDE_TRIP_COUNT]], 1
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-call-intrinsics.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-call-intrinsics.ll
index 3128e40144e30..a89c469921fe6 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-call-intrinsics.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-call-intrinsics.ll
@@ -16,12 +16,7 @@ define void @vp_smax(ptr %a, ptr %b, ptr %c, i64 %N) {
; IF-EVL-NEXT: [[C3:%.*]] = ptrtoint ptr [[C]] to i64
; IF-EVL-NEXT: [[B2:%.*]] = ptrtoint ptr [[B]] to i64
; IF-EVL-NEXT: [[A1:%.*]] = ptrtoint ptr [[A]] to i64
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 13, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP22:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP22]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -130,12 +125,7 @@ define void @vp_smin(ptr %a, ptr %b, ptr %c, i64 %N) {
; IF-EVL-NEXT: [[C3:%.*]] = ptrtoint ptr [[C]] to i64
; IF-EVL-NEXT: [[B2:%.*]] = ptrtoint ptr [[B]] to i64
; IF-EVL-NEXT: [[A1:%.*]] = ptrtoint ptr [[A]] to i64
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 13, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP22:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP22]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -244,12 +234,7 @@ define void @vp_umax(ptr %a, ptr %b, ptr %c, i64 %N) {
; IF-EVL-NEXT: [[C3:%.*]] = ptrtoint ptr [[C]] to i64
; IF-EVL-NEXT: [[B2:%.*]] = ptrtoint ptr [[B]] to i64
; IF-EVL-NEXT: [[A1:%.*]] = ptrtoint ptr [[A]] to i64
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 13, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP22:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP22]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -358,12 +343,7 @@ define void @vp_umin(ptr %a, ptr %b, ptr %c, i64 %N) {
; IF-EVL-NEXT: [[C3:%.*]] = ptrtoint ptr [[C]] to i64
; IF-EVL-NEXT: [[B2:%.*]] = ptrtoint ptr [[B]] to i64
; IF-EVL-NEXT: [[A1:%.*]] = ptrtoint ptr [[A]] to i64
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 13, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP22:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP22]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -472,11 +452,7 @@ define void @vp_ctlz(ptr %a, ptr %b, i64 %N) {
; IF-EVL-NEXT: [[ENTRY:.*]]:
; IF-EVL-NEXT: [[B2:%.*]] = ptrtoint ptr [[B]] to i64
; IF-EVL-NEXT: [[A1:%.*]] = ptrtoint ptr [[A]] to i64
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; IF-EVL-NEXT: br i1 [[TMP3]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -571,11 +547,7 @@ define void @vp_cttz(ptr %a, ptr %b, i64 %N) {
; IF-EVL-NEXT: [[ENTRY:.*]]:
; IF-EVL-NEXT: [[B2:%.*]] = ptrtoint ptr [[B]] to i64
; IF-EVL-NEXT: [[A1:%.*]] = ptrtoint ptr [[A]] to i64
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
-; IF-EVL-NEXT: br i1 [[TMP3]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -670,12 +642,7 @@ define void @vp_lrint(ptr %a, ptr %b, i64 %N) {
; IF-EVL-NEXT: [[ENTRY:.*]]:
; IF-EVL-NEXT: [[B2:%.*]] = ptrtoint ptr [[B]] to i64
; IF-EVL-NEXT: [[A1:%.*]] = ptrtoint ptr [[A]] to i64
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 9, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP22:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP22]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -778,12 +745,7 @@ define void @vp_llrint(ptr %a, ptr %b, i64 %N) {
; IF-EVL-NEXT: [[ENTRY:.*]]:
; IF-EVL-NEXT: [[B2:%.*]] = ptrtoint ptr [[B]] to i64
; IF-EVL-NEXT: [[A1:%.*]] = ptrtoint ptr [[A]] to i64
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 9, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP22:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP22]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
@@ -886,12 +848,7 @@ define void @vp_abs(ptr %a, ptr %b, i64 %N) {
; IF-EVL-NEXT: [[ENTRY:.*]]:
; IF-EVL-NEXT: [[B2:%.*]] = ptrtoint ptr [[B]] to i64
; IF-EVL-NEXT: [[A1:%.*]] = ptrtoint ptr [[A]] to i64
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 8, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP19:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP19]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll
index 50b600f8e8bda..0672df2b3d80c 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll
@@ -13,12 +13,7 @@ define void @vp_sext(ptr %a, ptr %b, i64 %N) {
; IF-EVL-LABEL: define void @vp_sext(
; IF-EVL-SAME: ptr [[A:%.*]], ptr [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
; IF-EVL-NEXT: [[ENTRY:.*]]:
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 2
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 18, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP5:%.*]] = shl i64 [[N]], 3
; IF-EVL-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP5]]
@@ -112,12 +107,7 @@ define void @vp_zext(ptr %a, ptr %b, i64 %N) {
; IF-EVL-LABEL: define void @vp_zext(
; IF-EVL-SAME: ptr [[A:%.*]], ptr [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
; IF-EVL-NEXT: [[ENTRY:.*]]:
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 2
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 18, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP5:%.*]] = shl i64 [[N]], 3
; IF-EVL-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP5]]
@@ -211,12 +201,7 @@ define void @vp_trunc(ptr %a, ptr %b, i64 %N) {
; IF-EVL-LABEL: define void @vp_trunc(
; IF-EVL-SAME: ptr [[A:%.*]], ptr [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
; IF-EVL-NEXT: [[ENTRY:.*]]:
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 2
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 18, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP5:%.*]] = shl i64 [[N]], 2
; IF-EVL-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP5]]
@@ -310,12 +295,7 @@ define void @vp_fpext(ptr %a, ptr %b, i64 %N) {
; IF-EVL-LABEL: define void @vp_fpext(
; IF-EVL-SAME: ptr [[A:%.*]], ptr [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
; IF-EVL-NEXT: [[ENTRY:.*]]:
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 2
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 14, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
; IF-EVL: [[VECTOR_MEMCHECK]]:
; IF-EVL-NEXT: [[TMP5:%.*]] = shl i64 [[N]], 3
; IF-EVL-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP5]]
@@ -409,12 +389,7 @@ define void @vp_fptrunc(ptr %a, ptr %b, i64 %N) {
; IF-EVL-LABEL: define void @vp_fptrunc(
; IF-EVL-SAME: ptr [[A:%.*]], ptr [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
; IF-EVL-NEXT: [[ENTRY:.*]]:
-; IF-EVL-NEXT: [[TMP0:%.*]] = sub i64 -1, [[N]]
-; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
-; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 2
-; IF-EVL-NEXT: [[TMP3:%.*]] = call i64 @llvm.umax.i64(i64 14, i64 [[TMP2]])
-; IF-EVL-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP0]], [[TMP3]]
-; IF-EVL-NEXT: br i1 [[TMP4]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; IF-EVL-NEXT: br i1 false, label %[[SCALAR_PH:...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -2434,7 +2434,7 @@ Value *InnerLoopVectorizer::createIterationCountCheck(ElementCount VF, | |||
// check is known to be true, or known to be false. | |||
CheckMinIters = Builder.CreateICmp(P, Count, Step, "min.iters.check"); | |||
} // else step known to be < trip count, use CheckMinIters preset to false. | |||
} else if (VF.isScalable() && | |||
} else if (VF.isScalable() && !TTI->isVScaleKnownToBeAPowerOfTwo() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if the code here is now exercised at all in any tests? I understand the rationale for adding the check, but I'm a bit worried that there is no longer anything to defend it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should hopefully be covered by tests using -force-target-supports-scalable-vectors=true -scalable-vectorization=on
w/o target. @preames would be good to double check that's indeed the case before landing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did some checking and I couldn't find any tests that cover this, we should probably add one then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
@@ -2434,7 +2434,7 @@ Value *InnerLoopVectorizer::createIterationCountCheck(ElementCount VF, | |||
// check is known to be true, or known to be false. | |||
CheckMinIters = Builder.CreateICmp(P, Count, Step, "min.iters.check"); | |||
} // else step known to be < trip count, use CheckMinIters preset to false. | |||
} else if (VF.isScalable() && | |||
} else if (VF.isScalable() && !TTI->isVScaleKnownToBeAPowerOfTwo() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should hopefully be covered by tests using -force-target-supports-scalable-vectors=true -scalable-vectorization=on
w/o target. @preames would be good to double check that's indeed the case before landing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if we can add a test for that code path
@@ -2434,7 +2434,7 @@ Value *InnerLoopVectorizer::createIterationCountCheck(ElementCount VF, | |||
// check is known to be true, or known to be false. | |||
CheckMinIters = Builder.CreateICmp(P, Count, Step, "min.iters.check"); | |||
} // else step known to be < trip count, use CheckMinIters preset to false. | |||
} else if (VF.isScalable() && | |||
} else if (VF.isScalable() && !TTI->isVScaleKnownToBeAPowerOfTwo() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did some checking and I couldn't find any tests that cover this, we should probably add one then.
I don't believe this is possible to test on a target without scalable vector support. I tried, but the costs returned are all invalid, and thus we refuse to vectorize entirely. (Which seems like the correct result) Would folks rather I land this with the code path untested, or convert the check to an assertion? |
Does |
Nope, I continue to get "remark: :0:0: UserVF ignored because of invalid costs.". My attempt in case anyone has suggestions:
|
Fair enough, I don't have any better ideas. I'm fine to convert it to an assertion or to leave it untested |
If it really is impossible to test, then surely we should just delete the entire block of code to avoid having to maintain something that's impossible to test? What do you think @fhahn? |
+1, I'm thinking that the assertion would just be |
Thinking about this now, the problem may be because the target also has to support masked loads and stores. I'm going to see what happens if I can vectorise a loop with memory ops. |
I took an existing test from Transforms/LoopVectorize/AArch64/tail-folding-styles.ll and modified it:
I then ran opt
and it seems to now hit that code path. It might be a little fragile because there are no memory ops in there and in future I can imagine we might decide there are no vector operations in the loop and abandon vectorisation early on. I think the problem is finding a way to force the target also supporting masked loads and stores. |
If the load is guaranteed to be dereferenceable it will be treated as a normal load, so this also should work:
|
I had just pushed a change which turned this into an assert instead, but am happy to go with whatever reviewers prefer. Honestly, I'm wondering if we should just update LangRef to require vscale to be a power of two. If a target comes along for which this property doesn't hold, we could just relax it then. |
The last time this was discussed the conclusion was that if we specify vscale must be a power-of-two then realistically we can never walk it back because it'll be too hard to find the places that rely on the assumption. The compromise was to change the definition of |
I am increasingly thinking this was the wrong decision. The only hardware we know of has power-of-two sizes, and keeping complexity for a potential future use case is generally undesirable. Finding the places that assume power of two again (if ever needed) doesn't seem that hard - compared to implementing any reasonable complicated feature at least. |
This reverts commit d4351da.
I landed this in the original form (i.e. making the check conditional) with the suggested test. I'm going to post a separate patch to continue the discussion on whether vscale should just be defined as a power of two. |
See #145098 for the followup discussion. |
…eck (llvm#144963) Going mostly by the comment here - but it says "vscale is not necessarily a power-of-2". Both in tree targets have vscale as a power of two, and we have an existing TTI hook for that.
Going mostly by the comment here - but it says "vscale is not necessarily a power-of-2". Both in tree targets have vscale as a power of two, and we have an existing TTI hook for that.