[VPlan] Materialize constant vector trip counts before final opts. #142309

fhahn · 2025-06-01T14:24:28Z

Materialize constant vector trip counts before ::execute, if the trip count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now this excludes when the tail is folded or scalar epilogues are required.

This enables removing a number of redundant branches from the middle block.

For now this is also only done when not vectorizing the epilogue, as the simplification complicates stitching the 2 plans together.

llvmbot · 2025-06-01T14:25:06Z

@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-backend-systemz

@llvm/pr-subscribers-backend-risc-v

Author: Florian Hahn (fhahn)

Changes

Materialize constant vector trip counts before ::execute, if the trip count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now this excludes when the tail is folded or scalar epilogues are required.

This enables removing a number of redundant branches from the middle block.

For now this is also only done when not vectorizing the epilogue, as the simplification complicates stitching the 2 plans together.

Patch is 624.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142309.diff

165 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+20)
(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+2-1)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+1-3)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll (+9-9)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll (+4-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/invariant-replicate-region.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/low_trip_count_predicates.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-mixed.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-neon.ll (+16-21)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product.ll (+13-13)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-no-dotprod.ll (+5-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/synthesize-mask-for-call.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-metadata.ll (+5-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-remove-loop-region.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-unroll.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll (+24-24)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-insertelt.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AMDGPU/packed-math.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/ARM/mve-reduction-predselect.ll (+6-8)
(modified) llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/ARM/pointer_iv.ll (+24-24)
(modified) llvm/test/Transforms/LoopVectorize/ARM/tail-folding-not-allowed.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/LoongArch/defaults.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/first-order-recurrence-scalable-vf1.ll (+4-5)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll (+42-42)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/partial-reduce-dot-product.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/safe-dep-distance.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+17-17)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-safe-dep-distance.ll (+17-17)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vf-will-not-generate-any-vector-insts.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/addressing.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/scalar-steps-with-users-demanding-all-lanes-and-first-lane-only.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/cost-constant-known-via-scev.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+10-10)
(modified) llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/fminimumnum.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/X86/gep-use-outside-loop.ll (+5-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/imprecise-through-phis.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll (+15-16)
(modified) llvm/test/Transforms/LoopVectorize/X86/interleave-cost.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/interleave-ptradd-with-replicated-operand.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/X86/interleaving.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/X86/limit-vf-by-tripcount.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll (+51-51)
(modified) llvm/test/Transforms/LoopVectorize/X86/masked-store-cost.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll (+45-45)
(modified) llvm/test/Transforms/LoopVectorize/X86/metadata-enable.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/optsize.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/parallel-loops.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr109581-unused-blend.ll (+4-18)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr131359-dead-for-splice.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr34438.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr36524.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr51366-sunk-instruction-used-outside-of-loop.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/X86/reduction-fastmath.ll (+9-9)
(modified) llvm/test/Transforms/LoopVectorize/X86/replicate-recipe-with-only-first-lane-used.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/replicate-uniform-call.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/strided_load_cost.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll (+27-31)
(modified) llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-interleaved-accesses-gap.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/widened-value-used-as-scalar-and-first-lane.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/x86-predication.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/blend-in-header.ll (+12-16)
(modified) llvm/test/Transforms/LoopVectorize/bsd_regex.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/constantfolder-infer-correct-gepty.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/constantfolder.ll (+14-14)
(modified) llvm/test/Transforms/LoopVectorize/create-induction-resume.ll (+3-5)
(modified) llvm/test/Transforms/LoopVectorize/dead_instructions.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/debugloc-optimize-vfuf-term.ll (+13-13)
(modified) llvm/test/Transforms/LoopVectorize/dereferenceable-info-from-assumption-constant-size.ll (+32-32)
(modified) llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/expand-scev-after-invoke.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/extract-from-end-vector-constant.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll (+16-16)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-multiply-recurrences.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+39-46)
(modified) llvm/test/Transforms/LoopVectorize/float-induction.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/float-minmax-instruction-flag.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/forked-pointers.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/induction-multiple-uses-in-same-instruction.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/induction-step.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/induction.ll (+62-68)
(modified) llvm/test/Transforms/LoopVectorize/instruction-only-used-outside-of-loop.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/interleave-with-i65-induction.ll (+4-5)
(modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses-different-insert-position.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+9-9)
(modified) llvm/test/Transforms/LoopVectorize/invalidate-scev-at-scope-after-vectorization.ll (+20-24)
(modified) llvm/test/Transforms/LoopVectorize/is_fpclass.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-trunc.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/iv_outside_user.ll (+121-74)
(modified) llvm/test/Transforms/LoopVectorize/lcssa-crashes.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll (+15-15)
(modified) llvm/test/Transforms/LoopVectorize/load-deref-pred-neg-off.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/load-of-struct-deref-pred.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/make-followup-loop-id.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/metadata.ll (+16-16)
(modified) llvm/test/Transforms/LoopVectorize/minimumnum-maximumnum-reductions.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/multiple-address-spaces.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/no_outside_user.ll (+28-29)
(modified) llvm/test/Transforms/LoopVectorize/optsize.ll (+23-24)
(modified) llvm/test/Transforms/LoopVectorize/phi-cost.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/pointer-induction-index-width-smaller-than-iv-width.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/pointer-induction.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/pr39417-optsize-scevchecks.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/pr44488-predication.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/pr47343-expander-lcssa-after-cfg-update.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/pr50686.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/pr55167-fold-tail-live-out.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/pr58811-scev-expansion.ll (+9-12)
(modified) llvm/test/Transforms/LoopVectorize/pr66616.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/predicate-switch.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-min-max.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-uf4.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/reduction.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/remarks-reduction-inloop.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/reverse_induction.ll (+18-21)
(modified) llvm/test/Transforms/LoopVectorize/runtime-check.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/runtime-checks-difference-simplifications.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/scev-exit-phi-invalidation.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/scev-predicate-reasoning.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/select-reduction-start-value-may-be-undef-or-poison.ll (+9-9)
(modified) llvm/test/Transforms/LoopVectorize/select-reduction.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll (+10-10)
(modified) llvm/test/Transforms/LoopVectorize/single_early_exit.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/single_early_exit_with_outer_loop.ll (+3-9)
(modified) llvm/test/Transforms/LoopVectorize/tail-folding-optimize-vector-induction-width.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/trunc-extended-icmps.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/trunc-loads-p16.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/trunc-reductions.ll (+6-7)
(modified) llvm/test/Transforms/LoopVectorize/trunc-shifts.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/uitofp-preserve-nneg.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/uniform-blend.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll (+38-38)
(modified) llvm/test/Transforms/LoopVectorize/unused-blend-mask-for-first-operand.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-early-exit.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-outside-iv-users.ll (+10-10)
(modified) llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll (+16-16)
(modified) llvm/test/Transforms/LoopVectorize/widen-gep-all-indices-invariant.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/widen-intrinsic.ll (+2-2)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index e9ace195684b3..67062f36c6986 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7560,6 +7560,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   VPlanTransforms::runPass(VPlanTransforms::materializeBroadcasts, BestVPlan);
   VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);
   VPlanTransforms::simplifyRecipes(BestVPlan, *Legal->getWidestInductionType());
+  VPlanTransforms::removeBranchOnConst(BestVPlan);
   VPlanTransforms::narrowInterleaveGroups(
       BestVPlan, BestVF,
       TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector));
@@ -10536,6 +10537,25 @@ bool LoopVectorizePass::processLoop(Loop *L) {
         InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width,
                                VF.MinProfitableTripCount, IC, &CM, BFI, PSI,
                                Checks, BestPlan);
+
+        // Materialize vector trip counts for constants early if it can simply
+        // be computed as (Original TC / VF * UF) * VF * UF.
+        if (BestPlan.hasScalarTail() &&
+            !CM.requiresScalarEpilogue(VF.Width.isVector())) {
+          VPValue *TC = BestPlan.getTripCount();
+          if (TC->isLiveIn()) {
+            ScalarEvolution &SE = *PSE.getSE();
+            auto *TCScev = SE.getSCEV(TC->getLiveInIRValue());
+            const SCEV *VFxUF =
+                SE.getElementCount(TCScev->getType(), VF.Width * IC);
+            auto VecTCScev =
+                SE.getMulExpr(SE.getUDivExpr(TCScev, VFxUF), VFxUF);
+            if (auto *NewC = dyn_cast<SCEVConstant>(VecTCScev))
+              BestPlan.getVectorTripCount().setUnderlyingValue(
+                  NewC->getValue());
+          }
+        }
+
         LVP.executePlan(VF.Width, IC, BestPlan, LB, DT, false);
         ++LoopsVectorized;
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 165b57c87beb1..9a3a08a3e06a9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -941,7 +941,8 @@ void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
     BackedgeTakenCount->setUnderlyingValue(TCMO);
   }
 
-  VectorTripCount.setUnderlyingValue(VectorTripCountV);
+  if (!VectorTripCount.getUnderlyingValue())
+    VectorTripCount.setUnderlyingValue(VectorTripCountV);
 
   IRBuilder<> Builder(State.CFG.PrevBB->getTerminator());
   // FIXME: Model VF * UF computation completely in VPlan.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index b214498229e52..d5f4807b0dfe7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1842,9 +1842,7 @@ void VPlanTransforms::truncateToMinimalBitwidths(
   }
 }
 
-/// Remove BranchOnCond recipes with true or false conditions together with
-/// removing dead edges to their successors.
-static void removeBranchOnConst(VPlan &Plan) {
+void VPlanTransforms::removeBranchOnConst(VPlan &Plan) {
   using namespace llvm::VPlanPatternMatch;
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
            vp_depth_first_shallow(Plan.getEntry()))) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 34e2de4eb3b74..653eeebe1531f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -202,6 +202,10 @@ struct VPlanTransforms {
   /// CanonicalIVTy as type for all un-typed live-ins in VPTypeAnalysis.
   static void simplifyRecipes(VPlan &Plan, Type &CanonicalIVTy);
 
+  /// Remove BranchOnCond recipes with true or false conditions together with
+  /// removing dead edges to their successors.
+  static void removeBranchOnConst(VPlan &Plan);
+
   /// If there's a single exit block, optimize its phi recipes that use exiting
   /// IV values by feeding them precomputed end values instead, possibly taken
   /// one step backwards.
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
index 43b942458a39e..ff3d43eb2d52f 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
@@ -368,7 +368,7 @@ define void @test_blend_feeding_replicated_store_2(ptr noalias %src, ptr %dst, i
 ; CHECK-NEXT:    [[TMP71:%.*]] = icmp eq i32 [[INDEX_NEXT]], 96
 ; CHECK-NEXT:    br i1 [[TMP71]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[SCALAR_PH]]
 ; CHECK:       [[SCALAR_PH]]:
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 96, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_HEADER:.*]]
@@ -388,7 +388,7 @@ define void @test_blend_feeding_replicated_store_2(ptr noalias %src, ptr %dst, i
 ; CHECK:       [[LOOP_LATCH]]:
 ; CHECK-NEXT:    [[IV_NEXT]] = add i32 [[IV1]], 1
 ; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV_NEXT]], 100
-; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP5:![0-9]+]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret void
 ;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
index 8c2a48aa38695..1cc4af7c4a3dd 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
@@ -32,7 +32,7 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
 ; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <2 x i64> [[WIDE_LOAD1]], i32 1
-; CHECK-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[SCALAR_PH]]
 ; CHECK:       [[SCALAR_PH]]:
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 100, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    [[SCALAR_RECUR_INIT:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
@@ -47,7 +47,7 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    store i64 [[OR]], ptr [[GEP_DST]], align 8
 ; CHECK-NEXT:    [[IV_NEXT]] = add i64 [[IV]], 1
 ; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV]], 100
-; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret void
 ;
@@ -86,9 +86,9 @@ define void @powi_call(ptr %P) {
 ; CHECK-NEXT:    store <2 x double> [[TMP3]], ptr [[TMP4]], align 8
 ; CHECK-NEXT:    br label %[[MIDDLE_BLOCK:.*]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 2, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP:.*]]
 ; CHECK:       [[LOOP]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
index f36161703dba5..35ad81fce40ad 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
@@ -680,7 +680,7 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; DEFAULT-NEXT:    [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
 ; DEFAULT-NEXT:    br i1 [[TMP5]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
 ; DEFAULT:       [[MIDDLE_BLOCK]]:
-; DEFAULT-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; DEFAULT-NEXT:    br label %[[SCALAR_PH]]
 ; DEFAULT:       [[SCALAR_PH]]:
 ; DEFAULT-NEXT:    [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], %[[MIDDLE_BLOCK]] ], [ [[DST]], %[[ENTRY]] ]
 ; DEFAULT-NEXT:    [[BC_RESUME_VAL1:%.*]] = phi i64 [ 512, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
@@ -696,7 +696,7 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; DEFAULT-NEXT:    [[PTR_IV_NEXT]] = getelementptr i8, ptr [[PTR_IV]], i64 8
 ; DEFAULT-NEXT:    [[IV_CLAMP:%.*]] = and i64 [[IV]], 4294967294
 ; DEFAULT-NEXT:    [[EC:%.*]] = icmp eq i64 [[IV_CLAMP]], 512
-; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP23:![0-9]+]]
+; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP23:![0-9]+]]
 ; DEFAULT:       [[EXIT]]:
 ; DEFAULT-NEXT:    ret void
 ;
@@ -1492,7 +1492,7 @@ define void @redundant_branch_and_tail_folding(ptr %dst, i1 %c) {
 ; DEFAULT-NEXT:    [[TMP3:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
 ; DEFAULT-NEXT:    br i1 [[TMP3]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP28:![0-9]+]]
 ; DEFAULT:       [[MIDDLE_BLOCK]]:
-; DEFAULT-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; DEFAULT-NEXT:    br label %[[SCALAR_PH]]
 ; DEFAULT:       [[SCALAR_PH]]:
 ; DEFAULT-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 16, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; DEFAULT-NEXT:    br label %[[LOOP_HEADER:.*]]
@@ -1506,7 +1506,7 @@ define void @redundant_branch_and_tail_folding(ptr %dst, i1 %c) {
 ; DEFAULT-NEXT:    [[T:%.*]] = trunc nuw nsw i64 [[IV_NEXT]] to i32
 ; DEFAULT-NEXT:    store i32 [[T]], ptr [[DST]], align 4
 ; DEFAULT-NEXT:    [[EC:%.*]] = icmp eq i64 [[IV_NEXT]], 21
-; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP29:![0-9]+]]
+; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP29:![0-9]+]]
 ; DEFAULT:       [[EXIT]]:
 ; DEFAULT-NEXT:    ret void
 ;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
index 73ef8534bb66c..06e6306da2368 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
@@ -469,7 +469,7 @@ define void @old_and_new_size_equalko(ptr noalias %src, ptr noalias %dst) {
 ; CHECK-NEXT:    [[TMP3:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1000
 ; CHECK-NEXT:    br i1 [[TMP3]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       scalar.ph:
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll b/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
index e28c79eac1e5c..6adb5470e1dc4 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
@@ -72,9 +72,9 @@ define void @check_widen_intrinsic_with_nnan(ptr noalias %dst.0, ptr noalias %ds
 ; CHECK-NEXT:    [[TMP34:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
 ; CHECK-NEXT:    br i1 [[TMP34]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 1000, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_HEADER:.*]]
 ; CHECK:       [[LOOP_HEADER]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll b/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll
index 403a5f186a8d6..98e52098d9766 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll
@@ -40,9 +40,9 @@ define void @fmin32(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -121,9 +121,9 @@ define void @fmax32(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -202,9 +202,9 @@ define void @fmin64(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -283,9 +283,9 @@ define void @fmax64(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -364,9 +364,9 @@ define void @fmin16(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP9]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -445,9 +445,9 @@ define void @fmax16(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP9]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
index 095ac222e1789..0214c4188314f 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
@@ -19,11 +19,11 @@ define double @test_reduction_costs() {
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
 ; CHECK-NEXT:    br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 2, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
-; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi double [ [[TMP0]], %[[MIDDLE_BLOCK]] ], [ 0.000000e+00, %[[ENTRY]] ]
-; CHECK-NEXT:    [[BC_MERGE_RDX2:%.*]] = phi double [ [[TMP1]], %[[MIDDLE_BLOCK]] ], [ 0.000000e+00, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi double [ 0.000000e+00, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_MERGE_RDX2:%.*]] = phi double [ 0.000000e+00, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_1:.*]]
 ; CHECK:       [[LOOP_1]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_1]] ]
@@ -103,11 +103,11 @@ define void @test_iv_cost(ptr %ptr.start, i8 %a, i64 %b) {
 ; CHECK-NEXT:    [[IND_END5:%.*]] = getelementptr i8, ptr [[PTR_START]], i64 [[N_VEC3]]
 ; CHECK-NEXT:    br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
 ; CHECK:       [[VEC_EPILOG_VECTOR_BODY]]:
-; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], %[[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT10:%.*]], %[[VEC_EPILOG_VECTOR_BODY]] ]
-; CHECK-NEXT:    [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[PTR_START]], i64 [[INDEX]]
+; CHECK-NEXT:    [[INDEX4:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], %[[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT10:%.*]], %[[VEC_EPILOG_VECTOR_BODY]] ]
+; CHECK-NEXT:    [[NEXT_...
[truncated]

llvmbot · 2025-06-01T14:25:07Z

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Materialize constant vector trip counts before ::execute, if the trip count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now this excludes when the tail is folded or scalar epilogues are required.

This enables removing a number of redundant branches from the middle block.

For now this is also only done when not vectorizing the epilogue, as the simplification complicates stitching the 2 plans together.

Patch is 624.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142309.diff

165 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+20)
(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+2-1)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+1-3)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll (+9-9)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll (+4-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/invariant-replicate-region.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/low_trip_count_predicates.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-mixed.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-neon.ll (+16-21)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product.ll (+13-13)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-no-dotprod.ll (+5-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/synthesize-mask-for-call.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-metadata.ll (+5-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-remove-loop-region.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-unroll.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll (+24-24)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-insertelt.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AMDGPU/packed-math.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/ARM/mve-reduction-predselect.ll (+6-8)
(modified) llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/ARM/pointer_iv.ll (+24-24)
(modified) llvm/test/Transforms/LoopVectorize/ARM/tail-folding-not-allowed.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/LoongArch/defaults.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/first-order-recurrence-scalable-vf1.ll (+4-5)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll (+42-42)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/partial-reduce-dot-product.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/safe-dep-distance.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+17-17)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-safe-dep-distance.ll (+17-17)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vf-will-not-generate-any-vector-insts.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/addressing.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/scalar-steps-with-users-demanding-all-lanes-and-first-lane-only.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/cost-constant-known-via-scev.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+10-10)
(modified) llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/fminimumnum.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/X86/gep-use-outside-loop.ll (+5-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/imprecise-through-phis.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll (+15-16)
(modified) llvm/test/Transforms/LoopVectorize/X86/interleave-cost.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/interleave-ptradd-with-replicated-operand.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/X86/interleaving.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/X86/limit-vf-by-tripcount.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll (+51-51)
(modified) llvm/test/Transforms/LoopVectorize/X86/masked-store-cost.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll (+45-45)
(modified) llvm/test/Transforms/LoopVectorize/X86/metadata-enable.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/optsize.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/parallel-loops.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr109581-unused-blend.ll (+4-18)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr131359-dead-for-splice.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr34438.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr36524.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr51366-sunk-instruction-used-outside-of-loop.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/X86/reduction-fastmath.ll (+9-9)
(modified) llvm/test/Transforms/LoopVectorize/X86/replicate-recipe-with-only-first-lane-used.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/replicate-uniform-call.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/strided_load_cost.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll (+27-31)
(modified) llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-interleaved-accesses-gap.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/widened-value-used-as-scalar-and-first-lane.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/x86-predication.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/blend-in-header.ll (+12-16)
(modified) llvm/test/Transforms/LoopVectorize/bsd_regex.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/constantfolder-infer-correct-gepty.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/constantfolder.ll (+14-14)
(modified) llvm/test/Transforms/LoopVectorize/create-induction-resume.ll (+3-5)
(modified) llvm/test/Transforms/LoopVectorize/dead_instructions.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/debugloc-optimize-vfuf-term.ll (+13-13)
(modified) llvm/test/Transforms/LoopVectorize/dereferenceable-info-from-assumption-constant-size.ll (+32-32)
(modified) llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/expand-scev-after-invoke.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/extract-from-end-vector-constant.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll (+16-16)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-multiply-recurrences.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+39-46)
(modified) llvm/test/Transforms/LoopVectorize/float-induction.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/float-minmax-instruction-flag.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/forked-pointers.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/induction-multiple-uses-in-same-instruction.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/induction-step.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/induction.ll (+62-68)
(modified) llvm/test/Transforms/LoopVectorize/instruction-only-used-outside-of-loop.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/interleave-with-i65-induction.ll (+4-5)
(modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses-different-insert-position.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+9-9)
(modified) llvm/test/Transforms/LoopVectorize/invalidate-scev-at-scope-after-vectorization.ll (+20-24)
(modified) llvm/test/Transforms/LoopVectorize/is_fpclass.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-trunc.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/iv_outside_user.ll (+121-74)
(modified) llvm/test/Transforms/LoopVectorize/lcssa-crashes.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll (+15-15)
(modified) llvm/test/Transforms/LoopVectorize/load-deref-pred-neg-off.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/load-of-struct-deref-pred.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/make-followup-loop-id.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/metadata.ll (+16-16)
(modified) llvm/test/Transforms/LoopVectorize/minimumnum-maximumnum-reductions.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/multiple-address-spaces.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/no_outside_user.ll (+28-29)
(modified) llvm/test/Transforms/LoopVectorize/optsize.ll (+23-24)
(modified) llvm/test/Transforms/LoopVectorize/phi-cost.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/pointer-induction-index-width-smaller-than-iv-width.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/pointer-induction.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/pr39417-optsize-scevchecks.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/pr44488-predication.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/pr47343-expander-lcssa-after-cfg-update.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/pr50686.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/pr55167-fold-tail-live-out.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/pr58811-scev-expansion.ll (+9-12)
(modified) llvm/test/Transforms/LoopVectorize/pr66616.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/predicate-switch.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-min-max.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-uf4.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/reduction.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/remarks-reduction-inloop.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/reverse_induction.ll (+18-21)
(modified) llvm/test/Transforms/LoopVectorize/runtime-check.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/runtime-checks-difference-simplifications.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/scev-exit-phi-invalidation.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/scev-predicate-reasoning.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/select-reduction-start-value-may-be-undef-or-poison.ll (+9-9)
(modified) llvm/test/Transforms/LoopVectorize/select-reduction.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll (+10-10)
(modified) llvm/test/Transforms/LoopVectorize/single_early_exit.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/single_early_exit_with_outer_loop.ll (+3-9)
(modified) llvm/test/Transforms/LoopVectorize/tail-folding-optimize-vector-induction-width.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/trunc-extended-icmps.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/trunc-loads-p16.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/trunc-reductions.ll (+6-7)
(modified) llvm/test/Transforms/LoopVectorize/trunc-shifts.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/uitofp-preserve-nneg.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/uniform-blend.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll (+38-38)
(modified) llvm/test/Transforms/LoopVectorize/unused-blend-mask-for-first-operand.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-early-exit.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-outside-iv-users.ll (+10-10)
(modified) llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll (+16-16)
(modified) llvm/test/Transforms/LoopVectorize/widen-gep-all-indices-invariant.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/widen-intrinsic.ll (+2-2)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index e9ace195684b3..67062f36c6986 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7560,6 +7560,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   VPlanTransforms::runPass(VPlanTransforms::materializeBroadcasts, BestVPlan);
   VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);
   VPlanTransforms::simplifyRecipes(BestVPlan, *Legal->getWidestInductionType());
+  VPlanTransforms::removeBranchOnConst(BestVPlan);
   VPlanTransforms::narrowInterleaveGroups(
       BestVPlan, BestVF,
       TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector));
@@ -10536,6 +10537,25 @@ bool LoopVectorizePass::processLoop(Loop *L) {
         InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width,
                                VF.MinProfitableTripCount, IC, &CM, BFI, PSI,
                                Checks, BestPlan);
+
+        // Materialize vector trip counts for constants early if it can simply
+        // be computed as (Original TC / VF * UF) * VF * UF.
+        if (BestPlan.hasScalarTail() &&
+            !CM.requiresScalarEpilogue(VF.Width.isVector())) {
+          VPValue *TC = BestPlan.getTripCount();
+          if (TC->isLiveIn()) {
+            ScalarEvolution &SE = *PSE.getSE();
+            auto *TCScev = SE.getSCEV(TC->getLiveInIRValue());
+            const SCEV *VFxUF =
+                SE.getElementCount(TCScev->getType(), VF.Width * IC);
+            auto VecTCScev =
+                SE.getMulExpr(SE.getUDivExpr(TCScev, VFxUF), VFxUF);
+            if (auto *NewC = dyn_cast<SCEVConstant>(VecTCScev))
+              BestPlan.getVectorTripCount().setUnderlyingValue(
+                  NewC->getValue());
+          }
+        }
+
         LVP.executePlan(VF.Width, IC, BestPlan, LB, DT, false);
         ++LoopsVectorized;
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 165b57c87beb1..9a3a08a3e06a9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -941,7 +941,8 @@ void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
     BackedgeTakenCount->setUnderlyingValue(TCMO);
   }
 
-  VectorTripCount.setUnderlyingValue(VectorTripCountV);
+  if (!VectorTripCount.getUnderlyingValue())
+    VectorTripCount.setUnderlyingValue(VectorTripCountV);
 
   IRBuilder<> Builder(State.CFG.PrevBB->getTerminator());
   // FIXME: Model VF * UF computation completely in VPlan.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index b214498229e52..d5f4807b0dfe7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1842,9 +1842,7 @@ void VPlanTransforms::truncateToMinimalBitwidths(
   }
 }
 
-/// Remove BranchOnCond recipes with true or false conditions together with
-/// removing dead edges to their successors.
-static void removeBranchOnConst(VPlan &Plan) {
+void VPlanTransforms::removeBranchOnConst(VPlan &Plan) {
   using namespace llvm::VPlanPatternMatch;
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
            vp_depth_first_shallow(Plan.getEntry()))) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 34e2de4eb3b74..653eeebe1531f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -202,6 +202,10 @@ struct VPlanTransforms {
   /// CanonicalIVTy as type for all un-typed live-ins in VPTypeAnalysis.
   static void simplifyRecipes(VPlan &Plan, Type &CanonicalIVTy);
 
+  /// Remove BranchOnCond recipes with true or false conditions together with
+  /// removing dead edges to their successors.
+  static void removeBranchOnConst(VPlan &Plan);
+
   /// If there's a single exit block, optimize its phi recipes that use exiting
   /// IV values by feeding them precomputed end values instead, possibly taken
   /// one step backwards.
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
index 43b942458a39e..ff3d43eb2d52f 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
@@ -368,7 +368,7 @@ define void @test_blend_feeding_replicated_store_2(ptr noalias %src, ptr %dst, i
 ; CHECK-NEXT:    [[TMP71:%.*]] = icmp eq i32 [[INDEX_NEXT]], 96
 ; CHECK-NEXT:    br i1 [[TMP71]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[SCALAR_PH]]
 ; CHECK:       [[SCALAR_PH]]:
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 96, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_HEADER:.*]]
@@ -388,7 +388,7 @@ define void @test_blend_feeding_replicated_store_2(ptr noalias %src, ptr %dst, i
 ; CHECK:       [[LOOP_LATCH]]:
 ; CHECK-NEXT:    [[IV_NEXT]] = add i32 [[IV1]], 1
 ; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV_NEXT]], 100
-; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP5:![0-9]+]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret void
 ;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
index 8c2a48aa38695..1cc4af7c4a3dd 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
@@ -32,7 +32,7 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
 ; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <2 x i64> [[WIDE_LOAD1]], i32 1
-; CHECK-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[SCALAR_PH]]
 ; CHECK:       [[SCALAR_PH]]:
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 100, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    [[SCALAR_RECUR_INIT:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
@@ -47,7 +47,7 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    store i64 [[OR]], ptr [[GEP_DST]], align 8
 ; CHECK-NEXT:    [[IV_NEXT]] = add i64 [[IV]], 1
 ; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV]], 100
-; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret void
 ;
@@ -86,9 +86,9 @@ define void @powi_call(ptr %P) {
 ; CHECK-NEXT:    store <2 x double> [[TMP3]], ptr [[TMP4]], align 8
 ; CHECK-NEXT:    br label %[[MIDDLE_BLOCK:.*]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 2, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP:.*]]
 ; CHECK:       [[LOOP]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
index f36161703dba5..35ad81fce40ad 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
@@ -680,7 +680,7 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; DEFAULT-NEXT:    [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
 ; DEFAULT-NEXT:    br i1 [[TMP5]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
 ; DEFAULT:       [[MIDDLE_BLOCK]]:
-; DEFAULT-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; DEFAULT-NEXT:    br label %[[SCALAR_PH]]
 ; DEFAULT:       [[SCALAR_PH]]:
 ; DEFAULT-NEXT:    [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], %[[MIDDLE_BLOCK]] ], [ [[DST]], %[[ENTRY]] ]
 ; DEFAULT-NEXT:    [[BC_RESUME_VAL1:%.*]] = phi i64 [ 512, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
@@ -696,7 +696,7 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; DEFAULT-NEXT:    [[PTR_IV_NEXT]] = getelementptr i8, ptr [[PTR_IV]], i64 8
 ; DEFAULT-NEXT:    [[IV_CLAMP:%.*]] = and i64 [[IV]], 4294967294
 ; DEFAULT-NEXT:    [[EC:%.*]] = icmp eq i64 [[IV_CLAMP]], 512
-; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP23:![0-9]+]]
+; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP23:![0-9]+]]
 ; DEFAULT:       [[EXIT]]:
 ; DEFAULT-NEXT:    ret void
 ;
@@ -1492,7 +1492,7 @@ define void @redundant_branch_and_tail_folding(ptr %dst, i1 %c) {
 ; DEFAULT-NEXT:    [[TMP3:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
 ; DEFAULT-NEXT:    br i1 [[TMP3]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP28:![0-9]+]]
 ; DEFAULT:       [[MIDDLE_BLOCK]]:
-; DEFAULT-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; DEFAULT-NEXT:    br label %[[SCALAR_PH]]
 ; DEFAULT:       [[SCALAR_PH]]:
 ; DEFAULT-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 16, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; DEFAULT-NEXT:    br label %[[LOOP_HEADER:.*]]
@@ -1506,7 +1506,7 @@ define void @redundant_branch_and_tail_folding(ptr %dst, i1 %c) {
 ; DEFAULT-NEXT:    [[T:%.*]] = trunc nuw nsw i64 [[IV_NEXT]] to i32
 ; DEFAULT-NEXT:    store i32 [[T]], ptr [[DST]], align 4
 ; DEFAULT-NEXT:    [[EC:%.*]] = icmp eq i64 [[IV_NEXT]], 21
-; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP29:![0-9]+]]
+; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP29:![0-9]+]]
 ; DEFAULT:       [[EXIT]]:
 ; DEFAULT-NEXT:    ret void
 ;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
index 73ef8534bb66c..06e6306da2368 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
@@ -469,7 +469,7 @@ define void @old_and_new_size_equalko(ptr noalias %src, ptr noalias %dst) {
 ; CHECK-NEXT:    [[TMP3:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1000
 ; CHECK-NEXT:    br i1 [[TMP3]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       scalar.ph:
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll b/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
index e28c79eac1e5c..6adb5470e1dc4 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
@@ -72,9 +72,9 @@ define void @check_widen_intrinsic_with_nnan(ptr noalias %dst.0, ptr noalias %ds
 ; CHECK-NEXT:    [[TMP34:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
 ; CHECK-NEXT:    br i1 [[TMP34]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 1000, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_HEADER:.*]]
 ; CHECK:       [[LOOP_HEADER]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll b/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll
index 403a5f186a8d6..98e52098d9766 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll
@@ -40,9 +40,9 @@ define void @fmin32(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -121,9 +121,9 @@ define void @fmax32(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -202,9 +202,9 @@ define void @fmin64(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -283,9 +283,9 @@ define void @fmax64(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -364,9 +364,9 @@ define void @fmin16(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP9]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -445,9 +445,9 @@ define void @fmax16(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP9]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
index 095ac222e1789..0214c4188314f 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
@@ -19,11 +19,11 @@ define double @test_reduction_costs() {
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
 ; CHECK-NEXT:    br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 2, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
-; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi double [ [[TMP0]], %[[MIDDLE_BLOCK]] ], [ 0.000000e+00, %[[ENTRY]] ]
-; CHECK-NEXT:    [[BC_MERGE_RDX2:%.*]] = phi double [ [[TMP1]], %[[MIDDLE_BLOCK]] ], [ 0.000000e+00, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi double [ 0.000000e+00, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_MERGE_RDX2:%.*]] = phi double [ 0.000000e+00, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_1:.*]]
 ; CHECK:       [[LOOP_1]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_1]] ]
@@ -103,11 +103,11 @@ define void @test_iv_cost(ptr %ptr.start, i8 %a, i64 %b) {
 ; CHECK-NEXT:    [[IND_END5:%.*]] = getelementptr i8, ptr [[PTR_START]], i64 [[N_VEC3]]
 ; CHECK-NEXT:    br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
 ; CHECK:       [[VEC_EPILOG_VECTOR_BODY]]:
-; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], %[[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT10:%.*]], %[[VEC_EPILOG_VECTOR_BODY]] ]
-; CHECK-NEXT:    [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[PTR_START]], i64 [[INDEX]]
+; CHECK-NEXT:    [[INDEX4:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], %[[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT10:%.*]], %[[VEC_EPILOG_VECTOR_BODY]] ]
+; CHECK-NEXT:    [[NEXT_...
[truncated]

llvmbot · 2025-06-01T14:25:07Z

@llvm/pr-subscribers-backend-amdgpu

Author: Florian Hahn (fhahn)

Changes

Materialize constant vector trip counts before ::execute, if the trip count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now this excludes when the tail is folded or scalar epilogues are required.

This enables removing a number of redundant branches from the middle block.

For now this is also only done when not vectorizing the epilogue, as the simplification complicates stitching the 2 plans together.

Patch is 624.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142309.diff

165 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+20)
(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+2-1)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+1-3)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll (+9-9)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll (+4-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/invariant-replicate-region.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/low_trip_count_predicates.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-mixed.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-neon.ll (+16-21)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product.ll (+13-13)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-no-dotprod.ll (+5-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/synthesize-mask-for-call.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-metadata.ll (+5-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-remove-loop-region.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-unroll.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll (+24-24)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-insertelt.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/AMDGPU/packed-math.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/ARM/mve-reduction-predselect.ll (+6-8)
(modified) llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/ARM/pointer_iv.ll (+24-24)
(modified) llvm/test/Transforms/LoopVectorize/ARM/tail-folding-not-allowed.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/LoongArch/defaults.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/first-order-recurrence-scalable-vf1.ll (+4-5)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll (+42-42)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/partial-reduce-dot-product.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/safe-dep-distance.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+17-17)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-safe-dep-distance.ll (+17-17)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vf-will-not-generate-any-vector-insts.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/addressing.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/scalar-steps-with-users-demanding-all-lanes-and-first-lane-only.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/cost-constant-known-via-scev.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+10-10)
(modified) llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/fminimumnum.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/X86/gep-use-outside-loop.ll (+5-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/imprecise-through-phis.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll (+15-16)
(modified) llvm/test/Transforms/LoopVectorize/X86/interleave-cost.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/interleave-ptradd-with-replicated-operand.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/X86/interleaving.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/X86/limit-vf-by-tripcount.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll (+51-51)
(modified) llvm/test/Transforms/LoopVectorize/X86/masked-store-cost.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll (+45-45)
(modified) llvm/test/Transforms/LoopVectorize/X86/metadata-enable.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/optsize.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/parallel-loops.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr109581-unused-blend.ll (+4-18)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr131359-dead-for-splice.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr34438.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr36524.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr51366-sunk-instruction-used-outside-of-loop.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/X86/reduction-fastmath.ll (+9-9)
(modified) llvm/test/Transforms/LoopVectorize/X86/replicate-recipe-with-only-first-lane-used.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/replicate-uniform-call.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/strided_load_cost.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll (+27-31)
(modified) llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-interleaved-accesses-gap.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/widened-value-used-as-scalar-and-first-lane.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/x86-predication.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/blend-in-header.ll (+12-16)
(modified) llvm/test/Transforms/LoopVectorize/bsd_regex.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/constantfolder-infer-correct-gepty.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/constantfolder.ll (+14-14)
(modified) llvm/test/Transforms/LoopVectorize/create-induction-resume.ll (+3-5)
(modified) llvm/test/Transforms/LoopVectorize/dead_instructions.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/debugloc-optimize-vfuf-term.ll (+13-13)
(modified) llvm/test/Transforms/LoopVectorize/dereferenceable-info-from-assumption-constant-size.ll (+32-32)
(modified) llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/expand-scev-after-invoke.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/extract-from-end-vector-constant.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll (+16-16)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-multiply-recurrences.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+39-46)
(modified) llvm/test/Transforms/LoopVectorize/float-induction.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/float-minmax-instruction-flag.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/forked-pointers.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/induction-multiple-uses-in-same-instruction.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/induction-step.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/induction.ll (+62-68)
(modified) llvm/test/Transforms/LoopVectorize/instruction-only-used-outside-of-loop.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/interleave-with-i65-induction.ll (+4-5)
(modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses-different-insert-position.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+9-9)
(modified) llvm/test/Transforms/LoopVectorize/invalidate-scev-at-scope-after-vectorization.ll (+20-24)
(modified) llvm/test/Transforms/LoopVectorize/is_fpclass.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-trunc.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/iv_outside_user.ll (+121-74)
(modified) llvm/test/Transforms/LoopVectorize/lcssa-crashes.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll (+15-15)
(modified) llvm/test/Transforms/LoopVectorize/load-deref-pred-neg-off.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/load-of-struct-deref-pred.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/make-followup-loop-id.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/metadata.ll (+16-16)
(modified) llvm/test/Transforms/LoopVectorize/minimumnum-maximumnum-reductions.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/multiple-address-spaces.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/no_outside_user.ll (+28-29)
(modified) llvm/test/Transforms/LoopVectorize/optsize.ll (+23-24)
(modified) llvm/test/Transforms/LoopVectorize/phi-cost.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/pointer-induction-index-width-smaller-than-iv-width.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/pointer-induction.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/pr39417-optsize-scevchecks.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/pr44488-predication.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/pr47343-expander-lcssa-after-cfg-update.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/pr50686.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/pr55167-fold-tail-live-out.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/pr58811-scev-expansion.ll (+9-12)
(modified) llvm/test/Transforms/LoopVectorize/pr66616.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/predicate-switch.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-min-max.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-uf4.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/reduction.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/remarks-reduction-inloop.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/reverse_induction.ll (+18-21)
(modified) llvm/test/Transforms/LoopVectorize/runtime-check.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/runtime-checks-difference-simplifications.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/scev-exit-phi-invalidation.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/scev-predicate-reasoning.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/select-reduction-start-value-may-be-undef-or-poison.ll (+9-9)
(modified) llvm/test/Transforms/LoopVectorize/select-reduction.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll (+10-10)
(modified) llvm/test/Transforms/LoopVectorize/single_early_exit.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/single_early_exit_with_outer_loop.ll (+3-9)
(modified) llvm/test/Transforms/LoopVectorize/tail-folding-optimize-vector-induction-width.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/trunc-extended-icmps.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/trunc-loads-p16.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/trunc-reductions.ll (+6-7)
(modified) llvm/test/Transforms/LoopVectorize/trunc-shifts.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/uitofp-preserve-nneg.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/uniform-blend.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll (+38-38)
(modified) llvm/test/Transforms/LoopVectorize/unused-blend-mask-for-first-operand.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-early-exit.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-outside-iv-users.ll (+10-10)
(modified) llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll (+16-16)
(modified) llvm/test/Transforms/LoopVectorize/widen-gep-all-indices-invariant.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/widen-intrinsic.ll (+2-2)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index e9ace195684b3..67062f36c6986 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7560,6 +7560,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   VPlanTransforms::runPass(VPlanTransforms::materializeBroadcasts, BestVPlan);
   VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);
   VPlanTransforms::simplifyRecipes(BestVPlan, *Legal->getWidestInductionType());
+  VPlanTransforms::removeBranchOnConst(BestVPlan);
   VPlanTransforms::narrowInterleaveGroups(
       BestVPlan, BestVF,
       TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector));
@@ -10536,6 +10537,25 @@ bool LoopVectorizePass::processLoop(Loop *L) {
         InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width,
                                VF.MinProfitableTripCount, IC, &CM, BFI, PSI,
                                Checks, BestPlan);
+
+        // Materialize vector trip counts for constants early if it can simply
+        // be computed as (Original TC / VF * UF) * VF * UF.
+        if (BestPlan.hasScalarTail() &&
+            !CM.requiresScalarEpilogue(VF.Width.isVector())) {
+          VPValue *TC = BestPlan.getTripCount();
+          if (TC->isLiveIn()) {
+            ScalarEvolution &SE = *PSE.getSE();
+            auto *TCScev = SE.getSCEV(TC->getLiveInIRValue());
+            const SCEV *VFxUF =
+                SE.getElementCount(TCScev->getType(), VF.Width * IC);
+            auto VecTCScev =
+                SE.getMulExpr(SE.getUDivExpr(TCScev, VFxUF), VFxUF);
+            if (auto *NewC = dyn_cast<SCEVConstant>(VecTCScev))
+              BestPlan.getVectorTripCount().setUnderlyingValue(
+                  NewC->getValue());
+          }
+        }
+
         LVP.executePlan(VF.Width, IC, BestPlan, LB, DT, false);
         ++LoopsVectorized;
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 165b57c87beb1..9a3a08a3e06a9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -941,7 +941,8 @@ void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
     BackedgeTakenCount->setUnderlyingValue(TCMO);
   }
 
-  VectorTripCount.setUnderlyingValue(VectorTripCountV);
+  if (!VectorTripCount.getUnderlyingValue())
+    VectorTripCount.setUnderlyingValue(VectorTripCountV);
 
   IRBuilder<> Builder(State.CFG.PrevBB->getTerminator());
   // FIXME: Model VF * UF computation completely in VPlan.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index b214498229e52..d5f4807b0dfe7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1842,9 +1842,7 @@ void VPlanTransforms::truncateToMinimalBitwidths(
   }
 }
 
-/// Remove BranchOnCond recipes with true or false conditions together with
-/// removing dead edges to their successors.
-static void removeBranchOnConst(VPlan &Plan) {
+void VPlanTransforms::removeBranchOnConst(VPlan &Plan) {
   using namespace llvm::VPlanPatternMatch;
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
            vp_depth_first_shallow(Plan.getEntry()))) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 34e2de4eb3b74..653eeebe1531f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -202,6 +202,10 @@ struct VPlanTransforms {
   /// CanonicalIVTy as type for all un-typed live-ins in VPTypeAnalysis.
   static void simplifyRecipes(VPlan &Plan, Type &CanonicalIVTy);
 
+  /// Remove BranchOnCond recipes with true or false conditions together with
+  /// removing dead edges to their successors.
+  static void removeBranchOnConst(VPlan &Plan);
+
   /// If there's a single exit block, optimize its phi recipes that use exiting
   /// IV values by feeding them precomputed end values instead, possibly taken
   /// one step backwards.
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
index 43b942458a39e..ff3d43eb2d52f 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
@@ -368,7 +368,7 @@ define void @test_blend_feeding_replicated_store_2(ptr noalias %src, ptr %dst, i
 ; CHECK-NEXT:    [[TMP71:%.*]] = icmp eq i32 [[INDEX_NEXT]], 96
 ; CHECK-NEXT:    br i1 [[TMP71]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[SCALAR_PH]]
 ; CHECK:       [[SCALAR_PH]]:
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 96, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_HEADER:.*]]
@@ -388,7 +388,7 @@ define void @test_blend_feeding_replicated_store_2(ptr noalias %src, ptr %dst, i
 ; CHECK:       [[LOOP_LATCH]]:
 ; CHECK-NEXT:    [[IV_NEXT]] = add i32 [[IV1]], 1
 ; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV_NEXT]], 100
-; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP5:![0-9]+]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret void
 ;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
index 8c2a48aa38695..1cc4af7c4a3dd 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
@@ -32,7 +32,7 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
 ; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <2 x i64> [[WIDE_LOAD1]], i32 1
-; CHECK-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[SCALAR_PH]]
 ; CHECK:       [[SCALAR_PH]]:
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 100, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    [[SCALAR_RECUR_INIT:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
@@ -47,7 +47,7 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    store i64 [[OR]], ptr [[GEP_DST]], align 8
 ; CHECK-NEXT:    [[IV_NEXT]] = add i64 [[IV]], 1
 ; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV]], 100
-; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret void
 ;
@@ -86,9 +86,9 @@ define void @powi_call(ptr %P) {
 ; CHECK-NEXT:    store <2 x double> [[TMP3]], ptr [[TMP4]], align 8
 ; CHECK-NEXT:    br label %[[MIDDLE_BLOCK:.*]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 2, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP:.*]]
 ; CHECK:       [[LOOP]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
index f36161703dba5..35ad81fce40ad 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
@@ -680,7 +680,7 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; DEFAULT-NEXT:    [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
 ; DEFAULT-NEXT:    br i1 [[TMP5]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
 ; DEFAULT:       [[MIDDLE_BLOCK]]:
-; DEFAULT-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; DEFAULT-NEXT:    br label %[[SCALAR_PH]]
 ; DEFAULT:       [[SCALAR_PH]]:
 ; DEFAULT-NEXT:    [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], %[[MIDDLE_BLOCK]] ], [ [[DST]], %[[ENTRY]] ]
 ; DEFAULT-NEXT:    [[BC_RESUME_VAL1:%.*]] = phi i64 [ 512, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
@@ -696,7 +696,7 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; DEFAULT-NEXT:    [[PTR_IV_NEXT]] = getelementptr i8, ptr [[PTR_IV]], i64 8
 ; DEFAULT-NEXT:    [[IV_CLAMP:%.*]] = and i64 [[IV]], 4294967294
 ; DEFAULT-NEXT:    [[EC:%.*]] = icmp eq i64 [[IV_CLAMP]], 512
-; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP23:![0-9]+]]
+; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP23:![0-9]+]]
 ; DEFAULT:       [[EXIT]]:
 ; DEFAULT-NEXT:    ret void
 ;
@@ -1492,7 +1492,7 @@ define void @redundant_branch_and_tail_folding(ptr %dst, i1 %c) {
 ; DEFAULT-NEXT:    [[TMP3:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
 ; DEFAULT-NEXT:    br i1 [[TMP3]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP28:![0-9]+]]
 ; DEFAULT:       [[MIDDLE_BLOCK]]:
-; DEFAULT-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; DEFAULT-NEXT:    br label %[[SCALAR_PH]]
 ; DEFAULT:       [[SCALAR_PH]]:
 ; DEFAULT-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 16, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; DEFAULT-NEXT:    br label %[[LOOP_HEADER:.*]]
@@ -1506,7 +1506,7 @@ define void @redundant_branch_and_tail_folding(ptr %dst, i1 %c) {
 ; DEFAULT-NEXT:    [[T:%.*]] = trunc nuw nsw i64 [[IV_NEXT]] to i32
 ; DEFAULT-NEXT:    store i32 [[T]], ptr [[DST]], align 4
 ; DEFAULT-NEXT:    [[EC:%.*]] = icmp eq i64 [[IV_NEXT]], 21
-; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP29:![0-9]+]]
+; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP29:![0-9]+]]
 ; DEFAULT:       [[EXIT]]:
 ; DEFAULT-NEXT:    ret void
 ;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
index 73ef8534bb66c..06e6306da2368 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
@@ -469,7 +469,7 @@ define void @old_and_new_size_equalko(ptr noalias %src, ptr noalias %dst) {
 ; CHECK-NEXT:    [[TMP3:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1000
 ; CHECK-NEXT:    br i1 [[TMP3]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       scalar.ph:
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll b/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
index e28c79eac1e5c..6adb5470e1dc4 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
@@ -72,9 +72,9 @@ define void @check_widen_intrinsic_with_nnan(ptr noalias %dst.0, ptr noalias %ds
 ; CHECK-NEXT:    [[TMP34:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
 ; CHECK-NEXT:    br i1 [[TMP34]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 1000, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_HEADER:.*]]
 ; CHECK:       [[LOOP_HEADER]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll b/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll
index 403a5f186a8d6..98e52098d9766 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll
@@ -40,9 +40,9 @@ define void @fmin32(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -121,9 +121,9 @@ define void @fmax32(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -202,9 +202,9 @@ define void @fmin64(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -283,9 +283,9 @@ define void @fmax64(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -364,9 +364,9 @@ define void @fmin16(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP9]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -445,9 +445,9 @@ define void @fmax16(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP9]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
index 095ac222e1789..0214c4188314f 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
@@ -19,11 +19,11 @@ define double @test_reduction_costs() {
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
 ; CHECK-NEXT:    br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 2, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
-; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi double [ [[TMP0]], %[[MIDDLE_BLOCK]] ], [ 0.000000e+00, %[[ENTRY]] ]
-; CHECK-NEXT:    [[BC_MERGE_RDX2:%.*]] = phi double [ [[TMP1]], %[[MIDDLE_BLOCK]] ], [ 0.000000e+00, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi double [ 0.000000e+00, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_MERGE_RDX2:%.*]] = phi double [ 0.000000e+00, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_1:.*]]
 ; CHECK:       [[LOOP_1]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_1]] ]
@@ -103,11 +103,11 @@ define void @test_iv_cost(ptr %ptr.start, i8 %a, i64 %b) {
 ; CHECK-NEXT:    [[IND_END5:%.*]] = getelementptr i8, ptr [[PTR_START]], i64 [[N_VEC3]]
 ; CHECK-NEXT:    br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
 ; CHECK:       [[VEC_EPILOG_VECTOR_BODY]]:
-; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], %[[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT10:%.*]], %[[VEC_EPILOG_VECTOR_BODY]] ]
-; CHECK-NEXT:    [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[PTR_START]], i64 [[INDEX]]
+; CHECK-NEXT:    [[INDEX4:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], %[[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT10:%.*]], %[[VEC_EPILOG_VECTOR_BODY]] ]
+; CHECK-NEXT:    [[NEXT_...
[truncated]

david-arm · 2025-06-09T12:21:03Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+            auto *TCScev = SE.getSCEV(TC->getLiveInIRValue());
+            const SCEV *VFxUF =
+                SE.getElementCount(TCScev->getType(), VF.Width * IC);
+            auto VecTCScev =


I'm a bit confused - even before this patch surely the vector trip count must have already been set to (OriginalTC / (VF * VF)) * (VF * UF), otherwise the vectorised code wouldn't even work? I just wonder if you can simply check that the existing vector trip count is already a constant.

Initially VectorTripCount is an opaque live-in VPValue and we can only compute it once we picked VF and UF. It is set just before executing the VPlan in VPlan::prepareToExecute, so it's available during codegen, but not during earlier VPlan transforms.

With this patch, it now gets set slightly earlier if it is constant so VPlan transform can use it for optimizations.

I updated VPlan::prepareToExecute to also assert the value set earlier matches.

Materialize constant vector trip counts before ::execute, if the trip count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now this excludes when the tail is folded or scalar epilogues are required. This enables removing a number of redundant branches from the middle block. For now this is also only done when not vectorizing the epilogue, as the simplification complicates stitching the 2 plans together.

…stant-vector-tc

ayalz · 2025-06-26T08:30:31Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+        // Materialize vector trip counts for constants early if it can simply
+        // be computed as (Original TC / VF * UF) * VF * UF.


Can this be a VPlanTransform, perhaps part of optimizeForVFAndUF()? In any case worth outlining.

Should requiresScalarEpilogue be known to VPlan, as a step towards modeling the epilog in VPlan along with the main loop?

Can this be a VPlanTransform, perhaps part of optimizeForVFAndUF()? In any case worth outlining.

Yep, moved to a separate transform; for now it must be separate and run here, as more work will be needed to make it work with epilogue vectorization.

Should requiresScalarEpilogue be known to VPlan, as a step towards modeling the epilog in VPlan along with the main loop?

We can check if middle.block unconditionally branches to scalar.ph. Updated, thanks

ayalz · 2025-06-26T08:39:17Z

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll

@@ -970,7 +970,7 @@ loop:
  %red.next = fadd double %for, %red
  %for.next = sitofp i32 %iv to double
  %iv.next = add nsw i32 %iv, 1
-  %ec = icmp eq i32 %iv.next, 1024
+  %ec = icmp eq i32 %iv.next, 1025


This change in the test is intentional - to prevent branch on const folding?

Yep, for the reduction test we should probably make sure that the edge to scalar.ph remains, to make sure resume values are set up correctly.

ayalz · 2025-06-26T08:49:19Z

llvm/test/Transforms/LoopVectorize/iv_outside_user.ll

@@ -90,29 +90,53 @@ for.end:
 }

 define i32 @constpre()  {


Extensive changes here?

Those were caused by not materializing when interleaving; update the patch and additional tests when only interleaving

ayalz · 2025-06-26T08:51:16Z

llvm/test/Transforms/LoopVectorize/make-followup-loop-id.ll

@@ -112,8 +112,7 @@ define void @f(ptr noundef captures(none) %a, float noundef %x) {
 ; CHECK-NEXT:    [[MUL_7:%.*]] = fmul float [[X]], [[LOAD_7]]
 ; CHECK-NEXT:    store float [[MUL_7]], ptr [[ARRAYIDX_7]], align 4
 ; CHECK-NEXT:    [[IV_NEXT_7]] = add nuw nsw i64 [[IV]], 8
-; CHECK-NEXT:    [[COMP_7:%.*]] = icmp eq i64 [[IV_NEXT_7]], 1024
-; CHECK-NEXT:    br i1 [[COMP_7]], label %[[EXIT_LOOPEXIT:.*]], label %[[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT:    br i1 true, label %[[EXIT_LOOPEXIT:.*]], label %[[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]


Simplified to branch-on-true but not removed?

This is due to unrolling (run by the test) now being able to reason about the remaining iterations, after removing an incoming value from the phi.

ayalz · 2025-06-26T08:54:19Z

llvm/test/Transforms/LoopVectorize/select-reduction.ll

-; CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16
-; CHECK-NEXT:    br i1 [[TMP2]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]


There are numerous changes that seem irrelevant.

Snuck in when batch-updating, removed, thanks

ayalz · 2025-06-26T08:55:55Z

llvm/test/Transforms/LoopVectorize/single_early_exit_with_outer_loop.ll

@@ -1,12 +1,11 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5


Are these changes desired?

Changes snuck in by accident, removed, thanks

ayalz · 2025-06-26T08:57:20Z

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll

@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --filter-out-after "scalar.ph:" --version 2


Filter no longer desired, now that the branch to scalar.ph is folded?

Changes made by accident, undone, thanks

…stant-vector-tc

…stant-vector-tc-tmp

github-actions · 2025-06-27T21:59:40Z

✅ With the latest revision this PR passed the C/C++ code formatter.

…stant-vector-tc

ayalz

LGTM, with few final comments.

ayalz · 2025-07-03T09:27:44Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -3120,6 +3115,29 @@ void VPlanTransforms::materializeBroadcasts(VPlan &Plan) {
  }
 }

+void VPlanTransforms::materializeVectorTripCount(


Suggested change

void VPlanTransforms::materializeVectorTripCount(

void VPlanTransforms::materializeConstantVectorTripCount(

?

ayalz · 2025-07-03T15:45:50Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  if (auto *NewC = dyn_cast<SCEVConstant>(VecTCScev))
+    Plan.getVectorTripCount().setUnderlyingValue(NewC->getValue());


Suggested change

if (auto *NewC = dyn_cast<SCEVConstant>(VecTCScev))

Plan.getVectorTripCount().setUnderlyingValue(NewC->getValue());

if (auto *ConstVecTC = dyn_cast<SCEVConstant>(VecTCScev))

Plan.getVectorTripCount().setUnderlyingValue(ConstVecTC->getValue());

ayalz · 2025-07-03T16:10:45Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  assert(Plan.hasUF(BestUF) && "BestUF is not available in Plan");
+
+  VPValue *TC = Plan.getTripCount();
+  // Skip cases for which the trip count may be non-trivial to materialize.


Suggested change

// Skip cases for which the trip count may be non-trivial to materialize.

// Skip cases for which the trip count may be non-trivial to materialize.

// I.e., when a scalar tail is absent - due to tail folding, or when a scalar tail is required.

An absent scalar tail due to knowing that VecTC is a multiple of VF*UF, requires materializing VecTC, so can be ignored here?

ayalz · 2025-07-03T16:14:03Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      !TC->isLiveIn())
+    return;
+  // Materialize vector trip counts for constants early if it can simply
+  // be computed as (Original TC / VF * UF) * VF * UF.


Worth adding a TODO to materialize constant VecTC also when tail is folded and when scalar epilog is required? Computing it then is quite simple too?

ayalz · 2025-07-03T16:39:32Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -7329,6 +7329,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
                             BestVPlan, BestVF);
  VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);
  VPlanTransforms::simplifyRecipes(BestVPlan, *Legal->getWidestInductionType());
+  VPlanTransforms::removeBranchOnConst(BestVPlan);


This is in addition to running it as part of optimize(), rather than instead of?

ayalz · 2025-07-03T16:46:39Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

-      assert((!isa<VPIRPhi>(&R) || RemovedSucc->getNumPredecessors() == 1) &&
-             "VPIRPhis must have a single predecessor");


Now VPIRPhi's of middle-block may be handled here as well, having multiple predecessors, right?

ayalz · 2025-07-03T16:52:27Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  // Materialize vector trip counts for constants early if it can simply
+  // be computed as (Original TC / VF * UF) * VF * UF.
+  ScalarEvolution &SE = *PSE.getSE();
+  auto *TCScev = SE.getSCEV(TC->getLiveInIRValue());


Worth bailing out early if TCScev is not a SCEVConstant?

ayalz · 2025-07-03T19:19:40Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      // TODO: Move to general VPlan pipeline once epilogue loops are also
+      // supported.
+      VPlanTransforms::runPass(VPlanTransforms::materializeVectorTripCount,
+                               BestPlan, VF.Width, IC, PSE);
+


This disables materializeVectorTripCount() from running on both main and epilog loops, when both are vectorized, as opposed to running it in LVP.executePlan() under if (!VectorizingEpilogue) - which would disable it only when vectorizing the epilog loop?

Generalize the code added in llvm#147214 to also support re-using pointer LCSSA phis when expanding integer SCEVs with AddRecs. A common source of integer AddRecs with pointer bases are runtime checks emitted by LV based on the distance between 2 pointer AddRecs. This improves codegen in some cases when vectorizing and prevents regressions with llvm#142309, which turns some phis into single-entry ones, which SCEV will look through now (and expand the whole AddRec), whereas before it would have to treat the LCSSA phi as SCEVUnknown.

Generalize the code added in #147214 to also support re-using pointer LCSSA phis when expanding SCEVs with AddRecs. A common source of integer AddRecs with pointer bases are runtime checks emitted by LV based on the distance between 2 pointer AddRecs. This improves codegen in some cases when vectorizing and prevents regressions with #142309, which turns some phis into single-entry ones, which SCEV will look through now (and expand the whole AddRec), whereas before it would have to treat the LCSSA phi as SCEVUnknown. Compile-time impact neutral: https://llvm-compile-time-tracker.com/compare.php?from=fd5fc76c91538871771be2c3be2ca3a5f2dcac31&to=ca5fc2b3d8e6efc09f1624a17fdbfbe909f14eb4&stat=instructions:u PR: #147824

…Vs. (#147824) Generalize the code added in llvm/llvm-project#147214 to also support re-using pointer LCSSA phis when expanding SCEVs with AddRecs. A common source of integer AddRecs with pointer bases are runtime checks emitted by LV based on the distance between 2 pointer AddRecs. This improves codegen in some cases when vectorizing and prevents regressions with llvm/llvm-project#142309, which turns some phis into single-entry ones, which SCEV will look through now (and expand the whole AddRec), whereas before it would have to treat the LCSSA phi as SCEVUnknown. Compile-time impact neutral: https://llvm-compile-time-tracker.com/compare.php?from=fd5fc76c91538871771be2c3be2ca3a5f2dcac31&to=ca5fc2b3d8e6efc09f1624a17fdbfbe909f14eb4&stat=instructions:u PR: llvm/llvm-project#147824

…stant-vector-tc

Update tests for which checking both the scalar resume and exit values is interesting, because they have first-order recurrences to have variable trip-counts, to avoid the branch in the middle.block being folded away by #142309. For similar reasons, also update check-prof-info.ll

…stant-vector-tc

Update tests for which checking both the scalar resume and exit values is interesting, because they have first-order recurrences to have variable trip-counts, to avoid the branch in the middle.block being folded away by llvm/llvm-project#142309. For similar reasons, also update check-prof-info.ll

…stant-vector-tc

llvm-ci · 2025-07-26T16:28:03Z

LLVM Buildbot has detected a new failure on builder clang-hip-vega20 running on hip-vega20-0 while building llvm at step 3 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/24134

Here is the relevant piece of the build log for the reference

Step 3 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/hip-build.sh --jobs=' (failure)
...
[57/59] Linking CXX executable External/HIP/cmath-hip-6.3.0
[58/59] Building CXX object External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o
[59/59] Linking CXX executable External/HIP/TheNextWeek-hip-6.3.0
@@@BUILD_STEP Testing HIP test-suite@@@
+ build_step 'Testing HIP test-suite'
+ echo '@@@BUILD_STEP Testing HIP test-suite@@@'
+ ninja check-hip-simple
[0/1] cd /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP && /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/llvm-lit -sv array-hip-6.3.0.test empty-hip-6.3.0.test with-fopenmp-hip-6.3.0.test saxpy-hip-6.3.0.test memmove-hip-6.3.0.test split-kernel-args-hip-6.3.0.test builtin-logb-scalbn-hip-6.3.0.test TheNextWeek-hip-6.3.0.test algorithm-hip-6.3.0.test cmath-hip-6.3.0.test complex-hip-6.3.0.test math_h-hip-6.3.0.test new-hip-6.3.0.test blender.test
-- Testing: 14 tests, 14 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90
FAIL: test-suite :: External/HIP/blender.test (14 of 14)
******************** TEST 'test-suite :: External/HIP/blender.test' FAILED ********************

/home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/blender.test.out --redirect-input /dev/null --summary /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/blender.test.time /bin/bash test_blender.sh
/bin/bash verify_blender.sh /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/blender.test.out
Begin Blender test.
TEST_SUITE_HIP_ROOT=/opt/botworker/llvm/External/hip
Render /opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo_release.blend
Blender 4.1.1 (hash e1743a0317bc built 2024-04-15 23:47:45)
Read blend: "/opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo_release.blend"
Could not open as Ogawa file from provided streams.
Unable to open /opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.002", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.003", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.004", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.001", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
Could not open as Ogawa file from provided streams.
Unable to open /opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.002", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.003", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.004", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.001", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
I0726 16:22:05.569768 347166 device.cpp:39] HIPEW initialization succeeded
I0726 16:22:05.571796 347166 device.cpp:45] Found HIPCC hipcc
I0726 16:22:05.638209 347166 device.cpp:207] Device has compute preemption or is not used for display.
I0726 16:22:05.638234 347166 device.cpp:211] Added device "" with id "HIP__0000:a3:00".
I0726 16:22:05.638309 347166 device.cpp:568] Mapped host memory limit set to 536,444,985,344 bytes. (499.60G)
I0726 16:22:05.638552 347166 device_impl.cpp:63] Using AVX2 CPU kernels.
Fra:1 Mem:524.00M (Peak 525.14M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Eyepiece_rim
Fra:1 Mem:524.00M (Peak 525.14M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.004
Fra:1 Mem:524.00M (Peak 525.14M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.012
Fra:1 Mem:524.07M (Peak 525.14M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.018
Fra:1 Mem:524.20M (Peak 525.14M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Cables.004
Fra:1 Mem:524.34M (Peak 525.14M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.026
Fra:1 Mem:524.40M (Peak 525.14M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.039
Fra:1 Mem:525.06M (Peak 525.14M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Hoses.003
Fra:1 Mem:525.58M (Peak 525.58M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Wires
Step 12 (Testing HIP test-suite) failure: Testing HIP test-suite (failure)
@@@BUILD_STEP Testing HIP test-suite@@@
+ build_step 'Testing HIP test-suite'
+ echo '@@@BUILD_STEP Testing HIP test-suite@@@'
+ ninja check-hip-simple
[0/1] cd /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP && /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/llvm-lit -sv array-hip-6.3.0.test empty-hip-6.3.0.test with-fopenmp-hip-6.3.0.test saxpy-hip-6.3.0.test memmove-hip-6.3.0.test split-kernel-args-hip-6.3.0.test builtin-logb-scalbn-hip-6.3.0.test TheNextWeek-hip-6.3.0.test algorithm-hip-6.3.0.test cmath-hip-6.3.0.test complex-hip-6.3.0.test math_h-hip-6.3.0.test new-hip-6.3.0.test blender.test
-- Testing: 14 tests, 14 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90
FAIL: test-suite :: External/HIP/blender.test (14 of 14)
******************** TEST 'test-suite :: External/HIP/blender.test' FAILED ********************

/home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/blender.test.out --redirect-input /dev/null --summary /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/blender.test.time /bin/bash test_blender.sh
/bin/bash verify_blender.sh /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/test-suite-build/External/HIP/Output/blender.test.out
Begin Blender test.
TEST_SUITE_HIP_ROOT=/opt/botworker/llvm/External/hip
Render /opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo_release.blend
Blender 4.1.1 (hash e1743a0317bc built 2024-04-15 23:47:45)
Read blend: "/opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo_release.blend"
Could not open as Ogawa file from provided streams.
Unable to open /opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.002", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.003", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.004", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.001", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
Could not open as Ogawa file from provided streams.
Unable to open /opt/botworker/llvm/External/hip/Blender_Scenes/290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.002", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.003", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.004", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
WARN (bke.modifier): source/blender/blenkernel/intern/modifier.cc:425 BKE_modifier_set_error: Object: "GEO-flag.001", Modifier: "MeshSequenceCache", Could not create reader for file //290skydemo2_flags.abc
I0726 16:22:05.569768 347166 device.cpp:39] HIPEW initialization succeeded
I0726 16:22:05.571796 347166 device.cpp:45] Found HIPCC hipcc
I0726 16:22:05.638209 347166 device.cpp:207] Device has compute preemption or is not used for display.
I0726 16:22:05.638234 347166 device.cpp:211] Added device "" with id "HIP__0000:a3:00".
I0726 16:22:05.638309 347166 device.cpp:568] Mapped host memory limit set to 536,444,985,344 bytes. (499.60G)
I0726 16:22:05.638552 347166 device_impl.cpp:63] Using AVX2 CPU kernels.
Fra:1 Mem:524.00M (Peak 525.14M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Eyepiece_rim
Fra:1 Mem:524.00M (Peak 525.14M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.004
Fra:1 Mem:524.00M (Peak 525.14M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.012
Fra:1 Mem:524.07M (Peak 525.14M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.018
Fra:1 Mem:524.20M (Peak 525.14M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Cables.004
Fra:1 Mem:524.34M (Peak 525.14M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.026
Fra:1 Mem:524.40M (Peak 525.14M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Rivets.039
Fra:1 Mem:525.06M (Peak 525.14M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Hoses.003
Fra:1 Mem:525.58M (Peak 525.58M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Wires
Fra:1 Mem:525.57M (Peak 525.58M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Connectors
Fra:1 Mem:526.31M (Peak 526.31M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Connectors.001
Fra:1 Mem:527.33M (Peak 527.53M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Connectors.002
Fra:1 Mem:528.05M (Peak 528.07M) | Time:00:00.67 | Mem:0.00M, Peak:0.00M | Scene, View Layer | Synchronizing object | GEO-Curve_Connectors.004

…al opts. (#142309) Materialize constant vector trip counts before ::execute, if the trip count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now this excludes when the tail is folded or scalar epilogues are required. This enables removing a number of redundant branches from the middle block. For now this is also only done when not vectorizing the epilogue, as the simplification complicates stitching the 2 plans together. PR: llvm/llvm-project#142309

…47824) Generalize the code added in llvm#147214 to also support re-using pointer LCSSA phis when expanding SCEVs with AddRecs. A common source of integer AddRecs with pointer bases are runtime checks emitted by LV based on the distance between 2 pointer AddRecs. This improves codegen in some cases when vectorizing and prevents regressions with llvm#142309, which turns some phis into single-entry ones, which SCEV will look through now (and expand the whole AddRec), whereas before it would have to treat the LCSSA phi as SCEVUnknown. Compile-time impact neutral: https://llvm-compile-time-tracker.com/compare.php?from=fd5fc76c91538871771be2c3be2ca3a5f2dcac31&to=ca5fc2b3d8e6efc09f1624a17fdbfbe909f14eb4&stat=instructions:u PR: llvm#147824

Update tests for which checking both the scalar resume and exit values is interesting, because they have first-order recurrences to have variable trip-counts, to avoid the branch in the middle.block being folded away by llvm#142309. For similar reasons, also update check-prof-info.ll

…lvm#142309) Materialize constant vector trip counts before ::execute, if the trip count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now this excludes when the tail is folded or scalar epilogues are required. This enables removing a number of redundant branches from the middle block. For now this is also only done when not vectorizing the epilogue, as the simplification complicates stitching the 2 plans together. PR: llvm#142309

…47824) Generalize the code added in llvm#147214 to also support re-using pointer LCSSA phis when expanding SCEVs with AddRecs. A common source of integer AddRecs with pointer bases are runtime checks emitted by LV based on the distance between 2 pointer AddRecs. This improves codegen in some cases when vectorizing and prevents regressions with llvm#142309, which turns some phis into single-entry ones, which SCEV will look through now (and expand the whole AddRec), whereas before it would have to treat the LCSSA phi as SCEVUnknown. Compile-time impact neutral: https://llvm-compile-time-tracker.com/compare.php?from=fd5fc76c91538871771be2c3be2ca3a5f2dcac31&to=ca5fc2b3d8e6efc09f1624a17fdbfbe909f14eb4&stat=instructions:u PR: llvm#147824

fhahn requested review from rengolin, ayalz and aniragil June 1, 2025 14:24

llvmbot added backend:AMDGPU backend:RISC-V backend:SystemZ vectorizers llvm:transforms labels Jun 1, 2025

david-arm reviewed Jun 9, 2025

View reviewed changes

fhahn added 2 commits June 9, 2025 22:56

!fixup add assert when VectorTripCount is already set

Verified

This commit was signed with the committer’s verified signature.

fhahn Florian Hahn

GPG key ID: C8B0D7090F9127E6

Verified
Learn about vigilant mode

Loading
Loading status checks…

4122565

fhahn force-pushed the vplan-materialize-constant-vector-tc branch from 0129d1d to 4122565 Compare June 9, 2025 22:05

fhahn added 3 commits June 20, 2025 19:04

Merge remote-tracking branch 'origin/main' into vplan-materialize-con…

Verified

This commit was signed with the committer’s verified signature.

fhahn Florian Hahn

GPG key ID: C8B0D7090F9127E6

Verified
Learn about vigilant mode

68a47f7

…stant-vector-tc

Merge remote-tracking branch 'origin/main' into vplan-materialize-con…

Verified

This commit was signed with the committer’s verified signature.

fhahn Florian Hahn

GPG key ID: C8B0D7090F9127E6

Verified
Learn about vigilant mode

c5e6bee

…stant-vector-tc

!fixup update remaining tests.

Verified

This commit was signed with the committer’s verified signature.

fhahn Florian Hahn

GPG key ID: C8B0D7090F9127E6

Verified
Learn about vigilant mode

Loading
Loading status checks…

66c8d73

ayalz reviewed Jun 26, 2025

View reviewed changes

fhahn added 4 commits June 27, 2025 12:50

Merge remote-tracking branch 'origin/main' into vplan-materialize-con…

Verified

This commit was signed with the committer’s verified signature.

fhahn Florian Hahn

GPG key ID: C8B0D7090F9127E6

Verified
Learn about vigilant mode

97043ee

…stant-vector-tc

Merge remote-tracking branch 'origin/main' into vplan-materialize-con…

Verified

This commit was signed with the committer’s verified signature.

fhahn Florian Hahn

GPG key ID: 3596DCFCBB5C67CC

Verified
Learn about vigilant mode

e8d1dce

…stant-vector-tc-tmp

Merge remote-tracking branch 'origin/main' into vplan-materialize-con…

Verified

This commit was signed with the committer’s verified signature.

fhahn Florian Hahn

GPG key ID: 3596DCFCBB5C67CC

Verified
Learn about vigilant mode

6cd2715

…stant-vector-tc-tmp

!fixup address comments, thanks

Verified

This commit was signed with the committer’s verified signature.

fhahn Florian Hahn

GPG key ID: C8B0D7090F9127E6

Verified
Learn about vigilant mode

Loading
Loading status checks…

8d001b2

fhahn added 2 commits June 28, 2025 12:56

!fixup fix formatting

Verified

This commit was signed with the committer’s verified signature.

fhahn Florian Hahn

GPG key ID: C8B0D7090F9127E6

Verified
Learn about vigilant mode

c49b0d6

ayalz approved these changes Jul 3, 2025

View reviewed changes

fhahn mentioned this pull request Jul 9, 2025

[SCEV] Try to re-use pointer LCSSA phis when expanding SCEVs. #147824

Merged

fhahn mentioned this pull request Jul 9, 2025

[SCEV] Try to re-use existing LCSSA phis when expanding SCEVAddRecExpr. #147214

Merged

Merge remote-tracking branch 'origin/main' into vplan-materialize-con…

Verified

This commit was signed with the committer’s verified signature.

fhahn Florian Hahn

GPG key ID: C8B0D7090F9127E6

Verified
Learn about vigilant mode

4618886

…stant-vector-tc

fhahn force-pushed the vplan-materialize-constant-vector-tc branch from 719ee52 to daa636a Compare July 26, 2025 14:52

fhahn merged commit fa3ec0c into llvm:main Jul 26, 2025
9 checks passed

fhahn deleted the vplan-materialize-constant-vector-tc branch July 26, 2025 16:16

		// Materialize vector trip counts for constants early if it can simply
		// be computed as (Original TC / VF * UF) * VF * UF.

		; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16
		; CHECK-NEXT: br i1 [[TMP2]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]

		@@ -1,12 +1,11 @@
		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5

		@@ -1,4 +1,4 @@
		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --filter-out-after "scalar.ph:" --version 2

	void VPlanTransforms::materializeVectorTripCount(
	void VPlanTransforms::materializeConstantVectorTripCount(

		if (auto *NewC = dyn_cast<SCEVConstant>(VecTCScev))
		Plan.getVectorTripCount().setUnderlyingValue(NewC->getValue());

	// Skip cases for which the trip count may be non-trivial to materialize.
	// Skip cases for which the trip count may be non-trivial to materialize.
	// I.e., when a scalar tail is absent - due to tail folding, or when a scalar tail is required.

		assert((!isa<VPIRPhi>(&R) \|\| RemovedSucc->getNumPredecessors() == 1) &&
		"VPIRPhis must have a single predecessor");

[VPlan] Materialize constant vector trip counts before final opts. #142309

[VPlan] Materialize constant vector trip counts before final opts. #142309

Conversation

fhahn commented Jun 1, 2025

Uh oh!

llvmbot commented Jun 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 1, 2025

Uh oh!

llvmbot commented Jun 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ayalz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ayalz Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

llvm-ci commented Jul 26, 2025

Uh oh!

llvmbot commented Jun 1, 2025 •

edited

Loading

github-actions bot commented Jun 27, 2025 •

edited

Loading

ayalz Jul 3, 2025 •

edited

Loading