Skip to content

[LangRef] Require that vscale be a power of two #145098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4442,10 +4442,11 @@ elementtype may be any integer, floating-point, pointer type, or a sized
target extension type that has the ``CanBeVectorElement`` property. Vectors
of size zero are not allowed. For scalable vectors, the total number of
elements is a constant multiple (called vscale) of the specified number
of elements; vscale is a positive integer that is unknown at compile time
and the same hardware-dependent constant for all scalable vectors at run
time. The size of a specific scalable vector type is thus constant within
IR, even if the exact size in bytes cannot be determined until run time.
of elements; vscale is a positive power-of-two integer that is unknown
at compile time and the same hardware-dependent constant for all scalable
vectors at run time. The size of a specific scalable vector type is thus
constant within IR, even if the exact size in bytes cannot be determined
until run time.

:Examples:

Expand Down Expand Up @@ -30398,8 +30399,8 @@ vectors such as ``<vscale x 16 x i8>``.
Semantics:
""""""""""

``vscale`` is a positive value that is constant throughout program
execution, but is unknown at compile time.
``vscale`` is a positive power-of-two value that is constant throughout
program execution, but is unknown at compile time.
If the result value does not fit in the result type, then the result is
a :ref:`poison value <poisonvalues>`.

Expand Down
3 changes: 0 additions & 3 deletions llvm/include/llvm/Analysis/TargetTransformInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -1220,9 +1220,6 @@ class TargetTransformInfo {
/// \return the value of vscale to tune the cost model for.
LLVM_ABI std::optional<unsigned> getVScaleForTuning() const;

/// \return true if vscale is known to be a power of 2
LLVM_ABI bool isVScaleKnownToBeAPowerOfTwo() const;

/// \return True if the vectorization factor should be chosen to
/// make the vector of the smallest element type match the size of a
/// vector register. For wider element types, this could result in
Expand Down
1 change: 0 additions & 1 deletion llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -591,7 +591,6 @@ class TargetTransformInfoImplBase {
virtual std::optional<unsigned> getVScaleForTuning() const {
return std::nullopt;
}
virtual bool isVScaleKnownToBeAPowerOfTwo() const { return false; }

virtual bool
shouldMaximizeVectorBandwidth(TargetTransformInfo::RegisterKind K) const {
Expand Down
1 change: 0 additions & 1 deletion llvm/include/llvm/CodeGen/BasicTTIImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -864,7 +864,6 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
std::optional<unsigned> getVScaleForTuning() const override {
return std::nullopt;
}
bool isVScaleKnownToBeAPowerOfTwo() const override { return false; }

/// Estimate the overhead of scalarizing an instruction. Insert and Extract
/// are set if the demanded result elements need to be inserted and/or
Expand Down
3 changes: 0 additions & 3 deletions llvm/include/llvm/CodeGen/TargetLowering.h
Original file line number Diff line number Diff line change
Expand Up @@ -623,9 +623,6 @@ class LLVM_ABI TargetLoweringBase {
return BypassSlowDivWidths;
}

/// Return true only if vscale must be a power of two.
virtual bool isVScaleKnownToBeAPowerOfTwo() const { return false; }

/// Return true if Flow Control is an expensive operation that should be
/// avoided.
bool isJumpExpensive() const { return JumpIsExpensive; }
Expand Down
4 changes: 0 additions & 4 deletions llvm/lib/Analysis/TargetTransformInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -795,10 +795,6 @@ std::optional<unsigned> TargetTransformInfo::getVScaleForTuning() const {
return TTIImpl->getVScaleForTuning();
}

bool TargetTransformInfo::isVScaleKnownToBeAPowerOfTwo() const {
return TTIImpl->isVScaleKnownToBeAPowerOfTwo();
}

bool TargetTransformInfo::shouldMaximizeVectorBandwidth(
TargetTransformInfo::RegisterKind K) const {
return TTIImpl->shouldMaximizeVectorBandwidth(K);
Expand Down
8 changes: 3 additions & 5 deletions llvm/lib/Analysis/ValueTracking.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2474,11 +2474,9 @@ bool llvm::isKnownToBeAPowerOfTwo(const Value *V, bool OrZero,
if (!I)
return false;

if (Q.CxtI && match(V, m_VScale())) {
const Function *F = Q.CxtI->getFunction();
// The vscale_range indicates vscale is a power-of-two.
return F->hasFnAttribute(Attribute::VScaleRange);
}
// vscale is a power-of-two by definition
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to highlight - with this change, we no longer need a context instruction with which to find the function. This is important both because having two different functions in the same module with differently power-of-two vscales is more than bit weird, and because many (most?) callers of this API do not pass a context. As such, the version without the context is significantly more powerful in practice.

if (match(V, m_VScale()))
return true;

// 1 << X is clearly a power of two if the one is not shifted off the end. If
// it is shifted off the end then the result is undefined.
Expand Down
1 change: 0 additions & 1 deletion llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4660,7 +4660,6 @@ bool SelectionDAG::isKnownToBeAPowerOfTwo(SDValue Val, unsigned Depth) const {

// vscale(power-of-two) is a power-of-two for some targets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// vscale(power-of-two) is a power-of-two for some targets
// vscale(power-of-two) is a power-of-two

if (Val.getOpcode() == ISD::VSCALE &&
getTargetLoweringInfo().isVScaleKnownToBeAPowerOfTwo() &&
isKnownToBeAPowerOfTwo(Val.getOperand(0), Depth + 1))
return true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (Val.getOpcode() == ISD::VSCALE)
  return isKnownToBeAPowerOfTwo(Val.getOperand(0), Depth + 1);


Expand Down
2 changes: 0 additions & 2 deletions llvm/lib/Target/AArch64/AArch64ISelLowering.h
Original file line number Diff line number Diff line change
Expand Up @@ -517,8 +517,6 @@ class AArch64TargetLowering : public TargetLowering {
SDValue Chain, SDValue InGlue, unsigned Condition,
SDValue PStateSM = SDValue()) const;

bool isVScaleKnownToBeAPowerOfTwo() const override { return true; }

// Normally SVE is only used for byte size vectors that do not fit within a
// NEON vector. This changes when OverrideNEON is true, allowing SVE to be
// used for 64bit and 128bit vectors as well.
Expand Down
2 changes: 0 additions & 2 deletions llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -156,8 +156,6 @@ class AArch64TTIImpl final : public BasicTTIImplBase<AArch64TTIImpl> {
return ST->getVScaleForTuning();
}

bool isVScaleKnownToBeAPowerOfTwo() const override { return true; }

bool shouldMaximizeVectorBandwidth(
TargetTransformInfo::RegisterKind K) const override;

Expand Down
12 changes: 0 additions & 12 deletions llvm/lib/Target/RISCV/RISCVISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23434,18 +23434,6 @@ const MCExpr *RISCVTargetLowering::LowerCustomJumpTableEntry(
return MCSymbolRefExpr::create(MBB->getSymbol(), Ctx);
}

bool RISCVTargetLowering::isVScaleKnownToBeAPowerOfTwo() const {
// We define vscale to be VLEN/RVVBitsPerBlock. VLEN is always a power
// of two >= 64, and RVVBitsPerBlock is 64. Thus, vscale must be
// a power of two as well.
// FIXME: This doesn't work for zve32, but that's already broken
// elsewhere for the same reason.
assert(Subtarget.getRealMinVLen() >= 64 && "zve32* unsupported");
static_assert(RISCV::RVVBitsPerBlock == 64,
"RVVBitsPerBlock changed, audit needed");
return true;
}

bool RISCVTargetLowering::getIndexedAddressParts(SDNode *Op, SDValue &Base,
SDValue &Offset,
ISD::MemIndexedMode &AM,
Expand Down
2 changes: 0 additions & 2 deletions llvm/lib/Target/RISCV/RISCVISelLowering.h
Original file line number Diff line number Diff line change
Expand Up @@ -394,8 +394,6 @@ class RISCVTargetLowering : public TargetLowering {
unsigned uid,
MCContext &Ctx) const override;

bool isVScaleKnownToBeAPowerOfTwo() const override;

bool getIndexedAddressParts(SDNode *Op, SDValue &Base, SDValue &Offset,
ISD::MemIndexedMode &AM, SelectionDAG &DAG) const;
bool getPreIndexedAddressParts(SDNode *N, SDValue &Base, SDValue &Offset,
Expand Down
4 changes: 0 additions & 4 deletions llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -335,10 +335,6 @@ class RISCVTTIImpl final : public BasicTTIImplBase<RISCVTTIImpl> {

bool isLegalMaskedCompressStore(Type *DataTy, Align Alignment) const override;

bool isVScaleKnownToBeAPowerOfTwo() const override {
return TLI->isVScaleKnownToBeAPowerOfTwo();
}

/// \returns How the target needs this vector-predicated operation to be
/// transformed.
TargetTransformInfo::VPLegalization
Expand Down
16 changes: 1 addition & 15 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2429,20 +2429,6 @@ Value *InnerLoopVectorizer::createIterationCountCheck(ElementCount VF,
// check is known to be true, or known to be false.
CheckMinIters = Builder.CreateICmp(P, Count, Step, "min.iters.check");
} // else step known to be < trip count, use CheckMinIters preset to false.
} else if (VF.isScalable() && !TTI->isVScaleKnownToBeAPowerOfTwo() &&
!isIndvarOverflowCheckKnownFalse(Cost, VF, UF) &&
Style != TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck) {
// vscale is not necessarily a power-of-2, which means we cannot guarantee
// an overflow to zero when updating induction variables and so an
// additional overflow check is required before entering the vector loop.

// Get the maximum unsigned value for the type.
Value *MaxUIntTripCount =
ConstantInt::get(CountTy, cast<IntegerType>(CountTy)->getMask());
Value *LHS = Builder.CreateSub(MaxUIntTripCount, Count);

// Don't execute the vector loop if (UMax - n) < (VF * UF).
CheckMinIters = Builder.CreateICmp(ICmpInst::ICMP_ULT, LHS, CreateStep());
}
return CheckMinIters;
}
Expand Down Expand Up @@ -3830,7 +3816,7 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
MaxFactors.FixedVF.getFixedValue();
if (MaxFactors.ScalableVF) {
std::optional<unsigned> MaxVScale = getMaxVScale(*TheFunction, TTI);
if (MaxVScale && TTI.isVScaleKnownToBeAPowerOfTwo()) {
if (MaxVScale) {
MaxPowerOf2RuntimeVF = std::max<unsigned>(
*MaxPowerOf2RuntimeVF,
*MaxVScale * MaxFactors.ScalableVF.getKnownMinValue());
Expand Down
3 changes: 2 additions & 1 deletion llvm/test/Transforms/InstCombine/rem-mul-shl.ll
Original file line number Diff line number Diff line change
Expand Up @@ -859,7 +859,8 @@ define i64 @urem_shl_vscale() {
; CHECK-LABEL: @urem_shl_vscale(
; CHECK-NEXT: [[VSCALE:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[SHIFT:%.*]] = shl nuw nsw i64 [[VSCALE]], 2
; CHECK-NEXT: [[REM:%.*]] = urem i64 1024, [[SHIFT]]
; CHECK-NEXT: [[TMP1:%.*]] = add nuw i64 [[SHIFT]], 2047
; CHECK-NEXT: [[REM:%.*]] = and i64 [[TMP1]], 1024
; CHECK-NEXT: ret i64 [[REM]]
;
%vscale = call i64 @llvm.vscale.i64()
Expand Down
21 changes: 0 additions & 21 deletions llvm/test/Transforms/InstSimplify/po2-shift-add-and-to-zero.ll
Original file line number Diff line number Diff line change
Expand Up @@ -61,27 +61,6 @@ define i64 @test_pow2_or_zero(i64 %arg) {
ret i64 %rem
}

;; Make sure it doesn't work if the value isn't known to be a power of 2.
;; In this case a vscale without a `vscale_range` attribute on the function.
define i64 @no_pow2() {
; CHECK-LABEL: define i64 @no_pow2() {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[TMP0]], 4
; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[TMP0]], 3
; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[TMP2]], -1
; CHECK-NEXT: [[REM:%.*]] = and i64 [[TMP1]], [[TMP3]]
; CHECK-NEXT: ret i64 [[REM]]
;
entry:
%0 = call i64 @llvm.vscale.i64()
%1 = shl i64 %0, 4
%2 = shl i64 %0, 3
%3 = add i64 %2, -1
%rem = and i64 %1, %3
ret i64 %rem
}

;; Make sure it doesn't work if the shift on the -1 side is greater
define i64 @minus_shift_greater(i64 %arg) {
; CHECK-LABEL: define i64 @minus_shift_greater
Expand Down
7 changes: 3 additions & 4 deletions llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions.ll
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,8 @@ define void @cond_ind64(ptr noalias nocapture %a, ptr noalias nocapture readonly
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP3:%.*]] = shl nuw i64 [[TMP2]], 2
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP3]]
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
; CHECK-NEXT: [[DOTNEG:%.*]] = mul i64 [[TMP2]], -4
; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[N]], [[DOTNEG]]
; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP5:%.*]] = shl nuw i64 [[TMP4]], 2
; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64()
Expand All @@ -42,7 +41,7 @@ define void @cond_ind64(ptr noalias nocapture %a, ptr noalias nocapture readonly
; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; CHECK: middle.block:
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:
;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1373,9 +1373,8 @@ define void @interleave_deinterleave_factor3(ptr writeonly noalias %dst, ptr rea
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP3:%.*]] = shl nuw i64 [[TMP2]], 2
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP3]]
; CHECK-NEXT: [[N_VEC:%.*]] = sub nuw nsw i64 1024, [[N_MOD_VF]]
; CHECK-NEXT: [[DOTNEG:%.*]] = mul i64 [[TMP2]], 2044
; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[DOTNEG]], 1024
; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP5:%.*]] = shl nuw i64 [[TMP4]], 2
; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64()
Expand Down Expand Up @@ -1411,8 +1410,8 @@ define void @interleave_deinterleave_factor3(ptr writeonly noalias %dst, ptr rea
; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP41:![0-9]+]]
; CHECK: middle.block:
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
; CHECK-NEXT: [[CMP_N_NOT:%.*]] = icmp eq i64 [[N_VEC]], 0
; CHECK-NEXT: br i1 [[CMP_N_NOT]], label [[SCALAR_PH]], label [[FOR_END:%.*]]
; CHECK: scalar.ph:
;
entry:
Expand Down Expand Up @@ -1467,9 +1466,8 @@ define void @interleave_deinterleave(ptr writeonly noalias %dst, ptr readonly %a
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP3:%.*]] = shl nuw i64 [[TMP2]], 2
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP3]]
; CHECK-NEXT: [[N_VEC:%.*]] = sub nuw nsw i64 1024, [[N_MOD_VF]]
; CHECK-NEXT: [[DOTNEG:%.*]] = mul i64 [[TMP2]], 2044
; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[DOTNEG]], 1024
; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP5:%.*]] = shl nuw i64 [[TMP4]], 2
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
Expand Down Expand Up @@ -1500,8 +1498,8 @@ define void @interleave_deinterleave(ptr writeonly noalias %dst, ptr readonly %a
; CHECK-NEXT: [[TMP25:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP25]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP43:![0-9]+]]
; CHECK: middle.block:
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
; CHECK-NEXT: [[CMP_N_NOT:%.*]] = icmp eq i64 [[N_VEC]], 0
; CHECK-NEXT: br i1 [[CMP_N_NOT]], label [[SCALAR_PH]], label [[FOR_END:%.*]]
; CHECK: scalar.ph:
;
entry:
Expand Down
Loading
Loading