-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[NVPTX] Fix NaN + overflow semantics of f2ll/d2i #159530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Fix the NaN-handling semantics of various NVVM intrinsics converting from fp types to integer types. Previously in ConstantFolding, NaN inputs would be constant-folded to 0. However, v9.0 of the PTX spec states that: In float-to-integer conversions, depending upon conversion types, NaN input results in following value: * Zero if source is not `.f64` and destination is not `.s64`, .`u64`. * Otherwise `1 << (BitWidth(dst) - 1)` corresponding to the value of `(MAXINT >> 1) + 1` for unsigned type or `MININT` for signed type. Also, support for constant-folding +/-Inf and values which overflow/underflow the integer output type has been added (they clamp to min/max int). Because of this NaN-handling semantic difference, we also need to disable transforming several intrinsics to FPToSI/FPToUI, as the LLVM intstruction will return poison, but the intrinsics have defined behaviour for these edge-cases like NaN/Inf/overflow.
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-backend-nvptx Author: Lewis Crawford (LewisCrawford) ChangesFix the NaN-handling semantics of various NVVM intrinsics converting from fp types to integer types. Previously in ConstantFolding, NaN inputs would be constant-folded to 0. However, v9.0 of the PTX spec states that: In float-to-integer conversions, depending upon conversion types, NaN input results in following value:
Also, support for constant-folding +/-Inf and values which overflow/underflow the integer output type has been added (they clamp to min/max int). Because of this NaN-handling semantic difference, we also need to disable transforming several intrinsics to FPToSI/FPToUI, as the LLVM intstruction will return poison, but the intrinsics have defined behaviour for these edge-cases like NaN/Inf/overflow. Patch is 30.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159530.diff 6 Files Affected:
diff --git a/llvm/include/llvm/IR/NVVMIntrinsicUtils.h b/llvm/include/llvm/IR/NVVMIntrinsicUtils.h
index cc4929a1ff8da..0e2d903bb2500 100644
--- a/llvm/include/llvm/IR/NVVMIntrinsicUtils.h
+++ b/llvm/include/llvm/IR/NVVMIntrinsicUtils.h
@@ -180,6 +180,70 @@ inline bool FPToIntegerIntrinsicResultIsSigned(Intrinsic::ID IntrinsicID) {
"Checking invalid f2i/d2i intrinsic for signed int conversion");
}
+inline bool FPToIntegerIntrinsicNaNZero(Intrinsic::ID IntrinsicID) {
+ switch (IntrinsicID) {
+ // f2i
+ case Intrinsic::nvvm_f2i_rm:
+ case Intrinsic::nvvm_f2i_rn:
+ case Intrinsic::nvvm_f2i_rp:
+ case Intrinsic::nvvm_f2i_rz:
+ case Intrinsic::nvvm_f2i_rm_ftz:
+ case Intrinsic::nvvm_f2i_rn_ftz:
+ case Intrinsic::nvvm_f2i_rp_ftz:
+ case Intrinsic::nvvm_f2i_rz_ftz:
+ // f2ui
+ case Intrinsic::nvvm_f2ui_rm:
+ case Intrinsic::nvvm_f2ui_rn:
+ case Intrinsic::nvvm_f2ui_rp:
+ case Intrinsic::nvvm_f2ui_rz:
+ case Intrinsic::nvvm_f2ui_rm_ftz:
+ case Intrinsic::nvvm_f2ui_rn_ftz:
+ case Intrinsic::nvvm_f2ui_rp_ftz:
+ case Intrinsic::nvvm_f2ui_rz_ftz:
+ return true;
+ // d2i
+ case Intrinsic::nvvm_d2i_rm:
+ case Intrinsic::nvvm_d2i_rn:
+ case Intrinsic::nvvm_d2i_rp:
+ case Intrinsic::nvvm_d2i_rz:
+ // d2ui
+ case Intrinsic::nvvm_d2ui_rm:
+ case Intrinsic::nvvm_d2ui_rn:
+ case Intrinsic::nvvm_d2ui_rp:
+ case Intrinsic::nvvm_d2ui_rz:
+ // f2ll
+ case Intrinsic::nvvm_f2ll_rm:
+ case Intrinsic::nvvm_f2ll_rn:
+ case Intrinsic::nvvm_f2ll_rp:
+ case Intrinsic::nvvm_f2ll_rz:
+ case Intrinsic::nvvm_f2ll_rm_ftz:
+ case Intrinsic::nvvm_f2ll_rn_ftz:
+ case Intrinsic::nvvm_f2ll_rp_ftz:
+ case Intrinsic::nvvm_f2ll_rz_ftz:
+ // f2ull
+ case Intrinsic::nvvm_f2ull_rm:
+ case Intrinsic::nvvm_f2ull_rn:
+ case Intrinsic::nvvm_f2ull_rp:
+ case Intrinsic::nvvm_f2ull_rz:
+ case Intrinsic::nvvm_f2ull_rm_ftz:
+ case Intrinsic::nvvm_f2ull_rn_ftz:
+ case Intrinsic::nvvm_f2ull_rp_ftz:
+ case Intrinsic::nvvm_f2ull_rz_ftz:
+ // d2ll
+ case Intrinsic::nvvm_d2ll_rm:
+ case Intrinsic::nvvm_d2ll_rn:
+ case Intrinsic::nvvm_d2ll_rp:
+ case Intrinsic::nvvm_d2ll_rz:
+ // d2ull
+ case Intrinsic::nvvm_d2ull_rm:
+ case Intrinsic::nvvm_d2ull_rn:
+ case Intrinsic::nvvm_d2ull_rp:
+ case Intrinsic::nvvm_d2ull_rz:
+ return false;
+ }
+ llvm_unreachable("Checking NaN result for invalid f2i/d2i intrinsic");
+}
+
inline APFloat::roundingMode
GetFPToIntegerRoundingMode(Intrinsic::ID IntrinsicID) {
switch (IntrinsicID) {
diff --git a/llvm/lib/Analysis/ConstantFolding.cpp b/llvm/lib/Analysis/ConstantFolding.cpp
index a3b2e62a1b8ba..b03f0bca3c26c 100755
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -2625,8 +2625,17 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
case Intrinsic::nvvm_d2ull_rp:
case Intrinsic::nvvm_d2ull_rz: {
// In float-to-integer conversion, NaN inputs are converted to 0.
- if (U.isNaN())
- return ConstantInt::get(Ty, 0);
+ if (U.isNaN()) {
+ // In float-to-integer conversion, NaN inputs are converted to 0
+ // when the source and destination bitwidths are both less than 64.
+ if (nvvm::FPToIntegerIntrinsicNaNZero(IntrinsicID))
+ return ConstantInt::get(Ty, 0);
+
+ // Otherwise, the most significant bit is set.
+ unsigned BitWidth = Ty->getIntegerBitWidth();
+ uint64_t Val = 1ULL << (BitWidth - 1);
+ return ConstantInt::get(Ty, APInt(BitWidth, Val, /*IsSigned=*/false));
+ }
APFloat::roundingMode RMode =
nvvm::GetFPToIntegerRoundingMode(IntrinsicID);
@@ -2636,13 +2645,11 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
APSInt ResInt(Ty->getIntegerBitWidth(), !IsSigned);
auto FloatToRound = IsFTZ ? FTZPreserveSign(U) : U;
+ // Return max/min value for integers if the result is +/-inf or
+ // is too large to fit in the result's integer bitwidth.
bool IsExact = false;
- APFloat::opStatus Status =
- FloatToRound.convertToInteger(ResInt, RMode, &IsExact);
-
- if (Status != APFloat::opInvalidOp)
- return ConstantInt::get(Ty, ResInt);
- return nullptr;
+ FloatToRound.convertToInteger(ResInt, RMode, &IsExact);
+ return ConstantInt::get(Ty, ResInt);
}
}
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
index f4f89613b358d..4647b3cf8039d 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
@@ -281,21 +281,12 @@ static Instruction *convertNvvmIntrinsicToLlvm(InstCombiner &IC,
return {Intrinsic::trunc, FTZ_MustBeOn};
// NVVM intrinsics that map to LLVM cast operations.
- //
- // Note that llvm's target-generic conversion operators correspond to the rz
- // (round to zero) versions of the nvvm conversion intrinsics, even though
- // most everything else here uses the rn (round to nearest even) nvvm ops.
- case Intrinsic::nvvm_d2i_rz:
- case Intrinsic::nvvm_f2i_rz:
- case Intrinsic::nvvm_d2ll_rz:
- case Intrinsic::nvvm_f2ll_rz:
- return {Instruction::FPToSI};
- case Intrinsic::nvvm_d2ui_rz:
- case Intrinsic::nvvm_f2ui_rz:
- case Intrinsic::nvvm_d2ull_rz:
- case Intrinsic::nvvm_f2ull_rz:
- return {Instruction::FPToUI};
- // Integer to floating-point uses RN rounding, not RZ
+ // Note - we cannot map intrinsics like nvvm_d2ll_rz to LLVM's
+ // FPToSI, as NaN to int conversion with FPToSI is considered UB and is
+ // eliminated. NVVM conversion intrinsics are translated to PTX cvt
+ // instructions which define the outcome for NaN rather than leaving as UB.
+ // Therefore, translate NVVM intrinsics to sitofp/uitofp, but not to
+ // fptosi/fptoui.
case Intrinsic::nvvm_i2d_rn:
case Intrinsic::nvvm_i2f_rn:
case Intrinsic::nvvm_ll2d_rn:
diff --git a/llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll b/llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
index 1819f4ed181c0..4d856699b2d24 100644
--- a/llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
+++ b/llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
@@ -185,52 +185,63 @@ define float @trunc_float_ftz(float %a) #0 {
}
; Check NVVM intrinsics that correspond to LLVM cast operations.
+; fp -> integer casts should not be converted, as the semantics
+; for NaN/Inf/Overflow inputs are different.
+; Only integer -> fp casts should be converted.
; CHECK-LABEL: @test_d2i
define i32 @test_d2i(double %a) #0 {
-; CHECK: fptosi double %a to i32
+; CHECK: call i32 @llvm.nvvm.d2i.rz(double %a)
+; CHECK-NOT: fptosi double %a to i32
%ret = call i32 @llvm.nvvm.d2i.rz(double %a)
ret i32 %ret
}
; CHECK-LABEL: @test_f2i
define i32 @test_f2i(float %a) #0 {
-; CHECK: fptosi float %a to i32
+; CHECK: call i32 @llvm.nvvm.f2i.rz(float %a)
+; CHECK-NOT: fptosi float %a to i32
%ret = call i32 @llvm.nvvm.f2i.rz(float %a)
ret i32 %ret
}
; CHECK-LABEL: @test_d2ll
define i64 @test_d2ll(double %a) #0 {
-; CHECK: fptosi double %a to i64
+; CHECK: call i64 @llvm.nvvm.d2ll.rz(double %a)
+; CHECK-NOT: fptosi double %a to i64
%ret = call i64 @llvm.nvvm.d2ll.rz(double %a)
ret i64 %ret
}
; CHECK-LABEL: @test_f2ll
define i64 @test_f2ll(float %a) #0 {
-; CHECK: fptosi float %a to i64
+; CHECK: call i64 @llvm.nvvm.f2ll.rz(float %a)
+; CHECK-NOT: fptosi float %a to i64
%ret = call i64 @llvm.nvvm.f2ll.rz(float %a)
ret i64 %ret
}
; CHECK-LABEL: @test_d2ui
define i32 @test_d2ui(double %a) #0 {
-; CHECK: fptoui double %a to i32
+; CHECK: call i32 @llvm.nvvm.d2ui.rz(double %a)
+; CHECK-NOT: fptoui double %a to i32
%ret = call i32 @llvm.nvvm.d2ui.rz(double %a)
ret i32 %ret
}
; CHECK-LABEL: @test_f2ui
define i32 @test_f2ui(float %a) #0 {
-; CHECK: fptoui float %a to i32
+; CHECK: call i32 @llvm.nvvm.f2ui.rz(float %a)
+; CHECK-NOT: fptoui float %a to i32
%ret = call i32 @llvm.nvvm.f2ui.rz(float %a)
ret i32 %ret
}
; CHECK-LABEL: @test_d2ull
define i64 @test_d2ull(double %a) #0 {
-; CHECK: fptoui double %a to i64
+; CHECK: call i64 @llvm.nvvm.d2ull.rz(double %a)
+; CHECK-NOT: fptoui double %a to i64
%ret = call i64 @llvm.nvvm.d2ull.rz(double %a)
ret i64 %ret
}
; CHECK-LABEL: @test_f2ull
define i64 @test_f2ull(float %a) #0 {
-; CHECK: fptoui float %a to i64
+; CHECK: call i64 @llvm.nvvm.f2ull.rz(float %a)
+; CHECK-NOT: fptoui float %a to i64
%ret = call i64 @llvm.nvvm.f2ull.rz(float %a)
ret i64 %ret
}
@@ -497,4 +508,4 @@ declare float @llvm.nvvm.ui2f.rn(i32)
declare double @llvm.nvvm.ull2d.rn(i64)
declare float @llvm.nvvm.ull2f.rn(i64)
declare i32 @llvm.nvvm.fshr.clamp.i32(i32, i32, i32)
-declare i32 @llvm.nvvm.fshl.clamp.i32(i32, i32, i32)
\ No newline at end of file
+declare i32 @llvm.nvvm.fshl.clamp.i32(i32, i32, i32)
diff --git a/llvm/test/Transforms/InstSimplify/const-fold-nvvm-f2i-d2i.ll b/llvm/test/Transforms/InstSimplify/const-fold-nvvm-f2i-d2i.ll
index 543c73137c1b6..b1a1e6b86c293 100644
--- a/llvm/test/Transforms/InstSimplify/const-fold-nvvm-f2i-d2i.ll
+++ b/llvm/test/Transforms/InstSimplify/const-fold-nvvm-f2i-d2i.ll
@@ -334,8 +334,7 @@ define i32 @test_neg_1_5_d2i_rz() {
;+-------------------------------------------------------------+
define i32 @test_neg_1_5_f2ui_rm() {
; CHECK-LABEL: define i32 @test_neg_1_5_f2ui_rm() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.f2ui.rm(float -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.f2ui.rm(float -1.5)
ret i32 %res
@@ -343,8 +342,7 @@ define i32 @test_neg_1_5_f2ui_rm() {
define i32 @test_neg_1_5_f2ui_rn() {
; CHECK-LABEL: define i32 @test_neg_1_5_f2ui_rn() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.f2ui.rn(float -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.f2ui.rn(float -1.5)
ret i32 %res
@@ -353,8 +351,7 @@ define i32 @test_neg_1_5_f2ui_rn() {
define i32 @test_neg_1_5_f2ui_rp() {
; CHECK-LABEL: define i32 @test_neg_1_5_f2ui_rp() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.f2ui.rp(float -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.f2ui.rp(float -1.5)
ret i32 %res
@@ -362,8 +359,7 @@ define i32 @test_neg_1_5_f2ui_rp() {
define i32 @test_neg_1_5_f2ui_rz() {
; CHECK-LABEL: define i32 @test_neg_1_5_f2ui_rz() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.f2ui.rz(float -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.f2ui.rz(float -1.5)
ret i32 %res
@@ -374,8 +370,7 @@ define i32 @test_neg_1_5_f2ui_rz() {
;+-------------------------------------------------------------+
define i32 @test_neg_1_5_f2ui_rm_ftz() {
; CHECK-LABEL: define i32 @test_neg_1_5_f2ui_rm_ftz() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.f2ui.rm.ftz(float -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.f2ui.rm.ftz(float -1.5)
ret i32 %res
@@ -383,8 +378,7 @@ define i32 @test_neg_1_5_f2ui_rm_ftz() {
define i32 @test_neg_1_5_f2ui_rn_ftz() {
; CHECK-LABEL: define i32 @test_neg_1_5_f2ui_rn_ftz() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.f2ui.rn.ftz(float -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.f2ui.rn.ftz(float -1.5)
ret i32 %res
@@ -392,8 +386,7 @@ define i32 @test_neg_1_5_f2ui_rn_ftz() {
define i32 @test_neg_1_5_f2ui_rp_ftz() {
; CHECK-LABEL: define i32 @test_neg_1_5_f2ui_rp_ftz() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.f2ui.rp.ftz(float -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.f2ui.rp.ftz(float -1.5)
ret i32 %res
@@ -401,8 +394,7 @@ define i32 @test_neg_1_5_f2ui_rp_ftz() {
define i32 @test_neg_1_5_f2ui_rz_ftz() {
; CHECK-LABEL: define i32 @test_neg_1_5_f2ui_rz_ftz() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.f2ui.rz.ftz(float -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.f2ui.rz.ftz(float -1.5)
ret i32 %res
@@ -412,8 +404,7 @@ define i32 @test_neg_1_5_f2ui_rz_ftz() {
;+-------------------------------------------------------------+
define i32 @test_neg_1_5_d2ui_rm() {
; CHECK-LABEL: define i32 @test_neg_1_5_d2ui_rm() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.d2ui.rm(double -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.d2ui.rm(double -1.5)
ret i32 %res
@@ -421,8 +412,7 @@ define i32 @test_neg_1_5_d2ui_rm() {
define i32 @test_neg_1_5_d2ui_rn() {
; CHECK-LABEL: define i32 @test_neg_1_5_d2ui_rn() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.d2ui.rn(double -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.d2ui.rn(double -1.5)
ret i32 %res
@@ -431,8 +421,7 @@ define i32 @test_neg_1_5_d2ui_rn() {
define i32 @test_neg_1_5_d2ui_rp() {
; CHECK-LABEL: define i32 @test_neg_1_5_d2ui_rp() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.d2ui.rp(double -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.d2ui.rp(double -1.5)
ret i32 %res
@@ -440,8 +429,7 @@ define i32 @test_neg_1_5_d2ui_rp() {
define i32 @test_neg_1_5_d2ui_rz() {
; CHECK-LABEL: define i32 @test_neg_1_5_d2ui_rz() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.d2ui.rz(double -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.d2ui.rz(double -1.5)
ret i32 %res
@@ -526,7 +514,7 @@ define i32 @test_nan_f2i_rz_ftz() {
;+-------------------------------------------------------------+
define i32 @test_nan_d2i_rm() {
; CHECK-LABEL: define i32 @test_nan_d2i_rm() {
-; CHECK-NEXT: ret i32 0
+; CHECK-NEXT: ret i32 -2147483648
;
%res = call i32 @llvm.nvvm.d2i.rm(double 0xFFF8000000000000)
ret i32 %res
@@ -534,7 +522,7 @@ define i32 @test_nan_d2i_rm() {
define i32 @test_nan_d2i_rn() {
; CHECK-LABEL: define i32 @test_nan_d2i_rn() {
-; CHECK-NEXT: ret i32 0
+; CHECK-NEXT: ret i32 -2147483648
;
%res = call i32 @llvm.nvvm.d2i.rn(double 0xFFF8000000000000)
ret i32 %res
@@ -543,7 +531,7 @@ define i32 @test_nan_d2i_rn() {
define i32 @test_nan_d2i_rp() {
; CHECK-LABEL: define i32 @test_nan_d2i_rp() {
-; CHECK-NEXT: ret i32 0
+; CHECK-NEXT: ret i32 -2147483648
;
%res = call i32 @llvm.nvvm.d2i.rp(double 0xFFF8000000000000)
ret i32 %res
@@ -551,7 +539,7 @@ define i32 @test_nan_d2i_rp() {
define i32 @test_nan_d2i_rz() {
; CHECK-LABEL: define i32 @test_nan_d2i_rz() {
-; CHECK-NEXT: ret i32 0
+; CHECK-NEXT: ret i32 -2147483648
;
%res = call i32 @llvm.nvvm.d2i.rz(double 0xFFF8000000000000)
ret i32 %res
@@ -632,7 +620,7 @@ define i32 @test_nan_f2ui_rz_ftz() {
;+-------------------------------------------------------------+
define i32 @test_nan_d2ui_rm() {
; CHECK-LABEL: define i32 @test_nan_d2ui_rm() {
-; CHECK-NEXT: ret i32 0
+; CHECK-NEXT: ret i32 -2147483648
;
%res = call i32 @llvm.nvvm.d2ui.rm(double 0xFFF8000000000000)
ret i32 %res
@@ -640,7 +628,7 @@ define i32 @test_nan_d2ui_rm() {
define i32 @test_nan_d2ui_rn() {
; CHECK-LABEL: define i32 @test_nan_d2ui_rn() {
-; CHECK-NEXT: ret i32 0
+; CHECK-NEXT: ret i32 -2147483648
;
%res = call i32 @llvm.nvvm.d2ui.rn(double 0xFFF8000000000000)
ret i32 %res
@@ -649,7 +637,7 @@ define i32 @test_nan_d2ui_rn() {
define i32 @test_nan_d2ui_rp() {
; CHECK-LABEL: define i32 @test_nan_d2ui_rp() {
-; CHECK-NEXT: ret i32 0
+; CHECK-NEXT: ret i32 -2147483648
;
%res = call i32 @llvm.nvvm.d2ui.rp(double 0xFFF8000000000000)
ret i32 %res
@@ -657,7 +645,7 @@ define i32 @test_nan_d2ui_rp() {
define i32 @test_nan_d2ui_rz() {
; CHECK-LABEL: define i32 @test_nan_d2ui_rz() {
-; CHECK-NEXT: ret i32 0
+; CHECK-NEXT: ret i32 -2147483648
;
%res = call i32 @llvm.nvvm.d2ui.rz(double 0xFFF8000000000000)
ret i32 %res
@@ -994,8 +982,7 @@ define i32 @test_neg_subnormal_d2i_rz() {
;+-------------------------------------------------------------+
define i32 @test_neg_subnormal_f2ui_rm() {
; CHECK-LABEL: define i32 @test_neg_subnormal_f2ui_rm() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.f2ui.rm(float 0xB80FFFFFC0000000)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.f2ui.rm(float 0xB80FFFFFC0000000)
ret i32 %res
@@ -1065,8 +1052,7 @@ define i32 @test_neg_subnormal_f2ui_rz_ftz() {
;+-------------------------------------------------------------+
define i32 @test_neg_subnormal_d2ui_rm() {
; CHECK-LABEL: define i32 @test_neg_subnormal_d2ui_rm() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.d2ui.rm(double 0x800FFFFFFFFFFFFF)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.d2ui.rm(double 0x800fffffffffffff)
ret i32 %res
diff --git a/llvm/test/Transforms/InstSimplify/const-fold-nvvm-f2ll-d2ll.ll b/llvm/test/Transforms/InstSimplify/const-fold-nvvm-f2ll-d2ll.ll
index be38177dce2c3..ffadf26f3c5b5 100644
--- a/llvm/test/Transforms/InstSimplify/const-fold-nvvm-f2ll-d2ll.ll
+++ b/llvm/test/Transforms/InstSimplify/const-fold-nvvm-f2ll-d2ll.ll
@@ -334,8 +334,7 @@ define i64 @test_neg_1_5_d2ll_rz() {
;+-------------------------------------------------------------+
define i64 @test_neg_1_5_f2ull_rm() {
; CHECK-LABEL: define i64 @test_neg_1_5_f2ull_rm() {
-; CHECK-NEXT: [[RES:%.*]] = call i64 @llvm.nvvm.f2ull.rm(float -1.500000e+00)
-; CHECK-NEXT: ret i64 [[RES]]
+; CHECK-NEXT: ret i64 0
;
%res = call i64 @llvm.nvvm.f2ull.rm(float -1.5)
ret i64 %res
@@ -343,8 +342,7 @@ define i64 @test_neg_1_5_f2ull_rm() {
define i64 @test_neg_1_5_f2ull_rn() {
; CHECK-LABEL: define i64 @test_neg_1_5_f2ull_rn() {
-; CHECK-NEXT: [[RES:%.*]] = call i64 @llvm.nvvm.f2ull.rn(float -1.500000e+00)
-; CHECK-NEXT: ret i64 [[RES]]
+; CHECK-NEXT: ret i64 0
;
%res = call i64 @llvm.nvvm.f2ull.rn(float -1.5)
ret i64 %res
@@ -353,8 +351,7 @@ define i64 @test_neg_1_5_f2ull_rn() {
define i64 @test_neg_1_5_f2ull_rp() {
; CHECK-LABEL: define i64 @test_neg_1_5_f2ull_rp() {
-; CHECK-NEXT: [[RES:%.*]] = call i64 @llvm.nvvm.f2ull.rp(float -1.500000e+00)
-; CHECK-NEXT: ret i64 [[RES]]
+; CHECK-NEXT: ret i64 0
;
%res = call i64 @llvm.nvvm.f2ull.rp(float -1.5)
ret i64 %res
@@ -362,8 +359,7 @@ define i64 @test_neg_1_5_f2ull_rp() {
define i64 @test_neg_1_5_f2ull_rz() {
; CHECK-LABEL: define i64 @test_neg_1_5_f2ull_rz() {
-; CHECK-NEXT: [[RES:%.*]] = call i64 @llvm.nvvm.f2ull.rz(float -1.500000e+00)
-; CHECK-NEXT: ret i64 [[RES]]
+; CHECK-NEXT: ret i64 0
;
%res = call i64 @llvm.nvvm.f2ull.rz(float -1.5)
ret i64 %res
@@ -374,8 +370,7 @@ define i64 @test_neg_1_5_f2ull_rz() {
;+-------------------------------------------------------------+
define i64 @test_neg_1_5_f2ull_rm_ftz() {
; CHECK-LABEL: define i64 @test_neg_1_5_f2ull_rm_ftz() {
-; CHECK-NEXT: [[RES:%.*]] = call i64 @llvm.nvvm.f2ull.rm.ftz(float -1.500000e+00)
-; CHECK-NEXT: ret i64 [[RES]]
+; CHECK-NEXT: ret i64 0
;
%res = call i64 @llvm.nvvm.f2ull.rm.ftz(float -1.5)
ret i64 %res
@@ -383,8 +378,7 @@ define i64 @test_neg_1_5_f2ull_rm_ftz() {
define i64 @test_neg_1_5_f2ull_rn_ftz() {
; CHECK-LABEL: define i64 @test_neg_1_5_f2ull_rn_ftz() {
-; CHECK-NEXT: [[RES:%.*]] = call i64 @llvm.nvvm.f2ull.rn.ftz(float -1.500000e+00)
-; CHECK-NEXT: ret i64 [[RES]]
+; CHECK-NEXT: ret i64 0
;
%res = call i64 @llvm.nvvm.f2ull.rn.ftz(float -1.5)
ret i64 %res
@@ -392,8 +386,7 @@ define i64 @test_neg_1_5_f2ull_rn_ftz() {
define i64 @test_neg_1_5_f2ull_rp_ftz() {
; CHECK-LABEL: define i64 @test_neg_1_5_f2ull_rp_ftz() {
-; CHECK-NEXT: [[RES:%.*]] = call i64 @llvm.nvvm.f2ull.rp.ftz(float -1.500000e+00)
-; CHECK-NEXT: ret i64 [[RES]]
+; CHECK-NEXT: ret i64 0
;
%res = call i64 @llvm.nvvm.f2ull.rp.ftz(float -1.5)
ret i64 %res
@@ -401,8 +394,7 @@ define i64 @test_neg_1_5_f2ull_rp_ftz() {
define i64 @test_neg_1_5_f2ull_rz_ftz() {
; CHECK-LABEL: define i64 @test_neg_1_5_f2ull_rz_ftz() {
-; CHECK-NEXT: [[RES:%.*]] = call i64 @llvm.nvvm.f2ull.rz.ftz...
[truncated]
|
@llvm/pr-subscribers-llvm-transforms Author: Lewis Crawford (LewisCrawford) ChangesFix the NaN-handling semantics of various NVVM intrinsics converting from fp types to integer types. Previously in ConstantFolding, NaN inputs would be constant-folded to 0. However, v9.0 of the PTX spec states that: In float-to-integer conversions, depending upon conversion types, NaN input results in following value:
Also, support for constant-folding +/-Inf and values which overflow/underflow the integer output type has been added (they clamp to min/max int). Because of this NaN-handling semantic difference, we also need to disable transforming several intrinsics to FPToSI/FPToUI, as the LLVM intstruction will return poison, but the intrinsics have defined behaviour for these edge-cases like NaN/Inf/overflow. Patch is 30.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159530.diff 6 Files Affected:
diff --git a/llvm/include/llvm/IR/NVVMIntrinsicUtils.h b/llvm/include/llvm/IR/NVVMIntrinsicUtils.h
index cc4929a1ff8da..0e2d903bb2500 100644
--- a/llvm/include/llvm/IR/NVVMIntrinsicUtils.h
+++ b/llvm/include/llvm/IR/NVVMIntrinsicUtils.h
@@ -180,6 +180,70 @@ inline bool FPToIntegerIntrinsicResultIsSigned(Intrinsic::ID IntrinsicID) {
"Checking invalid f2i/d2i intrinsic for signed int conversion");
}
+inline bool FPToIntegerIntrinsicNaNZero(Intrinsic::ID IntrinsicID) {
+ switch (IntrinsicID) {
+ // f2i
+ case Intrinsic::nvvm_f2i_rm:
+ case Intrinsic::nvvm_f2i_rn:
+ case Intrinsic::nvvm_f2i_rp:
+ case Intrinsic::nvvm_f2i_rz:
+ case Intrinsic::nvvm_f2i_rm_ftz:
+ case Intrinsic::nvvm_f2i_rn_ftz:
+ case Intrinsic::nvvm_f2i_rp_ftz:
+ case Intrinsic::nvvm_f2i_rz_ftz:
+ // f2ui
+ case Intrinsic::nvvm_f2ui_rm:
+ case Intrinsic::nvvm_f2ui_rn:
+ case Intrinsic::nvvm_f2ui_rp:
+ case Intrinsic::nvvm_f2ui_rz:
+ case Intrinsic::nvvm_f2ui_rm_ftz:
+ case Intrinsic::nvvm_f2ui_rn_ftz:
+ case Intrinsic::nvvm_f2ui_rp_ftz:
+ case Intrinsic::nvvm_f2ui_rz_ftz:
+ return true;
+ // d2i
+ case Intrinsic::nvvm_d2i_rm:
+ case Intrinsic::nvvm_d2i_rn:
+ case Intrinsic::nvvm_d2i_rp:
+ case Intrinsic::nvvm_d2i_rz:
+ // d2ui
+ case Intrinsic::nvvm_d2ui_rm:
+ case Intrinsic::nvvm_d2ui_rn:
+ case Intrinsic::nvvm_d2ui_rp:
+ case Intrinsic::nvvm_d2ui_rz:
+ // f2ll
+ case Intrinsic::nvvm_f2ll_rm:
+ case Intrinsic::nvvm_f2ll_rn:
+ case Intrinsic::nvvm_f2ll_rp:
+ case Intrinsic::nvvm_f2ll_rz:
+ case Intrinsic::nvvm_f2ll_rm_ftz:
+ case Intrinsic::nvvm_f2ll_rn_ftz:
+ case Intrinsic::nvvm_f2ll_rp_ftz:
+ case Intrinsic::nvvm_f2ll_rz_ftz:
+ // f2ull
+ case Intrinsic::nvvm_f2ull_rm:
+ case Intrinsic::nvvm_f2ull_rn:
+ case Intrinsic::nvvm_f2ull_rp:
+ case Intrinsic::nvvm_f2ull_rz:
+ case Intrinsic::nvvm_f2ull_rm_ftz:
+ case Intrinsic::nvvm_f2ull_rn_ftz:
+ case Intrinsic::nvvm_f2ull_rp_ftz:
+ case Intrinsic::nvvm_f2ull_rz_ftz:
+ // d2ll
+ case Intrinsic::nvvm_d2ll_rm:
+ case Intrinsic::nvvm_d2ll_rn:
+ case Intrinsic::nvvm_d2ll_rp:
+ case Intrinsic::nvvm_d2ll_rz:
+ // d2ull
+ case Intrinsic::nvvm_d2ull_rm:
+ case Intrinsic::nvvm_d2ull_rn:
+ case Intrinsic::nvvm_d2ull_rp:
+ case Intrinsic::nvvm_d2ull_rz:
+ return false;
+ }
+ llvm_unreachable("Checking NaN result for invalid f2i/d2i intrinsic");
+}
+
inline APFloat::roundingMode
GetFPToIntegerRoundingMode(Intrinsic::ID IntrinsicID) {
switch (IntrinsicID) {
diff --git a/llvm/lib/Analysis/ConstantFolding.cpp b/llvm/lib/Analysis/ConstantFolding.cpp
index a3b2e62a1b8ba..b03f0bca3c26c 100755
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -2625,8 +2625,17 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
case Intrinsic::nvvm_d2ull_rp:
case Intrinsic::nvvm_d2ull_rz: {
// In float-to-integer conversion, NaN inputs are converted to 0.
- if (U.isNaN())
- return ConstantInt::get(Ty, 0);
+ if (U.isNaN()) {
+ // In float-to-integer conversion, NaN inputs are converted to 0
+ // when the source and destination bitwidths are both less than 64.
+ if (nvvm::FPToIntegerIntrinsicNaNZero(IntrinsicID))
+ return ConstantInt::get(Ty, 0);
+
+ // Otherwise, the most significant bit is set.
+ unsigned BitWidth = Ty->getIntegerBitWidth();
+ uint64_t Val = 1ULL << (BitWidth - 1);
+ return ConstantInt::get(Ty, APInt(BitWidth, Val, /*IsSigned=*/false));
+ }
APFloat::roundingMode RMode =
nvvm::GetFPToIntegerRoundingMode(IntrinsicID);
@@ -2636,13 +2645,11 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
APSInt ResInt(Ty->getIntegerBitWidth(), !IsSigned);
auto FloatToRound = IsFTZ ? FTZPreserveSign(U) : U;
+ // Return max/min value for integers if the result is +/-inf or
+ // is too large to fit in the result's integer bitwidth.
bool IsExact = false;
- APFloat::opStatus Status =
- FloatToRound.convertToInteger(ResInt, RMode, &IsExact);
-
- if (Status != APFloat::opInvalidOp)
- return ConstantInt::get(Ty, ResInt);
- return nullptr;
+ FloatToRound.convertToInteger(ResInt, RMode, &IsExact);
+ return ConstantInt::get(Ty, ResInt);
}
}
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
index f4f89613b358d..4647b3cf8039d 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
@@ -281,21 +281,12 @@ static Instruction *convertNvvmIntrinsicToLlvm(InstCombiner &IC,
return {Intrinsic::trunc, FTZ_MustBeOn};
// NVVM intrinsics that map to LLVM cast operations.
- //
- // Note that llvm's target-generic conversion operators correspond to the rz
- // (round to zero) versions of the nvvm conversion intrinsics, even though
- // most everything else here uses the rn (round to nearest even) nvvm ops.
- case Intrinsic::nvvm_d2i_rz:
- case Intrinsic::nvvm_f2i_rz:
- case Intrinsic::nvvm_d2ll_rz:
- case Intrinsic::nvvm_f2ll_rz:
- return {Instruction::FPToSI};
- case Intrinsic::nvvm_d2ui_rz:
- case Intrinsic::nvvm_f2ui_rz:
- case Intrinsic::nvvm_d2ull_rz:
- case Intrinsic::nvvm_f2ull_rz:
- return {Instruction::FPToUI};
- // Integer to floating-point uses RN rounding, not RZ
+ // Note - we cannot map intrinsics like nvvm_d2ll_rz to LLVM's
+ // FPToSI, as NaN to int conversion with FPToSI is considered UB and is
+ // eliminated. NVVM conversion intrinsics are translated to PTX cvt
+ // instructions which define the outcome for NaN rather than leaving as UB.
+ // Therefore, translate NVVM intrinsics to sitofp/uitofp, but not to
+ // fptosi/fptoui.
case Intrinsic::nvvm_i2d_rn:
case Intrinsic::nvvm_i2f_rn:
case Intrinsic::nvvm_ll2d_rn:
diff --git a/llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll b/llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
index 1819f4ed181c0..4d856699b2d24 100644
--- a/llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
+++ b/llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll
@@ -185,52 +185,63 @@ define float @trunc_float_ftz(float %a) #0 {
}
; Check NVVM intrinsics that correspond to LLVM cast operations.
+; fp -> integer casts should not be converted, as the semantics
+; for NaN/Inf/Overflow inputs are different.
+; Only integer -> fp casts should be converted.
; CHECK-LABEL: @test_d2i
define i32 @test_d2i(double %a) #0 {
-; CHECK: fptosi double %a to i32
+; CHECK: call i32 @llvm.nvvm.d2i.rz(double %a)
+; CHECK-NOT: fptosi double %a to i32
%ret = call i32 @llvm.nvvm.d2i.rz(double %a)
ret i32 %ret
}
; CHECK-LABEL: @test_f2i
define i32 @test_f2i(float %a) #0 {
-; CHECK: fptosi float %a to i32
+; CHECK: call i32 @llvm.nvvm.f2i.rz(float %a)
+; CHECK-NOT: fptosi float %a to i32
%ret = call i32 @llvm.nvvm.f2i.rz(float %a)
ret i32 %ret
}
; CHECK-LABEL: @test_d2ll
define i64 @test_d2ll(double %a) #0 {
-; CHECK: fptosi double %a to i64
+; CHECK: call i64 @llvm.nvvm.d2ll.rz(double %a)
+; CHECK-NOT: fptosi double %a to i64
%ret = call i64 @llvm.nvvm.d2ll.rz(double %a)
ret i64 %ret
}
; CHECK-LABEL: @test_f2ll
define i64 @test_f2ll(float %a) #0 {
-; CHECK: fptosi float %a to i64
+; CHECK: call i64 @llvm.nvvm.f2ll.rz(float %a)
+; CHECK-NOT: fptosi float %a to i64
%ret = call i64 @llvm.nvvm.f2ll.rz(float %a)
ret i64 %ret
}
; CHECK-LABEL: @test_d2ui
define i32 @test_d2ui(double %a) #0 {
-; CHECK: fptoui double %a to i32
+; CHECK: call i32 @llvm.nvvm.d2ui.rz(double %a)
+; CHECK-NOT: fptoui double %a to i32
%ret = call i32 @llvm.nvvm.d2ui.rz(double %a)
ret i32 %ret
}
; CHECK-LABEL: @test_f2ui
define i32 @test_f2ui(float %a) #0 {
-; CHECK: fptoui float %a to i32
+; CHECK: call i32 @llvm.nvvm.f2ui.rz(float %a)
+; CHECK-NOT: fptoui float %a to i32
%ret = call i32 @llvm.nvvm.f2ui.rz(float %a)
ret i32 %ret
}
; CHECK-LABEL: @test_d2ull
define i64 @test_d2ull(double %a) #0 {
-; CHECK: fptoui double %a to i64
+; CHECK: call i64 @llvm.nvvm.d2ull.rz(double %a)
+; CHECK-NOT: fptoui double %a to i64
%ret = call i64 @llvm.nvvm.d2ull.rz(double %a)
ret i64 %ret
}
; CHECK-LABEL: @test_f2ull
define i64 @test_f2ull(float %a) #0 {
-; CHECK: fptoui float %a to i64
+; CHECK: call i64 @llvm.nvvm.f2ull.rz(float %a)
+; CHECK-NOT: fptoui float %a to i64
%ret = call i64 @llvm.nvvm.f2ull.rz(float %a)
ret i64 %ret
}
@@ -497,4 +508,4 @@ declare float @llvm.nvvm.ui2f.rn(i32)
declare double @llvm.nvvm.ull2d.rn(i64)
declare float @llvm.nvvm.ull2f.rn(i64)
declare i32 @llvm.nvvm.fshr.clamp.i32(i32, i32, i32)
-declare i32 @llvm.nvvm.fshl.clamp.i32(i32, i32, i32)
\ No newline at end of file
+declare i32 @llvm.nvvm.fshl.clamp.i32(i32, i32, i32)
diff --git a/llvm/test/Transforms/InstSimplify/const-fold-nvvm-f2i-d2i.ll b/llvm/test/Transforms/InstSimplify/const-fold-nvvm-f2i-d2i.ll
index 543c73137c1b6..b1a1e6b86c293 100644
--- a/llvm/test/Transforms/InstSimplify/const-fold-nvvm-f2i-d2i.ll
+++ b/llvm/test/Transforms/InstSimplify/const-fold-nvvm-f2i-d2i.ll
@@ -334,8 +334,7 @@ define i32 @test_neg_1_5_d2i_rz() {
;+-------------------------------------------------------------+
define i32 @test_neg_1_5_f2ui_rm() {
; CHECK-LABEL: define i32 @test_neg_1_5_f2ui_rm() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.f2ui.rm(float -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.f2ui.rm(float -1.5)
ret i32 %res
@@ -343,8 +342,7 @@ define i32 @test_neg_1_5_f2ui_rm() {
define i32 @test_neg_1_5_f2ui_rn() {
; CHECK-LABEL: define i32 @test_neg_1_5_f2ui_rn() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.f2ui.rn(float -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.f2ui.rn(float -1.5)
ret i32 %res
@@ -353,8 +351,7 @@ define i32 @test_neg_1_5_f2ui_rn() {
define i32 @test_neg_1_5_f2ui_rp() {
; CHECK-LABEL: define i32 @test_neg_1_5_f2ui_rp() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.f2ui.rp(float -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.f2ui.rp(float -1.5)
ret i32 %res
@@ -362,8 +359,7 @@ define i32 @test_neg_1_5_f2ui_rp() {
define i32 @test_neg_1_5_f2ui_rz() {
; CHECK-LABEL: define i32 @test_neg_1_5_f2ui_rz() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.f2ui.rz(float -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.f2ui.rz(float -1.5)
ret i32 %res
@@ -374,8 +370,7 @@ define i32 @test_neg_1_5_f2ui_rz() {
;+-------------------------------------------------------------+
define i32 @test_neg_1_5_f2ui_rm_ftz() {
; CHECK-LABEL: define i32 @test_neg_1_5_f2ui_rm_ftz() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.f2ui.rm.ftz(float -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.f2ui.rm.ftz(float -1.5)
ret i32 %res
@@ -383,8 +378,7 @@ define i32 @test_neg_1_5_f2ui_rm_ftz() {
define i32 @test_neg_1_5_f2ui_rn_ftz() {
; CHECK-LABEL: define i32 @test_neg_1_5_f2ui_rn_ftz() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.f2ui.rn.ftz(float -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.f2ui.rn.ftz(float -1.5)
ret i32 %res
@@ -392,8 +386,7 @@ define i32 @test_neg_1_5_f2ui_rn_ftz() {
define i32 @test_neg_1_5_f2ui_rp_ftz() {
; CHECK-LABEL: define i32 @test_neg_1_5_f2ui_rp_ftz() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.f2ui.rp.ftz(float -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.f2ui.rp.ftz(float -1.5)
ret i32 %res
@@ -401,8 +394,7 @@ define i32 @test_neg_1_5_f2ui_rp_ftz() {
define i32 @test_neg_1_5_f2ui_rz_ftz() {
; CHECK-LABEL: define i32 @test_neg_1_5_f2ui_rz_ftz() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.f2ui.rz.ftz(float -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.f2ui.rz.ftz(float -1.5)
ret i32 %res
@@ -412,8 +404,7 @@ define i32 @test_neg_1_5_f2ui_rz_ftz() {
;+-------------------------------------------------------------+
define i32 @test_neg_1_5_d2ui_rm() {
; CHECK-LABEL: define i32 @test_neg_1_5_d2ui_rm() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.d2ui.rm(double -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.d2ui.rm(double -1.5)
ret i32 %res
@@ -421,8 +412,7 @@ define i32 @test_neg_1_5_d2ui_rm() {
define i32 @test_neg_1_5_d2ui_rn() {
; CHECK-LABEL: define i32 @test_neg_1_5_d2ui_rn() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.d2ui.rn(double -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.d2ui.rn(double -1.5)
ret i32 %res
@@ -431,8 +421,7 @@ define i32 @test_neg_1_5_d2ui_rn() {
define i32 @test_neg_1_5_d2ui_rp() {
; CHECK-LABEL: define i32 @test_neg_1_5_d2ui_rp() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.d2ui.rp(double -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.d2ui.rp(double -1.5)
ret i32 %res
@@ -440,8 +429,7 @@ define i32 @test_neg_1_5_d2ui_rp() {
define i32 @test_neg_1_5_d2ui_rz() {
; CHECK-LABEL: define i32 @test_neg_1_5_d2ui_rz() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.d2ui.rz(double -1.500000e+00)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.d2ui.rz(double -1.5)
ret i32 %res
@@ -526,7 +514,7 @@ define i32 @test_nan_f2i_rz_ftz() {
;+-------------------------------------------------------------+
define i32 @test_nan_d2i_rm() {
; CHECK-LABEL: define i32 @test_nan_d2i_rm() {
-; CHECK-NEXT: ret i32 0
+; CHECK-NEXT: ret i32 -2147483648
;
%res = call i32 @llvm.nvvm.d2i.rm(double 0xFFF8000000000000)
ret i32 %res
@@ -534,7 +522,7 @@ define i32 @test_nan_d2i_rm() {
define i32 @test_nan_d2i_rn() {
; CHECK-LABEL: define i32 @test_nan_d2i_rn() {
-; CHECK-NEXT: ret i32 0
+; CHECK-NEXT: ret i32 -2147483648
;
%res = call i32 @llvm.nvvm.d2i.rn(double 0xFFF8000000000000)
ret i32 %res
@@ -543,7 +531,7 @@ define i32 @test_nan_d2i_rn() {
define i32 @test_nan_d2i_rp() {
; CHECK-LABEL: define i32 @test_nan_d2i_rp() {
-; CHECK-NEXT: ret i32 0
+; CHECK-NEXT: ret i32 -2147483648
;
%res = call i32 @llvm.nvvm.d2i.rp(double 0xFFF8000000000000)
ret i32 %res
@@ -551,7 +539,7 @@ define i32 @test_nan_d2i_rp() {
define i32 @test_nan_d2i_rz() {
; CHECK-LABEL: define i32 @test_nan_d2i_rz() {
-; CHECK-NEXT: ret i32 0
+; CHECK-NEXT: ret i32 -2147483648
;
%res = call i32 @llvm.nvvm.d2i.rz(double 0xFFF8000000000000)
ret i32 %res
@@ -632,7 +620,7 @@ define i32 @test_nan_f2ui_rz_ftz() {
;+-------------------------------------------------------------+
define i32 @test_nan_d2ui_rm() {
; CHECK-LABEL: define i32 @test_nan_d2ui_rm() {
-; CHECK-NEXT: ret i32 0
+; CHECK-NEXT: ret i32 -2147483648
;
%res = call i32 @llvm.nvvm.d2ui.rm(double 0xFFF8000000000000)
ret i32 %res
@@ -640,7 +628,7 @@ define i32 @test_nan_d2ui_rm() {
define i32 @test_nan_d2ui_rn() {
; CHECK-LABEL: define i32 @test_nan_d2ui_rn() {
-; CHECK-NEXT: ret i32 0
+; CHECK-NEXT: ret i32 -2147483648
;
%res = call i32 @llvm.nvvm.d2ui.rn(double 0xFFF8000000000000)
ret i32 %res
@@ -649,7 +637,7 @@ define i32 @test_nan_d2ui_rn() {
define i32 @test_nan_d2ui_rp() {
; CHECK-LABEL: define i32 @test_nan_d2ui_rp() {
-; CHECK-NEXT: ret i32 0
+; CHECK-NEXT: ret i32 -2147483648
;
%res = call i32 @llvm.nvvm.d2ui.rp(double 0xFFF8000000000000)
ret i32 %res
@@ -657,7 +645,7 @@ define i32 @test_nan_d2ui_rp() {
define i32 @test_nan_d2ui_rz() {
; CHECK-LABEL: define i32 @test_nan_d2ui_rz() {
-; CHECK-NEXT: ret i32 0
+; CHECK-NEXT: ret i32 -2147483648
;
%res = call i32 @llvm.nvvm.d2ui.rz(double 0xFFF8000000000000)
ret i32 %res
@@ -994,8 +982,7 @@ define i32 @test_neg_subnormal_d2i_rz() {
;+-------------------------------------------------------------+
define i32 @test_neg_subnormal_f2ui_rm() {
; CHECK-LABEL: define i32 @test_neg_subnormal_f2ui_rm() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.f2ui.rm(float 0xB80FFFFFC0000000)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.f2ui.rm(float 0xB80FFFFFC0000000)
ret i32 %res
@@ -1065,8 +1052,7 @@ define i32 @test_neg_subnormal_f2ui_rz_ftz() {
;+-------------------------------------------------------------+
define i32 @test_neg_subnormal_d2ui_rm() {
; CHECK-LABEL: define i32 @test_neg_subnormal_d2ui_rm() {
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.nvvm.d2ui.rm(double 0x800FFFFFFFFFFFFF)
-; CHECK-NEXT: ret i32 [[RES]]
+; CHECK-NEXT: ret i32 0
;
%res = call i32 @llvm.nvvm.d2ui.rm(double 0x800fffffffffffff)
ret i32 %res
diff --git a/llvm/test/Transforms/InstSimplify/const-fold-nvvm-f2ll-d2ll.ll b/llvm/test/Transforms/InstSimplify/const-fold-nvvm-f2ll-d2ll.ll
index be38177dce2c3..ffadf26f3c5b5 100644
--- a/llvm/test/Transforms/InstSimplify/const-fold-nvvm-f2ll-d2ll.ll
+++ b/llvm/test/Transforms/InstSimplify/const-fold-nvvm-f2ll-d2ll.ll
@@ -334,8 +334,7 @@ define i64 @test_neg_1_5_d2ll_rz() {
;+-------------------------------------------------------------+
define i64 @test_neg_1_5_f2ull_rm() {
; CHECK-LABEL: define i64 @test_neg_1_5_f2ull_rm() {
-; CHECK-NEXT: [[RES:%.*]] = call i64 @llvm.nvvm.f2ull.rm(float -1.500000e+00)
-; CHECK-NEXT: ret i64 [[RES]]
+; CHECK-NEXT: ret i64 0
;
%res = call i64 @llvm.nvvm.f2ull.rm(float -1.5)
ret i64 %res
@@ -343,8 +342,7 @@ define i64 @test_neg_1_5_f2ull_rm() {
define i64 @test_neg_1_5_f2ull_rn() {
; CHECK-LABEL: define i64 @test_neg_1_5_f2ull_rn() {
-; CHECK-NEXT: [[RES:%.*]] = call i64 @llvm.nvvm.f2ull.rn(float -1.500000e+00)
-; CHECK-NEXT: ret i64 [[RES]]
+; CHECK-NEXT: ret i64 0
;
%res = call i64 @llvm.nvvm.f2ull.rn(float -1.5)
ret i64 %res
@@ -353,8 +351,7 @@ define i64 @test_neg_1_5_f2ull_rn() {
define i64 @test_neg_1_5_f2ull_rp() {
; CHECK-LABEL: define i64 @test_neg_1_5_f2ull_rp() {
-; CHECK-NEXT: [[RES:%.*]] = call i64 @llvm.nvvm.f2ull.rp(float -1.500000e+00)
-; CHECK-NEXT: ret i64 [[RES]]
+; CHECK-NEXT: ret i64 0
;
%res = call i64 @llvm.nvvm.f2ull.rp(float -1.5)
ret i64 %res
@@ -362,8 +359,7 @@ define i64 @test_neg_1_5_f2ull_rp() {
define i64 @test_neg_1_5_f2ull_rz() {
; CHECK-LABEL: define i64 @test_neg_1_5_f2ull_rz() {
-; CHECK-NEXT: [[RES:%.*]] = call i64 @llvm.nvvm.f2ull.rz(float -1.500000e+00)
-; CHECK-NEXT: ret i64 [[RES]]
+; CHECK-NEXT: ret i64 0
;
%res = call i64 @llvm.nvvm.f2ull.rz(float -1.5)
ret i64 %res
@@ -374,8 +370,7 @@ define i64 @test_neg_1_5_f2ull_rz() {
;+-------------------------------------------------------------+
define i64 @test_neg_1_5_f2ull_rm_ftz() {
; CHECK-LABEL: define i64 @test_neg_1_5_f2ull_rm_ftz() {
-; CHECK-NEXT: [[RES:%.*]] = call i64 @llvm.nvvm.f2ull.rm.ftz(float -1.500000e+00)
-; CHECK-NEXT: ret i64 [[RES]]
+; CHECK-NEXT: ret i64 0
;
%res = call i64 @llvm.nvvm.f2ull.rm.ftz(float -1.5)
ret i64 %res
@@ -383,8 +378,7 @@ define i64 @test_neg_1_5_f2ull_rm_ftz() {
define i64 @test_neg_1_5_f2ull_rn_ftz() {
; CHECK-LABEL: define i64 @test_neg_1_5_f2ull_rn_ftz() {
-; CHECK-NEXT: [[RES:%.*]] = call i64 @llvm.nvvm.f2ull.rn.ftz(float -1.500000e+00)
-; CHECK-NEXT: ret i64 [[RES]]
+; CHECK-NEXT: ret i64 0
;
%res = call i64 @llvm.nvvm.f2ull.rn.ftz(float -1.5)
ret i64 %res
@@ -392,8 +386,7 @@ define i64 @test_neg_1_5_f2ull_rn_ftz() {
define i64 @test_neg_1_5_f2ull_rp_ftz() {
; CHECK-LABEL: define i64 @test_neg_1_5_f2ull_rp_ftz() {
-; CHECK-NEXT: [[RES:%.*]] = call i64 @llvm.nvvm.f2ull.rp.ftz(float -1.500000e+00)
-; CHECK-NEXT: ret i64 [[RES]]
+; CHECK-NEXT: ret i64 0
;
%res = call i64 @llvm.nvvm.f2ull.rp.ftz(float -1.5)
ret i64 %res
@@ -401,8 +394,7 @@ define i64 @test_neg_1_5_f2ull_rp_ftz() {
define i64 @test_neg_1_5_f2ull_rz_ftz() {
; CHECK-LABEL: define i64 @test_neg_1_5_f2ull_rz_ftz() {
-; CHECK-NEXT: [[RES:%.*]] = call i64 @llvm.nvvm.f2ull.rz.ftz...
[truncated]
|
Does the behavior changed in PTX9.0, or did the instructions already behave that way and only got documented in PTX9.0? If it's the new behavior, then |
Fix the NaN-handling semantics of various NVVM intrinsics converting from fp types to integer types.
Previously in ConstantFolding, NaN inputs would be constant-folded to 0. However, v9.0 of the PTX spec states that:
In float-to-integer conversions, depending upon conversion types, NaN input results in following value:
.f64
and destination is not.s64
, .u64
.1 << (BitWidth(dst) - 1)
corresponding to the value of(MAXINT >> 1) + 1
for unsigned type orMININT
for signed type.Also, support for constant-folding +/-Inf and values which overflow/underflow the integer output type has been added (they clamp to min/max int).
Because of this NaN-handling semantic difference, we also need to disable transforming several intrinsics to FPToSI/FPToUI, as the LLVM intstruction will return poison, but the intrinsics have defined behaviour for these edge-cases like NaN/Inf/overflow.