Skip to content

[ValueTracking] Return true for AddrSpaceCast in canCreateUndefOrPoison #144686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

wenju-he
Copy link
Contributor

In our downstream GPU target, following IR is valid before instcombine although the second addrspacecast causes UB.
define i1 @test(ptr addrspace(1) noundef %v) {
%0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4)
%1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0)
%2 = icmp eq i32 %1, 0
%3 = addrspacecast ptr addrspace(4) %0 to ptr addrspace(3)
%4 = select i1 %2, ptr addrspace(3) null, ptr addrspace(3) %3
%5 = icmp eq ptr addrspace(3) %4, null
ret i1 %5
}
We have a custom optimization that replaces invalid addrspacecast with poison, and IR is still valid since select stops poison propagation.

However, instcombine pass optimizes select to or:
%0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4)
%1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0)
%2 = icmp eq i32 %1, 0
%3 = addrspacecast ptr addrspace(1) %v to ptr addrspace(3)
%4 = icmp eq ptr addrspace(3) %3, null
%5 = or i1 %2, %4
ret i1 %5
The transform is invalid for our target.

…invalid addrspacecast inst

In our downstream GPU target, following IR is valid before instcombine
although the second addrspacecast causes UB.
  define i1 @test(ptr addrspace(1) noundef %v) {
    %0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4)
    %1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0)
    %2 = icmp eq i32 %1, 0
    %3 = addrspacecast ptr addrspace(4) %0 to ptr addrspace(3)
    %4 = select i1 %2, ptr addrspace(3) null, ptr addrspace(3) %3
    %5 = icmp eq ptr addrspace(3) %4, null
    ret i1 %5
  }
We have a custom optimization that replaces invalid addrspacecast with
poison, and IR is still valid since `select` stops poison propagation.

However, instcombine pass optimizes `select` to `or`:
    %0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4)
    %1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0)
    %2 = icmp eq i32 %1, 0
    %3 = addrspacecast ptr addrspace(1) %v to ptr addrspace(3)
    %4 = icmp eq ptr addrspace(3) %3, null
    %5 = or i1 %2, %4
    ret i1 %5
The transform is invalid for our target.
@wenju-he wenju-he requested a review from nikic as a code owner June 18, 2025 12:24
@llvmbot llvmbot added backend:AMDGPU llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Jun 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-backend-amdgpu

Author: Wenju He (wenju-he)

Changes

In our downstream GPU target, following IR is valid before instcombine although the second addrspacecast causes UB.
define i1 @test(ptr addrspace(1) noundef %v) {
%0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4)
%1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0)
%2 = icmp eq i32 %1, 0
%3 = addrspacecast ptr addrspace(4) %0 to ptr addrspace(3)
%4 = select i1 %2, ptr addrspace(3) null, ptr addrspace(3) %3
%5 = icmp eq ptr addrspace(3) %4, null
ret i1 %5
}
We have a custom optimization that replaces invalid addrspacecast with poison, and IR is still valid since select stops poison propagation.

However, instcombine pass optimizes select to or:
%0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4)
%1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0)
%2 = icmp eq i32 %1, 0
%3 = addrspacecast ptr addrspace(1) %v to ptr addrspace(3)
%4 = icmp eq ptr addrspace(3) %3, null
%5 = or i1 %2, %4
ret i1 %5
The transform is invalid for our target.


Full diff: https://github.com/llvm/llvm-project/pull/144686.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp (+29-6)
  • (added) llvm/test/Transforms/InstCombine/AMDGPU/addrspacecast.ll (+23)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
index 73ba0f78e8053..a2335640f917b 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
@@ -3194,8 +3194,23 @@ static Instruction *foldNestedSelects(SelectInst &OuterSelVal,
 /// Return true if V is poison or \p Expected given that ValAssumedPoison is
 /// already poison. For example, if ValAssumedPoison is `icmp samesign X, 10`
 /// and V is `icmp ne X, 5`, impliesPoisonOrCond returns true.
-static bool impliesPoisonOrCond(const Value *ValAssumedPoison, const Value *V,
-                                bool Expected) {
+static bool impliesPoisonOrCond(
+    const Value *ValAssumedPoison, const Value *V, bool Expected,
+    llvm::function_ref<bool(unsigned, unsigned)> isValidAddrSpaceCast) {
+  // Handle the case that ValAssumedPoison is `icmp eq ptr addrspace(3) X, null`
+  // and X is `addrspacecast ptr addrspace(1) Y to ptr addrspace(3)`. Target can
+  // replace X with poison if the addrspacecast is invalid. However, `V` might
+  // not be poison.
+  if (auto *ICmp = dyn_cast<ICmpInst>(ValAssumedPoison)) {
+    auto CanCreatePoison = [&](Value *Op) {
+      auto *ASC = dyn_cast<AddrSpaceCastInst>(Op);
+      return ASC && !isValidAddrSpaceCast(ASC->getDestAddressSpace(),
+                                          ASC->getSrcAddressSpace());
+    };
+    if (llvm::any_of(ICmp->operands(), CanCreatePoison))
+      return false;
+  }
+
   if (impliesPoison(ValAssumedPoison, V))
     return true;
 
@@ -3241,17 +3256,23 @@ Instruction *InstCombinerImpl::foldSelectOfBools(SelectInst &SI) {
   auto *Zero = ConstantInt::getFalse(SelType);
   Value *A, *B, *C, *D;
 
+  auto IsValidAddrSpaceCast = [&](unsigned FromAS, unsigned ToAS) {
+    return isValidAddrSpaceCast(FromAS, ToAS);
+  };
+
   // Folding select to and/or i1 isn't poison safe in general. impliesPoison
   // checks whether folding it does not convert a well-defined value into
   // poison.
   if (match(TrueVal, m_One())) {
-    if (impliesPoisonOrCond(FalseVal, CondVal, /*Expected=*/false)) {
+    if (impliesPoisonOrCond(FalseVal, CondVal, /*Expected=*/false,
+                            IsValidAddrSpaceCast)) {
       // Change: A = select B, true, C --> A = or B, C
       return BinaryOperator::CreateOr(CondVal, FalseVal);
     }
 
     if (match(CondVal, m_OneUse(m_Select(m_Value(A), m_One(), m_Value(B)))) &&
-        impliesPoisonOrCond(FalseVal, B, /*Expected=*/false)) {
+        impliesPoisonOrCond(FalseVal, B, /*Expected=*/false,
+                            IsValidAddrSpaceCast)) {
       // (A || B) || C --> A || (B | C)
       return replaceInstUsesWith(
           SI, Builder.CreateLogicalOr(A, Builder.CreateOr(B, FalseVal)));
@@ -3287,13 +3308,15 @@ Instruction *InstCombinerImpl::foldSelectOfBools(SelectInst &SI) {
   }
 
   if (match(FalseVal, m_Zero())) {
-    if (impliesPoisonOrCond(TrueVal, CondVal, /*Expected=*/true)) {
+    if (impliesPoisonOrCond(TrueVal, CondVal, /*Expected=*/true,
+                            IsValidAddrSpaceCast)) {
       // Change: A = select B, C, false --> A = and B, C
       return BinaryOperator::CreateAnd(CondVal, TrueVal);
     }
 
     if (match(CondVal, m_OneUse(m_Select(m_Value(A), m_Value(B), m_Zero()))) &&
-        impliesPoisonOrCond(TrueVal, B, /*Expected=*/true)) {
+        impliesPoisonOrCond(TrueVal, B, /*Expected=*/true,
+                            IsValidAddrSpaceCast)) {
       // (A && B) && C --> A && (B & C)
       return replaceInstUsesWith(
           SI, Builder.CreateLogicalAnd(A, Builder.CreateAnd(B, TrueVal)));
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/addrspacecast.ll b/llvm/test/Transforms/InstCombine/AMDGPU/addrspacecast.ll
new file mode 100644
index 0000000000000..4791d2c434884
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/addrspacecast.ll
@@ -0,0 +1,23 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes=instcombine %s | FileCheck %s
+
+; Check that `select B, true, C` isn't optimized to `or B, C`.
+define i1 @not_fold_select(ptr addrspace(1) noundef %x) {
+; CHECK-LABEL: define i1 @not_fold_select(
+; CHECK-SAME: ptr addrspace(1) noundef [[X:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = addrspacecast ptr addrspace(1) [[X]] to ptr
+; CHECK-NEXT:    [[TMP1:%.*]] = tail call i1 @llvm.amdgcn.is.shared(ptr [[TMP0]])
+; CHECK-NEXT:    [[TMP2:%.*]] = addrspacecast ptr addrspace(1) [[X]] to ptr addrspace(3)
+; CHECK-NEXT:    [[TMP3:%.*]] = icmp eq ptr addrspace(3) [[TMP2]], null
+; CHECK-NEXT:    [[TMP4:%.*]] = select i1 [[TMP1]], i1 true, i1 [[TMP3]]
+; CHECK-NEXT:    ret i1 [[TMP4]]
+;
+  entry:
+  %0 = addrspacecast ptr addrspace(1) %x to ptr
+  %1 = tail call i1 @llvm.amdgcn.is.shared(ptr %0)
+  %2 = addrspacecast ptr %0 to ptr addrspace(3)
+  %3 = select i1 %1, ptr addrspace(3) null, ptr addrspace(3) %2
+  %4 = icmp eq ptr addrspace(3) %3, null
+  ret i1 %4
+}

@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Wenju He (wenju-he)

Changes

In our downstream GPU target, following IR is valid before instcombine although the second addrspacecast causes UB.
define i1 @test(ptr addrspace(1) noundef %v) {
%0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4)
%1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0)
%2 = icmp eq i32 %1, 0
%3 = addrspacecast ptr addrspace(4) %0 to ptr addrspace(3)
%4 = select i1 %2, ptr addrspace(3) null, ptr addrspace(3) %3
%5 = icmp eq ptr addrspace(3) %4, null
ret i1 %5
}
We have a custom optimization that replaces invalid addrspacecast with poison, and IR is still valid since select stops poison propagation.

However, instcombine pass optimizes select to or:
%0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4)
%1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0)
%2 = icmp eq i32 %1, 0
%3 = addrspacecast ptr addrspace(1) %v to ptr addrspace(3)
%4 = icmp eq ptr addrspace(3) %3, null
%5 = or i1 %2, %4
ret i1 %5
The transform is invalid for our target.


Full diff: https://github.com/llvm/llvm-project/pull/144686.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp (+29-6)
  • (added) llvm/test/Transforms/InstCombine/AMDGPU/addrspacecast.ll (+23)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
index 73ba0f78e8053..a2335640f917b 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
@@ -3194,8 +3194,23 @@ static Instruction *foldNestedSelects(SelectInst &OuterSelVal,
 /// Return true if V is poison or \p Expected given that ValAssumedPoison is
 /// already poison. For example, if ValAssumedPoison is `icmp samesign X, 10`
 /// and V is `icmp ne X, 5`, impliesPoisonOrCond returns true.
-static bool impliesPoisonOrCond(const Value *ValAssumedPoison, const Value *V,
-                                bool Expected) {
+static bool impliesPoisonOrCond(
+    const Value *ValAssumedPoison, const Value *V, bool Expected,
+    llvm::function_ref<bool(unsigned, unsigned)> isValidAddrSpaceCast) {
+  // Handle the case that ValAssumedPoison is `icmp eq ptr addrspace(3) X, null`
+  // and X is `addrspacecast ptr addrspace(1) Y to ptr addrspace(3)`. Target can
+  // replace X with poison if the addrspacecast is invalid. However, `V` might
+  // not be poison.
+  if (auto *ICmp = dyn_cast<ICmpInst>(ValAssumedPoison)) {
+    auto CanCreatePoison = [&](Value *Op) {
+      auto *ASC = dyn_cast<AddrSpaceCastInst>(Op);
+      return ASC && !isValidAddrSpaceCast(ASC->getDestAddressSpace(),
+                                          ASC->getSrcAddressSpace());
+    };
+    if (llvm::any_of(ICmp->operands(), CanCreatePoison))
+      return false;
+  }
+
   if (impliesPoison(ValAssumedPoison, V))
     return true;
 
@@ -3241,17 +3256,23 @@ Instruction *InstCombinerImpl::foldSelectOfBools(SelectInst &SI) {
   auto *Zero = ConstantInt::getFalse(SelType);
   Value *A, *B, *C, *D;
 
+  auto IsValidAddrSpaceCast = [&](unsigned FromAS, unsigned ToAS) {
+    return isValidAddrSpaceCast(FromAS, ToAS);
+  };
+
   // Folding select to and/or i1 isn't poison safe in general. impliesPoison
   // checks whether folding it does not convert a well-defined value into
   // poison.
   if (match(TrueVal, m_One())) {
-    if (impliesPoisonOrCond(FalseVal, CondVal, /*Expected=*/false)) {
+    if (impliesPoisonOrCond(FalseVal, CondVal, /*Expected=*/false,
+                            IsValidAddrSpaceCast)) {
       // Change: A = select B, true, C --> A = or B, C
       return BinaryOperator::CreateOr(CondVal, FalseVal);
     }
 
     if (match(CondVal, m_OneUse(m_Select(m_Value(A), m_One(), m_Value(B)))) &&
-        impliesPoisonOrCond(FalseVal, B, /*Expected=*/false)) {
+        impliesPoisonOrCond(FalseVal, B, /*Expected=*/false,
+                            IsValidAddrSpaceCast)) {
       // (A || B) || C --> A || (B | C)
       return replaceInstUsesWith(
           SI, Builder.CreateLogicalOr(A, Builder.CreateOr(B, FalseVal)));
@@ -3287,13 +3308,15 @@ Instruction *InstCombinerImpl::foldSelectOfBools(SelectInst &SI) {
   }
 
   if (match(FalseVal, m_Zero())) {
-    if (impliesPoisonOrCond(TrueVal, CondVal, /*Expected=*/true)) {
+    if (impliesPoisonOrCond(TrueVal, CondVal, /*Expected=*/true,
+                            IsValidAddrSpaceCast)) {
       // Change: A = select B, C, false --> A = and B, C
       return BinaryOperator::CreateAnd(CondVal, TrueVal);
     }
 
     if (match(CondVal, m_OneUse(m_Select(m_Value(A), m_Value(B), m_Zero()))) &&
-        impliesPoisonOrCond(TrueVal, B, /*Expected=*/true)) {
+        impliesPoisonOrCond(TrueVal, B, /*Expected=*/true,
+                            IsValidAddrSpaceCast)) {
       // (A && B) && C --> A && (B & C)
       return replaceInstUsesWith(
           SI, Builder.CreateLogicalAnd(A, Builder.CreateAnd(B, TrueVal)));
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/addrspacecast.ll b/llvm/test/Transforms/InstCombine/AMDGPU/addrspacecast.ll
new file mode 100644
index 0000000000000..4791d2c434884
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/addrspacecast.ll
@@ -0,0 +1,23 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes=instcombine %s | FileCheck %s
+
+; Check that `select B, true, C` isn't optimized to `or B, C`.
+define i1 @not_fold_select(ptr addrspace(1) noundef %x) {
+; CHECK-LABEL: define i1 @not_fold_select(
+; CHECK-SAME: ptr addrspace(1) noundef [[X:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = addrspacecast ptr addrspace(1) [[X]] to ptr
+; CHECK-NEXT:    [[TMP1:%.*]] = tail call i1 @llvm.amdgcn.is.shared(ptr [[TMP0]])
+; CHECK-NEXT:    [[TMP2:%.*]] = addrspacecast ptr addrspace(1) [[X]] to ptr addrspace(3)
+; CHECK-NEXT:    [[TMP3:%.*]] = icmp eq ptr addrspace(3) [[TMP2]], null
+; CHECK-NEXT:    [[TMP4:%.*]] = select i1 [[TMP1]], i1 true, i1 [[TMP3]]
+; CHECK-NEXT:    ret i1 [[TMP4]]
+;
+  entry:
+  %0 = addrspacecast ptr addrspace(1) %x to ptr
+  %1 = tail call i1 @llvm.amdgcn.is.shared(ptr %0)
+  %2 = addrspacecast ptr %0 to ptr addrspace(3)
+  %3 = select i1 %1, ptr addrspace(3) null, ptr addrspace(3) %2
+  %4 = icmp eq ptr addrspace(3) %3, null
+  ret i1 %4
+}

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an invalid addrspacecast can produce poison, then this must be specified in

static bool canCreateUndefOrPoison(const Operator *Op, UndefPoisonKind Kind,
, not here. LangRef should also be adjusted in https://llvm.org/docs/LangRef.html#addrspacecast-to-instruction to specify that addrspacecast can introduce poison.

@nikic nikic requested a review from arsenm June 18, 2025 12:29
@arsenm
Copy link
Contributor

arsenm commented Jun 18, 2025

addrspacecast can produce poison. That's how we're handling the cases of addrspacecasts that are unimplementable

; CHECK-NEXT: ret i1 [[TMP4]]
;
entry:
%0 = addrspacecast ptr addrspace(1) %x to ptr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use named values in tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

bool Expected) {
static bool impliesPoisonOrCond(
const Value *ValAssumedPoison, const Value *V, bool Expected,
llvm::function_ref<bool(unsigned, unsigned)> isValidAddrSpaceCast) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need llvm:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@wenju-he
Copy link
Contributor Author

If an invalid addrspacecast can produce poison, then this must be specified in

static bool canCreateUndefOrPoison(const Operator *Op, UndefPoisonKind Kind,

This PR uses isValidAddrSpaceCast query from TTI which is available in InstCombine. But TTI isn't available in above canCreateUndefOrPoison function.
If we unconditionally return true for addrspacecast in canCreateUndefOrPoison, there is regression in following test probably because canCreateUndefOrPoison returning true for valid addrspacecast is preventing the optimization:

define amdgpu_kernel void @__omp_offloading_fd00_2c00523__ZN11qmcplusplus7ompBLAS9gemv_implIfEEiRiciiT_PKS3_iS5_iS3_PS3_i_l383() {
; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn
; CHECK-LABEL: define {{[^@]+}}@__omp_offloading_fd00_2c00523__ZN11qmcplusplus7ompBLAS9gemv_implIfEEiRiciiT_PKS3_iS5_iS3_PS3_i_l383
; CHECK-SAME: () #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: [[TMP1:%.*]] = alloca [0 x [0 x float]], i32 0, align 8, addrspace(5)
; CHECK-NEXT: [[TMP2:%.*]] = addrspacecast ptr addrspace(5) [[TMP1]] to ptr
; CHECK-NEXT: store ptr [[TMP2]], ptr addrspace(5) [[TMP1]], align 8
; CHECK-NEXT: [[TMP3:%.*]] = call fastcc i32 @__kmpc_nvptx_parallel_reduce_nowait_v2(ptr nofree noundef readonly align 8 captures(none) dereferenceable_or_null(8) [[TMP2]], i1 noundef false)
; CHECK-NEXT: ret void
;
%1 = alloca [0 x [0 x float]], i32 0, align 8, addrspace(5)
%2 = addrspacecast ptr addrspace(5) %1 to ptr
store ptr %2, ptr addrspace(5) %1, align 8
%3 = call fastcc i32 @__kmpc_nvptx_parallel_reduce_nowait_v2(ptr %2, i1 false)
ret void
}

LangRef should also be adjusted in https://llvm.org/docs/LangRef.html#addrspacecast-to-instruction to specify that addrspacecast can introduce poison.

done in ffff2c3, please review

@wenju-he wenju-he requested review from arsenm and nikic June 19, 2025 03:03
@dtcxzyw
Copy link
Member

dtcxzyw commented Jun 19, 2025

If an invalid addrspacecast can produce poison, then this must be specified in

static bool canCreateUndefOrPoison(const Operator *Op, UndefPoisonKind Kind,

This PR uses isValidAddrSpaceCast query from TTI which is available in InstCombine. But TTI isn't available in above canCreateUndefOrPoison function. If we unconditionally return true for addrspacecast in canCreateUndefOrPoison, there is regression in following test probably because canCreateUndefOrPoison returning true for valid addrspacecast is preventing the optimization:

define amdgpu_kernel void @__omp_offloading_fd00_2c00523__ZN11qmcplusplus7ompBLAS9gemv_implIfEEiRiciiT_PKS3_iS5_iS3_PS3_i_l383() {
; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn
; CHECK-LABEL: define {{[^@]+}}@__omp_offloading_fd00_2c00523__ZN11qmcplusplus7ompBLAS9gemv_implIfEEiRiciiT_PKS3_iS5_iS3_PS3_i_l383
; CHECK-SAME: () #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: [[TMP1:%.*]] = alloca [0 x [0 x float]], i32 0, align 8, addrspace(5)
; CHECK-NEXT: [[TMP2:%.*]] = addrspacecast ptr addrspace(5) [[TMP1]] to ptr
; CHECK-NEXT: store ptr [[TMP2]], ptr addrspace(5) [[TMP1]], align 8
; CHECK-NEXT: [[TMP3:%.*]] = call fastcc i32 @__kmpc_nvptx_parallel_reduce_nowait_v2(ptr nofree noundef readonly align 8 captures(none) dereferenceable_or_null(8) [[TMP2]], i1 noundef false)
; CHECK-NEXT: ret void
;
%1 = alloca [0 x [0 x float]], i32 0, align 8, addrspace(5)
%2 = addrspacecast ptr addrspace(5) %1 to ptr
store ptr %2, ptr addrspace(5) %1, align 8
%3 = call fastcc i32 @__kmpc_nvptx_parallel_reduce_nowait_v2(ptr %2, i1 false)
ret void
}

LangRef should also be adjusted in https://llvm.org/docs/LangRef.html#addrspacecast-to-instruction to specify that addrspacecast can introduce poison.

done in ffff2c3, please review

I don't think special handling of addrspacecast in logical and/or->bitwise and/or works. There are many places which call isGuaranteedNotToBeUndefOrPoison. For example, eliminating freeze from freeze ptr (addrspacecast ptr addrspace(5) noundef %x to ptr) is still invalid. That is, I can construct an input to bypass the workaround and cause the miscompilation.
It would be better to default to returning true for addrspacecast in canCreateUndefOrPoison, then add an optional callback for isValidAddrSpaceCast queries.

@@ -12621,6 +12621,9 @@ have no side effects, and must not capture the value of the pointer.
If the source is :ref:`poison <poisonvalues>`, the result is
:ref:`poison <poisonvalues>`.

If the source is not :ref:`poison <poisonvalues>`, and the result pointer is
non-dereferenceable, the result is :ref:`poison <poisonvalues>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect, not every addrspacecast of a non-dereferenceable pointer should automatically result in poison (e.g. the pointer to the end of an object is non-dereferenceable, but should certainly not turn into poison).

This should instead say something along the lines of:

Which address space casts are supported depends on the target. Unsupported address space casts return a poison value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 3128b33, thanks @nikic

…butorAttributes to restore previous behavior in aapointer_info_map_invalidation.ll
@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Jun 20, 2025
@wenju-he
Copy link
Contributor Author

If an invalid addrspacecast can produce poison, then this must be specified in

static bool canCreateUndefOrPoison(const Operator *Op, UndefPoisonKind Kind,

This PR uses isValidAddrSpaceCast query from TTI which is available in InstCombine. But TTI isn't available in above canCreateUndefOrPoison function. If we unconditionally return true for addrspacecast in canCreateUndefOrPoison, there is regression in following test probably because canCreateUndefOrPoison returning true for valid addrspacecast is preventing the optimization:

define amdgpu_kernel void @__omp_offloading_fd00_2c00523__ZN11qmcplusplus7ompBLAS9gemv_implIfEEiRiciiT_PKS3_iS5_iS3_PS3_i_l383() {
; CHECK: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn
; CHECK-LABEL: define {{[^@]+}}@__omp_offloading_fd00_2c00523__ZN11qmcplusplus7ompBLAS9gemv_implIfEEiRiciiT_PKS3_iS5_iS3_PS3_i_l383
; CHECK-SAME: () #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: [[TMP1:%.*]] = alloca [0 x [0 x float]], i32 0, align 8, addrspace(5)
; CHECK-NEXT: [[TMP2:%.*]] = addrspacecast ptr addrspace(5) [[TMP1]] to ptr
; CHECK-NEXT: store ptr [[TMP2]], ptr addrspace(5) [[TMP1]], align 8
; CHECK-NEXT: [[TMP3:%.*]] = call fastcc i32 @__kmpc_nvptx_parallel_reduce_nowait_v2(ptr nofree noundef readonly align 8 captures(none) dereferenceable_or_null(8) [[TMP2]], i1 noundef false)
; CHECK-NEXT: ret void
;
%1 = alloca [0 x [0 x float]], i32 0, align 8, addrspace(5)
%2 = addrspacecast ptr addrspace(5) %1 to ptr
store ptr %2, ptr addrspace(5) %1, align 8
%3 = call fastcc i32 @__kmpc_nvptx_parallel_reduce_nowait_v2(ptr %2, i1 false)
ret void
}

LangRef should also be adjusted in https://llvm.org/docs/LangRef.html#addrspacecast-to-instruction to specify that addrspacecast can introduce poison.

done in ffff2c3, please review

I don't think special handling of addrspacecast in logical and/or->bitwise and/or works. There are many places which call isGuaranteedNotToBeUndefOrPoison. For example, eliminating freeze from freeze ptr (addrspacecast ptr addrspace(5) noundef %x to ptr) is still invalid. That is, I can construct an input to bypass the workaround and cause the miscompilation. It would be better to default to returning true for addrspacecast in canCreateUndefOrPoison, then add an optional callback for isValidAddrSpaceCast queries.

thanks @dtcxzyw I've changed to return true for addrspacecast in canCreateUndefOrPoison in 3128b33. Adding an optional callback looks like the same as querying isValidAddrSpaceCast right after canCreateUndefOrPoison call, so I added isValidAddrSpaceCast query in llvm/lib/Transforms/IPO/AttributorAttributes.cpp to restore previous behavior of llvm/test/Transforms/Attributor/reduced/aapointer_info_map_invalidation.ll. The test is also updated to specify triple so that TTI is available.

Does it make sense to add an optional TTI argument to canCreateUndefOrPoison/isGuaranteedNotToBePoison/isGuaranteedNotToBeUndefOrPoison? If yes, then we don't need to query isValidAddrSpaceCast right after isGuaranteedNotToBeUndefOrPoison call in AttributorAttributes and potentially some other places.

@wenju-he wenju-he requested review from nikic and dtcxzyw June 20, 2025 04:06
@wenju-he wenju-he changed the title [InstCombine] Don't folder select to or if value argument is user of invalid addrspacecast inst [ValueTracking] Return true for AddrSpaceCast in canCreateUndefOrPoison Jun 20, 2025
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jun 20, 2025
if (IRP.getPositionKind() != IRPosition::IRP_RETURNED &&
isGuaranteedNotToBeUndefOrPoison(&Val)) {
(isGuaranteedNotToBeUndefOrPoison(&Val) ||
IsTargetGuaranteedNotPoison(Val))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect, it will not handle poison introduced higher up the chain if it ends in an addrspacecast. It's also sub-optimal, because it will not handle known-valid addrspacecasts higher up the chain. This needs to be handled fully inside canCreateUndefOrPoison.

@nikic
Copy link
Contributor

nikic commented Jun 20, 2025

For the purposes of this PR, I think you should only change canCreateUndefOrPoison to return true and just eat the regressions. We can follow up with using isValidAddrSpaceCast() in a followup, because it will be less straightforward. I'm not willing to accept direct use of TTI in ValueTracking without laundering it through an abstraction first (see https://discourse.llvm.org/t/constant-propagation-for-target-specific-intrinsics/85881/5).

@wenju-he
Copy link
Contributor Author

For the purposes of this PR, I think you should only change canCreateUndefOrPoison to return true and just eat the regressions. We can follow up with using isValidAddrSpaceCast() in a followup, because it will be less straightforward. I'm not willing to accept direct use of TTI in ValueTracking without laundering it through an abstraction first (see https://discourse.llvm.org/t/constant-propagation-for-target-specific-intrinsics/85881/5).

thanks @nikic
Shall I file an issue for the task of using isValidAddrSpaceCast()?

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wenju-he
Copy link
Contributor Author

For the purposes of this PR, I think you should only change canCreateUndefOrPoison to return true and just eat the regressions.

done. Please let me know if we should add FIXME for the regression in clang/test/CodeGenOpenCL/amdgcn-buffer-rsrc-type.cl and llvm/test/Transforms/Attributor/reduced/aapointer_info_map_invalidation.ll

The change in clang/test/CodeGenOpenCL/as_type.cl should not be a regression since spir isn't a concrete target.

; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes=instcombine %s | FileCheck %s

; Check that `select B, true, C` isn't optimized to `or B, C`,
; because the invalid addrspacecast %asc.shared may introduce poison.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description is slightly inaccurate. In this case it will always be poison

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description is slightly inaccurate. In this case it will always be poison

done

@wenju-he wenju-he requested a review from arsenm June 23, 2025 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang Clang issues not falling into any other category llvm:analysis Includes value tracking, cost tables and constant folding llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:ir llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants