Skip to content

[InlineCost] Simplify extractvalue across callsite #145054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tobias-stadler
Copy link
Contributor

@tobias-stadler tobias-stadler commented Jun 20, 2025

Motivation: When using libc++, std::bitset<64>::count() doesn't optimize to a single popcount instruction on AArch64, because we fail to inline the library code completely. Inlining fails, because the internal bit_iterator struct is passed as a [2 x i64] %arg value on AArch64. The value is built using insertvalue instructions and only one of the array entries is constant. If we know that this entry is constant, we can prove that half the function becomes dead. However, InlineCost only considers operands for simplification if they are Constants, which %arg is not. Without this simplification the function is too expensive to inline.

Therefore, we had to teach InlineCost to support simplifying values which aren't Constants and which may originate from the caller function (PR #145083). Now, we use this for extractvalue, so it can be simplified against insertvalues from the caller function. This is enough to get bitset::count fully optimized.

There are similar opportunities we can explore for BinOps in the future (e.g. cmp eq %arg1, %arg2 when the caller passes the same value into both arguments), but we need to be careful here, because InstSimplify isn't completely safe to use with operands owned by different functions.

@tobias-stadler tobias-stadler requested review from nikic and fhahn June 20, 2025 15:33
@llvmbot llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jun 20, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 20, 2025

@llvm/pr-subscribers-llvm-analysis

Author: Tobias Stadler (tobias-stadler)

Changes

Motivation: When using libc++, std::bitset&lt;64&gt;::count() doesn't optimize to a single popcount instruction on AArch64, because we fail to inline the library code completely. Inlining fails, because the internal bit_iterator struct is passed as a [2 x i64] %arg value on AArch64. The value is built using insertvalue instructions and only one of the array entries is constant. If we know that this entry is constant, we can prove that half the function becomes dead. However, InlineCost only considers operands for simplification if they are Constants, which %arg is not. Without this simplification the function is too expensive to inline.

Therefore, we need to teach InlineCost to simplify values which aren't Constants and which may originate from the caller function.

As a first step, we enable this for extractvalue, so it can be simplified against insertvalues from the caller function. This is enough to get bitset::count fully optimized.

There are similar opportunities we can explore for BinOps in the future (e.g. cmp eq %arg1, %arg2 when the caller passes the same value into both arguments), but we need to be careful here, because InstSimplify isn't completely safe to use with operands owned by different functions.


Full diff: https://github.com/llvm/llvm-project/pull/145054.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/InlineCost.cpp (+63-28)
  • (added) llvm/test/Transforms/Inline/simplify-crosscallsite.ll (+44)
diff --git a/llvm/lib/Analysis/InlineCost.cpp b/llvm/lib/Analysis/InlineCost.cpp
index 7bd1f18004580..dd3ea1b437d14 100644
--- a/llvm/lib/Analysis/InlineCost.cpp
+++ b/llvm/lib/Analysis/InlineCost.cpp
@@ -391,7 +391,8 @@ class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {
   /// likely simplifications post-inlining. The most important aspect we track
   /// is CFG altering simplifications -- when we prove a basic block dead, that
   /// can cause dramatic shifts in the cost of inlining a function.
-  DenseMap<Value *, Constant *> SimplifiedValues;
+  /// Note: The simplified Value may be owned by the caller function.
+  DenseMap<Value *, Value *> SimplifiedValues;
 
   /// Keep track of the values which map back (through function arguments) to
   /// allocas on the caller stack which could be simplified through SROA.
@@ -432,7 +433,7 @@ class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {
   template <typename T> T *getDirectOrSimplifiedValue(Value *V) const {
     if (auto *Direct = dyn_cast<T>(V))
       return Direct;
-    return dyn_cast_if_present<T>(SimplifiedValues.lookup(V));
+    return getSimplifiedValue<T>(V);
   }
 
   // Custom simplification helper routines.
@@ -525,11 +526,33 @@ class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {
 
   InlineResult analyze();
 
-  std::optional<Constant *> getSimplifiedValue(Instruction *I) {
-    auto It = SimplifiedValues.find(I);
-    if (It != SimplifiedValues.end())
-      return It->second;
-    return std::nullopt;
+  // Lookup simplified Value. May return a value owned by the caller.
+  Value *getSimplifiedValueUnchecked(Value *V) const {
+    return SimplifiedValues.lookup(V);
+  }
+
+  // Lookup simplified Value, but return nullptr if the simplified value is
+  // owned by the caller.
+  template <typename T> T *getSimplifiedValue(Value *V) const {
+    Value *SimpleV = SimplifiedValues.lookup(V);
+    if (!SimpleV)
+      return nullptr;
+
+    // Skip checks if we know T is a global. This has a small, but measurable
+    // impact on compile-time.
+    if constexpr (std::is_base_of_v<Constant, T>)
+      return dyn_cast<T>(SimpleV);
+
+    // Make sure the simplified Value is owned by this function
+    if (auto *I = dyn_cast<Instruction>(SimpleV)) {
+      if (I->getFunction() != &F)
+        return nullptr;
+    } else if (auto *Arg = dyn_cast<Argument>(SimpleV)) {
+      if (Arg->getParent() != &F)
+        return nullptr;
+    } else if (!isa<Constant>(SimpleV))
+      return nullptr;
+    return dyn_cast<T>(SimpleV);
   }
 
   // Keep a bunch of stats about the cost savings found so we can print them
@@ -921,12 +944,11 @@ class InlineCostCallAnalyzer final : public CallAnalyzer {
         if (BranchInst *BI = dyn_cast<BranchInst>(&I)) {
           // Count a conditional branch as savings if it becomes unconditional.
           if (BI->isConditional() &&
-              isa_and_nonnull<ConstantInt>(
-                  SimplifiedValues.lookup(BI->getCondition()))) {
+              getSimplifiedValue<ConstantInt>(BI->getCondition())) {
             CurrentSavings += InstrCost;
           }
         } else if (SwitchInst *SI = dyn_cast<SwitchInst>(&I)) {
-          if (isa_and_present<ConstantInt>(SimplifiedValues.lookup(SI->getCondition())))
+          if (getSimplifiedValue<ConstantInt>(SI->getCondition()))
             CurrentSavings += InstrCost;
         } else if (Value *V = dyn_cast<Value>(&I)) {
           // Count an instruction as savings if we can fold it.
@@ -1423,10 +1445,17 @@ void InlineCostAnnotationWriter::emitInstructionAnnot(
     if (Record->hasThresholdChanged())
       OS << ", threshold delta = " << Record->getThresholdDelta();
   }
-  auto C = ICCA->getSimplifiedValue(const_cast<Instruction *>(I));
-  if (C) {
+  auto *V = ICCA->getSimplifiedValueUnchecked(const_cast<Instruction *>(I));
+  if (V) {
     OS << ", simplified to ";
-    (*C)->print(OS, true);
+    V->print(OS, true);
+    if (auto *VI = dyn_cast<Instruction>(V)) {
+      if (VI->getFunction() != I->getFunction())
+        OS << " (caller instruction)";
+    } else if (auto *VArg = dyn_cast<Argument>(V)) {
+      if (VArg->getParent() != I->getFunction())
+        OS << " (caller argument)";
+    }
   }
   OS << "\n";
 }
@@ -1483,7 +1512,7 @@ bool CallAnalyzer::isGEPFree(GetElementPtrInst &GEP) {
   SmallVector<Value *, 4> Operands;
   Operands.push_back(GEP.getOperand(0));
   for (const Use &Op : GEP.indices())
-    if (Constant *SimpleOp = SimplifiedValues.lookup(Op))
+    if (Constant *SimpleOp = getSimplifiedValue<Constant>(Op))
       Operands.push_back(SimpleOp);
     else
       Operands.push_back(Op);
@@ -1498,7 +1527,7 @@ bool CallAnalyzer::visitAlloca(AllocaInst &I) {
   // Check whether inlining will turn a dynamic alloca into a static
   // alloca and handle that case.
   if (I.isArrayAllocation()) {
-    Constant *Size = SimplifiedValues.lookup(I.getArraySize());
+    Constant *Size = getSimplifiedValue<Constant>(I.getArraySize());
     if (auto *AllocSize = dyn_cast_or_null<ConstantInt>(Size)) {
       // Sometimes a dynamic alloca could be converted into a static alloca
       // after this constant prop, and become a huge static alloca on an
@@ -2287,9 +2316,18 @@ bool CallAnalyzer::visitStore(StoreInst &I) {
 }
 
 bool CallAnalyzer::visitExtractValue(ExtractValueInst &I) {
-  // Constant folding for extract value is trivial.
-  if (simplifyInstruction(I))
-    return true;
+  Value *Op = I.getAggregateOperand();
+
+  // Special handling, because we want to simplify extractvalue with a
+  // potential insertvalue from the caller.
+  if (Value *SimpleOp = getSimplifiedValueUnchecked(Op)) {
+    SimplifyQuery SQ(DL);
+    Value *SimpleV = simplifyExtractValueInst(SimpleOp, I.getIndices(), SQ);
+    if (SimpleV) {
+      SimplifiedValues[&I] = SimpleV;
+      return true;
+    }
+  }
 
   // SROA can't look through these, but they may be free.
   return Base::visitExtractValue(I);
@@ -2388,7 +2426,7 @@ bool CallAnalyzer::visitCallBase(CallBase &Call) {
     // Check if this happens to be an indirect function call to a known function
     // in this inline context. If not, we've done all we can.
     Value *Callee = Call.getCalledOperand();
-    F = dyn_cast_or_null<Function>(SimplifiedValues.lookup(Callee));
+    F = getSimplifiedValue<Function>(Callee);
     if (!F || F->getFunctionType() != Call.getFunctionType()) {
       onCallArgumentSetup(Call);
 
@@ -2483,8 +2521,7 @@ bool CallAnalyzer::visitSelectInst(SelectInst &SI) {
 
   Constant *TrueC = getDirectOrSimplifiedValue<Constant>(TrueVal);
   Constant *FalseC = getDirectOrSimplifiedValue<Constant>(FalseVal);
-  Constant *CondC =
-      dyn_cast_or_null<Constant>(SimplifiedValues.lookup(SI.getCondition()));
+  Constant *CondC = getSimplifiedValue<Constant>(SI.getCondition());
 
   if (!CondC) {
     // Select C, X, X => X
@@ -2833,8 +2870,9 @@ InlineResult CallAnalyzer::analyze() {
   auto CAI = CandidateCall.arg_begin();
   for (Argument &FAI : F.args()) {
     assert(CAI != CandidateCall.arg_end());
-    if (Constant *C = dyn_cast<Constant>(CAI))
-      SimplifiedValues[&FAI] = C;
+    SimplifiedValues[&FAI] = *CAI;
+    if (isa<Constant>(*CAI))
+      ++NumConstantArgs;
 
     Value *PtrArg = *CAI;
     if (ConstantInt *C = stripAndComputeInBoundsConstantOffsets(PtrArg)) {
@@ -2849,7 +2887,6 @@ InlineResult CallAnalyzer::analyze() {
     }
     ++CAI;
   }
-  NumConstantArgs = SimplifiedValues.size();
   NumConstantOffsetPtrArgs = ConstantOffsetPtrs.size();
   NumAllocaArgs = SROAArgValues.size();
 
@@ -2911,8 +2948,7 @@ InlineResult CallAnalyzer::analyze() {
     if (BranchInst *BI = dyn_cast<BranchInst>(TI)) {
       if (BI->isConditional()) {
         Value *Cond = BI->getCondition();
-        if (ConstantInt *SimpleCond =
-                dyn_cast_or_null<ConstantInt>(SimplifiedValues.lookup(Cond))) {
+        if (ConstantInt *SimpleCond = getSimplifiedValue<ConstantInt>(Cond)) {
           BasicBlock *NextBB = BI->getSuccessor(SimpleCond->isZero() ? 1 : 0);
           BBWorklist.insert(NextBB);
           KnownSuccessors[BB] = NextBB;
@@ -2922,8 +2958,7 @@ InlineResult CallAnalyzer::analyze() {
       }
     } else if (SwitchInst *SI = dyn_cast<SwitchInst>(TI)) {
       Value *Cond = SI->getCondition();
-      if (ConstantInt *SimpleCond =
-              dyn_cast_or_null<ConstantInt>(SimplifiedValues.lookup(Cond))) {
+      if (ConstantInt *SimpleCond = getSimplifiedValue<ConstantInt>(Cond)) {
         BasicBlock *NextBB = SI->findCaseValue(SimpleCond)->getCaseSuccessor();
         BBWorklist.insert(NextBB);
         KnownSuccessors[BB] = NextBB;
diff --git a/llvm/test/Transforms/Inline/simplify-crosscallsite.ll b/llvm/test/Transforms/Inline/simplify-crosscallsite.ll
new file mode 100644
index 0000000000000..23cee55092a99
--- /dev/null
+++ b/llvm/test/Transforms/Inline/simplify-crosscallsite.ll
@@ -0,0 +1,44 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -S -passes=inline -inline-threshold=0 | FileCheck %s
+; RUN: opt < %s -S -passes='cgscc(inline)' -inline-threshold=0 | FileCheck %s
+
+declare void @foo()
+
+define i32 @callee([2 x i32] %agg) {
+; CHECK-LABEL: define i32 @callee(
+; CHECK-SAME: [2 x i32] [[AGG:%.*]]) {
+; CHECK-NEXT:    [[V:%.*]] = extractvalue [2 x i32] [[AGG]], 0
+; CHECK-NEXT:    [[C:%.*]] = icmp eq i32 [[V]], 0
+; CHECK-NEXT:    br i1 [[C]], label %[[IS_NULL:.*]], label %[[NON_NULL:.*]]
+; CHECK:       [[IS_NULL]]:
+; CHECK-NEXT:    ret i32 0
+; CHECK:       [[NON_NULL]]:
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    ret i32 1
+;
+  %v = extractvalue [2 x i32] %agg, 0
+  %c = icmp eq i32 %v, 0
+  br i1 %c, label %is_null, label %non_null
+
+is_null:
+  ret i32 0
+
+non_null:
+  call void @foo()
+  call void @foo()
+  ret i32 1
+}
+
+define i32 @caller(i32 %arg) {
+; CHECK-LABEL: define i32 @caller(
+; CHECK-SAME: i32 [[ARG:%.*]]) {
+; CHECK-NEXT:    [[AGG0:%.*]] = insertvalue [2 x i32] poison, i32 0, 0
+; CHECK-NEXT:    [[AGG1:%.*]] = insertvalue [2 x i32] [[AGG0]], i32 [[ARG]], 1
+; CHECK-NEXT:    ret i32 0
+;
+  %agg0 = insertvalue [2 x i32] poison, i32 0, 0
+  %agg1 = insertvalue [2 x i32] %agg0, i32 %arg, 1
+  %v = call i32 @callee([2 x i32] %agg1)
+  ret i32 %v
+}

@llvmbot
Copy link
Member

llvmbot commented Jun 20, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Tobias Stadler (tobias-stadler)

Changes

Motivation: When using libc++, std::bitset&lt;64&gt;::count() doesn't optimize to a single popcount instruction on AArch64, because we fail to inline the library code completely. Inlining fails, because the internal bit_iterator struct is passed as a [2 x i64] %arg value on AArch64. The value is built using insertvalue instructions and only one of the array entries is constant. If we know that this entry is constant, we can prove that half the function becomes dead. However, InlineCost only considers operands for simplification if they are Constants, which %arg is not. Without this simplification the function is too expensive to inline.

Therefore, we need to teach InlineCost to simplify values which aren't Constants and which may originate from the caller function.

As a first step, we enable this for extractvalue, so it can be simplified against insertvalues from the caller function. This is enough to get bitset::count fully optimized.

There are similar opportunities we can explore for BinOps in the future (e.g. cmp eq %arg1, %arg2 when the caller passes the same value into both arguments), but we need to be careful here, because InstSimplify isn't completely safe to use with operands owned by different functions.


Full diff: https://github.com/llvm/llvm-project/pull/145054.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/InlineCost.cpp (+63-28)
  • (added) llvm/test/Transforms/Inline/simplify-crosscallsite.ll (+44)
diff --git a/llvm/lib/Analysis/InlineCost.cpp b/llvm/lib/Analysis/InlineCost.cpp
index 7bd1f18004580..dd3ea1b437d14 100644
--- a/llvm/lib/Analysis/InlineCost.cpp
+++ b/llvm/lib/Analysis/InlineCost.cpp
@@ -391,7 +391,8 @@ class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {
   /// likely simplifications post-inlining. The most important aspect we track
   /// is CFG altering simplifications -- when we prove a basic block dead, that
   /// can cause dramatic shifts in the cost of inlining a function.
-  DenseMap<Value *, Constant *> SimplifiedValues;
+  /// Note: The simplified Value may be owned by the caller function.
+  DenseMap<Value *, Value *> SimplifiedValues;
 
   /// Keep track of the values which map back (through function arguments) to
   /// allocas on the caller stack which could be simplified through SROA.
@@ -432,7 +433,7 @@ class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {
   template <typename T> T *getDirectOrSimplifiedValue(Value *V) const {
     if (auto *Direct = dyn_cast<T>(V))
       return Direct;
-    return dyn_cast_if_present<T>(SimplifiedValues.lookup(V));
+    return getSimplifiedValue<T>(V);
   }
 
   // Custom simplification helper routines.
@@ -525,11 +526,33 @@ class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {
 
   InlineResult analyze();
 
-  std::optional<Constant *> getSimplifiedValue(Instruction *I) {
-    auto It = SimplifiedValues.find(I);
-    if (It != SimplifiedValues.end())
-      return It->second;
-    return std::nullopt;
+  // Lookup simplified Value. May return a value owned by the caller.
+  Value *getSimplifiedValueUnchecked(Value *V) const {
+    return SimplifiedValues.lookup(V);
+  }
+
+  // Lookup simplified Value, but return nullptr if the simplified value is
+  // owned by the caller.
+  template <typename T> T *getSimplifiedValue(Value *V) const {
+    Value *SimpleV = SimplifiedValues.lookup(V);
+    if (!SimpleV)
+      return nullptr;
+
+    // Skip checks if we know T is a global. This has a small, but measurable
+    // impact on compile-time.
+    if constexpr (std::is_base_of_v<Constant, T>)
+      return dyn_cast<T>(SimpleV);
+
+    // Make sure the simplified Value is owned by this function
+    if (auto *I = dyn_cast<Instruction>(SimpleV)) {
+      if (I->getFunction() != &F)
+        return nullptr;
+    } else if (auto *Arg = dyn_cast<Argument>(SimpleV)) {
+      if (Arg->getParent() != &F)
+        return nullptr;
+    } else if (!isa<Constant>(SimpleV))
+      return nullptr;
+    return dyn_cast<T>(SimpleV);
   }
 
   // Keep a bunch of stats about the cost savings found so we can print them
@@ -921,12 +944,11 @@ class InlineCostCallAnalyzer final : public CallAnalyzer {
         if (BranchInst *BI = dyn_cast<BranchInst>(&I)) {
           // Count a conditional branch as savings if it becomes unconditional.
           if (BI->isConditional() &&
-              isa_and_nonnull<ConstantInt>(
-                  SimplifiedValues.lookup(BI->getCondition()))) {
+              getSimplifiedValue<ConstantInt>(BI->getCondition())) {
             CurrentSavings += InstrCost;
           }
         } else if (SwitchInst *SI = dyn_cast<SwitchInst>(&I)) {
-          if (isa_and_present<ConstantInt>(SimplifiedValues.lookup(SI->getCondition())))
+          if (getSimplifiedValue<ConstantInt>(SI->getCondition()))
             CurrentSavings += InstrCost;
         } else if (Value *V = dyn_cast<Value>(&I)) {
           // Count an instruction as savings if we can fold it.
@@ -1423,10 +1445,17 @@ void InlineCostAnnotationWriter::emitInstructionAnnot(
     if (Record->hasThresholdChanged())
       OS << ", threshold delta = " << Record->getThresholdDelta();
   }
-  auto C = ICCA->getSimplifiedValue(const_cast<Instruction *>(I));
-  if (C) {
+  auto *V = ICCA->getSimplifiedValueUnchecked(const_cast<Instruction *>(I));
+  if (V) {
     OS << ", simplified to ";
-    (*C)->print(OS, true);
+    V->print(OS, true);
+    if (auto *VI = dyn_cast<Instruction>(V)) {
+      if (VI->getFunction() != I->getFunction())
+        OS << " (caller instruction)";
+    } else if (auto *VArg = dyn_cast<Argument>(V)) {
+      if (VArg->getParent() != I->getFunction())
+        OS << " (caller argument)";
+    }
   }
   OS << "\n";
 }
@@ -1483,7 +1512,7 @@ bool CallAnalyzer::isGEPFree(GetElementPtrInst &GEP) {
   SmallVector<Value *, 4> Operands;
   Operands.push_back(GEP.getOperand(0));
   for (const Use &Op : GEP.indices())
-    if (Constant *SimpleOp = SimplifiedValues.lookup(Op))
+    if (Constant *SimpleOp = getSimplifiedValue<Constant>(Op))
       Operands.push_back(SimpleOp);
     else
       Operands.push_back(Op);
@@ -1498,7 +1527,7 @@ bool CallAnalyzer::visitAlloca(AllocaInst &I) {
   // Check whether inlining will turn a dynamic alloca into a static
   // alloca and handle that case.
   if (I.isArrayAllocation()) {
-    Constant *Size = SimplifiedValues.lookup(I.getArraySize());
+    Constant *Size = getSimplifiedValue<Constant>(I.getArraySize());
     if (auto *AllocSize = dyn_cast_or_null<ConstantInt>(Size)) {
       // Sometimes a dynamic alloca could be converted into a static alloca
       // after this constant prop, and become a huge static alloca on an
@@ -2287,9 +2316,18 @@ bool CallAnalyzer::visitStore(StoreInst &I) {
 }
 
 bool CallAnalyzer::visitExtractValue(ExtractValueInst &I) {
-  // Constant folding for extract value is trivial.
-  if (simplifyInstruction(I))
-    return true;
+  Value *Op = I.getAggregateOperand();
+
+  // Special handling, because we want to simplify extractvalue with a
+  // potential insertvalue from the caller.
+  if (Value *SimpleOp = getSimplifiedValueUnchecked(Op)) {
+    SimplifyQuery SQ(DL);
+    Value *SimpleV = simplifyExtractValueInst(SimpleOp, I.getIndices(), SQ);
+    if (SimpleV) {
+      SimplifiedValues[&I] = SimpleV;
+      return true;
+    }
+  }
 
   // SROA can't look through these, but they may be free.
   return Base::visitExtractValue(I);
@@ -2388,7 +2426,7 @@ bool CallAnalyzer::visitCallBase(CallBase &Call) {
     // Check if this happens to be an indirect function call to a known function
     // in this inline context. If not, we've done all we can.
     Value *Callee = Call.getCalledOperand();
-    F = dyn_cast_or_null<Function>(SimplifiedValues.lookup(Callee));
+    F = getSimplifiedValue<Function>(Callee);
     if (!F || F->getFunctionType() != Call.getFunctionType()) {
       onCallArgumentSetup(Call);
 
@@ -2483,8 +2521,7 @@ bool CallAnalyzer::visitSelectInst(SelectInst &SI) {
 
   Constant *TrueC = getDirectOrSimplifiedValue<Constant>(TrueVal);
   Constant *FalseC = getDirectOrSimplifiedValue<Constant>(FalseVal);
-  Constant *CondC =
-      dyn_cast_or_null<Constant>(SimplifiedValues.lookup(SI.getCondition()));
+  Constant *CondC = getSimplifiedValue<Constant>(SI.getCondition());
 
   if (!CondC) {
     // Select C, X, X => X
@@ -2833,8 +2870,9 @@ InlineResult CallAnalyzer::analyze() {
   auto CAI = CandidateCall.arg_begin();
   for (Argument &FAI : F.args()) {
     assert(CAI != CandidateCall.arg_end());
-    if (Constant *C = dyn_cast<Constant>(CAI))
-      SimplifiedValues[&FAI] = C;
+    SimplifiedValues[&FAI] = *CAI;
+    if (isa<Constant>(*CAI))
+      ++NumConstantArgs;
 
     Value *PtrArg = *CAI;
     if (ConstantInt *C = stripAndComputeInBoundsConstantOffsets(PtrArg)) {
@@ -2849,7 +2887,6 @@ InlineResult CallAnalyzer::analyze() {
     }
     ++CAI;
   }
-  NumConstantArgs = SimplifiedValues.size();
   NumConstantOffsetPtrArgs = ConstantOffsetPtrs.size();
   NumAllocaArgs = SROAArgValues.size();
 
@@ -2911,8 +2948,7 @@ InlineResult CallAnalyzer::analyze() {
     if (BranchInst *BI = dyn_cast<BranchInst>(TI)) {
       if (BI->isConditional()) {
         Value *Cond = BI->getCondition();
-        if (ConstantInt *SimpleCond =
-                dyn_cast_or_null<ConstantInt>(SimplifiedValues.lookup(Cond))) {
+        if (ConstantInt *SimpleCond = getSimplifiedValue<ConstantInt>(Cond)) {
           BasicBlock *NextBB = BI->getSuccessor(SimpleCond->isZero() ? 1 : 0);
           BBWorklist.insert(NextBB);
           KnownSuccessors[BB] = NextBB;
@@ -2922,8 +2958,7 @@ InlineResult CallAnalyzer::analyze() {
       }
     } else if (SwitchInst *SI = dyn_cast<SwitchInst>(TI)) {
       Value *Cond = SI->getCondition();
-      if (ConstantInt *SimpleCond =
-              dyn_cast_or_null<ConstantInt>(SimplifiedValues.lookup(Cond))) {
+      if (ConstantInt *SimpleCond = getSimplifiedValue<ConstantInt>(Cond)) {
         BasicBlock *NextBB = SI->findCaseValue(SimpleCond)->getCaseSuccessor();
         BBWorklist.insert(NextBB);
         KnownSuccessors[BB] = NextBB;
diff --git a/llvm/test/Transforms/Inline/simplify-crosscallsite.ll b/llvm/test/Transforms/Inline/simplify-crosscallsite.ll
new file mode 100644
index 0000000000000..23cee55092a99
--- /dev/null
+++ b/llvm/test/Transforms/Inline/simplify-crosscallsite.ll
@@ -0,0 +1,44 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -S -passes=inline -inline-threshold=0 | FileCheck %s
+; RUN: opt < %s -S -passes='cgscc(inline)' -inline-threshold=0 | FileCheck %s
+
+declare void @foo()
+
+define i32 @callee([2 x i32] %agg) {
+; CHECK-LABEL: define i32 @callee(
+; CHECK-SAME: [2 x i32] [[AGG:%.*]]) {
+; CHECK-NEXT:    [[V:%.*]] = extractvalue [2 x i32] [[AGG]], 0
+; CHECK-NEXT:    [[C:%.*]] = icmp eq i32 [[V]], 0
+; CHECK-NEXT:    br i1 [[C]], label %[[IS_NULL:.*]], label %[[NON_NULL:.*]]
+; CHECK:       [[IS_NULL]]:
+; CHECK-NEXT:    ret i32 0
+; CHECK:       [[NON_NULL]]:
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    ret i32 1
+;
+  %v = extractvalue [2 x i32] %agg, 0
+  %c = icmp eq i32 %v, 0
+  br i1 %c, label %is_null, label %non_null
+
+is_null:
+  ret i32 0
+
+non_null:
+  call void @foo()
+  call void @foo()
+  ret i32 1
+}
+
+define i32 @caller(i32 %arg) {
+; CHECK-LABEL: define i32 @caller(
+; CHECK-SAME: i32 [[ARG:%.*]]) {
+; CHECK-NEXT:    [[AGG0:%.*]] = insertvalue [2 x i32] poison, i32 0, 0
+; CHECK-NEXT:    [[AGG1:%.*]] = insertvalue [2 x i32] [[AGG0]], i32 [[ARG]], 1
+; CHECK-NEXT:    ret i32 0
+;
+  %agg0 = insertvalue [2 x i32] poison, i32 0, 0
+  %agg1 = insertvalue [2 x i32] %agg0, i32 %arg, 1
+  %v = call i32 @callee([2 x i32] %agg1)
+  ret i32 %v
+}

@tobias-stadler
Copy link
Contributor Author


// Special handling, because we want to simplify extractvalue with a
// potential insertvalue from the caller.
if (Value *SimpleOp = getSimplifiedValueUnchecked(Op)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to split off the change from constant -> constant and then improve extract value handling in a separate pqtch

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the test, this looks good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test also passes without your change. You could use a recursive call to prove that the recursive code path is eliminated. Alternative could check debug output.

Copy link
Contributor Author

@tobias-stadler tobias-stadler Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, x86 has lower cost than AArch64 here. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also add a negative test where extractvalue does not simplify (e.g. same as current but swap 0 and 1).

tobias-stadler added a commit that referenced this pull request Jun 20, 2025
Allow mapping callee Values to arbitrary (non-Constant) simplified
values. The simplified values can also originate from the caller. This
enables us to simplify instructions in the callee with instructions from
the caller.

The first use case for this is simplifying extractvalues (#145054).
@tobias-stadler tobias-stadler force-pushed the inline-simplify-callerinst-pr branch from da859c7 to 875e3d9 Compare June 23, 2025 14:33
@tobias-stadler tobias-stadler changed the base branch from main to users/tobias-stadler/inline-arbval-pr June 23, 2025 14:34
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also add a negative test where extractvalue does not simplify (e.g. same as current but swap 0 and 1).

tobias-stadler added a commit that referenced this pull request Jun 23, 2025
Allow mapping callee Values to arbitrary (non-Constant) simplified
values. The simplified values can also originate from the caller. This
enables us to simplify instructions in the callee with instructions from
the caller.

The first use case for this is simplifying extractvalues (PR #145054).
@tobias-stadler tobias-stadler deleted the branch llvm:main June 23, 2025 15:30
@tobias-stadler tobias-stadler changed the base branch from users/tobias-stadler/inline-arbval-pr to main June 23, 2025 16:10
miguelcsx pushed a commit to miguelcsx/llvm-project that referenced this pull request Jun 23, 2025
…5083)

Allow mapping callee Values to arbitrary (non-Constant) simplified
values. The simplified values can also originate from the caller. This
enables us to simplify instructions in the callee with instructions from
the caller.

The first use case for this is simplifying extractvalues (PR llvm#145054).
@tobias-stadler tobias-stadler force-pushed the inline-simplify-callerinst-pr branch from 875e3d9 to 0767ab9 Compare June 23, 2025 17:22
Jaddyen pushed a commit to Jaddyen/llvm-project that referenced this pull request Jun 23, 2025
…5083)

Allow mapping callee Values to arbitrary (non-Constant) simplified
values. The simplified values can also originate from the caller. This
enables us to simplify instructions in the callee with instructions from
the caller.

The first use case for this is simplifying extractvalues (PR llvm#145054).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants