Skip to content

[ICP] Add a few tunings to indirect-call-promotion #149892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

xur-llvm
Copy link
Contributor

@xur-llvm xur-llvm commented Jul 21, 2025

[ICP] Add a few tunings to indirect-call-promtion

Indirect-call promotion (ICP) has been adjusted with the following tunings.
(1) Candidate functions can be now ICP'd even if only a declaration is present.
(2) All non-cold candidate functions are now considered by ICP. Previously, only hot targets were considered.
(3) If one target cannot be ICP'd, proceed with the remaining targets instead of exiting the callsite.

This update hides all tunings under internal options and disables them by default. They'll be enabled in a later update. There'll also be another update to address the "not found" issue with indirect targets.

@llvmbot llvmbot added PGO Profile Guided Optimizations LTO Link time optimization (regular/full LTO or ThinLTO) llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jul 21, 2025
@xur-llvm xur-llvm requested a review from teresajohnson July 21, 2025 20:10
@llvmbot
Copy link
Member

llvmbot commented Jul 21, 2025

@llvm/pr-subscribers-llvm-transforms

Author: None (xur-llvm)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/149892.diff

3 Files Affected:

  • (modified) llvm/lib/Analysis/ProfileSummaryInfo.cpp (+12-2)
  • (modified) llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp (+34-3)
  • (modified) llvm/test/ThinLTO/X86/memprof-icp.ll (+1)
diff --git a/llvm/lib/Analysis/ProfileSummaryInfo.cpp b/llvm/lib/Analysis/ProfileSummaryInfo.cpp
index e8d4e37a4eb7e..7f361f47576d3 100644
--- a/llvm/lib/Analysis/ProfileSummaryInfo.cpp
+++ b/llvm/lib/Analysis/ProfileSummaryInfo.cpp
@@ -121,8 +121,18 @@ void ProfileSummaryInfo::computeThresholds() {
       ProfileSummaryBuilder::getHotCountThreshold(DetailedSummary);
   ColdCountThreshold =
       ProfileSummaryBuilder::getColdCountThreshold(DetailedSummary);
-  assert(ColdCountThreshold <= HotCountThreshold &&
-         "Cold count threshold cannot exceed hot count threshold!");
+  // When the hot and cold thresholds are identical, we would classify
+  // a count value as both hot and cold since we are doing an inclusitve check
+  // (see ::is{Hot|Cold}Count(). To avoid this undesirable overlap, ensure the
+  // thresholds are distinct.
+  if (HotCountThreshold == ColdCountThreshold) {
+    if (ColdCountThreshold > 0)
+      (*ColdCountThreshold)--;
+    else
+      (*HotCountThreshold)++;
+  }
+  assert(ColdCountThreshold < HotCountThreshold &&
+         "Cold count threshold should be less than hot count threshold!");
   if (!hasPartialSampleProfile() || !ScalePartialSampleProfileWorkingSetSize) {
     HasHugeWorkingSetSize =
         HotEntry.NumCounts > ProfileSummaryHugeWorkingSetSizeThreshold;
diff --git a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
index 854db0ff6864c..eafac58f0580a 100644
--- a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
+++ b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
@@ -80,6 +80,18 @@ static cl::opt<unsigned>
     ICPCSSkip("icp-csskip", cl::init(0), cl::Hidden,
               cl::desc("Skip Callsite up to this number for this compilation"));
 
+// ICP the candidate function even when only a declaration is present.
+static cl::opt<bool>
+    ICPAllowDeclOnly("icp-allow-decl-only", cl::init(true), cl::Hidden,
+                cl::desc("Promote the target candidate even when the defintion "
+                         " is not available"));
+
+// ICP all non-cold candidate functions. When it's false, only ICP hot functions.
+static cl::opt<bool>
+    ICPAllowWarmFunc("icp-allow-warm-func", cl::init(true), cl::Hidden,
+                cl::desc("Promote the target candidate even if it is not "
+                         "hot"));
+
 // Set if the pass is called in LTO optimization. The difference for LTO mode
 // is the pass won't prefix the source module name to the internal linkage
 // symbols.
@@ -477,7 +489,7 @@ IndirectCallPromoter::getPromotionCandidatesForCallSite(
     // the case where the symbol is globally dead in the binary and removed by
     // ThinLTO.
     Function *TargetFunction = Symtab->getFunction(Target);
-    if (TargetFunction == nullptr || TargetFunction->isDeclaration()) {
+    if (TargetFunction == nullptr) {
       LLVM_DEBUG(dbgs() << " Not promote: Cannot find the target\n");
       ORE.emit([&]() {
         return OptimizationRemarkMissed(DEBUG_TYPE, "UnableToFindTarget", &CB)
@@ -486,6 +498,16 @@ IndirectCallPromoter::getPromotionCandidatesForCallSite(
       });
       break;
     }
+    if (!ICPAllowDeclOnly && TargetFunction->isDeclaration()) {
+      LLVM_DEBUG(dbgs() << " Not promote: target definition is not available\n");
+      ORE.emit([&]() {
+        return OptimizationRemarkMissed(DEBUG_TYPE, "NoTargetDef", &CB)
+               << "Do not promote indirect call: target with md5sum "
+               << ore::NV("target md5sum", Target)
+               << " definition not available";
+      });
+      break;
+    }
 
     const char *Reason = nullptr;
     if (!isLegalToPromote(CB, TargetFunction, &Reason)) {
@@ -822,9 +844,18 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
     uint64_t TotalCount;
     auto ICallProfDataRef = ICallAnalysis.getPromotionCandidatesForInstruction(
         CB, TotalCount, NumCandidates);
-    if (!NumCandidates ||
-        (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
+    if (!NumCandidates)
       continue;
+    if (PSI && PSI->hasProfileSummary()) {
+      // Don't perform if ICPAllowWarmFunc is true AND the count is cold, OR
+      // ICPAllowWarmFunc is false AND the count is NOT hot.
+      if ((ICPAllowWarmFunc && PSI->isColdCount(TotalCount)) ||
+          (!ICPAllowWarmFunc && !PSI->isHotCount(TotalCount))) {
+        LLVM_DEBUG(dbgs() << " TotalCount=" << TotalCount
+                          << " is not large enough.\n");
+        continue;
+      }
+    }
 
     auto PromotionCandidates = getPromotionCandidatesForCallSite(
         *CB, ICallProfDataRef, TotalCount, NumCandidates);
diff --git a/llvm/test/ThinLTO/X86/memprof-icp.ll b/llvm/test/ThinLTO/X86/memprof-icp.ll
index dbc532ee52828..87e5435a4229d 100644
--- a/llvm/test/ThinLTO/X86/memprof-icp.ll
+++ b/llvm/test/ThinLTO/X86/memprof-icp.ll
@@ -229,6 +229,7 @@
 ; RUN: llvm-lto2 run %t/main.o %t/foo.o -enable-memprof-context-disambiguation \
 ; RUN:	-import-instr-limit=0 \
 ; RUN:	-memprof-require-definition-for-promotion \
+; RUN:	-icp-allow-decl-only=false \
 ; RUN:	-enable-memprof-indirect-call-support=true \
 ; RUN:  -supports-hot-cold-new \
 ; RUN:  -r=%t/foo.o,_Z3fooR2B0j,plx \

@llvmbot
Copy link
Member

llvmbot commented Jul 21, 2025

@llvm/pr-subscribers-lto

Author: None (xur-llvm)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/149892.diff

3 Files Affected:

  • (modified) llvm/lib/Analysis/ProfileSummaryInfo.cpp (+12-2)
  • (modified) llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp (+34-3)
  • (modified) llvm/test/ThinLTO/X86/memprof-icp.ll (+1)
diff --git a/llvm/lib/Analysis/ProfileSummaryInfo.cpp b/llvm/lib/Analysis/ProfileSummaryInfo.cpp
index e8d4e37a4eb7e..7f361f47576d3 100644
--- a/llvm/lib/Analysis/ProfileSummaryInfo.cpp
+++ b/llvm/lib/Analysis/ProfileSummaryInfo.cpp
@@ -121,8 +121,18 @@ void ProfileSummaryInfo::computeThresholds() {
       ProfileSummaryBuilder::getHotCountThreshold(DetailedSummary);
   ColdCountThreshold =
       ProfileSummaryBuilder::getColdCountThreshold(DetailedSummary);
-  assert(ColdCountThreshold <= HotCountThreshold &&
-         "Cold count threshold cannot exceed hot count threshold!");
+  // When the hot and cold thresholds are identical, we would classify
+  // a count value as both hot and cold since we are doing an inclusitve check
+  // (see ::is{Hot|Cold}Count(). To avoid this undesirable overlap, ensure the
+  // thresholds are distinct.
+  if (HotCountThreshold == ColdCountThreshold) {
+    if (ColdCountThreshold > 0)
+      (*ColdCountThreshold)--;
+    else
+      (*HotCountThreshold)++;
+  }
+  assert(ColdCountThreshold < HotCountThreshold &&
+         "Cold count threshold should be less than hot count threshold!");
   if (!hasPartialSampleProfile() || !ScalePartialSampleProfileWorkingSetSize) {
     HasHugeWorkingSetSize =
         HotEntry.NumCounts > ProfileSummaryHugeWorkingSetSizeThreshold;
diff --git a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
index 854db0ff6864c..eafac58f0580a 100644
--- a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
+++ b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
@@ -80,6 +80,18 @@ static cl::opt<unsigned>
     ICPCSSkip("icp-csskip", cl::init(0), cl::Hidden,
               cl::desc("Skip Callsite up to this number for this compilation"));
 
+// ICP the candidate function even when only a declaration is present.
+static cl::opt<bool>
+    ICPAllowDeclOnly("icp-allow-decl-only", cl::init(true), cl::Hidden,
+                cl::desc("Promote the target candidate even when the defintion "
+                         " is not available"));
+
+// ICP all non-cold candidate functions. When it's false, only ICP hot functions.
+static cl::opt<bool>
+    ICPAllowWarmFunc("icp-allow-warm-func", cl::init(true), cl::Hidden,
+                cl::desc("Promote the target candidate even if it is not "
+                         "hot"));
+
 // Set if the pass is called in LTO optimization. The difference for LTO mode
 // is the pass won't prefix the source module name to the internal linkage
 // symbols.
@@ -477,7 +489,7 @@ IndirectCallPromoter::getPromotionCandidatesForCallSite(
     // the case where the symbol is globally dead in the binary and removed by
     // ThinLTO.
     Function *TargetFunction = Symtab->getFunction(Target);
-    if (TargetFunction == nullptr || TargetFunction->isDeclaration()) {
+    if (TargetFunction == nullptr) {
       LLVM_DEBUG(dbgs() << " Not promote: Cannot find the target\n");
       ORE.emit([&]() {
         return OptimizationRemarkMissed(DEBUG_TYPE, "UnableToFindTarget", &CB)
@@ -486,6 +498,16 @@ IndirectCallPromoter::getPromotionCandidatesForCallSite(
       });
       break;
     }
+    if (!ICPAllowDeclOnly && TargetFunction->isDeclaration()) {
+      LLVM_DEBUG(dbgs() << " Not promote: target definition is not available\n");
+      ORE.emit([&]() {
+        return OptimizationRemarkMissed(DEBUG_TYPE, "NoTargetDef", &CB)
+               << "Do not promote indirect call: target with md5sum "
+               << ore::NV("target md5sum", Target)
+               << " definition not available";
+      });
+      break;
+    }
 
     const char *Reason = nullptr;
     if (!isLegalToPromote(CB, TargetFunction, &Reason)) {
@@ -822,9 +844,18 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
     uint64_t TotalCount;
     auto ICallProfDataRef = ICallAnalysis.getPromotionCandidatesForInstruction(
         CB, TotalCount, NumCandidates);
-    if (!NumCandidates ||
-        (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
+    if (!NumCandidates)
       continue;
+    if (PSI && PSI->hasProfileSummary()) {
+      // Don't perform if ICPAllowWarmFunc is true AND the count is cold, OR
+      // ICPAllowWarmFunc is false AND the count is NOT hot.
+      if ((ICPAllowWarmFunc && PSI->isColdCount(TotalCount)) ||
+          (!ICPAllowWarmFunc && !PSI->isHotCount(TotalCount))) {
+        LLVM_DEBUG(dbgs() << " TotalCount=" << TotalCount
+                          << " is not large enough.\n");
+        continue;
+      }
+    }
 
     auto PromotionCandidates = getPromotionCandidatesForCallSite(
         *CB, ICallProfDataRef, TotalCount, NumCandidates);
diff --git a/llvm/test/ThinLTO/X86/memprof-icp.ll b/llvm/test/ThinLTO/X86/memprof-icp.ll
index dbc532ee52828..87e5435a4229d 100644
--- a/llvm/test/ThinLTO/X86/memprof-icp.ll
+++ b/llvm/test/ThinLTO/X86/memprof-icp.ll
@@ -229,6 +229,7 @@
 ; RUN: llvm-lto2 run %t/main.o %t/foo.o -enable-memprof-context-disambiguation \
 ; RUN:	-import-instr-limit=0 \
 ; RUN:	-memprof-require-definition-for-promotion \
+; RUN:	-icp-allow-decl-only=false \
 ; RUN:	-enable-memprof-indirect-call-support=true \
 ; RUN:  -supports-hot-cold-new \
 ; RUN:  -r=%t/foo.o,_Z3fooR2B0j,plx \

@llvmbot
Copy link
Member

llvmbot commented Jul 21, 2025

@llvm/pr-subscribers-llvm-analysis

Author: None (xur-llvm)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/149892.diff

3 Files Affected:

  • (modified) llvm/lib/Analysis/ProfileSummaryInfo.cpp (+12-2)
  • (modified) llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp (+34-3)
  • (modified) llvm/test/ThinLTO/X86/memprof-icp.ll (+1)
diff --git a/llvm/lib/Analysis/ProfileSummaryInfo.cpp b/llvm/lib/Analysis/ProfileSummaryInfo.cpp
index e8d4e37a4eb7e..7f361f47576d3 100644
--- a/llvm/lib/Analysis/ProfileSummaryInfo.cpp
+++ b/llvm/lib/Analysis/ProfileSummaryInfo.cpp
@@ -121,8 +121,18 @@ void ProfileSummaryInfo::computeThresholds() {
       ProfileSummaryBuilder::getHotCountThreshold(DetailedSummary);
   ColdCountThreshold =
       ProfileSummaryBuilder::getColdCountThreshold(DetailedSummary);
-  assert(ColdCountThreshold <= HotCountThreshold &&
-         "Cold count threshold cannot exceed hot count threshold!");
+  // When the hot and cold thresholds are identical, we would classify
+  // a count value as both hot and cold since we are doing an inclusitve check
+  // (see ::is{Hot|Cold}Count(). To avoid this undesirable overlap, ensure the
+  // thresholds are distinct.
+  if (HotCountThreshold == ColdCountThreshold) {
+    if (ColdCountThreshold > 0)
+      (*ColdCountThreshold)--;
+    else
+      (*HotCountThreshold)++;
+  }
+  assert(ColdCountThreshold < HotCountThreshold &&
+         "Cold count threshold should be less than hot count threshold!");
   if (!hasPartialSampleProfile() || !ScalePartialSampleProfileWorkingSetSize) {
     HasHugeWorkingSetSize =
         HotEntry.NumCounts > ProfileSummaryHugeWorkingSetSizeThreshold;
diff --git a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
index 854db0ff6864c..eafac58f0580a 100644
--- a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
+++ b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
@@ -80,6 +80,18 @@ static cl::opt<unsigned>
     ICPCSSkip("icp-csskip", cl::init(0), cl::Hidden,
               cl::desc("Skip Callsite up to this number for this compilation"));
 
+// ICP the candidate function even when only a declaration is present.
+static cl::opt<bool>
+    ICPAllowDeclOnly("icp-allow-decl-only", cl::init(true), cl::Hidden,
+                cl::desc("Promote the target candidate even when the defintion "
+                         " is not available"));
+
+// ICP all non-cold candidate functions. When it's false, only ICP hot functions.
+static cl::opt<bool>
+    ICPAllowWarmFunc("icp-allow-warm-func", cl::init(true), cl::Hidden,
+                cl::desc("Promote the target candidate even if it is not "
+                         "hot"));
+
 // Set if the pass is called in LTO optimization. The difference for LTO mode
 // is the pass won't prefix the source module name to the internal linkage
 // symbols.
@@ -477,7 +489,7 @@ IndirectCallPromoter::getPromotionCandidatesForCallSite(
     // the case where the symbol is globally dead in the binary and removed by
     // ThinLTO.
     Function *TargetFunction = Symtab->getFunction(Target);
-    if (TargetFunction == nullptr || TargetFunction->isDeclaration()) {
+    if (TargetFunction == nullptr) {
       LLVM_DEBUG(dbgs() << " Not promote: Cannot find the target\n");
       ORE.emit([&]() {
         return OptimizationRemarkMissed(DEBUG_TYPE, "UnableToFindTarget", &CB)
@@ -486,6 +498,16 @@ IndirectCallPromoter::getPromotionCandidatesForCallSite(
       });
       break;
     }
+    if (!ICPAllowDeclOnly && TargetFunction->isDeclaration()) {
+      LLVM_DEBUG(dbgs() << " Not promote: target definition is not available\n");
+      ORE.emit([&]() {
+        return OptimizationRemarkMissed(DEBUG_TYPE, "NoTargetDef", &CB)
+               << "Do not promote indirect call: target with md5sum "
+               << ore::NV("target md5sum", Target)
+               << " definition not available";
+      });
+      break;
+    }
 
     const char *Reason = nullptr;
     if (!isLegalToPromote(CB, TargetFunction, &Reason)) {
@@ -822,9 +844,18 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
     uint64_t TotalCount;
     auto ICallProfDataRef = ICallAnalysis.getPromotionCandidatesForInstruction(
         CB, TotalCount, NumCandidates);
-    if (!NumCandidates ||
-        (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
+    if (!NumCandidates)
       continue;
+    if (PSI && PSI->hasProfileSummary()) {
+      // Don't perform if ICPAllowWarmFunc is true AND the count is cold, OR
+      // ICPAllowWarmFunc is false AND the count is NOT hot.
+      if ((ICPAllowWarmFunc && PSI->isColdCount(TotalCount)) ||
+          (!ICPAllowWarmFunc && !PSI->isHotCount(TotalCount))) {
+        LLVM_DEBUG(dbgs() << " TotalCount=" << TotalCount
+                          << " is not large enough.\n");
+        continue;
+      }
+    }
 
     auto PromotionCandidates = getPromotionCandidatesForCallSite(
         *CB, ICallProfDataRef, TotalCount, NumCandidates);
diff --git a/llvm/test/ThinLTO/X86/memprof-icp.ll b/llvm/test/ThinLTO/X86/memprof-icp.ll
index dbc532ee52828..87e5435a4229d 100644
--- a/llvm/test/ThinLTO/X86/memprof-icp.ll
+++ b/llvm/test/ThinLTO/X86/memprof-icp.ll
@@ -229,6 +229,7 @@
 ; RUN: llvm-lto2 run %t/main.o %t/foo.o -enable-memprof-context-disambiguation \
 ; RUN:	-import-instr-limit=0 \
 ; RUN:	-memprof-require-definition-for-promotion \
+; RUN:	-icp-allow-decl-only=false \
 ; RUN:	-enable-memprof-indirect-call-support=true \
 ; RUN:  -supports-hot-cold-new \
 ; RUN:  -r=%t/foo.o,_Z3fooR2B0j,plx \

@llvmbot
Copy link
Member

llvmbot commented Jul 21, 2025

@llvm/pr-subscribers-pgo

Author: None (xur-llvm)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/149892.diff

3 Files Affected:

  • (modified) llvm/lib/Analysis/ProfileSummaryInfo.cpp (+12-2)
  • (modified) llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp (+34-3)
  • (modified) llvm/test/ThinLTO/X86/memprof-icp.ll (+1)
diff --git a/llvm/lib/Analysis/ProfileSummaryInfo.cpp b/llvm/lib/Analysis/ProfileSummaryInfo.cpp
index e8d4e37a4eb7e..7f361f47576d3 100644
--- a/llvm/lib/Analysis/ProfileSummaryInfo.cpp
+++ b/llvm/lib/Analysis/ProfileSummaryInfo.cpp
@@ -121,8 +121,18 @@ void ProfileSummaryInfo::computeThresholds() {
       ProfileSummaryBuilder::getHotCountThreshold(DetailedSummary);
   ColdCountThreshold =
       ProfileSummaryBuilder::getColdCountThreshold(DetailedSummary);
-  assert(ColdCountThreshold <= HotCountThreshold &&
-         "Cold count threshold cannot exceed hot count threshold!");
+  // When the hot and cold thresholds are identical, we would classify
+  // a count value as both hot and cold since we are doing an inclusitve check
+  // (see ::is{Hot|Cold}Count(). To avoid this undesirable overlap, ensure the
+  // thresholds are distinct.
+  if (HotCountThreshold == ColdCountThreshold) {
+    if (ColdCountThreshold > 0)
+      (*ColdCountThreshold)--;
+    else
+      (*HotCountThreshold)++;
+  }
+  assert(ColdCountThreshold < HotCountThreshold &&
+         "Cold count threshold should be less than hot count threshold!");
   if (!hasPartialSampleProfile() || !ScalePartialSampleProfileWorkingSetSize) {
     HasHugeWorkingSetSize =
         HotEntry.NumCounts > ProfileSummaryHugeWorkingSetSizeThreshold;
diff --git a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
index 854db0ff6864c..eafac58f0580a 100644
--- a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
+++ b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
@@ -80,6 +80,18 @@ static cl::opt<unsigned>
     ICPCSSkip("icp-csskip", cl::init(0), cl::Hidden,
               cl::desc("Skip Callsite up to this number for this compilation"));
 
+// ICP the candidate function even when only a declaration is present.
+static cl::opt<bool>
+    ICPAllowDeclOnly("icp-allow-decl-only", cl::init(true), cl::Hidden,
+                cl::desc("Promote the target candidate even when the defintion "
+                         " is not available"));
+
+// ICP all non-cold candidate functions. When it's false, only ICP hot functions.
+static cl::opt<bool>
+    ICPAllowWarmFunc("icp-allow-warm-func", cl::init(true), cl::Hidden,
+                cl::desc("Promote the target candidate even if it is not "
+                         "hot"));
+
 // Set if the pass is called in LTO optimization. The difference for LTO mode
 // is the pass won't prefix the source module name to the internal linkage
 // symbols.
@@ -477,7 +489,7 @@ IndirectCallPromoter::getPromotionCandidatesForCallSite(
     // the case where the symbol is globally dead in the binary and removed by
     // ThinLTO.
     Function *TargetFunction = Symtab->getFunction(Target);
-    if (TargetFunction == nullptr || TargetFunction->isDeclaration()) {
+    if (TargetFunction == nullptr) {
       LLVM_DEBUG(dbgs() << " Not promote: Cannot find the target\n");
       ORE.emit([&]() {
         return OptimizationRemarkMissed(DEBUG_TYPE, "UnableToFindTarget", &CB)
@@ -486,6 +498,16 @@ IndirectCallPromoter::getPromotionCandidatesForCallSite(
       });
       break;
     }
+    if (!ICPAllowDeclOnly && TargetFunction->isDeclaration()) {
+      LLVM_DEBUG(dbgs() << " Not promote: target definition is not available\n");
+      ORE.emit([&]() {
+        return OptimizationRemarkMissed(DEBUG_TYPE, "NoTargetDef", &CB)
+               << "Do not promote indirect call: target with md5sum "
+               << ore::NV("target md5sum", Target)
+               << " definition not available";
+      });
+      break;
+    }
 
     const char *Reason = nullptr;
     if (!isLegalToPromote(CB, TargetFunction, &Reason)) {
@@ -822,9 +844,18 @@ bool IndirectCallPromoter::processFunction(ProfileSummaryInfo *PSI) {
     uint64_t TotalCount;
     auto ICallProfDataRef = ICallAnalysis.getPromotionCandidatesForInstruction(
         CB, TotalCount, NumCandidates);
-    if (!NumCandidates ||
-        (PSI && PSI->hasProfileSummary() && !PSI->isHotCount(TotalCount)))
+    if (!NumCandidates)
       continue;
+    if (PSI && PSI->hasProfileSummary()) {
+      // Don't perform if ICPAllowWarmFunc is true AND the count is cold, OR
+      // ICPAllowWarmFunc is false AND the count is NOT hot.
+      if ((ICPAllowWarmFunc && PSI->isColdCount(TotalCount)) ||
+          (!ICPAllowWarmFunc && !PSI->isHotCount(TotalCount))) {
+        LLVM_DEBUG(dbgs() << " TotalCount=" << TotalCount
+                          << " is not large enough.\n");
+        continue;
+      }
+    }
 
     auto PromotionCandidates = getPromotionCandidatesForCallSite(
         *CB, ICallProfDataRef, TotalCount, NumCandidates);
diff --git a/llvm/test/ThinLTO/X86/memprof-icp.ll b/llvm/test/ThinLTO/X86/memprof-icp.ll
index dbc532ee52828..87e5435a4229d 100644
--- a/llvm/test/ThinLTO/X86/memprof-icp.ll
+++ b/llvm/test/ThinLTO/X86/memprof-icp.ll
@@ -229,6 +229,7 @@
 ; RUN: llvm-lto2 run %t/main.o %t/foo.o -enable-memprof-context-disambiguation \
 ; RUN:	-import-instr-limit=0 \
 ; RUN:	-memprof-require-definition-for-promotion \
+; RUN:	-icp-allow-decl-only=false \
 ; RUN:	-enable-memprof-indirect-call-support=true \
 ; RUN:  -supports-hot-cold-new \
 ; RUN:  -r=%t/foo.o,_Z3fooR2B0j,plx \

@xur-llvm xur-llvm requested a review from david-xl July 21, 2025 20:10
Copy link

github-actions bot commented Jul 21, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a description summarizing the changes. Also the changes need tests.

assert(ColdCountThreshold <= HotCountThreshold &&
"Cold count threshold cannot exceed hot count threshold!");
// When the hot and cold thresholds are identical, we would classify
// a count value as both hot and cold since we are doing an inclusitve check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/inclusitve/inclusive/

Copy link
Contributor

@david-xl david-xl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test cases?

// (see ::is{Hot|Cold}Count(). To avoid this undesirable overlap, ensure the
// thresholds are distinct.
if (HotCountThreshold == ColdCountThreshold) {
if (ColdCountThreshold > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplify it to just bump the hot count threshold?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not a good idea as this will lead tons of test changes. Also change existing opt behaviors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean bump it when the cold and hot thresholds equal -- that should not be frequent.

@@ -80,6 +80,18 @@ static cl::opt<unsigned>
ICPCSSkip("icp-csskip", cl::init(0), cl::Hidden,
cl::desc("Skip Callsite up to this number for this compilation"));

// ICP the candidate function even when only a declaration is present.
static cl::opt<bool>
ICPAllowDeclOnly("icp-allow-decl-only", cl::init(true), cl::Hidden,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be off initially to avoid behavior change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do want the behavior change, right?

continue;
if (PSI && PSI->hasProfileSummary()) {
// Don't perform if ICPAllowWarmFunc is true AND the count is cold, OR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add early check: if (is_cold) continue;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// ICP all non-cold candidate functions. When it's false, only ICP hot functions.
static cl::opt<bool>
ICPAllowWarmFunc("icp-allow-warm-func", cl::init(true), cl::Hidden,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change it to 'icp-allow-hot-only' and set it to true by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (PSI && PSI->hasProfileSummary()) {
// Don't perform if ICPAllowWarmFunc is true AND the count is cold, OR
// ICPAllowWarmFunc is false AND the count is NOT hot.
if ((ICPAllowWarmFunc && PSI->isColdCount(TotalCount)) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is cleaner to do this:
if (is_cold)
continue;
if (is_hot_only && !is_hot)
continue;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

If hot and cold thresholds are identical, both isColdCount() and
isHotCount() can return true due to inclusive checks in these
functions. This is undesirable.

This patch ensures distinct hot and cold counter thresholds.
An alternative fix is to make the check in isColdCount() exclusive.
But that requires many test case changes.
@xur-llvm xur-llvm force-pushed the users/xur-llvm/icp_work branch 4 times, most recently from 884231d to 66524b3 Compare July 22, 2025 00:28
@xur-llvm
Copy link
Contributor Author

Integrated reviews from David and Teresa in the latest patchset.

@xur-llvm xur-llvm force-pushed the users/xur-llvm/icp_work branch from 66524b3 to b2ddd5c Compare July 22, 2025 19:02
@@ -80,6 +80,27 @@ static cl::opt<unsigned>
ICPCSSkip("icp-csskip", cl::init(0), cl::Hidden,
cl::desc("Skip Callsite up to this number for this compilation"));

// ICP the candidate function even when only a declaration is present.
static cl::opt<bool> ICPAllowDeclOnly(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming can be confused as 'only allow decls (without definition)' while it means 'allow symbols with only declarations'. To avoid the confusion, rename it to just 'ICPAllowDecls'.


// If one target cannot be ICP'd, proceed with the remaining targets instead
// of exiting the callsite
static cl::opt<bool> ICPAllowAllocSkip(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ICPAllowCandidateSkip matches the meaning more.

; RUN: opt < %s -passes=pgo-icall-prom -icp-allow-decl-only=false -icp-allow-hot-only=true -icp-allow-skip=false -S -pass-remarks=pgo-icall-prom 2>&1 | FileCheck %s --check-prefix=REMARK1
; RUN: opt < %s -passes=pgo-icall-prom -icp-allow-decl-only=false -icp-allow-hot-only=false -icp-allow-skip=false -S -pass-remarks=pgo-icall-prom 2>&1 | FileCheck %s --check-prefixes=REMARK1
; RUN: opt < %s -passes=pgo-icall-prom -icp-allow-decl-only=false -icp-allow-hot-only=false -icp-allow-skip=true -S -pass-remarks=pgo-icall-prom 2>&1 | FileCheck %s --check-prefixes=REMARK1,REMARK3
; RUN: opt < %s -passes=pgo-icall-prom -icp-allow-hot-only=false -icp-allow-decl-only=true -icp-allow-skip=true -S -pass-remarks=pgo-icall-prom 2>&1 | FileCheck %s --check-prefixes=REMARK1,REMARK2,REMARK4,REMARK5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can you put the flags in the same order, so it is easier to compare the different RUN lines? I think they mostly are except for the final invocation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, when allow skip is true can you double check that the VP metadata is updated correctly? I.e. because unlike previously, we aren't always needing to remove the first N.

@xur-llvm xur-llvm force-pushed the users/xur-llvm/icp_work branch from b2ddd5c to b1cbd3e Compare July 23, 2025 05:19
@xur-llvm
Copy link
Contributor Author

Updated based on David and Teresa reviews.
(1) Changed option names
(2) Fixed the VP metadata update issue.
(3) Added a test case to check the metadata update.

Indirect-call promotion (ICP) has been adjusted with the following
tunings.
  (1) Candidate functions can be now ICP'd even if only a declaration
      is present.
  (2) All non-cold candidate functions are now considered by ICP.
      Previously, only hot targets were considered.
  (3) If one target cannot be ICP'd, proceed with the remaining
      targets instead of exiting the callsite.

In this patch, all tunings are disabled by default. They will be
enabled in a subsequent patch.
@xur-llvm xur-llvm force-pushed the users/xur-llvm/icp_work branch from b1cbd3e to d81f611 Compare July 23, 2025 05:24
@xur-llvm xur-llvm requested a review from mingmingl-llvm July 23, 2025 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO) PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants