[LLDB] Add optional callback function to `TypeMatcher` #143748

Nerixyz · 2025-06-11T16:55:12Z

To be able to have multiple formatters that match on the same type name but for different types, this adds a callback that can be specified, which is called with the match candidate. As mentlerd pointed out, it's similar to passing --recognizer-function. This isn't implemented as another match type, because the TypeMatcher still needs to have some name for GetMatchString.

The only use I currently see here is for the C++ STL, specifically libstdc++ and MSVC's STL.
In #143177, I used this to add formatters for std::string and friends from MSVC's STL.

One potential issue of adding the callback here is that the summary/synthetic will be cached for a type name - disregarding the callback. The scenario here would be two shared libraries that use a different type with the same name and two formatters (e.g. they use two different standard libraries). Once the first type was formatted, its formatter is cached. Now, the second type will always be formatted with the wrong formatter (the cached one), even though the callback for the cached one wouldn't match.
For one, --recognizer-function has the same issue and secondly, I don't think this is really realistic.

llvmbot · 2025-06-11T16:55:46Z

@llvm/pr-subscribers-lldb

Author: nerix (Nerixyz)

Changes

To be able to have multiple formatters that match on the same type name but for different types, this adds a callback that can be specified, which is called with the match candidate. As mentlerd pointed out, it's similar to passing --recognizer-function. This isn't implemented as another match type, because the TypeMatcher still needs to have some name for GetMatchString.

The only use I currently see here is for the C++ STL, specifically libstdc++ and MSVC's STL.
In #143177, I used this to add formatters for std::string and friends from MSVC's STL.

One potential issue of adding the callback here is that the summary/synthetic will be cached for a type name - disregarding the callback. The scenario here would be two shared libraries that use a different type with the same name and two formatters (e.g. they use two different standard libraries). Once the first type was formatted, its formatter is cached. Now, the second type will always be formatted with the wrong formatter (the cached one), even though the callback for the cached one wouldn't match.
For one, --recognizer-function has the same issue and secondly, I don't think this is really realistic.

Full diff: https://github.com/llvm/llvm-project/pull/143748.diff

3 Files Affected:

(modified) lldb/include/lldb/DataFormatters/FormatClasses.h (+8-2)
(modified) lldb/include/lldb/DataFormatters/FormattersContainer.h (+37-21)
(modified) lldb/unittests/DataFormatter/FormattersContainerTest.cpp (+64)

diff --git a/lldb/include/lldb/DataFormatters/FormatClasses.h b/lldb/include/lldb/DataFormatters/FormatClasses.h
index 89d74649ecc64..16685594db663 100644
--- a/lldb/include/lldb/DataFormatters/FormatClasses.h
+++ b/lldb/include/lldb/DataFormatters/FormatClasses.h
@@ -149,13 +149,16 @@ class FormattersMatchData {
   CandidateLanguagesVector m_candidate_languages;
 };
 
+using CxxTypeCandidateMatchFn = bool(const FormattersMatchCandidate &);
+
 class TypeNameSpecifierImpl {
 public:
   TypeNameSpecifierImpl() = default;
 
   TypeNameSpecifierImpl(llvm::StringRef name,
-                        lldb::FormatterMatchType match_type)
-      : m_match_type(match_type) {
+                        lldb::FormatterMatchType match_type,
+                        CxxTypeCandidateMatchFn *match_fn = nullptr)
+      : m_match_type(match_type), m_match_fn(match_fn) {
     m_type.m_type_name = std::string(name);
   }
 
@@ -188,6 +191,8 @@ class TypeNameSpecifierImpl {
     return CompilerType();
   }
 
+  CxxTypeCandidateMatchFn *GetCxxMatchFunction() const { return m_match_fn; }
+
   lldb::FormatterMatchType GetMatchType() { return m_match_type; }
 
   bool IsRegex() { return m_match_type == lldb::eFormatterMatchRegex; }
@@ -200,6 +205,7 @@ class TypeNameSpecifierImpl {
     CompilerType m_compiler_type;
   };
   TypeOrName m_type;
+  CxxTypeCandidateMatchFn *m_match_fn = nullptr;
 
   TypeNameSpecifierImpl(const TypeNameSpecifierImpl &) = delete;
   const TypeNameSpecifierImpl &
diff --git a/lldb/include/lldb/DataFormatters/FormattersContainer.h b/lldb/include/lldb/DataFormatters/FormattersContainer.h
index f7465fc65538d..2cf33347b3c53 100644
--- a/lldb/include/lldb/DataFormatters/FormattersContainer.h
+++ b/lldb/include/lldb/DataFormatters/FormattersContainer.h
@@ -50,6 +50,11 @@ class TypeMatcher {
   /// - eFormatterMatchCallback: run the function in m_name to decide if a type
   ///   matches or not.
   lldb::FormatterMatchType m_match_type;
+  /// An optional additional check for a FormattersMatchCandidate.
+  ///
+  /// If set, the name/regex is still checked first and this function will be
+  /// called if the name matches.
+  CxxTypeCandidateMatchFn *m_match_fn = nullptr;
 
   // if the user tries to add formatters for, say, "struct Foo" those will not
   // match any type because of the way we strip qualifiers from typenames this
@@ -86,32 +91,19 @@ class TypeMatcher {
   /// name specifier.
   TypeMatcher(lldb::TypeNameSpecifierImplSP type_specifier)
       : m_name(type_specifier->GetName()),
-        m_match_type(type_specifier->GetMatchType()) {
+        m_match_type(type_specifier->GetMatchType()),
+        m_match_fn(type_specifier->GetCxxMatchFunction()) {
     if (m_match_type == lldb::eFormatterMatchRegex)
       m_type_name_regex = RegularExpression(type_specifier->GetName());
   }
 
   /// True iff this matches the given type.
   bool Matches(FormattersMatchCandidate candidate_type) const {
-    ConstString type_name = candidate_type.GetTypeName();
-    switch (m_match_type) {
-    case lldb::eFormatterMatchExact:
-      return m_name == type_name ||
-             StripTypeName(m_name) == StripTypeName(type_name);
-    case lldb::eFormatterMatchRegex:
-      return m_type_name_regex.Execute(type_name.GetStringRef());
-    case lldb::eFormatterMatchCallback:
-      // CommandObjectType{Synth,Filter}Add tries to prevent the user from
-      // creating both a synthetic child provider and a filter for the same type
-      // in the same category, but we don't have a type object at that point, so
-      // it creates a dummy candidate without type or script interpreter.
-      // Skip callback matching in these cases.
-      if (candidate_type.GetScriptInterpreter())
-        return candidate_type.GetScriptInterpreter()->FormatterCallbackFunction(
-            m_name.AsCString(),
-            std::make_shared<TypeImpl>(candidate_type.GetType()));
-    }
-    return false;
+    if (!MatchesName(candidate_type))
+      return false;
+    if (m_match_fn && !m_match_fn(candidate_type))
+      return false;
+    return true;
   }
 
   lldb::FormatterMatchType GetMatchType() const { return m_match_type; }
@@ -134,7 +126,31 @@ class TypeMatcher {
   /// (lldb) type summary add --summary-string \"A\" -x TypeName
   /// (lldb) type summary delete TypeName
   bool CreatedBySameMatchString(TypeMatcher other) const {
-    return GetMatchString() == other.GetMatchString();
+    return GetMatchString() == other.GetMatchString() &&
+           m_match_fn == other.m_match_fn;
+  }
+
+private:
+  bool MatchesName(FormattersMatchCandidate candidate_type) const {
+    ConstString type_name = candidate_type.GetTypeName();
+    switch (m_match_type) {
+    case lldb::eFormatterMatchExact:
+      return m_name == type_name ||
+             StripTypeName(m_name) == StripTypeName(type_name);
+    case lldb::eFormatterMatchRegex:
+      return m_type_name_regex.Execute(type_name.GetStringRef());
+    case lldb::eFormatterMatchCallback:
+      // CommandObjectType{Synth,Filter}Add tries to prevent the user from
+      // creating both a synthetic child provider and a filter for the same type
+      // in the same category, but we don't have a type object at that point, so
+      // it creates a dummy candidate without type or script interpreter.
+      // Skip callback matching in these cases.
+      if (candidate_type.GetScriptInterpreter())
+        return candidate_type.GetScriptInterpreter()->FormatterCallbackFunction(
+            m_name.AsCString(),
+            std::make_shared<TypeImpl>(candidate_type.GetType()));
+    }
+    return false;
   }
 };
 
diff --git a/lldb/unittests/DataFormatter/FormattersContainerTest.cpp b/lldb/unittests/DataFormatter/FormattersContainerTest.cpp
index 41b01adfb9ecd..a8e9c1a1cccc0 100644
--- a/lldb/unittests/DataFormatter/FormattersContainerTest.cpp
+++ b/lldb/unittests/DataFormatter/FormattersContainerTest.cpp
@@ -165,3 +165,67 @@ TEST(TypeMatcherTests, CreatedBySameMatchStringExactNamePrefixes) {
     EXPECT_TRUE(without_prefix.CreatedBySameMatchString(without_prefix));
   }
 }
+
+TEST(TypeMatcherTests, CxxTypeCandidateMatchTest) {
+  auto alwaysTrue = +[](const FormattersMatchCandidate &) { return true; };
+  auto alwaysFalse = +[](const FormattersMatchCandidate &) { return false; };
+
+  TypeMatcher regex_matcher_true(std::make_shared<TypeNameSpecifierImpl>(
+      "^a[a-z]c$", eFormatterMatchRegex, alwaysTrue));
+  EXPECT_TRUE(regex_matcher_true.Matches(CandidateFromTypeName("abc")));
+  EXPECT_TRUE(regex_matcher_true.Matches(CandidateFromTypeName("azc")));
+  EXPECT_FALSE(regex_matcher_true.Matches(CandidateFromTypeName("a")));
+
+  TypeMatcher regex_matcher_false(std::make_shared<TypeNameSpecifierImpl>(
+      "^a[a-z]c$", eFormatterMatchRegex, alwaysFalse));
+  EXPECT_FALSE(regex_matcher_false.Matches(CandidateFromTypeName("abc")));
+  EXPECT_FALSE(regex_matcher_false.Matches(CandidateFromTypeName("azc")));
+  EXPECT_FALSE(regex_matcher_false.Matches(CandidateFromTypeName("a")));
+
+  TypeMatcher string_matcher_true(std::make_shared<TypeNameSpecifierImpl>(
+      "abc", eFormatterMatchExact, alwaysTrue));
+  EXPECT_TRUE(string_matcher_true.Matches(CandidateFromTypeName("abc")));
+  EXPECT_FALSE(string_matcher_true.Matches(CandidateFromTypeName("azc")));
+  EXPECT_FALSE(string_matcher_true.Matches(CandidateFromTypeName("a")));
+
+  TypeMatcher string_matcher_false(std::make_shared<TypeNameSpecifierImpl>(
+      "abc", eFormatterMatchExact, alwaysFalse));
+  EXPECT_FALSE(string_matcher_false.Matches(CandidateFromTypeName("abc")));
+  EXPECT_FALSE(string_matcher_false.Matches(CandidateFromTypeName("azc")));
+  EXPECT_FALSE(string_matcher_false.Matches(CandidateFromTypeName("a")));
+
+  EXPECT_TRUE(regex_matcher_true.CreatedBySameMatchString(regex_matcher_true));
+  EXPECT_FALSE(
+      regex_matcher_true.CreatedBySameMatchString(regex_matcher_false));
+  EXPECT_FALSE(
+      regex_matcher_true.CreatedBySameMatchString(string_matcher_true));
+  EXPECT_FALSE(
+      regex_matcher_true.CreatedBySameMatchString(string_matcher_false));
+
+  EXPECT_FALSE(
+      regex_matcher_false.CreatedBySameMatchString(regex_matcher_true));
+  EXPECT_TRUE(
+      regex_matcher_false.CreatedBySameMatchString(regex_matcher_false));
+  EXPECT_FALSE(
+      regex_matcher_false.CreatedBySameMatchString(string_matcher_true));
+  EXPECT_FALSE(
+      regex_matcher_false.CreatedBySameMatchString(string_matcher_false));
+
+  EXPECT_FALSE(
+      string_matcher_true.CreatedBySameMatchString(regex_matcher_true));
+  EXPECT_FALSE(
+      string_matcher_true.CreatedBySameMatchString(regex_matcher_false));
+  EXPECT_TRUE(
+      string_matcher_true.CreatedBySameMatchString(string_matcher_true));
+  EXPECT_FALSE(
+      string_matcher_true.CreatedBySameMatchString(string_matcher_false));
+
+  EXPECT_FALSE(
+      string_matcher_false.CreatedBySameMatchString(regex_matcher_true));
+  EXPECT_FALSE(
+      string_matcher_false.CreatedBySameMatchString(regex_matcher_false));
+  EXPECT_FALSE(
+      string_matcher_false.CreatedBySameMatchString(string_matcher_true));
+  EXPECT_TRUE(
+      string_matcher_false.CreatedBySameMatchString(string_matcher_false));
+}

jimingham · 2025-06-11T20:13:05Z

We would very much like to support debug sessions with two targets on different systems. For instance, imagine debugging a Windows client on the host system, and a Linux server running on another machine. There's all sorts of opportunities to do coordinated debugging between server and client, so we really want to support this scenario.

But it shouldn't be that hard to support this. When you are about to use a cached formatter, just run the validator function from that formatter on the candidate VO. If it doesn't pass the validator, fall back to the full search. Running the validator should be much quicker than a full type search, so you still get a lot of the benefit of caching, but you won't use the cache when it isn't appropriate.

jimingham · 2025-06-11T20:17:38Z

You could even make the cache hold the "match string" and a LIST of matching formatters. If a VO passes the typename string match, then we run through the list of matching formatters and choose the first one whose validator accepts this VO.

That way we could have both the STL version formatters in the cache but never choose the wrong one when dealing with heterogeneous targets.

jimingham · 2025-06-11T20:18:17Z

Another way to do this would be to make the data formatters held per target not per debugger. But that's a much bigger change.

Nerixyz · 2025-06-11T20:25:18Z

When you are about to use a cached formatter, just run the validator function from that formatter on the candidate VO. If it doesn't pass the validator, fall back to the full search.

I think for this, it's easier to save the validator per formatter/summary/synthetic, because we need the validator when retrieving from the cache, right? That shouldn't be a big issue, just a refactoring.

You could even make the cache hold the "match string" and a LIST of matching formatters. If a VO passes the typename string match, then we run through the list of matching formatters and choose the first one whose validator accepts this VO.

I think in almost all cases, having one entry in the cache is better. Sure, there might be scenarios where you have different types with the same name, but realistically, when you're formatting you're formatting (e.g. at a breakpoint), you'll most likely have one type per name.

Nerixyz · 2025-06-11T21:46:30Z

It's still missing tests (and still has the wrong commit message), but I updated it to validate in the cache too. I also updated #143177 where it's implemented for std::string.

jimingham · 2025-06-11T22:26:16Z

When you are about to use a cached formatter, just run the validator function from that formatter on the candidate VO. If it doesn't pass the validator, fall back to the full search.

I think for this, it's easier to save the validator per formatter/summary/synthetic, because we need the validator when retrieving from the cache, right? That shouldn't be a big issue, just a refactoring.

You could even make the cache hold the "match string" and a LIST of matching formatters. If a VO passes the typename string match, then we run through the list of matching formatters and choose the first one whose validator accepts this VO.

I think in almost all cases, having one entry in the cache is better. Sure, there might be scenarios where you have different types with the same name, but realistically, when you're formatting you're formatting (e.g. at a breakpoint), you'll most likely have one type per name.

If you have two Targets in lldb, and they each have a variant of some type that you've discriminated by a TypeValidator, then it's quite possible that Target A and Target B hit breakpoints in a location that contains a local of the given type in sequence. For instance, if I were trying to follow some RPC from client to server and back, I'd be stopping with the "client" view of the message first, then stopping on receipt with the "server" view of the message, etc.

So one {typename, validator} pair will get set in the cache when you first stop in the client. Then when you stop in the server, we'll answer that the type is not cached because the validator fails. I think we'll then cache the {typename, second_validator} so at that point we're getting no benefit from caching...

jimingham · 2025-06-11T22:27:44Z

Also, when you go to write tests be sure to write a test with this case. If there were a way to add the validator from Python you could do this pretty easily. Maybe you can do it a unit test w/o going that far?

Nerixyz · 2025-06-12T09:35:59Z

I think we'll then cache the {typename, second_validator} so at that point we're getting no benefit from caching...

Yes, if you're going back and forth, then the current caching won't help.

I'm not that familiar with all the use cases of LLDB. In my thinking, you'd break in the client, evaluate multiple variables and then break in the server where you'd also evaluate multiple variables. My assumption would be that some of these have the same type. In that case you'd get two misses but all other requests would be cached.

Zooming out a bit more: what's the concrete use-case where this would be a problem? As explained above, having two types with the same name but different layouts seems really rare. Moreover, you'd need to be debugging both at the same time to run into this. And even then, it's not a logic bug but a performance bug.

Maybe you can do it a unit test w/o going that far?

I'll try to do this.

labath · 2025-06-12T13:58:00Z

A low-tech way to support both STL versions would be to do a dispatch inside the data formatter:

StdStringSummaryProvider(value) {
  if (IsGnuSTL(value))
    return GnuStringSummaryProvider(value);
  return MSVCStringSummaryProvider(value);
}

It's not a complete solution as it doesn't allow for a distributed setup (vendor A writes a formatter for type Foo, vendor B writes a formatter for its type Foo, user C needs to debug both Foo types, so it installs both formatters), but for formatters that are a part of lldb (i.e. STL types), that should work fine. And maybe STL types are the only place where this matters?

jimingham · 2025-06-12T16:26:06Z

On Jun 12, 2025, at 2:36 AM, nerix ***@***.***> wrote: Nerixyz left a comment (llvm/llvm-project#143748) <#143748 (comment)> I think we'll then cache the {typename, second_validator} so at that point we're getting no benefit from caching... Yes, if you're going back and forth, then the current caching won't help. I'm not that familiar with all the use cases of LLDB. In my thinking, you'd break in the client, evaluate multiple variables and then break in the server where you'd also evaluate multiple variables. My assumption would be that some of these have the same type. In that case you'd get two misses but all other requests would be cached. Zooming out a bit more: what's the concrete use-case where this would be a problem? As explained above, having two types with the same name but different layouts seems really rare. Moreover, you'd need to be debugging both at the same time to run into this. And even then, it's not a logic bug but a performance bug.

I've described a use case a couple times already. If you have a client-server architecture, it would be really nice to be able to debug both the client and the server in the same lldb process, because then you can add tooling on top of lldb to do things like "step in across remote protocol dispatch". And in client-server systems it is often the case that the client and the server run on different OS-es. Jim

…

Maybe you can do it a unit test w/o going that far? I'll try to do this. — Reply to this email directly, view it on GitHub <#143748 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW6JZMPGUK4XCV5INZL3DFCZRAVCNFSM6AAAAAB7DDQ4KCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSNRVHA3TSOJYGI>. You are receiving this because you are on a team that was mentioned.

Nerixyz · 2025-06-13T08:02:03Z

I've described a use case a couple times already. If you have a client-server architecture, it would be really nice to be able to debug both the client and the server in the same lldb process, because then you can add tooling on top of lldb to do things like "step in across remote protocol dispatch". And in client-server systems it is often the case that the client and the server run on different OS-es. Jim

Sorry if I sounded mean before, that wasn't my intention at all. I assumed you meant just using lldb-server, but with this explanation it makes sense why you'd debug both and now I can see why you run into this.

What do you think about caching at most two instances per kind (summary/snythetic/formatter) per entry? The two isn't from "client&server" but from having at most two different ones (gnu&msvc).

Nerixyz · 2025-06-13T08:10:06Z

A low-tech way to support both STL versions would be to do a dispatch inside the data formatter

For summaries this would work, but it Gerts more involved for synthetic children providers, where you'd need to check for the kind in every method.

Though now that I think about it, couldn't I inspect the type before creating the frontend for a synthetic provider and create the appropriate frontend?

labath · 2025-06-13T10:05:32Z

Though now that I think about it, couldn't I inspect the type before creating the frontend for a synthetic provider and create the appropriate frontend?

That is what I was imagining, yes.

Nerixyz · 2025-06-17T20:56:57Z

Thank you for the feedback!

Closing this, as it's not needed to allow different summary/synthetic providers for one type name.

Nerixyz requested a review from JDevlieghere as a code owner June 11, 2025 16:55

llvmbot added the lldb label Jun 11, 2025

[LLDB] Add optional callback function to TypeMatcher

fc36d22

Nerixyz force-pushed the feat/lldb-formatter-cxx-callback branch from d1a9078 to fc36d22 Compare June 11, 2025 21:45

Nerixyz closed this Jun 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LLDB] Add optional callback function to `TypeMatcher` #143748

[LLDB] Add optional callback function to `TypeMatcher` #143748

Uh oh!

Nerixyz commented Jun 11, 2025

Uh oh!

llvmbot commented Jun 11, 2025

Uh oh!

jimingham commented Jun 11, 2025

Uh oh!

jimingham commented Jun 11, 2025

Uh oh!

jimingham commented Jun 11, 2025

Uh oh!

Nerixyz commented Jun 11, 2025

Uh oh!

Nerixyz commented Jun 11, 2025 •

edited

Loading

Uh oh!

jimingham commented Jun 11, 2025

Uh oh!

jimingham commented Jun 11, 2025

Uh oh!

Nerixyz commented Jun 12, 2025

Uh oh!

labath commented Jun 12, 2025

Uh oh!

jimingham commented Jun 12, 2025 via email

Uh oh!

Nerixyz commented Jun 13, 2025

Uh oh!

Nerixyz commented Jun 13, 2025

Uh oh!

labath commented Jun 13, 2025

Uh oh!

Nerixyz commented Jun 17, 2025

Uh oh!

Uh oh!

[LLDB] Add optional callback function to TypeMatcher #143748

[LLDB] Add optional callback function to TypeMatcher #143748

Uh oh!

Conversation

Nerixyz commented Jun 11, 2025

Uh oh!

llvmbot commented Jun 11, 2025

Uh oh!

jimingham commented Jun 11, 2025

Uh oh!

jimingham commented Jun 11, 2025

Uh oh!

jimingham commented Jun 11, 2025

Uh oh!

Nerixyz commented Jun 11, 2025

Uh oh!

Nerixyz commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimingham commented Jun 11, 2025

Uh oh!

jimingham commented Jun 11, 2025

Uh oh!

Nerixyz commented Jun 12, 2025

Uh oh!

labath commented Jun 12, 2025

Uh oh!

jimingham commented Jun 12, 2025 via email

Uh oh!

Nerixyz commented Jun 13, 2025

Uh oh!

Nerixyz commented Jun 13, 2025

Uh oh!

labath commented Jun 13, 2025

Uh oh!

Nerixyz commented Jun 17, 2025

Uh oh!

Uh oh!

[LLDB] Add optional callback function to `TypeMatcher` #143748

[LLDB] Add optional callback function to `TypeMatcher` #143748

Nerixyz commented Jun 11, 2025 •

edited

Loading