Skip to content

[LLDB] Add optional callback function to TypeMatcher #143748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

Nerixyz
Copy link
Contributor

@Nerixyz Nerixyz commented Jun 11, 2025

To be able to have multiple formatters that match on the same type name but for different types, this adds a callback that can be specified, which is called with the match candidate. As mentlerd pointed out, it's similar to passing --recognizer-function. This isn't implemented as another match type, because the TypeMatcher still needs to have some name for GetMatchString.

The only use I currently see here is for the C++ STL, specifically libstdc++ and MSVC's STL.
In #143177, I used this to add formatters for std::string and friends from MSVC's STL.

One potential issue of adding the callback here is that the summary/synthetic will be cached for a type name - disregarding the callback. The scenario here would be two shared libraries that use a different type with the same name and two formatters (e.g. they use two different standard libraries). Once the first type was formatted, its formatter is cached. Now, the second type will always be formatted with the wrong formatter (the cached one), even though the callback for the cached one wouldn't match.
For one, --recognizer-function has the same issue and secondly, I don't think this is really realistic.

@Nerixyz Nerixyz requested a review from JDevlieghere as a code owner June 11, 2025 16:55
@llvmbot llvmbot added the lldb label Jun 11, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 11, 2025

@llvm/pr-subscribers-lldb

Author: nerix (Nerixyz)

Changes

To be able to have multiple formatters that match on the same type name but for different types, this adds a callback that can be specified, which is called with the match candidate. As mentlerd pointed out, it's similar to passing --recognizer-function. This isn't implemented as another match type, because the TypeMatcher still needs to have some name for GetMatchString.

The only use I currently see here is for the C++ STL, specifically libstdc++ and MSVC's STL.
In #143177, I used this to add formatters for std::string and friends from MSVC's STL.

One potential issue of adding the callback here is that the summary/synthetic will be cached for a type name - disregarding the callback. The scenario here would be two shared libraries that use a different type with the same name and two formatters (e.g. they use two different standard libraries). Once the first type was formatted, its formatter is cached. Now, the second type will always be formatted with the wrong formatter (the cached one), even though the callback for the cached one wouldn't match.
For one, --recognizer-function has the same issue and secondly, I don't think this is really realistic.


Full diff: https://github.com/llvm/llvm-project/pull/143748.diff

3 Files Affected:

  • (modified) lldb/include/lldb/DataFormatters/FormatClasses.h (+8-2)
  • (modified) lldb/include/lldb/DataFormatters/FormattersContainer.h (+37-21)
  • (modified) lldb/unittests/DataFormatter/FormattersContainerTest.cpp (+64)
diff --git a/lldb/include/lldb/DataFormatters/FormatClasses.h b/lldb/include/lldb/DataFormatters/FormatClasses.h
index 89d74649ecc64..16685594db663 100644
--- a/lldb/include/lldb/DataFormatters/FormatClasses.h
+++ b/lldb/include/lldb/DataFormatters/FormatClasses.h
@@ -149,13 +149,16 @@ class FormattersMatchData {
   CandidateLanguagesVector m_candidate_languages;
 };
 
+using CxxTypeCandidateMatchFn = bool(const FormattersMatchCandidate &);
+
 class TypeNameSpecifierImpl {
 public:
   TypeNameSpecifierImpl() = default;
 
   TypeNameSpecifierImpl(llvm::StringRef name,
-                        lldb::FormatterMatchType match_type)
-      : m_match_type(match_type) {
+                        lldb::FormatterMatchType match_type,
+                        CxxTypeCandidateMatchFn *match_fn = nullptr)
+      : m_match_type(match_type), m_match_fn(match_fn) {
     m_type.m_type_name = std::string(name);
   }
 
@@ -188,6 +191,8 @@ class TypeNameSpecifierImpl {
     return CompilerType();
   }
 
+  CxxTypeCandidateMatchFn *GetCxxMatchFunction() const { return m_match_fn; }
+
   lldb::FormatterMatchType GetMatchType() { return m_match_type; }
 
   bool IsRegex() { return m_match_type == lldb::eFormatterMatchRegex; }
@@ -200,6 +205,7 @@ class TypeNameSpecifierImpl {
     CompilerType m_compiler_type;
   };
   TypeOrName m_type;
+  CxxTypeCandidateMatchFn *m_match_fn = nullptr;
 
   TypeNameSpecifierImpl(const TypeNameSpecifierImpl &) = delete;
   const TypeNameSpecifierImpl &
diff --git a/lldb/include/lldb/DataFormatters/FormattersContainer.h b/lldb/include/lldb/DataFormatters/FormattersContainer.h
index f7465fc65538d..2cf33347b3c53 100644
--- a/lldb/include/lldb/DataFormatters/FormattersContainer.h
+++ b/lldb/include/lldb/DataFormatters/FormattersContainer.h
@@ -50,6 +50,11 @@ class TypeMatcher {
   /// - eFormatterMatchCallback: run the function in m_name to decide if a type
   ///   matches or not.
   lldb::FormatterMatchType m_match_type;
+  /// An optional additional check for a FormattersMatchCandidate.
+  ///
+  /// If set, the name/regex is still checked first and this function will be
+  /// called if the name matches.
+  CxxTypeCandidateMatchFn *m_match_fn = nullptr;
 
   // if the user tries to add formatters for, say, "struct Foo" those will not
   // match any type because of the way we strip qualifiers from typenames this
@@ -86,32 +91,19 @@ class TypeMatcher {
   /// name specifier.
   TypeMatcher(lldb::TypeNameSpecifierImplSP type_specifier)
       : m_name(type_specifier->GetName()),
-        m_match_type(type_specifier->GetMatchType()) {
+        m_match_type(type_specifier->GetMatchType()),
+        m_match_fn(type_specifier->GetCxxMatchFunction()) {
     if (m_match_type == lldb::eFormatterMatchRegex)
       m_type_name_regex = RegularExpression(type_specifier->GetName());
   }
 
   /// True iff this matches the given type.
   bool Matches(FormattersMatchCandidate candidate_type) const {
-    ConstString type_name = candidate_type.GetTypeName();
-    switch (m_match_type) {
-    case lldb::eFormatterMatchExact:
-      return m_name == type_name ||
-             StripTypeName(m_name) == StripTypeName(type_name);
-    case lldb::eFormatterMatchRegex:
-      return m_type_name_regex.Execute(type_name.GetStringRef());
-    case lldb::eFormatterMatchCallback:
-      // CommandObjectType{Synth,Filter}Add tries to prevent the user from
-      // creating both a synthetic child provider and a filter for the same type
-      // in the same category, but we don't have a type object at that point, so
-      // it creates a dummy candidate without type or script interpreter.
-      // Skip callback matching in these cases.
-      if (candidate_type.GetScriptInterpreter())
-        return candidate_type.GetScriptInterpreter()->FormatterCallbackFunction(
-            m_name.AsCString(),
-            std::make_shared<TypeImpl>(candidate_type.GetType()));
-    }
-    return false;
+    if (!MatchesName(candidate_type))
+      return false;
+    if (m_match_fn && !m_match_fn(candidate_type))
+      return false;
+    return true;
   }
 
   lldb::FormatterMatchType GetMatchType() const { return m_match_type; }
@@ -134,7 +126,31 @@ class TypeMatcher {
   /// (lldb) type summary add --summary-string \"A\" -x TypeName
   /// (lldb) type summary delete TypeName
   bool CreatedBySameMatchString(TypeMatcher other) const {
-    return GetMatchString() == other.GetMatchString();
+    return GetMatchString() == other.GetMatchString() &&
+           m_match_fn == other.m_match_fn;
+  }
+
+private:
+  bool MatchesName(FormattersMatchCandidate candidate_type) const {
+    ConstString type_name = candidate_type.GetTypeName();
+    switch (m_match_type) {
+    case lldb::eFormatterMatchExact:
+      return m_name == type_name ||
+             StripTypeName(m_name) == StripTypeName(type_name);
+    case lldb::eFormatterMatchRegex:
+      return m_type_name_regex.Execute(type_name.GetStringRef());
+    case lldb::eFormatterMatchCallback:
+      // CommandObjectType{Synth,Filter}Add tries to prevent the user from
+      // creating both a synthetic child provider and a filter for the same type
+      // in the same category, but we don't have a type object at that point, so
+      // it creates a dummy candidate without type or script interpreter.
+      // Skip callback matching in these cases.
+      if (candidate_type.GetScriptInterpreter())
+        return candidate_type.GetScriptInterpreter()->FormatterCallbackFunction(
+            m_name.AsCString(),
+            std::make_shared<TypeImpl>(candidate_type.GetType()));
+    }
+    return false;
   }
 };
 
diff --git a/lldb/unittests/DataFormatter/FormattersContainerTest.cpp b/lldb/unittests/DataFormatter/FormattersContainerTest.cpp
index 41b01adfb9ecd..a8e9c1a1cccc0 100644
--- a/lldb/unittests/DataFormatter/FormattersContainerTest.cpp
+++ b/lldb/unittests/DataFormatter/FormattersContainerTest.cpp
@@ -165,3 +165,67 @@ TEST(TypeMatcherTests, CreatedBySameMatchStringExactNamePrefixes) {
     EXPECT_TRUE(without_prefix.CreatedBySameMatchString(without_prefix));
   }
 }
+
+TEST(TypeMatcherTests, CxxTypeCandidateMatchTest) {
+  auto alwaysTrue = +[](const FormattersMatchCandidate &) { return true; };
+  auto alwaysFalse = +[](const FormattersMatchCandidate &) { return false; };
+
+  TypeMatcher regex_matcher_true(std::make_shared<TypeNameSpecifierImpl>(
+      "^a[a-z]c$", eFormatterMatchRegex, alwaysTrue));
+  EXPECT_TRUE(regex_matcher_true.Matches(CandidateFromTypeName("abc")));
+  EXPECT_TRUE(regex_matcher_true.Matches(CandidateFromTypeName("azc")));
+  EXPECT_FALSE(regex_matcher_true.Matches(CandidateFromTypeName("a")));
+
+  TypeMatcher regex_matcher_false(std::make_shared<TypeNameSpecifierImpl>(
+      "^a[a-z]c$", eFormatterMatchRegex, alwaysFalse));
+  EXPECT_FALSE(regex_matcher_false.Matches(CandidateFromTypeName("abc")));
+  EXPECT_FALSE(regex_matcher_false.Matches(CandidateFromTypeName("azc")));
+  EXPECT_FALSE(regex_matcher_false.Matches(CandidateFromTypeName("a")));
+
+  TypeMatcher string_matcher_true(std::make_shared<TypeNameSpecifierImpl>(
+      "abc", eFormatterMatchExact, alwaysTrue));
+  EXPECT_TRUE(string_matcher_true.Matches(CandidateFromTypeName("abc")));
+  EXPECT_FALSE(string_matcher_true.Matches(CandidateFromTypeName("azc")));
+  EXPECT_FALSE(string_matcher_true.Matches(CandidateFromTypeName("a")));
+
+  TypeMatcher string_matcher_false(std::make_shared<TypeNameSpecifierImpl>(
+      "abc", eFormatterMatchExact, alwaysFalse));
+  EXPECT_FALSE(string_matcher_false.Matches(CandidateFromTypeName("abc")));
+  EXPECT_FALSE(string_matcher_false.Matches(CandidateFromTypeName("azc")));
+  EXPECT_FALSE(string_matcher_false.Matches(CandidateFromTypeName("a")));
+
+  EXPECT_TRUE(regex_matcher_true.CreatedBySameMatchString(regex_matcher_true));
+  EXPECT_FALSE(
+      regex_matcher_true.CreatedBySameMatchString(regex_matcher_false));
+  EXPECT_FALSE(
+      regex_matcher_true.CreatedBySameMatchString(string_matcher_true));
+  EXPECT_FALSE(
+      regex_matcher_true.CreatedBySameMatchString(string_matcher_false));
+
+  EXPECT_FALSE(
+      regex_matcher_false.CreatedBySameMatchString(regex_matcher_true));
+  EXPECT_TRUE(
+      regex_matcher_false.CreatedBySameMatchString(regex_matcher_false));
+  EXPECT_FALSE(
+      regex_matcher_false.CreatedBySameMatchString(string_matcher_true));
+  EXPECT_FALSE(
+      regex_matcher_false.CreatedBySameMatchString(string_matcher_false));
+
+  EXPECT_FALSE(
+      string_matcher_true.CreatedBySameMatchString(regex_matcher_true));
+  EXPECT_FALSE(
+      string_matcher_true.CreatedBySameMatchString(regex_matcher_false));
+  EXPECT_TRUE(
+      string_matcher_true.CreatedBySameMatchString(string_matcher_true));
+  EXPECT_FALSE(
+      string_matcher_true.CreatedBySameMatchString(string_matcher_false));
+
+  EXPECT_FALSE(
+      string_matcher_false.CreatedBySameMatchString(regex_matcher_true));
+  EXPECT_FALSE(
+      string_matcher_false.CreatedBySameMatchString(regex_matcher_false));
+  EXPECT_FALSE(
+      string_matcher_false.CreatedBySameMatchString(string_matcher_true));
+  EXPECT_TRUE(
+      string_matcher_false.CreatedBySameMatchString(string_matcher_false));
+}

@jimingham
Copy link
Collaborator

We would very much like to support debug sessions with two targets on different systems. For instance, imagine debugging a Windows client on the host system, and a Linux server running on another machine. There's all sorts of opportunities to do coordinated debugging between server and client, so we really want to support this scenario.

But it shouldn't be that hard to support this. When you are about to use a cached formatter, just run the validator function from that formatter on the candidate VO. If it doesn't pass the validator, fall back to the full search. Running the validator should be much quicker than a full type search, so you still get a lot of the benefit of caching, but you won't use the cache when it isn't appropriate.

@jimingham
Copy link
Collaborator

You could even make the cache hold the "match string" and a LIST of matching formatters. If a VO passes the typename string match, then we run through the list of matching formatters and choose the first one whose validator accepts this VO.

That way we could have both the STL version formatters in the cache but never choose the wrong one when dealing with heterogeneous targets.

@jimingham
Copy link
Collaborator

Another way to do this would be to make the data formatters held per target not per debugger. But that's a much bigger change.

@Nerixyz
Copy link
Contributor Author

Nerixyz commented Jun 11, 2025

When you are about to use a cached formatter, just run the validator function from that formatter on the candidate VO. If it doesn't pass the validator, fall back to the full search.

I think for this, it's easier to save the validator per formatter/summary/synthetic, because we need the validator when retrieving from the cache, right? That shouldn't be a big issue, just a refactoring.

You could even make the cache hold the "match string" and a LIST of matching formatters. If a VO passes the typename string match, then we run through the list of matching formatters and choose the first one whose validator accepts this VO.

I think in almost all cases, having one entry in the cache is better. Sure, there might be scenarios where you have different types with the same name, but realistically, when you're formatting you're formatting (e.g. at a breakpoint), you'll most likely have one type per name.

@Nerixyz Nerixyz force-pushed the feat/lldb-formatter-cxx-callback branch from d1a9078 to fc36d22 Compare June 11, 2025 21:45
@Nerixyz
Copy link
Contributor Author

Nerixyz commented Jun 11, 2025

It's still missing tests (and still has the wrong commit message), but I updated it to validate in the cache too. I also updated #143177 where it's implemented for std::string.

@jimingham
Copy link
Collaborator

When you are about to use a cached formatter, just run the validator function from that formatter on the candidate VO. If it doesn't pass the validator, fall back to the full search.

I think for this, it's easier to save the validator per formatter/summary/synthetic, because we need the validator when retrieving from the cache, right? That shouldn't be a big issue, just a refactoring.

You could even make the cache hold the "match string" and a LIST of matching formatters. If a VO passes the typename string match, then we run through the list of matching formatters and choose the first one whose validator accepts this VO.

I think in almost all cases, having one entry in the cache is better. Sure, there might be scenarios where you have different types with the same name, but realistically, when you're formatting you're formatting (e.g. at a breakpoint), you'll most likely have one type per name.

If you have two Targets in lldb, and they each have a variant of some type that you've discriminated by a TypeValidator, then it's quite possible that Target A and Target B hit breakpoints in a location that contains a local of the given type in sequence. For instance, if I were trying to follow some RPC from client to server and back, I'd be stopping with the "client" view of the message first, then stopping on receipt with the "server" view of the message, etc.

So one {typename, validator} pair will get set in the cache when you first stop in the client. Then when you stop in the server, we'll answer that the type is not cached because the validator fails. I think we'll then cache the {typename, second_validator} so at that point we're getting no benefit from caching...

@jimingham
Copy link
Collaborator

Also, when you go to write tests be sure to write a test with this case. If there were a way to add the validator from Python you could do this pretty easily. Maybe you can do it a unit test w/o going that far?

@Nerixyz
Copy link
Contributor Author

Nerixyz commented Jun 12, 2025

I think we'll then cache the {typename, second_validator} so at that point we're getting no benefit from caching...

Yes, if you're going back and forth, then the current caching won't help.

I'm not that familiar with all the use cases of LLDB. In my thinking, you'd break in the client, evaluate multiple variables and then break in the server where you'd also evaluate multiple variables. My assumption would be that some of these have the same type. In that case you'd get two misses but all other requests would be cached.

Zooming out a bit more: what's the concrete use-case where this would be a problem? As explained above, having two types with the same name but different layouts seems really rare. Moreover, you'd need to be debugging both at the same time to run into this. And even then, it's not a logic bug but a performance bug.

Maybe you can do it a unit test w/o going that far?

I'll try to do this.

@labath
Copy link
Collaborator

labath commented Jun 12, 2025

A low-tech way to support both STL versions would be to do a dispatch inside the data formatter:

StdStringSummaryProvider(value) {
  if (IsGnuSTL(value))
    return GnuStringSummaryProvider(value);
  return MSVCStringSummaryProvider(value);
}

It's not a complete solution as it doesn't allow for a distributed setup (vendor A writes a formatter for type Foo, vendor B writes a formatter for its type Foo, user C needs to debug both Foo types, so it installs both formatters), but for formatters that are a part of lldb (i.e. STL types), that should work fine. And maybe STL types are the only place where this matters?

@jimingham
Copy link
Collaborator

jimingham commented Jun 12, 2025 via email

@Nerixyz
Copy link
Contributor Author

Nerixyz commented Jun 13, 2025

I've described a use case a couple times already. If you have a client-server architecture, it would be really nice to be able to debug both the client and the server in the same lldb process, because then you can add tooling on top of lldb to do things like "step in across remote protocol dispatch". And in client-server systems it is often the case that the client and the server run on different OS-es. Jim

Sorry if I sounded mean before, that wasn't my intention at all. I assumed you meant just using lldb-server, but with this explanation it makes sense why you'd debug both and now I can see why you run into this.

What do you think about caching at most two instances per kind (summary/snythetic/formatter) per entry? The two isn't from "client&server" but from having at most two different ones (gnu&msvc).

@Nerixyz
Copy link
Contributor Author

Nerixyz commented Jun 13, 2025

A low-tech way to support both STL versions would be to do a dispatch inside the data formatter

For summaries this would work, but it Gerts more involved for synthetic children providers, where you'd need to check for the kind in every method.

Though now that I think about it, couldn't I inspect the type before creating the frontend for a synthetic provider and create the appropriate frontend?

@labath
Copy link
Collaborator

labath commented Jun 13, 2025

Though now that I think about it, couldn't I inspect the type before creating the frontend for a synthetic provider and create the appropriate frontend?

That is what I was imagining, yes.

@Nerixyz
Copy link
Contributor Author

Nerixyz commented Jun 17, 2025

Thank you for the feedback!

Closing this, as it's not needed to allow different summary/synthetic providers for one type name.

@Nerixyz Nerixyz closed this Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants