-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[LLDB] Add optional callback function to TypeMatcher
#143748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-lldb Author: nerix (Nerixyz) ChangesTo be able to have multiple formatters that match on the same type name but for different types, this adds a callback that can be specified, which is called with the match candidate. As mentlerd pointed out, it's similar to passing The only use I currently see here is for the C++ STL, specifically libstdc++ and MSVC's STL. One potential issue of adding the callback here is that the summary/synthetic will be cached for a type name - disregarding the callback. The scenario here would be two shared libraries that use a different type with the same name and two formatters (e.g. they use two different standard libraries). Once the first type was formatted, its formatter is cached. Now, the second type will always be formatted with the wrong formatter (the cached one), even though the callback for the cached one wouldn't match. Full diff: https://github.com/llvm/llvm-project/pull/143748.diff 3 Files Affected:
diff --git a/lldb/include/lldb/DataFormatters/FormatClasses.h b/lldb/include/lldb/DataFormatters/FormatClasses.h
index 89d74649ecc64..16685594db663 100644
--- a/lldb/include/lldb/DataFormatters/FormatClasses.h
+++ b/lldb/include/lldb/DataFormatters/FormatClasses.h
@@ -149,13 +149,16 @@ class FormattersMatchData {
CandidateLanguagesVector m_candidate_languages;
};
+using CxxTypeCandidateMatchFn = bool(const FormattersMatchCandidate &);
+
class TypeNameSpecifierImpl {
public:
TypeNameSpecifierImpl() = default;
TypeNameSpecifierImpl(llvm::StringRef name,
- lldb::FormatterMatchType match_type)
- : m_match_type(match_type) {
+ lldb::FormatterMatchType match_type,
+ CxxTypeCandidateMatchFn *match_fn = nullptr)
+ : m_match_type(match_type), m_match_fn(match_fn) {
m_type.m_type_name = std::string(name);
}
@@ -188,6 +191,8 @@ class TypeNameSpecifierImpl {
return CompilerType();
}
+ CxxTypeCandidateMatchFn *GetCxxMatchFunction() const { return m_match_fn; }
+
lldb::FormatterMatchType GetMatchType() { return m_match_type; }
bool IsRegex() { return m_match_type == lldb::eFormatterMatchRegex; }
@@ -200,6 +205,7 @@ class TypeNameSpecifierImpl {
CompilerType m_compiler_type;
};
TypeOrName m_type;
+ CxxTypeCandidateMatchFn *m_match_fn = nullptr;
TypeNameSpecifierImpl(const TypeNameSpecifierImpl &) = delete;
const TypeNameSpecifierImpl &
diff --git a/lldb/include/lldb/DataFormatters/FormattersContainer.h b/lldb/include/lldb/DataFormatters/FormattersContainer.h
index f7465fc65538d..2cf33347b3c53 100644
--- a/lldb/include/lldb/DataFormatters/FormattersContainer.h
+++ b/lldb/include/lldb/DataFormatters/FormattersContainer.h
@@ -50,6 +50,11 @@ class TypeMatcher {
/// - eFormatterMatchCallback: run the function in m_name to decide if a type
/// matches or not.
lldb::FormatterMatchType m_match_type;
+ /// An optional additional check for a FormattersMatchCandidate.
+ ///
+ /// If set, the name/regex is still checked first and this function will be
+ /// called if the name matches.
+ CxxTypeCandidateMatchFn *m_match_fn = nullptr;
// if the user tries to add formatters for, say, "struct Foo" those will not
// match any type because of the way we strip qualifiers from typenames this
@@ -86,32 +91,19 @@ class TypeMatcher {
/// name specifier.
TypeMatcher(lldb::TypeNameSpecifierImplSP type_specifier)
: m_name(type_specifier->GetName()),
- m_match_type(type_specifier->GetMatchType()) {
+ m_match_type(type_specifier->GetMatchType()),
+ m_match_fn(type_specifier->GetCxxMatchFunction()) {
if (m_match_type == lldb::eFormatterMatchRegex)
m_type_name_regex = RegularExpression(type_specifier->GetName());
}
/// True iff this matches the given type.
bool Matches(FormattersMatchCandidate candidate_type) const {
- ConstString type_name = candidate_type.GetTypeName();
- switch (m_match_type) {
- case lldb::eFormatterMatchExact:
- return m_name == type_name ||
- StripTypeName(m_name) == StripTypeName(type_name);
- case lldb::eFormatterMatchRegex:
- return m_type_name_regex.Execute(type_name.GetStringRef());
- case lldb::eFormatterMatchCallback:
- // CommandObjectType{Synth,Filter}Add tries to prevent the user from
- // creating both a synthetic child provider and a filter for the same type
- // in the same category, but we don't have a type object at that point, so
- // it creates a dummy candidate without type or script interpreter.
- // Skip callback matching in these cases.
- if (candidate_type.GetScriptInterpreter())
- return candidate_type.GetScriptInterpreter()->FormatterCallbackFunction(
- m_name.AsCString(),
- std::make_shared<TypeImpl>(candidate_type.GetType()));
- }
- return false;
+ if (!MatchesName(candidate_type))
+ return false;
+ if (m_match_fn && !m_match_fn(candidate_type))
+ return false;
+ return true;
}
lldb::FormatterMatchType GetMatchType() const { return m_match_type; }
@@ -134,7 +126,31 @@ class TypeMatcher {
/// (lldb) type summary add --summary-string \"A\" -x TypeName
/// (lldb) type summary delete TypeName
bool CreatedBySameMatchString(TypeMatcher other) const {
- return GetMatchString() == other.GetMatchString();
+ return GetMatchString() == other.GetMatchString() &&
+ m_match_fn == other.m_match_fn;
+ }
+
+private:
+ bool MatchesName(FormattersMatchCandidate candidate_type) const {
+ ConstString type_name = candidate_type.GetTypeName();
+ switch (m_match_type) {
+ case lldb::eFormatterMatchExact:
+ return m_name == type_name ||
+ StripTypeName(m_name) == StripTypeName(type_name);
+ case lldb::eFormatterMatchRegex:
+ return m_type_name_regex.Execute(type_name.GetStringRef());
+ case lldb::eFormatterMatchCallback:
+ // CommandObjectType{Synth,Filter}Add tries to prevent the user from
+ // creating both a synthetic child provider and a filter for the same type
+ // in the same category, but we don't have a type object at that point, so
+ // it creates a dummy candidate without type or script interpreter.
+ // Skip callback matching in these cases.
+ if (candidate_type.GetScriptInterpreter())
+ return candidate_type.GetScriptInterpreter()->FormatterCallbackFunction(
+ m_name.AsCString(),
+ std::make_shared<TypeImpl>(candidate_type.GetType()));
+ }
+ return false;
}
};
diff --git a/lldb/unittests/DataFormatter/FormattersContainerTest.cpp b/lldb/unittests/DataFormatter/FormattersContainerTest.cpp
index 41b01adfb9ecd..a8e9c1a1cccc0 100644
--- a/lldb/unittests/DataFormatter/FormattersContainerTest.cpp
+++ b/lldb/unittests/DataFormatter/FormattersContainerTest.cpp
@@ -165,3 +165,67 @@ TEST(TypeMatcherTests, CreatedBySameMatchStringExactNamePrefixes) {
EXPECT_TRUE(without_prefix.CreatedBySameMatchString(without_prefix));
}
}
+
+TEST(TypeMatcherTests, CxxTypeCandidateMatchTest) {
+ auto alwaysTrue = +[](const FormattersMatchCandidate &) { return true; };
+ auto alwaysFalse = +[](const FormattersMatchCandidate &) { return false; };
+
+ TypeMatcher regex_matcher_true(std::make_shared<TypeNameSpecifierImpl>(
+ "^a[a-z]c$", eFormatterMatchRegex, alwaysTrue));
+ EXPECT_TRUE(regex_matcher_true.Matches(CandidateFromTypeName("abc")));
+ EXPECT_TRUE(regex_matcher_true.Matches(CandidateFromTypeName("azc")));
+ EXPECT_FALSE(regex_matcher_true.Matches(CandidateFromTypeName("a")));
+
+ TypeMatcher regex_matcher_false(std::make_shared<TypeNameSpecifierImpl>(
+ "^a[a-z]c$", eFormatterMatchRegex, alwaysFalse));
+ EXPECT_FALSE(regex_matcher_false.Matches(CandidateFromTypeName("abc")));
+ EXPECT_FALSE(regex_matcher_false.Matches(CandidateFromTypeName("azc")));
+ EXPECT_FALSE(regex_matcher_false.Matches(CandidateFromTypeName("a")));
+
+ TypeMatcher string_matcher_true(std::make_shared<TypeNameSpecifierImpl>(
+ "abc", eFormatterMatchExact, alwaysTrue));
+ EXPECT_TRUE(string_matcher_true.Matches(CandidateFromTypeName("abc")));
+ EXPECT_FALSE(string_matcher_true.Matches(CandidateFromTypeName("azc")));
+ EXPECT_FALSE(string_matcher_true.Matches(CandidateFromTypeName("a")));
+
+ TypeMatcher string_matcher_false(std::make_shared<TypeNameSpecifierImpl>(
+ "abc", eFormatterMatchExact, alwaysFalse));
+ EXPECT_FALSE(string_matcher_false.Matches(CandidateFromTypeName("abc")));
+ EXPECT_FALSE(string_matcher_false.Matches(CandidateFromTypeName("azc")));
+ EXPECT_FALSE(string_matcher_false.Matches(CandidateFromTypeName("a")));
+
+ EXPECT_TRUE(regex_matcher_true.CreatedBySameMatchString(regex_matcher_true));
+ EXPECT_FALSE(
+ regex_matcher_true.CreatedBySameMatchString(regex_matcher_false));
+ EXPECT_FALSE(
+ regex_matcher_true.CreatedBySameMatchString(string_matcher_true));
+ EXPECT_FALSE(
+ regex_matcher_true.CreatedBySameMatchString(string_matcher_false));
+
+ EXPECT_FALSE(
+ regex_matcher_false.CreatedBySameMatchString(regex_matcher_true));
+ EXPECT_TRUE(
+ regex_matcher_false.CreatedBySameMatchString(regex_matcher_false));
+ EXPECT_FALSE(
+ regex_matcher_false.CreatedBySameMatchString(string_matcher_true));
+ EXPECT_FALSE(
+ regex_matcher_false.CreatedBySameMatchString(string_matcher_false));
+
+ EXPECT_FALSE(
+ string_matcher_true.CreatedBySameMatchString(regex_matcher_true));
+ EXPECT_FALSE(
+ string_matcher_true.CreatedBySameMatchString(regex_matcher_false));
+ EXPECT_TRUE(
+ string_matcher_true.CreatedBySameMatchString(string_matcher_true));
+ EXPECT_FALSE(
+ string_matcher_true.CreatedBySameMatchString(string_matcher_false));
+
+ EXPECT_FALSE(
+ string_matcher_false.CreatedBySameMatchString(regex_matcher_true));
+ EXPECT_FALSE(
+ string_matcher_false.CreatedBySameMatchString(regex_matcher_false));
+ EXPECT_FALSE(
+ string_matcher_false.CreatedBySameMatchString(string_matcher_true));
+ EXPECT_TRUE(
+ string_matcher_false.CreatedBySameMatchString(string_matcher_false));
+}
|
We would very much like to support debug sessions with two targets on different systems. For instance, imagine debugging a Windows client on the host system, and a Linux server running on another machine. There's all sorts of opportunities to do coordinated debugging between server and client, so we really want to support this scenario. But it shouldn't be that hard to support this. When you are about to use a cached formatter, just run the validator function from that formatter on the candidate VO. If it doesn't pass the validator, fall back to the full search. Running the validator should be much quicker than a full type search, so you still get a lot of the benefit of caching, but you won't use the cache when it isn't appropriate. |
You could even make the cache hold the "match string" and a LIST of matching formatters. If a VO passes the typename string match, then we run through the list of matching formatters and choose the first one whose validator accepts this VO. That way we could have both the STL version formatters in the cache but never choose the wrong one when dealing with heterogeneous targets. |
Another way to do this would be to make the data formatters held per target not per debugger. But that's a much bigger change. |
I think for this, it's easier to save the validator per formatter/summary/synthetic, because we need the validator when retrieving from the cache, right? That shouldn't be a big issue, just a refactoring.
I think in almost all cases, having one entry in the cache is better. Sure, there might be scenarios where you have different types with the same name, but realistically, when you're formatting you're formatting (e.g. at a breakpoint), you'll most likely have one type per name. |
d1a9078
to
fc36d22
Compare
It's still missing tests (and still has the wrong commit message), but I updated it to validate in the cache too. I also updated #143177 where it's implemented for std::string. |
If you have two Targets in lldb, and they each have a variant of some type that you've discriminated by a TypeValidator, then it's quite possible that Target A and Target B hit breakpoints in a location that contains a local of the given type in sequence. For instance, if I were trying to follow some RPC from client to server and back, I'd be stopping with the "client" view of the message first, then stopping on receipt with the "server" view of the message, etc. So one {typename, validator} pair will get set in the cache when you first stop in the client. Then when you stop in the server, we'll answer that the type is not cached because the validator fails. I think we'll then cache the {typename, second_validator} so at that point we're getting no benefit from caching... |
Also, when you go to write tests be sure to write a test with this case. If there were a way to add the validator from Python you could do this pretty easily. Maybe you can do it a unit test w/o going that far? |
Yes, if you're going back and forth, then the current caching won't help. I'm not that familiar with all the use cases of LLDB. In my thinking, you'd break in the client, evaluate multiple variables and then break in the server where you'd also evaluate multiple variables. My assumption would be that some of these have the same type. In that case you'd get two misses but all other requests would be cached. Zooming out a bit more: what's the concrete use-case where this would be a problem? As explained above, having two types with the same name but different layouts seems really rare. Moreover, you'd need to be debugging both at the same time to run into this. And even then, it's not a logic bug but a performance bug.
I'll try to do this. |
A low-tech way to support both STL versions would be to do a dispatch inside the data formatter:
It's not a complete solution as it doesn't allow for a distributed setup (vendor A writes a formatter for type |
On Jun 12, 2025, at 2:36 AM, nerix ***@***.***> wrote:
Nerixyz
left a comment
(llvm/llvm-project#143748)
<#143748 (comment)>
I think we'll then cache the {typename, second_validator} so at that point we're getting no benefit from caching...
Yes, if you're going back and forth, then the current caching won't help.
I'm not that familiar with all the use cases of LLDB. In my thinking, you'd break in the client, evaluate multiple variables and then break in the server where you'd also evaluate multiple variables. My assumption would be that some of these have the same type. In that case you'd get two misses but all other requests would be cached.
Zooming out a bit more: what's the concrete use-case where this would be a problem? As explained above, having two types with the same name but different layouts seems really rare. Moreover, you'd need to be debugging both at the same time to run into this. And even then, it's not a logic bug but a performance bug.
I've described a use case a couple times already. If you have a client-server architecture, it would be really nice to be able to debug both the client and the server in the same lldb process, because then you can add tooling on top of lldb to do things like "step in across remote protocol dispatch". And in client-server systems it is often the case that the client and the server run on different OS-es.
Jim
… Maybe you can do it a unit test w/o going that far?
I'll try to do this.
—
Reply to this email directly, view it on GitHub <#143748 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW6JZMPGUK4XCV5INZL3DFCZRAVCNFSM6AAAAAB7DDQ4KCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSNRVHA3TSOJYGI>.
You are receiving this because you are on a team that was mentioned.
|
Sorry if I sounded mean before, that wasn't my intention at all. I assumed you meant just using lldb-server, but with this explanation it makes sense why you'd debug both and now I can see why you run into this. What do you think about caching at most two instances per kind (summary/snythetic/formatter) per entry? The two isn't from "client&server" but from having at most two different ones (gnu&msvc). |
For summaries this would work, but it Gerts more involved for synthetic children providers, where you'd need to check for the kind in every method. Though now that I think about it, couldn't I inspect the type before creating the frontend for a synthetic provider and create the appropriate frontend? |
That is what I was imagining, yes. |
Thank you for the feedback! Closing this, as it's not needed to allow different summary/synthetic providers for one type name. |
To be able to have multiple formatters that match on the same type name but for different types, this adds a callback that can be specified, which is called with the match candidate. As mentlerd pointed out, it's similar to passing
--recognizer-function
. This isn't implemented as another match type, because theTypeMatcher
still needs to have some name forGetMatchString
.The only use I currently see here is for the C++ STL, specifically libstdc++ and MSVC's STL.
In #143177, I used this to add formatters for
std::string
and friends from MSVC's STL.One potential issue of adding the callback here is that the summary/synthetic will be cached for a type name - disregarding the callback. The scenario here would be two shared libraries that use a different type with the same name and two formatters (e.g. they use two different standard libraries). Once the first type was formatted, its formatter is cached. Now, the second type will always be formatted with the wrong formatter (the cached one), even though the callback for the cached one wouldn't match.
For one,
--recognizer-function
has the same issue and secondly, I don't think this is really realistic.