-
Notifications
You must be signed in to change notification settings - Fork 25.4k
ES|QL: Fix Fork field reference tracking #131723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
9f24a18
594d76a
69fff05
b64ec91
666382c
15480b6
96951c7
8a98b45
bb7082d
a20fb5b
b159bd7
3b7f546
f9f26d4
917b273
cb412e6
c5564f0
7d67986
2479bfa
3ebe919
482fabe
300f169
6378f76
f337b02
82f189d
199392f
b7a7854
5dd7462
98aa887
7bc4a87
b017321
d14bf24
2931d83
82672be
94da34b
9606c04
3f55419
00a885a
4313222
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
pr: 131723 | ||
summary: Tests for FORK's evaluation of field names used in `field_caps` resolve calls | ||
area: Search | ||
type: bug | ||
issues: [] |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -49,6 +49,7 @@ | |
import java.util.HashSet; | ||
import java.util.List; | ||
import java.util.Set; | ||
import java.util.function.BiConsumer; | ||
import java.util.stream.Collectors; | ||
|
||
import static org.elasticsearch.xpack.esql.core.util.StringUtils.WILDCARD; | ||
|
@@ -76,11 +77,6 @@ public static PreAnalysisResult resolveFieldNames(LogicalPlan parsed, EnrichReso | |
return new PreAnalysisResult(enrichResolution, IndexResolver.ALL_FIELDS, Set.of()); | ||
} | ||
|
||
// TODO: Improve field resolution for FORK - right now we request all fields | ||
if (parsed.anyMatch(p -> p instanceof Fork)) { | ||
return new PreAnalysisResult(enrichResolution, IndexResolver.ALL_FIELDS, Set.of()); | ||
} | ||
|
||
Holder<Boolean> projectAll = new Holder<>(false); | ||
parsed.forEachExpressionDown(UnresolvedStar.class, us -> {// explicit "*" fields selection | ||
if (projectAll.get()) { | ||
|
@@ -93,7 +89,7 @@ public static PreAnalysisResult resolveFieldNames(LogicalPlan parsed, EnrichReso | |
return new PreAnalysisResult(enrichResolution, IndexResolver.ALL_FIELDS, Set.of()); | ||
} | ||
|
||
var referencesBuilder = AttributeSet.builder(); | ||
var referencesBuilder = new Holder<>(AttributeSet.builder()); | ||
// "keep" and "drop" attributes are special whenever a wildcard is used in their name, as the wildcard can cover some | ||
// attributes ("lookup join" generated columns among others); steps like removal of Aliases should ignore fields matching the | ||
// wildcards. | ||
|
@@ -110,19 +106,49 @@ public static PreAnalysisResult resolveFieldNames(LogicalPlan parsed, EnrichReso | |
// lookup indices where we request "*" because we may require all their fields | ||
Set<String> wildcardJoinIndices = new java.util.HashSet<>(); | ||
|
||
boolean[] canRemoveAliases = new boolean[] { true }; | ||
var canRemoveAliases = new Holder<>(true); | ||
|
||
var forEachDownProcessor = new Holder<BiConsumer<LogicalPlan, Holder<Boolean>>>(); | ||
forEachDownProcessor.set((LogicalPlan p, Holder<Boolean> breakEarly) -> {// go over each plan top-down | ||
if (p instanceof Fork fork) { | ||
// Early return from forEachDown. We will iterate over the children manually and end the recursion via forEachDown early. | ||
var forkRefsResult = AttributeSet.builder(); | ||
forkRefsResult.addAll(referencesBuilder.get()); | ||
|
||
for (var forkBranch : fork.children()) { | ||
referencesBuilder.set(AttributeSet.builder()); | ||
var isNestedFork = forkBranch.forEachDownMayReturnEarly(forEachDownProcessor.get()); | ||
// This assert is just for good measure. FORKs within FORKs is yet not supported. | ||
assert isNestedFork == false : "Nested FORKs are not yet supported"; | ||
// This is a safety measure for fork where the list of fields returned is empty. | ||
// It can be empty for a branch that does need all the fields. For example "fork (where true) (where a is not null)" | ||
// but it can also be empty for queries where NO fields are needed from ES, | ||
// for example "fork (eval x = 1 | keep x) (eval y = 1 | keep y)" but we cannot establish this yet. | ||
if (referencesBuilder.get().isEmpty()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the logic behind this check? Why would an empty There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See below on line 232:
This has the comment
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure that's correct. The logic at line 232 is about queries like this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree that the logic seems pretty confusing, but at the same time I believe the statement you make regarding the fields is only partially true. Consider the following query with match.
Consider the "stats" stage which looks like this
It has no referenced columns (p.references() is empty). Thus by the logic of the comment I mentioned above we should retain all columns. This information needs to be propagated back to the caller (the Fork). Thus if any of the Fork's children need all the fields, the we can break out early and just retain all fields. I can try to make this a bit clear in the following way. The lambda can return a pair <bool, ColumnSet>. The bool can indicate if we need to retain all columns. How does this sound? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not about the lambda here, but about the logic of the field name resolution for I have tested this query
and with the logic as Thank you for bringing up the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So I have added the query you have in your comment as a test, and it returns *. I'm not following if there is another change your are proposing or you simply have a question/observation about the existing (pre-fork) code? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not following this part. I'm not sure I understand how fork behaves differently here... If we end up with empty set for resolved fields, we will convert to _index, no? In that regard this behavior is the same as it should be, but maybe I'm missing something in your explanation. |
||
projectAll.set(true); | ||
// Return early, we'll be returning all references no matter what the remainder of the query is. | ||
breakEarly.set(true); | ||
return; | ||
} | ||
forkRefsResult.addAll(referencesBuilder.get()); | ||
} | ||
|
||
forkRefsResult.removeIf(attr -> attr.name().equals(Fork.FORK_FIELD)); | ||
referencesBuilder.set(forkRefsResult); | ||
|
||
parsed.forEachDown(p -> {// go over each plan top-down | ||
if (p instanceof RegexExtract re) { // for Grok and Dissect | ||
// Return early, we've already explored all fork branches. | ||
breakEarly.set(true); | ||
return; | ||
} else if (p instanceof RegexExtract re) { // for Grok and Dissect | ||
// keep the inputs needed by Grok/Dissect | ||
referencesBuilder.addAll(re.input().references()); | ||
referencesBuilder.get().addAll(re.input().references()); | ||
} else if (p instanceof Enrich enrich) { | ||
AttributeSet enrichFieldRefs = Expressions.references(enrich.enrichFields()); | ||
AttributeSet.Builder enrichRefs = enrichFieldRefs.combine(enrich.matchField().references()).asBuilder(); | ||
// Enrich adds an EmptyAttribute if no match field is specified | ||
// The exact name of the field will be added later as part of enrichPolicyMatchFields Set | ||
enrichRefs.removeIf(attr -> attr instanceof EmptyAttribute); | ||
referencesBuilder.addAll(enrichRefs); | ||
referencesBuilder.get().addAll(enrichRefs); | ||
} else if (p instanceof LookupJoin join) { | ||
if (join.config().type() instanceof JoinTypes.UsingJoinType usingJoinType) { | ||
joinRefs.addAll(usingJoinType.columns()); | ||
|
@@ -135,15 +161,15 @@ public static PreAnalysisResult resolveFieldNames(LogicalPlan parsed, EnrichReso | |
joinRefs.addAll(keepRefs); | ||
} | ||
} else { | ||
referencesBuilder.addAll(p.references()); | ||
referencesBuilder.get().addAll(p.references()); | ||
if (p instanceof UnresolvedRelation ur && ur.indexMode() == IndexMode.TIME_SERIES) { | ||
// METRICS aggs generally rely on @timestamp without the user having to mention it. | ||
referencesBuilder.add(new UnresolvedAttribute(ur.source(), MetadataAttribute.TIMESTAMP_FIELD)); | ||
referencesBuilder.get().add(new UnresolvedAttribute(ur.source(), MetadataAttribute.TIMESTAMP_FIELD)); | ||
} | ||
// special handling for UnresolvedPattern (which is not an UnresolvedAttribute) | ||
p.forEachExpression(UnresolvedNamePattern.class, up -> { | ||
var ua = new UnresolvedAttribute(up.source(), up.name()); | ||
referencesBuilder.add(ua); | ||
referencesBuilder.get().add(ua); | ||
if (p instanceof Keep) { | ||
keepRefs.add(ua); | ||
} else if (p instanceof Drop) { | ||
|
@@ -168,10 +194,10 @@ public static PreAnalysisResult resolveFieldNames(LogicalPlan parsed, EnrichReso | |
// | ||
// and ips_policy enriches the results with the same name ip field), | ||
// these aliases should be kept in the list of fields. | ||
if (canRemoveAliases[0] && p.anyMatch(FieldNameUtils::couldOverrideAliases)) { | ||
canRemoveAliases[0] = false; | ||
if (canRemoveAliases.get() && p.anyMatch(FieldNameUtils::couldOverrideAliases)) { | ||
canRemoveAliases.set(false); | ||
} | ||
if (canRemoveAliases[0]) { | ||
if (canRemoveAliases.get()) { | ||
// remove any already discovered UnresolvedAttributes that are in fact aliases defined later down in the tree | ||
// for example "from test | eval x = salary | stats max = max(x) by gender" | ||
// remove the UnresolvedAttribute "x", since that is an Alias defined in "eval" | ||
|
@@ -187,21 +213,25 @@ public static PreAnalysisResult resolveFieldNames(LogicalPlan parsed, EnrichReso | |
if (fieldNames.contains(ne.name())) { | ||
return; | ||
} | ||
referencesBuilder.removeIf( | ||
attr -> matchByName(attr, ne.name(), keepRefs.contains(attr) || dropWildcardRefs.contains(attr)) | ||
); | ||
referencesBuilder.get() | ||
.removeIf(attr -> matchByName(attr, ne.name(), keepRefs.contains(attr) || dropWildcardRefs.contains(attr))); | ||
}); | ||
} | ||
}); | ||
parsed.forEachDownMayReturnEarly(forEachDownProcessor.get()); | ||
|
||
if (projectAll.get()) { | ||
return new PreAnalysisResult(enrichResolution, IndexResolver.ALL_FIELDS, Set.of()); | ||
} | ||
|
||
// Add JOIN ON column references afterward to avoid Alias removal | ||
referencesBuilder.addAll(joinRefs); | ||
referencesBuilder.get().addAll(joinRefs); | ||
// If any JOIN commands need wildcard field-caps calls, persist the index names | ||
|
||
// remove valid metadata attributes because they will be filtered out by the IndexResolver anyway | ||
// otherwise, in some edge cases, we will fail to ask for "*" (all fields) instead | ||
referencesBuilder.removeIf(a -> a instanceof MetadataAttribute || MetadataAttribute.isSupported(a.name())); | ||
Set<String> fieldNames = referencesBuilder.build().names(); | ||
referencesBuilder.get().removeIf(a -> a instanceof MetadataAttribute || MetadataAttribute.isSupported(a.name())); | ||
Set<String> fieldNames = referencesBuilder.get().build().names(); | ||
|
||
if (fieldNames.isEmpty() && enrichPolicyMatchFields.isEmpty()) { | ||
// there cannot be an empty list of fields, we'll ask the simplest and lightest one instead: _index | ||
|
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quoting the previous comment from @astefan from here #131723 (comment) here since it became outdated.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your feedback. To my understanding you are unhappy with the API I originally had which had a lambda returning true/false to indicate early exit. As you pointed out, existing code needs to be aware of the correct default value (true) to return if it does not with to return the traverse early. This is fair criticism.
I've made a small tweak below. Essentially instead of returning true/false, I pass a boolean holder. This holder needs to be updated to true only if the lambda needs the traverse to end early, by default no action is needed. Thus only the fork code is aware of the need to exit early and the non-fork code stays as is.
I think this should satisfy the criteria you mentioned. I'd be happy to iterate further as needed.