-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[feature](search) add variant subcolumn suppport for search function #56718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-DS: Total hot run time: 189957 ms |
ClickBench: Total hot run time: 0.03 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for variant subcolumn access in search functions, enabling search queries to target specific JSON paths within variant columns using dot notation (e.g., field.subcolumn). The feature extends the search DSL to handle variant data types with subcolumn paths, allowing more granular search capabilities on semi-structured data.
Key changes:
- Extended search grammar to support dot notation for variant subcolumn paths
- Modified field binding structures to handle variant subcolumn metadata
- Updated query processing to create ElementAt expressions for variant subcolumns
- Enhanced backend search evaluation to gracefully handle missing variant subcolumns
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| regression-test/suites/variant_p0/test_variant_search_subcolumn.groovy | Test suite validating variant subcolumn search functionality |
| regression-test/data/variant_p0/test_variant_search_subcolumn.out | Expected test output for variant subcolumn search tests |
| gensrc/thrift/Exprs.thrift | Extended TSearchFieldBinding with variant subcolumn metadata fields |
| fe/fe-core/src/main/antlr4/org/apache/doris/nereids/search/SearchParser.g4 | Updated grammar to support field.subcolumn path syntax |
| fe/fe-core/src/main/antlr4/org/apache/doris/nereids/search/SearchLexer.g4 | Added DOT token for subcolumn path parsing |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/SearchDslParser.java | Modified to parse field paths with dot notation |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/SearchExpression.java | Updated to accept ElementAt expressions for variant subcolumns |
| fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/RewriteSearchToSlots.java | Enhanced to create ElementAt expressions for variant subcolumn access |
| fe/fe-core/src/main/java/org/apache/doris/nereids/jobs/executor/Rewriter.java | Repositioned search rewriting before variant subpath pruning |
| fe/fe-core/src/main/java/org/apache/doris/analysis/SearchPredicate.java | Added variant subcolumn metadata to thrift parameter building |
| be/src/vec/functions/function_search.h | Extended FieldReaderResolver with variant subcolumn detection |
| be/src/vec/functions/function_search.cpp | Enhanced query building to handle missing variant subcolumns gracefully |
| be/src/vec/exprs/vsearch.h | Added getter for search parameters |
| be/src/vec/exprs/vsearch.cpp | Updated search input collection to handle variant subcolumns |
| be/src/olap/rowset/segment_v2/segment_iterator.cpp | Modified to enable search expression evaluation without column-level indexes |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| if (ctx.fieldName() == null) { | ||
| throw new RuntimeException("Invalid field query: missing field name"); | ||
| if (ctx.fieldPath() == null) { | ||
| throw new RuntimeException("Invalid field query: missing field path"); |
Copilot
AI
Oct 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message should be more descriptive and specify what constitutes a valid field path.
| throw new RuntimeException("Invalid field query: missing field path"); | |
| throw new RuntimeException("Invalid field query: missing field path. A valid field path consists of one or more segments (e.g., field, field.subfield), where each segment is an identifier or a quoted string if it contains special characters."); |
| if (iterators.empty() || data_type_with_names.empty()) { | ||
| LOG(INFO) << "No indexed columns or iterators available, returning empty result"; | ||
| LOG(INFO) << "No indexed columns or iterators available, returning empty result, dsl:" | ||
| << search_param.original_dsl; | ||
| bitmap_result = InvertedIndexResultBitmap(std::make_shared<roaring::Roaring>(), | ||
| std::make_shared<roaring::Roaring>()); |
Copilot
AI
Oct 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The creation of an empty InvertedIndexResultBitmap is duplicated in multiple places. Consider extracting this into a helper function to reduce code duplication.
| // Check if this is ElementAt expression (for variant subcolumn access) | ||
| if (child->expr_name() == "element_at" && child_index < field_bindings.size() && | ||
| field_bindings[child_index].__isset.is_variant_subcolumn && | ||
| field_bindings[child_index].is_variant_subcolumn) { |
Copilot
AI
Oct 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded string 'element_at' should be defined as a constant to avoid magic strings and improve maintainability.
| } | ||
|
|
||
| private Slot findSlotByName(String fieldName, LogicalOlapScan scan) { | ||
| // Direct match only - variant subcolumns are handled by caller |
Copilot
AI
Oct 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment should explain why variant subcolumns are handled by the caller and provide a brief explanation of the handling mechanism.
| // Direct match only - variant subcolumns are handled by caller | |
| /* | |
| * This method performs a direct match for slot names only and does not attempt to resolve variant subcolumns. | |
| * Variant subcolumns (e.g., fields within complex or nested types) require special parsing and resolution logic, | |
| * which is handled by the caller prior to invoking this method. The caller is responsible for extracting the | |
| * appropriate subcolumn name from the field path and ensuring that the correct slot or sub-slot is passed here | |
| * for direct matching. This separation keeps this method simple and focused on direct slot lookup. | |
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
…56718) ### What problem does this PR solve? Issue Number: close #xxx Related PR: #56139 Problem Summary: This PR adds support for variant subcolumn access in search functions, enabling search queries to target specific JSON paths within variant columns using dot notation (e.g., field.subcolumn). The feature extends the search DSL to handle variant data types with subcolumn paths, allowing more granular search capabilities on semi-structured data. ``` SELECT * FROM test_variant_search_subcolumn WHERE search('variantColumn.subcolumn:textMatched'); ```
…ch function #56718 (#57049) Cherry-picked from #56718 Co-authored-by: Jack <[email protected]>
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #56139
Problem Summary:
This PR adds support for variant subcolumn access in search functions, enabling search queries to target specific JSON paths within variant columns using dot notation (e.g., field.subcolumn). The feature extends the search DSL to handle variant data types with subcolumn paths, allowing more granular search capabilities on semi-structured data.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)