Skip to content

Conversation

@airborne12
Copy link
Member

@airborne12 airborne12 commented Oct 8, 2025

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #56139

Problem Summary:
This PR adds support for variant subcolumn access in search functions, enabling search queries to target specific JSON paths within variant columns using dot notation (e.g., field.subcolumn). The feature extends the search DSL to handle variant data types with subcolumn paths, allowing more granular search capabilities on semi-structured data.

SELECT  * FROM test_variant_search_subcolumn  WHERE search('variantColumn.subcolumn:textMatched');

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@airborne12
Copy link
Member Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 83.35% (1632/1958)
Line Coverage 67.89% (28847/42488)
Region Coverage 68.15% (14226/20876)
Branch Coverage 58.43% (7572/12960)

@doris-robot
Copy link

TPC-DS: Total hot run time: 189957 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b48920d30d5842d950168e97e2468d24f6b00a4f, data reload: false

query1	1058	448	416	416
query2	6555	1759	1702	1702
query3	6747	223	217	217
query4	26805	23956	23433	23433
query5	5378	625	513	513
query6	359	250	232	232
query7	4652	502	300	300
query8	309	272	251	251
query9	8698	2551	2568	2551
query10	511	329	285	285
query11	15514	15121	14826	14826
query12	171	122	137	122
query13	1666	560	430	430
query14	11123	9251	9141	9141
query15	200	191	175	175
query16	7677	718	517	517
query17	1172	745	610	610
query18	2088	483	353	353
query19	231	238	219	219
query20	144	133	136	133
query21	218	145	118	118
query22	4726	4751	4740	4740
query23	34628	33714	33795	33714
query24	8599	2486	2477	2477
query25	601	565	453	453
query26	1424	291	166	166
query27	2783	549	388	388
query28	4352	2222	2220	2220
query29	874	693	544	544
query30	339	250	209	209
query31	937	1062	769	769
query32	83	69	69	69
query33	609	405	338	338
query34	839	893	538	538
query35	860	922	819	819
query36	974	1033	908	908
query37	120	109	128	109
query38	3494	3513	3552	3513
query39	1479	1439	1416	1416
query40	215	132	120	120
query41	59	58	63	58
query42	126	111	111	111
query43	485	478	474	474
query44	1397	838	825	825
query45	185	180	171	171
query46	877	1004	648	648
query47	1777	1814	1754	1754
query48	409	421	325	325
query49	769	499	400	400
query50	667	694	409	409
query51	3941	3870	3929	3870
query52	107	107	98	98
query53	236	282	195	195
query54	597	585	526	526
query55	97	84	87	84
query56	318	313	300	300
query57	1170	1207	1128	1128
query58	282	271	272	271
query59	2553	2600	2508	2508
query60	349	338	320	320
query61	157	158	155	155
query62	812	746	693	693
query63	226	189	199	189
query64	4380	1171	875	875
query65	4075	3959	3982	3959
query66	1031	424	337	337
query67	15624	15308	15048	15048
query68	8146	953	588	588
query69	515	335	277	277
query70	1354	1284	1265	1265
query71	533	336	327	327
query72	6019	4934	4822	4822
query73	682	577	358	358
query74	8858	8931	8737	8737
query75	4081	3430	2815	2815
query76	3841	1167	750	750
query77	822	397	329	329
query78	9654	9728	8852	8852
query79	2213	838	596	596
query80	633	553	505	505
query81	502	340	225	225
query82	418	156	134	134
query83	277	266	242	242
query84	257	116	94	94
query85	863	468	445	445
query86	381	321	303	303
query87	3718	3759	3670	3670
query88	3628	2241	2196	2196
query89	390	337	305	305
query90	2020	213	217	213
query91	229	167	132	132
query92	81	64	68	64
query93	1876	991	638	638
query94	682	443	325	325
query95	399	321	323	321
query96	486	588	282	282
query97	2934	2980	2869	2869
query98	248	213	211	211
query99	1346	1405	1282	1282
Total cold run time: 280004 ms
Total hot run time: 189957 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 0.03 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b48920d30d5842d950168e97e2468d24f6b00a4f, data reload: false

query1	0.05	0.01	0.01
query2	0.12	0.00	0.01
query3	0.31	0.01	0.00
query4	1.74	0.00	0.01
query5	0.29	0.01	0.01
query6	1.68	0.00	0.01
query7	0.05	0.00	0.00
query8	0.08	0.00	0.01
query9	0.66	0.00	0.00
query10	0.60	0.01	0.01
query11	0.27	0.00	0.00
query12	0.26	0.00	0.01
query13	0.66	0.01	0.00
query14	1.05	0.00	0.00
query15	0.96	0.01	0.00
query16	0.38	0.00	0.00
query17	1.09	0.00	0.00
query18	0.24	0.00	0.00
query19	1.99	0.00	0.00
query20	0.03	0.00	0.00
query21	15.37	0.00	0.00
query22	5.71	0.00	0.00
query23	15.62	0.00	0.00
query24	1.80	0.00	0.00
query25	0.16	0.01	0.00
query26	0.19	0.00	0.00
query27	0.11	0.00	0.00
query28	1.80	0.00	0.00
query29	12.76	0.00	0.01
query30	0.35	0.01	0.00
query31	2.53	0.00	0.01
query32	6.33	0.00	0.00
query33	4.35	0.00	0.00
query34	8.74	0.01	0.00
query35	7.57	0.01	0.00
query36	0.66	0.00	0.00
query37	0.23	0.01	0.00
query38	0.21	0.00	0.00
query39	0.06	0.01	0.00
query40	0.21	0.00	0.00
query41	0.12	0.00	0.00
query42	0.09	0.00	0.00
query43	0.07	0.00	0.01
Total cold run time: 97.55 s
Total hot run time: 0.03 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 58.33% (49/84) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.49% (17711/33741)
Line Coverage 37.66% (160776/426911)
Region Coverage 32.15% (122789/381970)
Branch Coverage 33.54% (53852/160565)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 80.95% (68/84) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.18% (23537/33066)
Line Coverage 57.61% (245707/426494)
Region Coverage 52.86% (204473/386813)
Branch Coverage 54.59% (88094/161380)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 71.43% (40/56) 🎉
Increment coverage report
Complete coverage report

@airborne12 airborne12 requested a review from Copilot October 8, 2025 13:42
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for variant subcolumn access in search functions, enabling search queries to target specific JSON paths within variant columns using dot notation (e.g., field.subcolumn). The feature extends the search DSL to handle variant data types with subcolumn paths, allowing more granular search capabilities on semi-structured data.

Key changes:

  • Extended search grammar to support dot notation for variant subcolumn paths
  • Modified field binding structures to handle variant subcolumn metadata
  • Updated query processing to create ElementAt expressions for variant subcolumns
  • Enhanced backend search evaluation to gracefully handle missing variant subcolumns

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
regression-test/suites/variant_p0/test_variant_search_subcolumn.groovy Test suite validating variant subcolumn search functionality
regression-test/data/variant_p0/test_variant_search_subcolumn.out Expected test output for variant subcolumn search tests
gensrc/thrift/Exprs.thrift Extended TSearchFieldBinding with variant subcolumn metadata fields
fe/fe-core/src/main/antlr4/org/apache/doris/nereids/search/SearchParser.g4 Updated grammar to support field.subcolumn path syntax
fe/fe-core/src/main/antlr4/org/apache/doris/nereids/search/SearchLexer.g4 Added DOT token for subcolumn path parsing
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/SearchDslParser.java Modified to parse field paths with dot notation
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/SearchExpression.java Updated to accept ElementAt expressions for variant subcolumns
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/RewriteSearchToSlots.java Enhanced to create ElementAt expressions for variant subcolumn access
fe/fe-core/src/main/java/org/apache/doris/nereids/jobs/executor/Rewriter.java Repositioned search rewriting before variant subpath pruning
fe/fe-core/src/main/java/org/apache/doris/analysis/SearchPredicate.java Added variant subcolumn metadata to thrift parameter building
be/src/vec/functions/function_search.h Extended FieldReaderResolver with variant subcolumn detection
be/src/vec/functions/function_search.cpp Enhanced query building to handle missing variant subcolumns gracefully
be/src/vec/exprs/vsearch.h Added getter for search parameters
be/src/vec/exprs/vsearch.cpp Updated search input collection to handle variant subcolumns
be/src/olap/rowset/segment_v2/segment_iterator.cpp Modified to enable search expression evaluation without column-level indexes

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

if (ctx.fieldName() == null) {
throw new RuntimeException("Invalid field query: missing field name");
if (ctx.fieldPath() == null) {
throw new RuntimeException("Invalid field query: missing field path");
Copy link

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message should be more descriptive and specify what constitutes a valid field path.

Suggested change
throw new RuntimeException("Invalid field query: missing field path");
throw new RuntimeException("Invalid field query: missing field path. A valid field path consists of one or more segments (e.g., field, field.subfield), where each segment is an identifier or a quoted string if it contains special characters.");

Copilot uses AI. Check for mistakes.
Comment on lines 195 to +199
if (iterators.empty() || data_type_with_names.empty()) {
LOG(INFO) << "No indexed columns or iterators available, returning empty result";
LOG(INFO) << "No indexed columns or iterators available, returning empty result, dsl:"
<< search_param.original_dsl;
bitmap_result = InvertedIndexResultBitmap(std::make_shared<roaring::Roaring>(),
std::make_shared<roaring::Roaring>());
Copy link

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The creation of an empty InvertedIndexResultBitmap is duplicated in multiple places. Consider extracting this into a helper function to reduce code duplication.

Copilot uses AI. Check for mistakes.
Comment on lines +97 to +100
// Check if this is ElementAt expression (for variant subcolumn access)
if (child->expr_name() == "element_at" && child_index < field_bindings.size() &&
field_bindings[child_index].__isset.is_variant_subcolumn &&
field_bindings[child_index].is_variant_subcolumn) {
Copy link

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded string 'element_at' should be defined as a constant to avoid magic strings and improve maintainability.

Copilot uses AI. Check for mistakes.
}

private Slot findSlotByName(String fieldName, LogicalOlapScan scan) {
// Direct match only - variant subcolumns are handled by caller
Copy link

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment should explain why variant subcolumns are handled by the caller and provide a brief explanation of the handling mechanism.

Suggested change
// Direct match only - variant subcolumns are handled by caller
/*
* This method performs a direct match for slot names only and does not attempt to resolve variant subcolumns.
* Variant subcolumns (e.g., fields within complex or nested types) require special parsing and resolution logic,
* which is handled by the caller prior to invoking this method. The caller is responsible for extracting the
* appropriate subcolumn name from the field path and ensuring that the correct slot or sub-slot is passed here
* for direct matching. This separation keeps this method simple and focused on direct slot lookup.
*/

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@zzzxl1993 zzzxl1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2025

PR approved by anyone and no changes requested.

Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 16, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@airborne12 airborne12 merged commit 9451276 into apache:master Oct 16, 2025
35 of 41 checks passed
@airborne12 airborne12 deleted the feature-variant branch October 16, 2025 07:37
github-actions bot pushed a commit that referenced this pull request Oct 16, 2025
…56718)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #56139

Problem Summary:
This PR adds support for variant subcolumn access in search functions,
enabling search queries to target specific JSON paths within variant
columns using dot notation (e.g., field.subcolumn). The feature extends
the search DSL to handle variant data types with subcolumn paths,
allowing more granular search capabilities on semi-structured data.

```
SELECT  * FROM test_variant_search_subcolumn  WHERE search('variantColumn.subcolumn:textMatched');
```
yiguolei pushed a commit that referenced this pull request Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants