Skip to content

[fix](mv) Avoid unioning query-unused MV partitions#63081

Open
foxtail463 wants to merge 1 commit intoapache:masterfrom
foxtail463:bugfix-partition-compensate
Open

[fix](mv) Avoid unioning query-unused MV partitions#63081
foxtail463 wants to merge 1 commit intoapache:masterfrom
foxtail463:bugfix-partition-compensate

Conversation

@foxtail463
Copy link
Copy Markdown
Contributor

Problem Summary:

In MV union rewrite, rewrittenPlanUsePartitionNameSet may contain extra MV partitions outside the query range. For example, the query only uses {p20260401, p20260402, p20260403}, but the rewritten MV scan may select partitions from p20260301 to p20260428. The old compensation logic mapped all removed MV partitions back to base table partitions and added them into baseTableNeedUnionPartitionNameSet. This created unnecessary base table union branches, increased the MV candidate cost, and could make explain show MaterializedViewRewriteSuccessButNotChose.

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@foxtail463
Copy link
Copy Markdown
Contributor Author

run buildall

@foxtail463 foxtail463 changed the title [fix](fe) Avoid unioning query-unused MV partitions [fix](mv) Avoid unioning query-unused MV partitions May 8, 2026
@foxtail463 foxtail463 force-pushed the bugfix-partition-compensate branch from fce5497 to 8b8e561 Compare May 8, 2026 10:09
@foxtail463
Copy link
Copy Markdown
Contributor Author

问题

在 MV union rewrite 过程中,rewrittenPlanUsePartitionNameSet 可能包含一些不在原始查询范围内的 MV 分区。

旧逻辑会把所有需要从 MV scan 中移除的分区都映射回 base table 分区,并加入 baseTableNeedUnionPartitionNameSet

这样可能会生成不必要的 base table UNION ALL 分支,导致 MV candidate 的代价被高估,最终使一个本来可用的 MV 改写结果出现在 MaterializedViewRewriteSuccessButNotChose 中。

示例

假设有一张按日期分区的 base table:sales

sales partitions:
p20260301, p20260302, ..., p20260331,
p20260401, p20260402, p20260403, ..., p20260428

原始查询只需要读取 3 个 base table 分区:

queryUsedBaseTablePartitionNameSet =
{p20260401, p20260402, p20260403}

但是在 MV union rewrite 过程中,临时生成的 MV scan 可能会选择一个更大的分区范围:

rewrittenPlanUsePartitionNameSet =
{p20260301, ..., p20260331, p20260401, p20260402, p20260403, ..., p20260428}

假设当前有效的 MV 分区只有查询真正需要的这几个:

mvValidPartitionNameSet =
{p20260401, p20260402, p20260403}

那么需要从 MV scan 中移除的分区是:

mvNeedRemovePartitionNameSet =
    rewrittenPlanUsePartitionNameSet - mvValidPartitionNameSet

= {
    p20260301, ..., p20260331,
    p20260404, ..., p20260428
  }

这些被移除的 MV 分区都不在原始查询范围内。

它们应该从 MV scan 中移除,但不应该再从 base table 中 union 回来,因为原始查询根本不需要这些分区。

这个 PR 之前

旧逻辑会把所有映射出来的 base table 分区都加入 baseTableNeedUnionPartitionNameSet

baseTableNeedUnionPartitionNameSet.addAll(baseTablePartitions);

因此最终改写出来的计划可能逻辑上类似于:

SELECT ...
FROM mv_sales_daily
WHERE dt IN ('2026-04-01', '2026-04-02', '2026-04-03')

UNION ALL

SELECT ...
FROM sales
WHERE dt IN (
    '2026-03-01', ..., '2026-03-31',
    '2026-04-04', ..., '2026-04-28'
);

第二个 UNION ALL 分支是不必要的,因为这些分区并不在原始查询范围内。

这会带来两个问题:

  1. 生成了多余的 base table scan。
  2. MV candidate 的 cost 被放大,可能导致这个 MV 改写结果虽然成功了,但最终没有被选择。

这个 PR 之后

这个 PR 会先取交集,只把原始查询真正用到的 base table 分区加入 baseTableNeedUnionPartitionNameSet

baseTableNeedUnionPartitionNameSet.addAll(
        Sets.intersection(baseTablePartitions, queryUsedBaseTablePartitionNameSet));

在上面的例子中:

baseTablePartitions =
{
  p20260301, ..., p20260331,
  p20260404, ..., p20260428
}

queryUsedBaseTablePartitionNameSet =
{
  p20260401, p20260402, p20260403
}

intersection =
{}

所以不会生成任何不必要的 base table union 分支。

最终改写计划只需要:

SELECT ...
FROM mv_sales_daily
WHERE dt IN ('2026-04-01', '2026-04-02', '2026-04-03');

@foxtail463
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29739 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8b8e561c5bee0f4ca0549b8ff71e867fced75f42, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17818	3972	4000	3972
q2	q3	10710	871	596	596
q4	4655	456	341	341
q5	7447	1363	1137	1137
q6	221	186	144	144
q7	924	953	753	753
q8	9942	1428	1325	1325
q9	6389	5315	5349	5315
q10	6332	2078	1803	1803
q11	481	270	264	264
q12	694	434	308	308
q13	18187	3318	2702	2702
q14	301	285	267	267
q15	q16	908	871	796	796
q17	1000	1086	865	865
q18	6451	5733	5591	5591
q19	1737	1205	960	960
q20	517	394	309	309
q21	4739	2389	1952	1952
q22	472	399	339	339
Total cold run time: 99925 ms
Total hot run time: 29739 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4786	4676	4646	4646
q2	q3	4683	4829	4182	4182
q4	2157	2179	1407	1407
q5	5048	4967	5206	4967
q6	207	182	151	151
q7	2037	1834	1629	1629
q8	3384	3150	3116	3116
q9	8542	8457	8461	8457
q10	4526	4531	4264	4264
q11	616	463	419	419
q12	728	743	519	519
q13	3378	3516	2916	2916
q14	301	301	266	266
q15	q16	763	789	701	701
q17	1405	1355	1298	1298
q18	8050	7268	7238	7238
q19	1169	1150	1186	1150
q20	2250	2241	1953	1953
q21	6303	5374	4941	4941
q22	530	479	408	408
Total cold run time: 60863 ms
Total hot run time: 54628 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171885 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8b8e561c5bee0f4ca0549b8ff71e867fced75f42, data reload: false

query5	4334	676	534	534
query6	334	228	203	203
query7	4224	547	295	295
query8	331	233	213	213
query9	8839	4144	4131	4131
query10	459	354	303	303
query11	5780	2421	2233	2233
query12	186	133	129	129
query13	1339	644	446	446
query14	6687	5433	5109	5109
query14_1	4420	4427	4377	4377
query15	213	211	186	186
query16	1012	466	442	442
query17	1155	786	636	636
query18	2716	505	371	371
query19	234	212	175	175
query20	145	137	132	132
query21	216	145	125	125
query22	13618	13570	13342	13342
query23	17075	16369	16031	16031
query23_1	16132	16204	16079	16079
query24	7367	1748	1360	1360
query24_1	1362	1368	1379	1368
query25	603	524	462	462
query26	1321	325	179	179
query27	2661	595	361	361
query28	4402	2010	2014	2010
query29	1039	681	538	538
query30	309	245	200	200
query31	1127	1086	982	982
query32	101	81	76	76
query33	550	375	305	305
query34	1179	1112	661	661
query35	760	788	692	692
query36	1285	1361	1110	1110
query37	156	106	93	93
query38	3196	3117	3051	3051
query39	936	927	892	892
query39_1	865	869	857	857
query40	237	159	136	136
query41	64	65	65	65
query42	113	109	108	108
query43	335	355	307	307
query44	
query45	213	201	194	194
query46	1080	1229	720	720
query47	2277	2297	2191	2191
query48	396	426	301	301
query49	640	533	420	420
query50	718	302	219	219
query51	4359	4326	4213	4213
query52	108	109	96	96
query53	258	287	204	204
query54	330	316	254	254
query55	95	91	86	86
query56	311	309	297	297
query57	1393	1397	1293	1293
query58	306	273	268	268
query59	1549	1622	1436	1436
query60	348	342	334	334
query61	160	157	155	155
query62	671	630	567	567
query63	246	206	224	206
query64	2375	818	685	685
query65	
query66	1684	531	426	426
query67	29934	29921	29933	29921
query68	
query69	482	349	316	316
query70	1017	996	944	944
query71	317	288	277	277
query72	2964	2774	2446	2446
query73	839	751	429	429
query74	5051	4916	4737	4737
query75	2771	2663	2315	2315
query76	2277	1167	810	810
query77	428	461	361	361
query78	12906	12830	12395	12395
query79	1527	1005	736	736
query80	1386	601	503	503
query81	526	281	246	246
query82	983	159	124	124
query83	356	274	255	255
query84	258	146	113	113
query85	952	525	453	453
query86	440	339	314	314
query87	3411	3360	3221	3221
query88	3586	2704	2688	2688
query89	451	391	341	341
query90	2014	187	187	187
query91	181	167	140	140
query92	76	82	79	79
query93	1181	984	569	569
query94	728	340	297	297
query95	668	389	358	358
query96	1017	790	354	354
query97	2696	2708	2614	2614
query98	244	236	235	235
query99	1126	1105	965	965
Total cold run time: 254639 ms
Total hot run time: 171885 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/2) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants