Skip to content

[opt](csv reader) optimize stream load CSV read performance#60920

Open
liaoxin01 wants to merge 1 commit intoapache:masterfrom
liaoxin01:opt/csv-reader-stream-load-perf
Open

[opt](csv reader) optimize stream load CSV read performance#60920
liaoxin01 wants to merge 1 commit intoapache:masterfrom
liaoxin01:opt/csv-reader-stream-load-perf

Conversation

@liaoxin01
Copy link
Contributor

@liaoxin01 liaoxin01 commented Feb 28, 2026

Proposed changes

Optimize stream load CSV read performance for nullable string columns by eliminating per-row overhead from the SerDe abstraction layer.

Changes

  1. Cache nullable string column pointers per-batch: Pre-compute assert_cast results (ColumnStr and NullMap pointers) once per batch instead of once per row per column, stored in NullableStringColumnCache.

  2. Inline nullable string write path: Bypass _deserialize_nullable_string and StringSerDe::deserialize_one_cell_from_csv in the hot loop, directly performing null checks, escape handling, and insert_data/push_back.

  3. Pre-reserve column capacity: Reserve offsets, chars, and null_map capacity at batch start to reduce PODArray realloc overhead during the row loop.

Performance

Tested with ClickBench dataset stream load:

  • Import time reduced from 571s to 476s (16.6% improvement)

Flame graph analysis

Before optimization, _deserialize_nullable_string path dominated with +96s self-time from:

  • Per-row assert_cast<ColumnNullable&> (+65s)
  • StringSerDe::deserialize_one_cell_from_csv intermediate layer (+54s)
  • Repeated PODArray reserve/realloc during column growth

After optimization, these costs are eliminated or amortized to per-batch.

Cache nullable string column pointers per-batch to eliminate per-row
assert_cast, inline the write path to bypass StringSerDe layer, and
pre-reserve ColumnStr/NullMap capacity to reduce realloc overhead.
Copilot AI review requested due to automatic review settings February 28, 2026 14:59
@Thearas
Copy link
Contributor

Thearas commented Feb 28, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes vectorized CSV stream-load parsing for nullable string columns by removing per-row SerDe overhead and reducing reallocations in the hot loop.

Changes:

  • Adds per-batch caching of ColumnNullable nested string column and null-map pointers to avoid repeated assert_cast per row.
  • Inlines the nullable-string CSV decode path (null detection + escape handling + insert_data / push_back) instead of calling through SerDe layers.
  • Pre-reserves offsets, chars, and null_map capacity per batch to reduce PODArray growth overhead.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
be/src/vec/exec/format/csv/csv_reader.h Adds nullable string column cache structures/members and required column includes.
be/src/vec/exec/format/csv/csv_reader.cpp Initializes/uses the cache per batch, inlines nullable-string deserialization, and adds per-batch reserves.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@liaoxin01
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28649 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9711a75daa28fdfe7ea12f51580249a917ba3665, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17611	4469	4274	4274
q2	q3	10652	796	525	525
q4	4676	361	251	251
q5	7546	1203	1038	1038
q6	176	177	150	150
q7	766	834	668	668
q8	9296	1441	1320	1320
q9	4925	4760	4647	4647
q10	6803	1877	1641	1641
q11	474	269	229	229
q12	733	566	465	465
q13	17766	4183	3434	3434
q14	226	227	211	211
q15	965	797	786	786
q16	743	715	663	663
q17	729	929	440	440
q18	5967	5412	5310	5310
q19	1416	985	607	607
q20	509	501	389	389
q21	4623	1835	1362	1362
q22	342	282	239	239
Total cold run time: 96944 ms
Total hot run time: 28649 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4429	4350	4349	4349
q2	q3	1758	2161	1714	1714
q4	837	1148	762	762
q5	4017	4305	4328	4305
q6	178	169	143	143
q7	1723	1600	1482	1482
q8	2418	2629	2517	2517
q9	7501	7356	7446	7356
q10	2684	2884	2516	2516
q11	521	465	414	414
q12	507	593	458	458
q13	3973	4429	3689	3689
q14	283	298	280	280
q15	905	863	786	786
q16	720	771	718	718
q17	1204	1644	1310	1310
q18	7154	6820	6575	6575
q19	942	890	964	890
q20	2138	2255	2097	2097
q21	4248	3502	3444	3444
q22	476	470	381	381
Total cold run time: 48616 ms
Total hot run time: 46186 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184062 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9711a75daa28fdfe7ea12f51580249a917ba3665, data reload: false

query5	4955	635	513	513
query6	330	215	191	191
query7	4215	478	278	278
query8	341	239	238	238
query9	8783	2759	2709	2709
query10	558	372	339	339
query11	17089	16854	16609	16609
query12	194	129	130	129
query13	1272	445	354	354
query14	6605	3202	3010	3010
query14_1	2862	2782	2771	2771
query15	203	196	179	179
query16	1003	482	468	468
query17	1064	719	617	617
query18	2716	441	355	355
query19	214	212	184	184
query20	137	132	131	131
query21	223	147	120	120
query22	5437	5869	5265	5265
query23	17977	17147	16920	16920
query23_1	17142	17027	17079	17027
query24	7895	1600	1193	1193
query24_1	1210	1216	1221	1216
query25	529	446	398	398
query26	1241	254	152	152
query27	2744	469	292	292
query28	4483	1860	1871	1860
query29	809	573	528	528
query30	312	244	214	214
query31	889	744	640	640
query32	78	71	65	65
query33	511	334	286	286
query34	921	932	549	549
query35	632	681	600	600
query36	1083	1121	927	927
query37	134	90	84	84
query38	3003	2874	2864	2864
query39	924	870	862	862
query39_1	819	830	833	830
query40	231	149	136	136
query41	61	59	56	56
query42	105	102	100	100
query43	367	376	342	342
query44	
query45	196	183	194	183
query46	904	973	603	603
query47	2118	2150	2000	2000
query48	316	313	233	233
query49	622	463	376	376
query50	683	274	222	222
query51	4134	4073	4111	4073
query52	107	106	96	96
query53	288	332	275	275
query54	290	272	255	255
query55	89	83	83	83
query56	316	312	327	312
query57	1354	1335	1281	1281
query58	293	282	276	276
query59	2480	2653	2547	2547
query60	341	335	346	335
query61	145	148	147	147
query62	608	564	542	542
query63	316	285	279	279
query64	4809	1262	977	977
query65	
query66	1389	451	363	363
query67	16410	16352	16228	16228
query68	
query69	418	300	292	292
query70	1024	955	981	955
query71	339	312	299	299
query72	2740	2690	2549	2549
query73	545	550	318	318
query74	10018	9832	9750	9750
query75	2866	2744	2459	2459
query76	2314	1035	684	684
query77	364	385	337	337
query78	11191	11420	10698	10698
query79	1162	794	609	609
query80	1365	661	552	552
query81	550	277	260	260
query82	1028	150	117	117
query83	344	277	269	269
query84	254	128	106	106
query85	966	551	432	432
query86	406	307	295	295
query87	3174	3112	2988	2988
query88	3560	2676	2656	2656
query89	421	365	338	338
query90	1983	163	171	163
query91	162	158	134	134
query92	77	77	74	74
query93	928	868	510	510
query94	639	320	284	284
query95	587	340	318	318
query96	640	514	225	225
query97	2451	2475	2442	2442
query98	234	217	218	217
query99	1002	975	913	913
Total cold run time: 256112 ms
Total hot run time: 184062 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/48) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.57% (19628/37337)
Line Coverage 36.18% (183228/506401)
Region Coverage 32.48% (142119/437595)
Branch Coverage 33.44% (61653/184383)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (48/48) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.28% (26792/36559)
Line Coverage 56.59% (285680/504848)
Region Coverage 54.05% (238763/441732)
Branch Coverage 55.62% (102861/184947)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants