perf: improve field indexing in JSON StructArrayDecoder (1.7x speed up) #9086

Weijun-H · 2026-01-02T13:31:04Z

Which issue does this PR close?

Closes #NNN.

Rationale for this change

Optimize JSON struct decoding on wide objects by reducing per-row allocations and repeated field lookups.

What changes are included in this PR?

Reuse a flat child-position buffer in StructArrayDecoder and add an optional field-name index for object mode.
Skip building the field-name index for list mode; add overflow/allocation checks.

decode_wide_object_i64_json
                        time:   [11.828 ms 11.865 ms 11.905 ms]
                        change: [−67.828% −67.378% −67.008%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

decode_wide_object_i64_serialize
                        time:   [7.6923 ms 7.7402 ms 7.7906 ms]
                        change: [−75.652% −75.483% −75.331%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Are these changes tested?

Yes

Are there any user-facing changes?

No

…exing in StructArrayDecoder

…ecoders and field index creation

scovich

Not sure I understand the indexing code well enough to say whether that part is correct, but the idea of using an optional index for field name lookups makes a lot of sense to me.

scovich · 2026-01-05T21:33:03Z

arrow-json/src/reader/struct_array.rs

    }
 }
+
+fn build_field_index(fields: &Fields) -> Option<HashMap<String, usize>> {


qq: Do lifetimes coincide so that we could return Option<HashMap<&str, usize>> instead?

Yes, the lifetimes do coincide. we can use HashMap<&'a str, usize> by taking fields: &'a Fields as a parameter, which avoids the self-referential struct problem. However, this would require threading the lifetime parameter <'a> through the entire decoder system across many files. Since the lookup performance is identical, I don’t think it’s worth the added complexity.

maybe it would be a good follow on PR

alamb

Thanks @Weijun-H and @scovich

alamb · 2026-01-06T22:30:41Z

arrow-json/benches/reader.rs

+use std::fmt::Write;
+use std::sync::Arc;
+
+fn build_schema(field_count: usize) -> Arc<Schema> {


can you please add some comments here with an example of what this code does / what patterns of input it creates?

Also, it would help me to reproduce your results if you could make a separate PR with the benchmarks (so I can compare main to the PR)

separate benchmark here

#9107

alamb · 2026-01-06T22:31:31Z

arrow-json/src/reader/struct_array.rs

    }
 }
+
+fn build_field_index(fields: &Fields) -> Option<HashMap<String, usize>> {


maybe it would be a good follow on PR

Weijun-H added 2 commits January 2, 2026 15:17

feat: add benchmark for JSON reader performance and improve field ind…

79477a9

…exing in StructArrayDecoder

refactor: streamline StructArrayDecoder initialization by combining d…

1ffd2c0

…ecoders and field index creation

github-actions bot added the arrow Changes to the arrow crate label Jan 2, 2026

chore

7e3077e

Weijun-H marked this pull request as ready for review January 2, 2026 13:57

Weijun-H changed the title ~~perf: improve field indexing in StructArrayDecoder~~ perf: improve field indexing in StructArrayDecoder (1.5x speed up) Jan 2, 2026

Weijun-H changed the title ~~perf: improve field indexing in StructArrayDecoder (1.5x speed up)~~ perf: improve field indexing in StructArrayDecoder (2x speed up) Jan 2, 2026

Weijun-H changed the title ~~perf: improve field indexing in StructArrayDecoder (2x speed up)~~ perf: improve field indexing in StructArrayDecoder (1.7x speed up) Jan 2, 2026

scovich reviewed Jan 5, 2026

View reviewed changes

alamb reviewed Jan 6, 2026

View reviewed changes

alamb changed the title ~~perf: improve field indexing in StructArrayDecoder (1.7x speed up)~~ perf: improve field indexing in JSON StructArrayDecoder (1.7x speed up) Jan 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: improve field indexing in JSON StructArrayDecoder (1.7x speed up) #9086

perf: improve field indexing in JSON StructArrayDecoder (1.7x speed up) #9086

Weijun-H commented Jan 2, 2026 •

edited

Loading

Uh oh!

scovich left a comment

Uh oh!

scovich Jan 5, 2026

Uh oh!

Weijun-H Jan 6, 2026

Uh oh!

alamb Jan 6, 2026

Uh oh!

alamb left a comment

Uh oh!

alamb Jan 6, 2026

Uh oh!

Weijun-H Jan 7, 2026

Uh oh!

alamb Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf: improve field indexing in JSON StructArrayDecoder (1.7x speed up) #9086

Are you sure you want to change the base?

perf: improve field indexing in JSON StructArrayDecoder (1.7x speed up) #9086

Conversation

Weijun-H commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

scovich left a comment

Choose a reason for hiding this comment

Uh oh!

scovich Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Weijun-H Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Weijun-H Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Weijun-H commented Jan 2, 2026 •

edited

Loading