-
Notifications
You must be signed in to change notification settings - Fork 1.1k
perf: optimize hex decoding in json (1.8x faster in binary-heavy) #9091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
30f160e to
4a6b5d4
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Weijun-H - -this looks pretty close to me
| impl ArrayDecoder for FixedSizeBinaryArrayDecoder { | ||
| fn decode(&mut self, tape: &Tape<'_>, pos: &[u32]) -> Result<ArrayData, ArrowError> { | ||
| let mut builder = FixedSizeBinaryBuilder::with_capacity(pos.len(), self.len); | ||
| let mut scratch = Vec::with_capacity(self.len as usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why self.len? Isn't that the length of the input? Also the scratch used below is different (no initial capacity)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Here self.len comes from FixedSizeBinary and represents the decoded byte width, not the input hex string length. The input string length is expected to be 2 * len. For variable-width types we use Vec::new() and reserve per value instead.
|
run benchmark json-reader |
|
(I fixed a bug in my benchmark runner that didn't allow |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Weijun-H -- this looks good to me
Nice work
Jefffrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
c8ed14d to
2571386
Compare
|
🤖 |
|
🤖: Benchmark completed Details
|
Which issue does this PR close?
Rationale for this change
Improve JSON binary decoding performance by avoiding per-value allocations and enabling direct hex decoding into builders.
What changes are included in this PR?
Optimized binary hex decoding paths to reduce allocations and improve throughput.
Are these changes tested?
Yes
Are there any user-facing changes?
No