perf(io): Avoid per-entry KeyValue allocation in HFileDataBlock.seekTo by wombatu-kun · Pull Request #19021 · apache/hudi

wombatu-kun · 2026-06-16T09:33:26Z

Describe the issue this Pull Request addresses

The native HFile reader's HFileDataBlock.seekTo is the hottest inner loop on the metadata-table read path (record-level index, bloom filter and column-stats point lookups, which run on essentially every write). For each entry it scanned it allocated a KeyValue and its Key just to compare the entry key against the lookup key and to compute the stride to the next entry, producing two short-lived objects per scanned entry and avoidable GC pressure under point-lookup workloads.

Summary and Changelog

HFileDataBlock.seekTo now compares the entry key directly against the backing block buffer and computes the stride from the on-disk length fields, instead of materializing a KeyValue/Key for every scanned entry. A KeyValue is materialized only on an exact match. For the "in range" and end-of-block cases the cursor is pointed at the previous offset and the read is deferred, which getKeyValue() already performs lazily. The lookup key may be a UTF8StringKey, so its polymorphic content accessors are used for the comparison. No other class is touched and the original Option-based cursor is unchanged.

Impact

No public API or on-disk format change. Lower-allocation, faster point lookups on the metadata-table read path. JMH microbenchmark over an uncompressed HFile fixture (5000 entries, 625 sorted point lookups; forks(0); gc.alloc.rate.norm):

Workload	Metric	Before	After	Delta
Point lookups	allocation (B/op)	677,729	363,721	-46%
Point lookups	throughput (ops/ms)	5.25	6.16	+17%
Full scan (not on the seekTo path)	allocation (B/op)	643,705	643,681	unchanged

Risk Level

low. The change is confined to one method, preserves all seekTo return codes and the cursor's lazy-read semantics, and is exercised by the existing HFile reader suite (point, prefix, non-unique and fake-first-key seeks, sequential reads, empty file, and HBase read/write compatibility). The full hudi-io module test suite (101 tests) and checkstyle pass.

Documentation Update

none

Contributor's checklist

Read through contributor's guide
Enough context is provided in the sections above
Adequate tests were added if applicable

HFileDataBlock.seekTo materialized a KeyValue (and its Key) for every entry it scanned, only to compare the entry key and compute the stride to the next entry. On the metadata-table read path (record-level index, bloom filter and column-stats point lookups) this is the hottest inner loop, allocating two short-lived objects per scanned entry. This compares the entry key directly against the backing block buffer and computes the stride from the on-disk length fields, materializing a KeyValue only on an exact match. The "in range" and end-of-block cases point the cursor at the previous offset and defer the read, which getKeyValue() already performs lazily. The lookup key is a UTF8StringKey, so its polymorphic content accessors are used for the comparison. No on-disk format or public API change. JMH microbenchmark over an uncompressed HFile fixture (5000 entries, 625 sorted point lookups), forks(0), gc.alloc.rate.norm and throughput: point lookups: 677,729 -> 363,721 B/op (-46%), 5.25 -> 6.16 ops/ms (+17%) full scan (seekTo is not on that path): 643,705 -> 643,681 B/op (unchanged)

hudi-agent

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR avoids per-entry KeyValue/Key allocations in HFileDataBlock.seekTo by comparing entry keys directly against the backing buffer and computing the stride from the on-disk length fields, materializing a KeyValue only on an exact match. I traced the buffer layout, comparison semantics, stride computation, and the deferred-read fallback through HFileCursor, HFileReaderImpl.getKeyValue, and HFileDataBlock.next, and the behavior matches the original across all paths (exact match, in-range, before-block-first-key, end-of-block, and empty-loop). No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One naming suggestion on the inlined readInt calls; rest of the diff is clean.

cc @yihua

hudi-agent · 2026-06-16T14:31:27Z

      }
      long increment =
-          (long) KEY_OFFSET + (long) kv.getKeyLength() + (long) kv.getValueLength()
+          (long) KEY_OFFSET + (long) IOUtils.readInt(byteBuff, relativeOffset)


🤖 nit: could you extract these two readInt calls into named locals (e.g. entryKeyLength / entryValueLength) before computing increment? The original kv.getKeyLength() / kv.getValueLength() were self-documenting; the raw offset reads here require tracing back to the layout comment ~30 lines earlier to understand what each one returns.

_{- AI-generated; verify before applying. React 👍/👎 to flag quality.}

Done 35c1441

… locals in HFileDataBlock.seekTo

hudi-agent

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR removes per-entry KeyValue allocation in HFileDataBlock.seekTo by comparing the entry key directly against the backing buffer and computing the stride from the on-disk length fields, materializing a KeyValue only on exact match and deferring otherwise to the cursor's lazy read. I traced the offset arithmetic (KEY_OFFSET + KEY_LENGTH_LENGTH matches the KeyValue layout), the comparison semantics (equivalent to compareKeys via IOUtils.compareTo), and the comp > 0 / end-of-block paths (lazy materialization through HFileReaderImpl.getKeyValue() preserves behavior). No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.

cc @yihua

danny0405 · 2026-06-17T03:25:21Z

+    // key may be a UTF8StringKey, so use the polymorphic content accessors (no 2-byte prefix).
+    byte[] lookupBytes = key.getBytes();
+    int lookupContentOffset = key.getContentOffset();
+    int lookupContentLength = key.getContentLength();


can we add some UTs to validate this.

Done 82abf7d. Added a focused test in TestHFileReader that writes a small-block-size HFile (several entries per data block) and validates seekTo across SEEK_TO_FOUND, SEEK_TO_IN_RANGE, SEEK_TO_BEFORE_FILE_FIRST_KEY, and SEEK_TO_EOF; the FOUND/IN_RANGE cases land mid-block so they exercise the new buffer-direct key comparison and the deferred cursor read.

…ffer-direct scan

hudi-agent

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR avoids per-entry KeyValue/Key allocation in HFileDataBlock.seekTo by comparing the entry key directly against the backing buffer and computing the stride from the on-disk length fields, materializing a KeyValue only on exact match. Layout, comparison semantics, and cursor lazy-read behavior all line up with the original code, and the new test exercises both buffer-direct comparison and the deferred-cursor in-range path across multiple data blocks. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.

cc @yihua

yihua

Blocked on my review as the core change to the file format

hudi-bot · 2026-06-17T06:10:45Z

CI report:

82abf7d Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

github-actions Bot added the size:S PR with lines of changes in (10, 100] label Jun 16, 2026

hudi-agent reviewed Jun 16, 2026

View reviewed changes

addressed review comments: extract entry key/value lengths into named…

35c1441

… locals in HFileDataBlock.seekTo

hudi-agent reviewed Jun 17, 2026

View reviewed changes

danny0405 assigned linliu-code Jun 17, 2026

danny0405 reviewed Jun 17, 2026

View reviewed changes

addressed review comments: add unit test for HFileDataBlock.seekTo bu…

82abf7d

…ffer-direct scan

hudi-agent reviewed Jun 17, 2026

View reviewed changes

yihua requested changes Jun 17, 2026

View reviewed changes

voonhous added the area:performance Performance optimizations label Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(io): Avoid per-entry KeyValue allocation in HFileDataBlock.seekTo#19021

perf(io): Avoid per-entry KeyValue allocation in HFileDataBlock.seekTo#19021
wombatu-kun wants to merge 3 commits into
apache:masterfrom
wombatu-kun:perf-io-hfile-seekto-buffer-compare

wombatu-kun commented Jun 16, 2026 •

edited

Loading

Uh oh!

hudi-agent left a comment

Uh oh!

hudi-agent Jun 16, 2026

Uh oh!

wombatu-kun Jun 17, 2026

Uh oh!

hudi-agent left a comment

Uh oh!

danny0405 Jun 17, 2026

Uh oh!

wombatu-kun Jun 17, 2026

Uh oh!

hudi-agent left a comment

Uh oh!

yihua left a comment

Uh oh!

hudi-bot commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

wombatu-kun commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

Uh oh!

hudi-agent left a comment

Choose a reason for hiding this comment

Uh oh!

hudi-agent Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

wombatu-kun Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

hudi-agent left a comment

Choose a reason for hiding this comment

Uh oh!

danny0405 Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

wombatu-kun Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

hudi-agent left a comment

Choose a reason for hiding this comment

Uh oh!

yihua left a comment

Choose a reason for hiding this comment

Uh oh!

hudi-bot commented Jun 17, 2026

CI report:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

wombatu-kun commented Jun 16, 2026 •

edited

Loading