Skip to content

Commit c6cf517

Browse files
MINOR: Add a description document for batchLength (#20140)
Add documentation for Batch Format to explain the meaning of batchLength. This is the preview image after the change: ![image](https://github.com/user-attachments/assets/85023c48-64e6-4a33-898f-df84f6864e58) Reviewers: Ken Huang <[email protected]>, Jhen-Yung Hsu <[email protected]>, Chia-Ping Tsai <[email protected]>
1 parent 9f09242 commit c6cf517

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

docs/implementation.html

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,9 @@ <h4 class="anchor-heading"><a id="recordbatch" class="anchor-link"></a><a href="
5959
records: [Record]</code></pre>
6060
<p> Note that when compression is enabled, the compressed record data is serialized directly following the count of the number of records. </p>
6161

62+
<p>batchLength represents the number of bytes from the current position (immediately after the batchLength field) to the end of the batch.
63+
In other words, the total size of a record batch on disk is batchLength + 12 bytes, which includes the 8-byte baseOffset and the 4-byte batchLength field itself.</p>
64+
6265
<p>The CRC covers the data from the attributes to the end of the batch (i.e. all the bytes that follow the CRC). It is located after the magic byte, which
6366
means that clients must parse the magic byte before deciding how to interpret the bytes between the batch length and the magic byte. The partition leader
6467
epoch field is not included in the CRC computation to avoid the need to recompute the CRC when this field is assigned for every batch that is received by

0 commit comments

Comments
 (0)