Skip to content

Add Tablet.serializedSize() and comprehensive size validation tests.#824

Draft
luoluoyuyu wants to merge 1 commit into
apache:developfrom
luoluoyuyu:tablet-serialized-size
Draft

Add Tablet.serializedSize() and comprehensive size validation tests.#824
luoluoyuyu wants to merge 1 commit into
apache:developfrom
luoluoyuyu:tablet-serialized-size

Conversation

@luoluoyuyu
Copy link
Copy Markdown
Member

Pre-allocate serialization buffer using exact size estimation, support OBJECT type in tablet serialize/deserialize path, and consolidate serializedSize tests.

Pre-allocate serialization buffer using exact size estimation, support OBJECT
type in tablet serialize/deserialize path, and consolidate serializedSize tests.
@luoluoyuyu luoluoyuyu marked this pull request as draft May 26, 2026 09:02
private int serializedSizeOfTimes() {
int size = Byte.BYTES;
if (timestamps != null) {
size += (long) Long.BYTES * rowSize;
@Caideyipi
Copy link
Copy Markdown
Contributor

I found a functional issue.

Tablet.serializedSize() claims to return the exact serialized byte size, but it uses
ReadWriteIOUtils.sizeToWrite(insertTargetName) to calculate string sizes. That helper uses s.getBytes(), which
depends on the platform default charset. The actual serialization path uses ReadWriteIOUtils.write(String, ...),
which encodes strings with TSFileConfig.STRING_CHARSET (UTF-8).

So when the device/table name, measurement name, or schema properties contain non-ASCII characters, serializedSize()
can differ from the real serialized size if the process default charset is not UTF-8.

This is probably not an issue when TsFile is used through IoTDB, because IoTDB startup sets the default charset. But
TsFile can also be used independently, and in standalone usage this can make the size estimate incorrect and break the
“exact size” guarantee.

Suggested fix: make ReadWriteIOUtils.sizeToWrite(String) use TSFileConfig.STRING_CHARSET, consistent with the
write path, and add a non-ASCII name test.

There is also a CodeQL alert for integer narrowing/overflow in serializedSizeOfTimes(). Since this method is
intended to return an exact byte size, that should probably be handled as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants