Typed columns from ScanIterator down; LE layouts move to VortexFormat#199
Merged
Conversation
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Chunk's two parallel string-keyed maps (columns + columnDtypes) collapse into one SequencedMap<ColumnName, Chunk.Column> — the Array and its DType travel together in the Column carrier, so desync is unrepresentable, and schema order is now part of the contract (previously Map.of gave no order guarantee for 1-2 column chunks). ColumnName originates in ScanIterator.initialize(), parsed once from the file's DType.Struct (the parse edge has already policy- certified every name); ChunkSpec, layout lookups, zone-stat caches and the column-map builders are typed end to end. Public rims keep String sugar converted exactly once: chunk.column(String), RowFilter references, columnZoneStats. A policy-invalid query name fails fast with the policy message — it could never match a certified column; valid-but-absent names keep the exact previous behavior and messages. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review parity finding: as(String, ...) routes through ColumnName.of like column(String) but did not document the IllegalArgumentException. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Endianness is a property of the wire format — trailer fields, spec-table indexes, scaffolding, and element values are all little-endian — so the shared unaligned LE layouts belong in VortexFormat next to the magic and trailer shape, not in PTypeIO (which keeps its real job: mapping ptypes onto those layouts). Sweeps 116 files onto the single source and deletes the six private copies (Trailer, LazyDecimalArray, GenericArray, ChunkedEncoding- Decoder, PcoTansDecoder, PcoEncodingDecoder) — two of which used reversed names (SHORT_LE) for byte-identical constants. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Found by IntelliJ inspections (value of colIdx is always 0) — dead generality from the typed-columns refactor. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Two refactors completing the "strings at the boundary, types inside" arc, plus the new docs-with-every-change rule written into CLAUDE.md.
f8ad15d1— typed columns fromScanIteratordown.Chunk's two parallel string-keyed maps (columns+columnDtypes) collapse into one unmodifiableSequencedMap<ColumnName, Chunk.Column>— each column'sArrayandDTypetravel together, so desync is unrepresentable, and schema order is now part of the contract (previouslyMap.ofgave no order guarantee for 1–2 column chunks).ColumnNameoriginates once inScanIterator.initialize()from the file's already-policy-certified schema and flows typed throughChunkSpec, layout lookups, and zone caches. Public rims keep String sugar (chunk.column(String),RowFilter,columnZoneStats) converted exactly once; a policy-invalid query name fails fast — it could never match a certified column. Compute kernels verified:ColumnNameconstruction is per-chunk-per-column, never per-row.c060d34f—LE_*layouts movePTypeIO→VortexFormat. Endianness is a wire-format fact, not a ptype fact;VortexFormat(magic, trailer, version) is where format facts live. 116 files sweep onto the single source; six private duplicates deleted — two of which spelled the identical constantSHORT_LEinstead ofLE_SHORT.Also rides along: the CLAUDE.md "Documentation is part of every change" section (
8a81331f), withdocs/reference.mdand CHANGELOG updated in-branch for both refactors, plus the review-parity@throwsfix onChunk.as(0c61bd42).Verification
./mvnw verifygreen after every commit — all modules including the Rust-interop failsafe suite (no wire changes; the oracle is a pure regression check here).@throwsparity) applied; hot-path allocation audit explicitly confirmed clean../mvnw javadoc:javadoc -pl core— zero output.🤖 Generated with Claude Code