Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed

- The little-endian `ValueLayout` constants moved from `PTypeIO.LE_*` to `VortexFormat.LE_*` — endianness is a property of the wire format, not of ptypes, and `VortexFormat` is where format facts live. Six classes that carried private copies (including reversed-name duplicates like `SHORT_LE`) now share the single source; nothing outside `VortexFormat` defines a `withOrder(LITTLE_ENDIAN)` layout. ([c060d34f](https://github.com/dfa1/vortex-java/commit/c060d34f))
- `Chunk.columns()` returns an order-preserving `SequencedMap<ColumnName, Chunk.Column>` — one map instead of two parallel string-keyed maps, with each column's `Array` and `DType` traveling together in the `Column` carrier. `column(String)` stays as boundary sugar (plus a `column(ColumnName)` overload); iteration order is the schema/projection order, now guaranteed even for 1–2 column chunks. Typed names originate in `ScanIterator` from the file's already-certified schema. ([f8ad15d1](https://github.com/dfa1/vortex-java/commit/f8ad15d1))

- `Compute.filteredSum` over a dictionary-encoded filter column is ~20× faster (best runs ~30×): the predicate is resolved against the dictionary's value pool once and the raw `u8` codes are scanned directly from their backing segments, instead of decoding every row through the per-element accessor — a fused `SUM(measure) WHERE category = …` over 100M rows drops from ~760 ms to ~38 ms. ([85e251cc](https://github.com/dfa1/vortex-java/commit/85e251cc))
- `Compute.filteredAggregate` takes the same dictionary code-scan lane (`COUNT(*)` included) — ~22× faster on the same workload (~980 ms → ~46 ms over 100M rows), which the Calcite `WHERE`-filtered aggregate push-down inherits on its boundary chunks. ([145791c7](https://github.com/dfa1/vortex-java/commit/145791c7), [6e6d7dd0](https://github.com/dfa1/vortex-java/commit/6e6d7dd0))
- A multi-column `AND` filter no longer forfeits the dictionary lane: the dict-encoded leaf drives the code scan and the remaining predicates are evaluated only on its matches — `SUM(…) WHERE category = 7 AND price > 500` over 100M rows drops from ~2.3 s to ~200 ms (~11×). ([12e13501](https://github.com/dfa1/vortex-java/commit/12e13501))
Expand Down
9 changes: 9 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,15 @@ in the Rust source for the exact schema, then implement from spec.
- **POM deps** grouped with comments: `<!-- production -->` then `<!-- testing -->`, each with
project-internal (`io.github.dfa1.vortex:*`) deps first, then external. Omit empty sections.

## Documentation is part of every change

Living docs ship in the same commit/PR as the change they describe — never as a follow-up
sweep. A change touching public API, module structure, wire behavior, or policy updates
whichever apply: `docs/reference.md`, `docs/compatibility.md`, the CLAUDE.md module map /
design decisions, and CHANGELOG (per its own rules). Historical records (`adr/`, released
CHANGELOG sections) are exempt — they describe the past. Docs drift is a bug (2026-07-04: a
single audit found phantom APIs, dead service files, and pre-refactor FQNs across four files).

## Code style

- 4-space indent, **zero SonarQube bugs/smells**, no `sun.misc.Unsafe` or internal JDK APIs.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import io.github.dfa1.vortex.reader.layout.Layout;
import io.github.dfa1.vortex.reader.SegmentSpec;
import io.github.dfa1.vortex.core.model.DType;
import io.github.dfa1.vortex.core.model.ColumnName;
import io.github.dfa1.vortex.reader.array.Array;
import io.github.dfa1.vortex.cli.tui.term.Ansi;
import io.github.dfa1.vortex.cli.tui.term.Key;
Expand Down Expand Up @@ -766,11 +767,12 @@ private void runDataLoad(String columnName) {
return;
}
try (Chunk chunk = it.next()) {
Array array = chunk.columns().get(columnName);
if (array == null) {
Chunk.Column column = chunk.columns().get(ColumnName.of(columnName));
if (column == null) {
dataCache.put(columnName, new DataState.Loaded(List.of()));
return;
}
Array array = column.array();
int n = (int) Math.min(array.length(), DATA_PREVIEW_ROWS);
List<String> out = new ArrayList<>(n);
for (int i = 0; i < n; i++) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import io.github.dfa1.vortex.core.model.PType;
import io.github.dfa1.vortex.core.model.EncodingId;
import io.github.dfa1.vortex.core.io.VortexFormat;
import io.github.dfa1.vortex.core.io.PTypeIO;
import io.github.dfa1.vortex.core.error.VortexException;

Expand Down Expand Up @@ -97,7 +98,7 @@ public static long[] toLongs(Object data, PType ptype, EncodingId encoding) {
public static MemorySegment fromLongs(long[] longs, PType ptype, SegmentAllocator arena) {
if (ptype == PType.I64 || ptype == PType.U64) {
MemorySegment dst = arena.allocate((long) longs.length * 8);
MemorySegment.copy(MemorySegment.ofArray(longs), ValueLayout.JAVA_LONG, 0L, dst, PTypeIO.LE_LONG, 0L, longs.length);
MemorySegment.copy(MemorySegment.ofArray(longs), ValueLayout.JAVA_LONG, 0L, dst, VortexFormat.LE_LONG, 0L, longs.length);
return dst;
}
int n = longs.length;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
package io.github.dfa1.vortex.core.fbs;

import static io.github.dfa1.vortex.core.io.PTypeIO.LE_DOUBLE;
import static io.github.dfa1.vortex.core.io.PTypeIO.LE_FLOAT;
import static io.github.dfa1.vortex.core.io.PTypeIO.LE_INT;
import static io.github.dfa1.vortex.core.io.PTypeIO.LE_LONG;
import static io.github.dfa1.vortex.core.io.PTypeIO.LE_SHORT;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_DOUBLE;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_FLOAT;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_INT;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_LONG;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_SHORT;

import java.lang.foreign.MemorySegment;
import java.lang.foreign.ValueLayout;
Expand Down
10 changes: 5 additions & 5 deletions core/src/main/java/io/github/dfa1/vortex/core/fbs/FbsStruct.java
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
package io.github.dfa1.vortex.core.fbs;

import static io.github.dfa1.vortex.core.io.PTypeIO.LE_DOUBLE;
import static io.github.dfa1.vortex.core.io.PTypeIO.LE_FLOAT;
import static io.github.dfa1.vortex.core.io.PTypeIO.LE_INT;
import static io.github.dfa1.vortex.core.io.PTypeIO.LE_LONG;
import static io.github.dfa1.vortex.core.io.PTypeIO.LE_SHORT;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_DOUBLE;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_FLOAT;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_INT;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_LONG;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_SHORT;

import java.lang.foreign.MemorySegment;
import java.lang.foreign.ValueLayout;
Expand Down
10 changes: 5 additions & 5 deletions core/src/main/java/io/github/dfa1/vortex/core/fbs/FbsTable.java
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
package io.github.dfa1.vortex.core.fbs;

import static io.github.dfa1.vortex.core.io.PTypeIO.LE_DOUBLE;
import static io.github.dfa1.vortex.core.io.PTypeIO.LE_FLOAT;
import static io.github.dfa1.vortex.core.io.PTypeIO.LE_INT;
import static io.github.dfa1.vortex.core.io.PTypeIO.LE_LONG;
import static io.github.dfa1.vortex.core.io.PTypeIO.LE_SHORT;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_DOUBLE;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_FLOAT;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_INT;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_LONG;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_SHORT;

import java.lang.foreign.MemorySegment;
import java.lang.foreign.ValueLayout;
Expand Down
22 changes: 8 additions & 14 deletions core/src/main/java/io/github/dfa1/vortex/core/io/PTypeIO.java
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,12 @@
import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodType;
import java.lang.invoke.VarHandle;
import java.nio.ByteOrder;

import static io.github.dfa1.vortex.core.io.VortexFormat.LE_DOUBLE;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_FLOAT;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_INT;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_LONG;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_SHORT;

/// Bulk I/O helpers for primitive ptypes, backed by `MemorySegment`/`ValueLayout`/`MethodHandle`.
///
Expand All @@ -18,21 +23,10 @@
/// the unaligned little-endian VarHandle. This lets hot loops avoid per-element `switch`
/// dispatch on `PType`.
///
/// The LE_* layout constants are public so callers outside this package can share them
/// without duplicating the `withOrder(LITTLE_ENDIAN)` boilerplate.
/// The shared little-endian layouts live in [VortexFormat] — endianness is a property of the
/// wire format, not of ptypes; this class only maps ptypes onto those layouts.
public final class PTypeIO {

/// Unaligned little-endian layout for 16-bit shorts.
public static final ValueLayout.OfShort LE_SHORT = ValueLayout.JAVA_SHORT_UNALIGNED.withOrder(ByteOrder.LITTLE_ENDIAN);
/// Unaligned little-endian layout for 32-bit ints.
public static final ValueLayout.OfInt LE_INT = ValueLayout.JAVA_INT_UNALIGNED.withOrder(ByteOrder.LITTLE_ENDIAN);
/// Unaligned little-endian layout for 64-bit longs.
public static final ValueLayout.OfLong LE_LONG = ValueLayout.JAVA_LONG_UNALIGNED.withOrder(ByteOrder.LITTLE_ENDIAN);
/// Unaligned little-endian layout for 32-bit floats.
public static final ValueLayout.OfFloat LE_FLOAT = ValueLayout.JAVA_FLOAT_UNALIGNED.withOrder(ByteOrder.LITTLE_ENDIAN);
/// Unaligned little-endian layout for 64-bit doubles.
public static final ValueLayout.OfDouble LE_DOUBLE = ValueLayout.JAVA_DOUBLE_UNALIGNED.withOrder(ByteOrder.LITTLE_ENDIAN);

private static final MethodHandle[] SETTERS = buildSetters();

private PTypeIO() {
Expand Down
19 changes: 19 additions & 0 deletions core/src/main/java/io/github/dfa1/vortex/core/io/VortexFormat.java
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
package io.github.dfa1.vortex.core.io;

import java.lang.foreign.MemorySegment;
import java.lang.foreign.ValueLayout;

import java.nio.ByteOrder;

/// Wire-format constants for the Vortex file format.
///
Expand All @@ -24,6 +27,22 @@ public final class VortexFormat {
/// Files with any other version are rejected up front rather than silently mis-parsed.
public static final int VERSION = 1;

// All multi-byte integers in the Vortex wire format are little-endian — trailer fields,
// spec-table indexes, buffer scaffolding, and element values alike. These unaligned
// little-endian layouts are the single source for every wire read/write; nothing outside
// this class defines its own withOrder(LITTLE_ENDIAN) copy.

/// Unaligned little-endian layout for 16-bit shorts.
public static final ValueLayout.OfShort LE_SHORT = ValueLayout.JAVA_SHORT_UNALIGNED.withOrder(ByteOrder.LITTLE_ENDIAN);
/// Unaligned little-endian layout for 32-bit ints.
public static final ValueLayout.OfInt LE_INT = ValueLayout.JAVA_INT_UNALIGNED.withOrder(ByteOrder.LITTLE_ENDIAN);
/// Unaligned little-endian layout for 64-bit longs.
public static final ValueLayout.OfLong LE_LONG = ValueLayout.JAVA_LONG_UNALIGNED.withOrder(ByteOrder.LITTLE_ENDIAN);
/// Unaligned little-endian layout for 32-bit floats.
public static final ValueLayout.OfFloat LE_FLOAT = ValueLayout.JAVA_FLOAT_UNALIGNED.withOrder(ByteOrder.LITTLE_ENDIAN);
/// Unaligned little-endian layout for 64-bit doubles.
public static final ValueLayout.OfDouble LE_DOUBLE = ValueLayout.JAVA_DOUBLE_UNALIGNED.withOrder(ByteOrder.LITTLE_ENDIAN);

private VortexFormat() {
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import io.github.dfa1.vortex.core.error.VortexException;

import static io.github.dfa1.vortex.core.io.PTypeIO.LE_SHORT;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_SHORT;

import java.lang.foreign.MemorySegment;
import java.lang.foreign.ValueLayout;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
package io.github.dfa1.vortex.core.proto;

import static io.github.dfa1.vortex.core.io.PTypeIO.LE_INT;
import static io.github.dfa1.vortex.core.io.PTypeIO.LE_LONG;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_INT;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_LONG;
import java.io.IOException;
import java.lang.foreign.MemorySegment;
import java.lang.foreign.ValueLayout;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
package io.github.dfa1.vortex.core.compute;

import io.github.dfa1.vortex.core.model.EncodingId;
import io.github.dfa1.vortex.core.io.PTypeIO;
import io.github.dfa1.vortex.core.io.VortexFormat;

import io.github.dfa1.vortex.core.model.PType;
import io.github.dfa1.vortex.core.error.VortexException;
Expand Down Expand Up @@ -142,7 +142,7 @@ void fromLongs_i64_writesLittleEndian() {

// Then it is stored little-endian (lowest byte first)
assertThat(seg.get(ValueLayout.JAVA_BYTE, 0)).isEqualTo((byte) 0x08);
assertThat(seg.getAtIndex(PTypeIO.LE_LONG, 0)).isEqualTo(0x0102_0304_0506_0708L);
assertThat(seg.getAtIndex(VortexFormat.LE_LONG, 0)).isEqualTo(0x0102_0304_0506_0708L);
}
}

Expand All @@ -164,9 +164,9 @@ void fromLongs_narrowWidth_keepsOnlyLowBytes() {
private static long readElement(MemorySegment seg, PType ptype, int i) {
return switch (ptype) {
case I8, U8 -> seg.get(ValueLayout.JAVA_BYTE, i);
case I16, U16 -> seg.getAtIndex(PTypeIO.LE_SHORT, i);
case I32, U32 -> seg.getAtIndex(PTypeIO.LE_INT, i);
case I64, U64 -> seg.getAtIndex(PTypeIO.LE_LONG, i);
case I16, U16 -> seg.getAtIndex(VortexFormat.LE_SHORT, i);
case I32, U32 -> seg.getAtIndex(VortexFormat.LE_INT, i);
case I64, U64 -> seg.getAtIndex(VortexFormat.LE_LONG, i);
default -> throw new IllegalArgumentException("not an integer ptype: " + ptype);
};
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@

import java.lang.foreign.MemorySegment;

import static io.github.dfa1.vortex.core.io.PTypeIO.LE_DOUBLE;
import static io.github.dfa1.vortex.core.io.PTypeIO.LE_FLOAT;
import static io.github.dfa1.vortex.core.io.PTypeIO.LE_INT;
import static io.github.dfa1.vortex.core.io.PTypeIO.LE_LONG;
import static io.github.dfa1.vortex.core.io.PTypeIO.LE_SHORT;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_DOUBLE;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_FLOAT;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_INT;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_LONG;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_SHORT;
import static java.lang.foreign.ValueLayout.JAVA_BYTE;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
package io.github.dfa1.vortex.encoding;

import io.github.dfa1.vortex.core.io.PTypeIO;
import io.github.dfa1.vortex.core.io.VortexFormat;

import java.lang.foreign.Arena;
import java.lang.foreign.MemorySegment;
Expand All @@ -16,39 +16,39 @@ private TestSegments() {
public static MemorySegment leLongs(long... values) {
MemorySegment seg = Arena.ofAuto().allocate((long) values.length * Long.BYTES);
for (int i = 0; i < values.length; i++) {
seg.setAtIndex(PTypeIO.LE_LONG, i, values[i]);
seg.setAtIndex(VortexFormat.LE_LONG, i, values[i]);
}
return seg;
}

public static MemorySegment leInts(int... values) {
MemorySegment seg = Arena.ofAuto().allocate((long) values.length * Integer.BYTES);
for (int i = 0; i < values.length; i++) {
seg.setAtIndex(PTypeIO.LE_INT, i, values[i]);
seg.setAtIndex(VortexFormat.LE_INT, i, values[i]);
}
return seg;
}

public static MemorySegment leDoubles(double... values) {
MemorySegment seg = Arena.ofAuto().allocate((long) values.length * Double.BYTES);
for (int i = 0; i < values.length; i++) {
seg.setAtIndex(PTypeIO.LE_DOUBLE, i, values[i]);
seg.setAtIndex(VortexFormat.LE_DOUBLE, i, values[i]);
}
return seg;
}

public static MemorySegment leFloats(float... values) {
MemorySegment seg = Arena.ofAuto().allocate((long) values.length * Float.BYTES);
for (int i = 0; i < values.length; i++) {
seg.setAtIndex(PTypeIO.LE_FLOAT, i, values[i]);
seg.setAtIndex(VortexFormat.LE_FLOAT, i, values[i]);
}
return seg;
}

public static MemorySegment leShorts(short... values) {
MemorySegment seg = Arena.ofAuto().allocate((long) values.length * Short.BYTES);
for (int i = 0; i < values.length; i++) {
seg.setAtIndex(PTypeIO.LE_SHORT, i, values[i]);
seg.setAtIndex(VortexFormat.LE_SHORT, i, values[i]);
}
return seg;
}
Expand Down
21 changes: 14 additions & 7 deletions docs/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,13 +183,20 @@ columnar buffers; closing the chunk releases the arena. After `close()`, touchin
any `Array` previously returned by `column(...)` or `columns()` raises FFM's scope
check (`IllegalStateException`).

| Method | Notes |
|-----------------------------------------|----------------------------------------------------------|
| `rowCount()` | Rows in this chunk |
| `columns()` | All columns in this chunk |
| `<T extends Array> column(String name)` | Typed column lookup; throws `VortexException` if unknown |
| `isClosed()` | Whether `close()` has run |
| `close()` | Releases the chunk's arena. Idempotent. |
Columns are stored as one order-preserving map keyed by the validated [`ColumnName`]; each
entry is a `Chunk.Column(Array array, DType dtype)` carrier, so a column's data and type can
never desync. `column(String)` is boundary sugar: the name is wrapped in a `ColumnName` (a
policy-invalid name fails fast — it could never match a certified column).

| Method | Notes |
|---------------------------------------------|----------------------------------------------------------|
| `rowCount()` | Rows in this chunk |
| `columns()` | `SequencedMap<ColumnName, Chunk.Column>`, schema order, unmodifiable |
| `<T extends Array> column(String name)` | Typed column lookup; throws `VortexException` if absent |
| `<T extends Array> column(ColumnName name)` | Same, for callers that validated early |
| `as(String name, Class<T> domainType)` | Extension column → typed `List<T>` |
| `isClosed()` | Whether `close()` has run |
| `close()` | Releases the chunk's arena. Idempotent. |

---

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
package io.github.dfa1.vortex.inspect;

import static io.github.dfa1.vortex.core.io.PTypeIO.LE_INT;
import static io.github.dfa1.vortex.core.io.VortexFormat.LE_INT;

import io.github.dfa1.vortex.reader.ArrayStats;
import io.github.dfa1.vortex.core.model.DType;
Expand Down
Loading
Loading