Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion vortex-btrblocks/src/compressor/mod.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,19 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright the Vortex contributors

//! Compressor traits for type-specific compression.
//! Type-specific compressor traits that drive scheme selection and compression.
//!
//! [`Compressor`] defines the interface: generate statistics for an array via
//! [`Compressor::gen_stats`], and provide available [`Scheme`]s via [`Compressor::schemes`].
//!
//! [`CompressorExt`] is blanket-implemented for all `Compressor`s and adds the core logic:
//!
//! - [`CompressorExt::choose_scheme`] iterates all schemes, skips excluded ones, and calls
//! [`Scheme::expected_compression_ratio`] on each. It returns the scheme with the highest ratio
//! above 1.0, or falls back to the default. See the [`scheme`](crate::scheme) module for how
//! ratio estimation works.
//! - [`CompressorExt::compress`] generates stats, calls `choose_scheme()`, and applies the
//! result. If compression did not shrink the array, the original is returned.

use vortex_array::ArrayRef;
use vortex_array::IntoArray;
Expand Down
16 changes: 16 additions & 0 deletions vortex-btrblocks/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,22 @@
//! - **Statistical Analysis**: Uses data sampling and statistics to predict compression ratios
//! - **Recursive Structure Handling**: Compresses nested structures like structs and lists
//!
//! # How It Works
//!
//! [`BtrBlocksCompressor::compress()`] takes an `&ArrayRef` and returns an `ArrayRef` that may
//! use a different encoding. It first canonicalizes the input, then dispatches by type.
//! Primitives go to a type-specific `Compressor` (integer, float, or string). Compound types
//! like structs and lists recurse into their fields and elements.
//!
//! Each type-specific compressor holds a static list of `Scheme` implementations (e.g.
//! BitPacking, ALP, Dict). There is no dynamic registry. The compressor evaluates each scheme by
//! compressing a ~1% sample and measuring the ratio, then picks the best. See `SchemeExt` for
//! details on how sampling works.
Comment on lines +28 to +31
Copy link
Contributor

@joseph-isaacs joseph-isaacs Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type-specific one has encodings switched on or off

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we should likely allow fully dynamic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is kind of the point of why I was trying to understand this better, I am going to see if we can easily make the compressor pluggable (so that we can compress extension types into custom encodings).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need to add schemas to a registry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, or something potentially more complicated, or a reworking of how we register schemes in the first place

//!
//! Schemes can produce arrays that are themselves further compressed (e.g. FoR then BitPacking),
//! up to `MAX_CASCADE` (3) layers deep. An `Excludes` set prevents the same scheme from being
//! applied twice in a chain.
//!
//! # Example
//!
//! ```rust
Expand Down
19 changes: 18 additions & 1 deletion vortex-btrblocks/src/scheme.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,24 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright the Vortex contributors

//! Compression scheme traits.
//! Compression scheme traits. This is the interface each encoding implements to participate in
//! compression.
//!
//! [`Scheme`] is the core trait. Each encoding (e.g. BitPacking, ALP, Dict) implements it with
//! two key methods: [`Scheme::expected_compression_ratio`] to estimate how well it compresses
//! the data, and [`Scheme::compress`] to apply the encoding. Type-specific sub-traits
//! ([`IntegerScheme`], [`FloatScheme`], [`StringScheme`]) bind schemes to the appropriate stats
//! and code types.
//!
//! [`SchemeExt`] provides the default ratio estimation strategy. It samples ~1% of the array
//! (minimum [`SAMPLE_SIZE`] values), compresses the sample, and returns the before/after byte
//! ratio. Schemes can override [`Scheme::expected_compression_ratio`] if they have a cheaper
//! heuristic.
//!
//! [`IntegerScheme`]: crate::compressor::integer::IntegerScheme
//! [`FloatScheme`]: crate::compressor::float::FloatScheme
//! [`StringScheme`]: crate::compressor::string::StringScheme
//! [`SAMPLE_SIZE`]: crate::stats::SAMPLE_SIZE

use std::fmt::Debug;
use std::hash::Hash;
Expand Down
Loading