Skip to content

1.0.0#5

Open
peterrrock2 wants to merge 201 commits into
1.0.0rcfrom
1.0.0
Open

1.0.0#5
peterrrock2 wants to merge 201 commits into
1.0.0rcfrom
1.0.0

Conversation

@peterrrock2

@peterrrock2 peterrrock2 commented Mar 19, 2026

Copy link
Copy Markdown
Owner

Summary

First stable release (1.0.0) of the rewritten binary-ensemble Python API, headlined by the new single-file .bendl bundle format, a custom-asset system with per-asset integrity checks, and a durability/safety hardening pass across the writer and readers.

Why

0.3.0 shipped a usable but pre-stable Python surface with no self-contained container: an ensemble's assignment stream, its dual graph, the node permutation, and any metadata all lived as separate files with no integrity guarantees and no crash safety. 1.0.0 consolidates that into one bundle format, locks in the public API for a stable release, and hardens the write path so a crash or power loss can't leave a half-written or silently corrupt file.

Changes

  • .bendl bundle format: BendlEncoder / BendlDecoder read and write a single file holding the assignment stream, dual graph, node permutation, metadata, and custom assets. Adds compress_stream (recompress a bundle's stream to
    XBEN) and relabel_bundle (reorder a bundle's graph and rewrite its stream to match), both asset-preserving.
  • Custom assets: add_asset takes JSON, text, binary, or a file path, with CRC32C on every asset, transparent xz compression at ≥1 KiB, and BendlDecoder.verify() to check a whole bundle's checksums at once. Asset names are guarded against colliding with reserved bundle entries.
  • Durability & safety hardening: atomic file swap, power-loss/mid-write crash safety (ask me why I care about this), the encoder poisons itself on a failed stream finalize, payload lengths are validated against file size, and readers are hardened against oversized assets, malicious assignment lengths, and mid-file zero-byte corruption. Internal panics are converted to propagated errors.
  • Plain streams & codecs: BenEncoder / BenDecoder for .ben/.xben, frame-skip subsampling (subsampleindices / _range / _every) shared with the bundle decoder, and whole-file encode_ / decode__ helpers across JSONL/BEN/XBEN.
  • Encoding variants & graph ordering; standard, mkv_chain, and twodelta (now the default) with auto-detection on read; binary_ensemble.graph exposes MLC (multi-level clustering), RCM (reverse Cuthill-McKee), and key-based orderings. Fixes a twodelta failure on a pathological 2-district case and removes the old from_parts / try_from_parts semantics.
  • Docs & CI: full Sphinx/Furo docs site with executed tutorial notebooks, plus CI additions: cross-architecture tests, a Rust linter, a wheel smoke test, a soak test, expanded fuzzing, and a better coverage harness.

Testing

  • Tests pass locally
  • Added/updated tests
  • Manually tested relevant behavior

Verification on this branch: unit tests for both the Rust crate and Python package, an expanded fuzzing suite, a "soak" test, cross-architecture test runs, and a wheel smoke test in CI. The docs site builds clean under -W (warnings-as-errors) with notebooks executed end-to-end against the live API.

Reviewer Notes

  • This is a 0.3.0 → 1.0.0 release with breaking changes: the previous multi-file format superseded by .bendl. Highest-risk surface is the crash write path (atomic swap, finalize poisoning) and the corruption-hardening in the readers.
  • It is a major goal of this release to harden the entire format against as many errors as possible. Thorough review of the write path and the readers is appreciated, especially with an eye towards corrupted inputs.

@peterrrock2

Copy link
Copy Markdown
Owner Author

/ci-full

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant