Improve `SequenceSet` memory use and performance

`SequenceSet` is very useful for managing mailbox state.  But I've known since first implementing it that, even though it uses binary searches for `O(lg n)` in many places, the current naive implementation of `SequenceSet` has some serious performance issues, especially at larger scales (e.g: when a set holds millions of members, which is very common for "All Mail" type mailboxes).  For many scenarios, it performs worse than simply keeping a sorted array of integers (using `Array#bsearch`, etc).

Most notably: by storing each element as a `[min, max]` range/run tuple, we create a 8-byte pointer to 40-byte array for each `uint32x2` run.  We could easily use 1/3rd as much memory, either by storing a flat array (runs represented by `flat.each_cons(2)`) or two "columnar" arrays (runs represented by `mins.zip(lens)`.  And that could be cut in half by storing the `uint32` values in a packed string or by merging each `uint32x2` run into a single `uint64` Integer—in practice this will almost always be less than `UINT62_MAX` and thus stored as a `FIXNUM`.

So we're basically using 6x more memory than needed, and that memory isn't even in a contiguous block.

And that's all before we get into more advanced tricks, for example: splitting large sets into `UInt16` chunks, allowing runs to be stored as `uint16x2`.  And once that's done, we can dynamically encode each chunk as either run length encoded, a bitset, or simple array of numbers.  I've tested the encoding of real-life mailbox UID sets, and this last step provides _enormous_ memory savings.  (While working on this, I accidentally re-invented a naive version of [Roaring Bitmaps](https://roaringbitmap.org/)... and when I started searching for prior art, I discovered [roaring bitmaps](https://roaringbitmap.org)!)

Additionally, I'd also like to add less naive versions of specific set operations.  Most set operations are implemented as simply `dup.mutating_version!(other)`.  In my benchmarks, some fairly straightforward tricks can be employed to make `AND` and `XOR` operations 20% to 300% faster for real-world scenarios (i.e: my own production code).

And, of course, none of this should be done without good benchmarks to justify any significant jump in complexity.
* [x] #486 
* [ ] #485 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve `SequenceSet` memory use and performance #484

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve SequenceSet memory use and performance #484

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Improve `SequenceSet` memory use and performance #484