Description
SequenceSet
is very useful for managing mailbox state. But I've known since first implementing it that, even though it uses binary searches for O(lg n)
in many places, the current naive implementation of SequenceSet
has some serious performance issues, especially at larger scales (e.g: when a set holds millions of members, which is very common for "All Mail" type mailboxes). For many scenarios, it performs worse than simply keeping a sorted array of integers (using Array#bsearch
, etc).
Most notably: by storing each element as a [min, max]
range/run tuple, we create a 8-byte pointer to 40-byte array for each uint32x2
run. We could easily use 1/3rd as much memory, either by storing a flat array (runs represented by flat.each_cons(2)
) or two "columnar" arrays (runs represented by mins.zip(lens)
. And that could be cut in half by storing the uint32
values in a packed string or by merging each uint32x2
run into a single uint64
Integer—in practice this will almost always be less than UINT62_MAX
and thus stored as a FIXNUM
.
So we're basically using 6x more memory than needed, and that memory isn't even in a contiguous block.
And that's all before we get into more advanced tricks, for example: splitting large sets into UInt16
chunks, allowing runs to be stored as uint16x2
. And once that's done, we can dynamically encode each chunk as either run length encoded, a bitset, or simple array of numbers. I've tested the encoding of real-life mailbox UID sets, and this last step provides enormous memory savings. (While working on this, I accidentally re-invented a naive version of Roaring Bitmaps... and when I started searching for prior art, I discovered roaring bitmaps!)
Additionally, I'd also like to add less naive versions of specific set operations. Most set operations are implemented as simply dup.mutating_version!(other)
. In my benchmarks, some fairly straightforward tricks can be employed to make AND
and XOR
operations 20% to 300% faster for real-world scenarios (i.e: my own production code).
And, of course, none of this should be done without good benchmarks to justify any significant jump in complexity.