Mempool storage is very space inefficient

(Created on Aug 30, 2023 by @iljakuklic)

Mempool is size limited, by default to 300MB. It tracks the size by estimating how much the data take up in memory, not the encoded size used for e.g. the block size limit. Therefore, a transaction is expected to take up more space in mempool than is its encoding size. In bitcoin, it tends to be about a factor of 3. In my measurements, the current mempool implementation achieves (very roughly) about factor of 10 overhead.

There are multiple facets to this issue:
* When a transaction is decoded, the efficient serialized version of input list, output list, and witnesses is loaded into vec. Each vec adds three extra pointers worth of size overhead. It is unclear how much can be done about it but some experimentation with `SmallVec` could potentially yield some improvements.
* Compact integers are expanded. Not much can be done about this without serious tradeoffs.
* Mempool store heavily uses `BTreeSet`s and `BTreeMap`s to look up transactions by various criteria. Some kind of sorted sequence type can be expected to be more space efficient. Similar to `Vec` but it needs to support reasonably efficient insertion and is kept sorted by the indexed attribute for fast lookup.
* Transactions are identified by transaction ID in the various lookup maps. Transaction ID is a rather heavy identifier (32 byte hash). Bitcoin uses pointers (typically 8 bytes) but dealing with raw pointers is not advisable and with references we run into the self-referential struct lifetime issues. A possible approach is to use integer IDs and keep a mapping from transaction IDs to these internal IDs. The storage for the actual transactions could then be stored in `Vec` or `Slab` rather than a `BTreeMap`. The data structure used determines possible internal ID allocation schemes. Similar approach is already used in the orphan pool.
* Mempool store keeps track of transaction insertion order by means of two maps. It's used when resetting the mempool state after a reorg. This could be improved in a way discussed above or it could even be completely eliminated if the transactions are added in an order that makes the dependencies automatically satisfied in manner similar to `collect_txs`. A middle ground solution is to allocate internal IDs monotonically as transactions come in. That prevents us from storing the transactions in `Vec` or `Slab`.
* Many more improvements could be made for sure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mempool storage is very space inefficient #1213

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mempool storage is very space inefficient #1213

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions