Open
Description
(Created on Aug 30, 2023 by @iljakuklic)
Mempool is size limited, by default to 300MB. It tracks the size by estimating how much the data take up in memory, not the encoded size used for e.g. the block size limit. Therefore, a transaction is expected to take up more space in mempool than is its encoding size. In bitcoin, it tends to be about a factor of 3. In my measurements, the current mempool implementation achieves (very roughly) about factor of 10 overhead.
There are multiple facets to this issue:
- When a transaction is decoded, the efficient serialized version of input list, output list, and witnesses is loaded into vec. Each vec adds three extra pointers worth of size overhead. It is unclear how much can be done about it but some experimentation with
SmallVec
could potentially yield some improvements. - Compact integers are expanded. Not much can be done about this without serious tradeoffs.
- Mempool store heavily uses
BTreeSet
s andBTreeMap
s to look up transactions by various criteria. Some kind of sorted sequence type can be expected to be more space efficient. Similar toVec
but it needs to support reasonably efficient insertion and is kept sorted by the indexed attribute for fast lookup. - Transactions are identified by transaction ID in the various lookup maps. Transaction ID is a rather heavy identifier (32 byte hash). Bitcoin uses pointers (typically 8 bytes) but dealing with raw pointers is not advisable and with references we run into the self-referential struct lifetime issues. A possible approach is to use integer IDs and keep a mapping from transaction IDs to these internal IDs. The storage for the actual transactions could then be stored in
Vec
orSlab
rather than aBTreeMap
. The data structure used determines possible internal ID allocation schemes. Similar approach is already used in the orphan pool. - Mempool store keeps track of transaction insertion order by means of two maps. It's used when resetting the mempool state after a reorg. This could be improved in a way discussed above or it could even be completely eliminated if the transactions are added in an order that makes the dependencies automatically satisfied in manner similar to
collect_txs
. A middle ground solution is to allocate internal IDs monotonically as transactions come in. That prevents us from storing the transactions inVec
orSlab
. - Many more improvements could be made for sure.