Skip to content

Mempool storage is very space inefficient #1213

Open
@TheQuantumPhysicist

Description

@TheQuantumPhysicist

(Created on Aug 30, 2023 by @iljakuklic)

Mempool is size limited, by default to 300MB. It tracks the size by estimating how much the data take up in memory, not the encoded size used for e.g. the block size limit. Therefore, a transaction is expected to take up more space in mempool than is its encoding size. In bitcoin, it tends to be about a factor of 3. In my measurements, the current mempool implementation achieves (very roughly) about factor of 10 overhead.

There are multiple facets to this issue:

  • When a transaction is decoded, the efficient serialized version of input list, output list, and witnesses is loaded into vec. Each vec adds three extra pointers worth of size overhead. It is unclear how much can be done about it but some experimentation with SmallVec could potentially yield some improvements.
  • Compact integers are expanded. Not much can be done about this without serious tradeoffs.
  • Mempool store heavily uses BTreeSets and BTreeMaps to look up transactions by various criteria. Some kind of sorted sequence type can be expected to be more space efficient. Similar to Vec but it needs to support reasonably efficient insertion and is kept sorted by the indexed attribute for fast lookup.
  • Transactions are identified by transaction ID in the various lookup maps. Transaction ID is a rather heavy identifier (32 byte hash). Bitcoin uses pointers (typically 8 bytes) but dealing with raw pointers is not advisable and with references we run into the self-referential struct lifetime issues. A possible approach is to use integer IDs and keep a mapping from transaction IDs to these internal IDs. The storage for the actual transactions could then be stored in Vec or Slab rather than a BTreeMap. The data structure used determines possible internal ID allocation schemes. Similar approach is already used in the orphan pool.
  • Mempool store keeps track of transaction insertion order by means of two maps. It's used when resetting the mempool state after a reorg. This could be improved in a way discussed above or it could even be completely eliminated if the transactions are added in an order that makes the dependencies automatically satisfied in manner similar to collect_txs. A middle ground solution is to allocate internal IDs monotonically as transactions come in. That prevents us from storing the transactions in Vec or Slab.
  • Many more improvements could be made for sure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    mempoolMempool-related issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions