-
Notifications
You must be signed in to change notification settings - Fork 715
Feat/marf compression #6654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Feat/marf compression #6654
Conversation
…ue if it's 0, store a bitmap and non-empty trie ptrs if the list is sparse, and store node patches atop full nodes (and read them back)
…then record the original node from which it was copied so that a TrieNodePatch can be calculated and stored instead
…oid repeating nodes across tries
… each node and see if we can instead patch an existing node instead of storing a (mostly-unchanged) copy
| .with_compression(true), | ||
| MARFOpenOpts::new(TrieHashCalculationMode::Immediate, "noop", true) | ||
| .with_compression(true), | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: revert this comment block
| + rusqlite::types::ToSql | ||
| + rusqlite::types::FromSql | ||
| + stacks_common::codec::StacksMessageCodec | ||
| + crate::codec::StacksMessageCodec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: use stacks_common not crate
| #[cfg(test)] | ||
| use stacks_common::types::chainstate::BlockHeaderHash; | ||
| use stacks_common::types::chainstate::{ | ||
| use crate::types::chainstate::BlockHeaderHash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: use stacks_common not crate
This PR implements work discovered in #6593. Specifically, it makes the following changes to the on-disk representation of the MARF:
If a
TriePtris not a back-block pointer, then theback_blockfield is not stored since it's always 0.If a node's children
TriePtrlist is sparse, then instead of storingTriePtr::default()for empty pointers, the system now stores a bitmap of which pointers are non-empty, and then only stores the non-emptyTriePtrs. It only does this if this actually saves space over just storing the entire list (storing the entire list is cheaper if the list is nearly full).Instead of copying a trie node from one trie to the next as part of a copy-on-write, the system will only store a patch from the old node to the new node. It will create a list of up to 16 patches across 16 tries before storing a full copy of the node.
In my (very small scale) benchmarks, this saves over 50% of space.
This PR is still a draft and will likely remain so for some time. It needs a lot more unit tests, and it would benefit significantly from fuzz and property testing against the current implementation. In addition, it will need a lot of performance tuning, since the act of reading a list of patch nodes will slow down reads.