-
-
Notifications
You must be signed in to change notification settings - Fork 15
Implement BlockedBloomFilter #127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 3.0.0
Are you sure you want to change the base?
Conversation
Sharing the benchmark on my PC
|
Would you mind basing this on the |
Sure, will do. Thanks! |
I've updated the MR to be based on the 3.0.0 branch. Here's the benchmarks. Noted that the benchmarking parameters changed compared to my above benchmark, hence the difference in the standard bloom filter benchmarks.
|
The next steps in the PR will be:
Please correct me if I'm wrong, or if you have other ideas. |
Sounds good to me One more thing is comparing the FPR of the bloom filters. Blocked should have slightly higher FPR. |
src/segment/filter/mod.rs
Outdated
pub struct AMQFilterBuilder {} | ||
|
||
impl AMQFilterBuilder { | ||
pub fn decode_from<R: Read>(reader: &mut R) -> Result<Box<dyn AMQFilter + Sync>, DecodeError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't use impl Decode, because we're not returning a AMQFilterBuilder here.
Otherwise, the method signature is similar
#[allow(clippy::len_without_is_empty)] | ||
impl StandardBloomFilter { | ||
// To be used by AMQFilter after magic bytes and filter type have been read and parsed | ||
pub(super) fn decode_from<R: Read>(reader: &mut R) -> Result<Self, DecodeError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
method signature is similar to decode_from in Decode. Changed to it (super) visibility because it's only to be used by AMQFilterBuilder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although, it feels slightly weird to reuse DecodeError here. Not sure if we should use another type of error instead
src/segment/filter/mod.rs
Outdated
} | ||
} | ||
|
||
pub trait AMQFilter: Sync + Send { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding Sync + Send here, otherwise we would get errors from blob_drop_after_flush
h1 = h1.wrapping_add(h2); | ||
h2 = h2.wrapping_add(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines should probably be moved to the end of the loop iteration.
Same for filter (reader)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was deliberate, to add variance between the choice of block and the first bit set in the block.
An edge case would be if num_of_blocks == cache_line_bytes. This would cause us to always set bit X of block X as the first bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is porobably not an issue though because blocks are always logically isolated. It's probably also quite an edge case.
Still pending measuring FPR for BlockedBloomFilter |
Solves #78
It's still WIP, but wanted to raise the PR earlier for feedback, if any
src/bloom
intosrc/bloom/blocked
andsrc/bloom/standard