-
Notifications
You must be signed in to change notification settings - Fork 153
Use the one flatbuffer to store all lists #489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pub(crate) fn filter_list(&self) -> fb::NetworkFilterList<'_> { | ||
unsafe { fb::root_as_network_filter_list_unchecked(self.data()) } | ||
pub(crate) fn root(&self) -> fb::Engine<'_> { | ||
unsafe { fb::root_as_engine_unchecked(self.data()) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
flatbuffers::Vector<'a, flatbuffers::ForwardsUOffset<NetworkFilterList>>, | ||
>>(Engine::VT_LISTS, None) | ||
.unwrap() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
None, | ||
) | ||
.unwrap() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust Benchmark
Benchmark suite | Current: 0bb327f | Previous: 4738d3f | Ratio |
---|---|---|---|
rule-match-browserlike/brave-list |
2268568240 ns/iter (± 20502033 ) |
2252126465 ns/iter (± 13369021 ) |
1.01 |
rule-match-first-request/brave-list |
1006882 ns/iter (± 6061 ) |
1000284 ns/iter (± 8364 ) |
1.01 |
blocker_new/brave-list |
150003954 ns/iter (± 1107279 ) |
150674514 ns/iter (± 1975624 ) |
1.00 |
blocker_new/brave-list-deserialize |
63987710 ns/iter (± 1362897 ) |
62360204 ns/iter (± 1865060 ) |
1.03 |
memory-usage/brave-list-initial |
16282069 ns/iter (± 3 ) |
16225933 ns/iter (± 3 ) |
1.00 |
memory-usage/brave-list-initial/max |
64817658 ns/iter (± 3 ) |
64817658 ns/iter (± 3 ) |
1 |
memory-usage/brave-list-initial/alloc-count |
1514486 ns/iter (± 3 ) |
1514650 ns/iter (± 3 ) |
1.00 |
memory-usage/brave-list-1000-requests |
2516487 ns/iter (± 3 ) |
2505592 ns/iter (± 3 ) |
1.00 |
memory-usage/brave-list-1000-requests/alloc-count |
66572 ns/iter (± 3 ) |
66070 ns/iter (± 3 ) |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
f8e5204
to
27de614
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked over the unsafe usage and this matches what it was previously. We changed the return type here, but it's not a concern and the implementation matches what it previously was. I'm removing those alerts
src/filters/fb_network.rs
Outdated
// TODO: do we need another feature for this? | ||
#[cfg(feature = "unsync-regex-caching")] | ||
pub(crate) type SharedStateRef = std::rc::Rc<SharedState>; | ||
#[cfg(not(feature = "unsync-regex-caching"))] | ||
pub(crate) type SharedStateRef = std::rc::Arc<SharedState>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same feature for both is fine, but it would be nice to rename it since it'd no longer be strictly about regex caching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On that note though, I don't think we actually need any refcounted pointers for this? We could pass the &SharedState
in as an argument to whatever functions require it, or give Blocker
a <'a>
lifetime so it can own a shared_state: &'a SharedState
field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds logical, but the issue is that Engine
(the primary owner of shared_state
) also own a Blocker instance (that also stores shared_state
).
A class member cannot use the same lifetime as another class member without stuff like https://docs.rs/ouroboros/latest/ouroboros/attr.self_referencing.html
Without using Rc
Blocker will became Blocker<'a>
with a limited lifetime and can't be stored as a member of Engine, which results in major changes in the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the feature renamed to single-thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking is it could be organized like this minimal example:
type FilterDataContextRef = [u8; 32];
struct Engine {
filter_data_ref: FilterDataContextRef,
}
impl Engine {
fn new() -> Self {
let filter_data_ref = [0u8; 32];
Self {
filter_data_ref,
}
}
fn blocker<'a>(&'a self) -> Blocker<'a> {
Blocker {
filter_data_ref: &self.filter_data_ref,
}
}
fn check_network_request(&self) -> bool {
self.blocker().check()
}
}
struct Blocker<'a> {
filter_data_ref: &'a FilterDataContextRef,
}
impl<'a> Blocker<'a> {
fn check(&self) -> bool {
true
}
}
Key point is Engine
owns all the data and has the blocker()
method to produce a scoped convenience struct that can be used for the current network blocking methods without moving too much code around.
The only other fields on Blocker
are:
regex_manager
, which also makes sense to keep on the top-levelEngine
so that it may be used for cosmetic filtering as needed (some of uBO's newer syntax features could benefit from it)tags_enabled
, which I don't have strong preferences about since I'd like to get rid of it anyways
src/filters/flat_builder.rs
Outdated
builder.add_filter(filter, list_id as u32); | ||
} | ||
|
||
builder.finish(if optimize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimize && id != FilterId::RemoveParam as u32 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was my initial attempt, but we can't use lambas with a capture as fn (like fn(u32) -> bool)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my other comment, but for future reference: https://doc.rust-lang.org/std/keyword.move.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Rust Benchmark'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.10
.
Benchmark suite | Current: 0bb327f | Previous: 4738d3f | Ratio |
---|---|---|---|
blocker_new/brave-list-deserialize |
69734003 ns/iter (± 1109450 ) |
62360204 ns/iter (± 1865060 ) |
1.12 |
This comment was automatically generated by workflow using github-action-benchmark.
src/filters/flat_builder.rs
Outdated
@@ -0,0 +1,337 @@ | |||
//! Builder for creating flatbuffer with serialized engine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be rename it to fb_builder.rs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can. @antonok-edm WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm asking because we have fb_network.rs
and flat_filter_map.rs
and I suppose that fb for flatbuffers
and flat
for flat containers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, sounds good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more general thing - could we do these in a 0.11.x
branch rather than in master
? That way it's a little bit easier to maintain the history of breaking changes vs patch releases
src/filters/flat_builder.rs
Outdated
builder.finish(if optimize { | ||
// Don't optimize removeparam, since it can fuse filters without respecting distinct | ||
|id: u32| id != FilterId::RemoveParam as u32 | ||
} else { | ||
|_| false | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the reason to pass this as a lambda rather than moving the check inside finish()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, the initial idea was to make 2 layers: for the storage and for the logic how and what.
But in fact that parts of code is strongly connected to each other.
Changed to just bool
.
src/filters/flat_builder.rs
Outdated
builder.add_filter(filter, list_id as u32); | ||
} | ||
|
||
builder.finish(if optimize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my other comment, but for future reference: https://doc.rust-lang.org/std/keyword.move.html
// Reconstruct the unique_domains_hashes_map from the flatbuffer data | ||
let root = memory.root(); | ||
let mut unique_domains_hashes_map: HashMap<crate::utils::Hash, u32> = HashMap::new(); | ||
for (index, hash) in root.unique_domains_hashes().iter().enumerate() { | ||
unique_domains_hashes_map.insert(hash, index as u32); | ||
} | ||
FilterDataContextRef::new(Self { | ||
memory, | ||
unique_domains_hashes_map, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a concern for this particular PR, but does it make sense to hold data that needs to be "reconstructed" in a separate buffer? Then we don't necessarily need to hold both copies in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that stored unique_domains_hashes
is index => ShortHash (basically), but we need ShortHash => index.
Maybe it makes sense to store additional mapping (despite it will eat some small amount memory).
Well, I believe we need to dedup and index not only domains, but the other strings, so let's postpone this for some time.
|
||
Self { | ||
blocker: Blocker::new(network_filters, &blocker_options), | ||
blocker: Blocker::from_context(filter_data_context.clone()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convention is to explicitly use Rc::clone(x)
or Arc::clone(x)
rather than x.clone()
to make it clear that it's just a refcount increase.
https://doc.rust-lang.org/book/ch15-04-rc.html#using-rct-to-share-data
In these instances it depends on the feature config, so I'd suggest FilterDataContextRef::clone(x)
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks.
src/filters/flat_builder.rs
Outdated
@@ -0,0 +1,337 @@ | |||
//! Builder for creating flatbuffer with serialized engine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, sounds good to me
We can in theory. But the workflow isn't clear to me. |
[puLL-Merge] - brave/adblock-rust@489 DescriptionThis PR refactors the internal storage system for network filters in the adblock-rust library. The main changes include:
Possible Issues
ChangesChanges
sequenceDiagram
participant Client
participant Engine
participant FilterDataContext
participant FlatBufferBuilder
participant Blocker
Client->>Engine: from_filter_set(rules, optimize)
Engine->>FlatBufferBuilder: make_flatbuffer(network_filters, optimize)
FlatBufferBuilder->>FlatBufferBuilder: categorize filters into lists
FlatBufferBuilder->>FlatBufferBuilder: optimize if requested
FlatBufferBuilder->>FilterDataContext: new(memory)
FilterDataContext-->>Engine: FilterDataContextRef
Engine->>Blocker: from_context(filter_data_context)
Blocker-->>Engine: Blocker instance
Engine-->>Client: Engine instance
Client->>Engine: check_network_request(request)
Engine->>Blocker: check(request, resources)
Blocker->>Blocker: get_list(NetworkFilterListId)
Blocker->>FilterDataContext: access filter data
FilterDataContext-->>Blocker: NetworkFilterList
Blocker-->>Engine: BlockerResult
Engine-->>Client: result
|
The PR moves from per-NetworkList flatbuffers to a one (per-Engine).
It doesn't affect the performance metrics, but open possibilities to put cosmetic filters to the same flatbuffer.
It also simplifies the serialization and deserialization code.
Notes:
insert_dup
was dropped, currently it didn't have any effect with flatbuffers (we compared the flatbuffers offsets, not the unique ids). Maybe it makes sense to restore it the future versions.NetworkFilterLists::new
is left as-is for benches and tests to avoid excessive diff.