-
Notifications
You must be signed in to change notification settings - Fork 179
Remove The Hashmap from Shorted Path for Centrality Computation #1307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Pull Request Test Coverage Report for Build 15007738295Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think minimizing the number of hashing operations that happen during the shortest path is a good idea in general. But I am not convinced this works, we need to benchmark this more carefully.
I have a feeling this method will be slower for directed graphs, where some nodes are not able to reach all other nodes in the graph. In those cases, having a small hashmap with the nodes that can be reached is much faster than having a large vector with mostly non-visited entries.
let mut verts_sorted_by_distance: Vec<G::NodeId> = Vec::with_capacity(c); // a stack | ||
let mut predecessors: Vec<Vec<usize>> = vec![Vec::new(); max_index]; | ||
let mut sigma: Vec<f64> = vec![0.; max_index]; | ||
let mut distance: Vec<i64> = vec![-1; max_index]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more appropriate type here is Option<i64>
if you want to represent missing paths. I'd even say Option<usize>
rustworkx-core/src/centrality.rs
Outdated
let coeff = (1.0 + delta[iw]) / path_calc.sigma[iw]; | ||
let p_w = path_calc.predecessors.get(iw).unwrap(); | ||
for iv in p_w { | ||
//let iv = graph.to_index(*v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this comment
|
||
for node in graph.node_identifiers() { | ||
predecessors.insert(node, Vec::new()); | ||
sigma.insert(node, 0.0); | ||
distance.insert(node, -1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Se how the hashmap was full filled with value for all node of the graph, so replacing with a vec of size of node bound will no be a problem for cache efficiency, because hashmap was already full when the algorithm started
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I was not involved in the original review of #799. Your version is definetely better than what was submitted in #799. With that being said, I'd like to compare it to a more optimized version like in https://github.com/IvanIsCoding/rustworkx/blob/f9de1db7fd5f9efafe4d6f5d9012ae2c75364081/rustworkx-core/src/centrality.rs#L1137. But that can come in a follow up PR.
I heard your argument and i agree that we should benchmark to be sure that it will be faster when the hashmap was not full filled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will stand corrected that indeed this is better than the code from #799. I was not involved in the review of #799, but to be honest we can do much better than hashing every single node in the graph (multiple) times.
Your numbers are nice but not very reproducible at the moment, I suggest you test both edge_betweness_centrality and betweness_centrality with two best/worst case scenarios from Python:
- https://www.rustworkx.org/apiref/rustworkx.generators.directed_path_graph.html -> worst case scenario, very sparse braph
- https://www.rustworkx.org/apiref/rustworkx.generators.complete_graph.html -> best case scenario for the optimization
With the benchmark code, we can reuse it to testperformance improvements. I think at the end, the status of thee code will be:
- Undirected graphs always call the vector version (this PR)
- Directed graphs call an optimized version with hash map (future PR)
G::NodeId: Eq + Hash, | ||
G::EdgeId: Eq + Hash, | ||
G::NodeId: Eq, | ||
G::EdgeId: Eq, | ||
{ | ||
let mut verts_sorted_by_distance: Vec<G::NodeId> = Vec::new(); // a stack | ||
let c = graph.node_count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we'd need to change this to node_bound()
. It is definetely worth adding a test case where some nodes are removed. I need to check if we have any
|
||
for node in graph.node_identifiers() { | ||
predecessors.insert(node, Vec::new()); | ||
sigma.insert(node, 0.0); | ||
distance.insert(node, -1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I was not involved in the original review of #799. Your version is definetely better than what was submitted in #799. With that being said, I'd like to compare it to a more optimized version like in https://github.com/IvanIsCoding/rustworkx/blob/f9de1db7fd5f9efafe4d6f5d9012ae2c75364081/rustworkx-core/src/centrality.rs#L1137. But that can come in a follow up PR.
Also, last but not least add a test to test edge betweness centrality with a deleted node: rustworkx/tests/graph/test_centrality.py Line 62 in 30897c5
|
Hello, I have run some benchmark with the graphs you mentionned before. For the Directed graph of 10000 nodes :With hashmap : 2.5117154121398926 secondes For Complete graph of 1000 nodes :With hashmap : 4.231908559799194 secondes if you want to test by your self
|
Okay i will do it later |
Hello,
I removed the hashmap for the shorted path for centrality.
This may improve performance.
Tell me what do u think about :)