Skip to content

Remove The Hashmap from Shorted Path for Centrality Computation #1307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

Paulo-21
Copy link
Contributor

@Paulo-21 Paulo-21 commented Nov 3, 2024

Hello,
I removed the hashmap for the shorted path for centrality.

This may improve performance.

Tell me what do u think about :)

@coveralls
Copy link

coveralls commented Nov 3, 2024

Pull Request Test Coverage Report for Build 15007738295

Details

  • 51 of 51 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.001%) to 95.235%

Totals Coverage Status
Change from base Build 14954677437: -0.001%
Covered Lines: 18727
Relevant Lines: 19664

💛 - Coveralls

Copy link
Collaborator

@IvanIsCoding IvanIsCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think minimizing the number of hashing operations that happen during the shortest path is a good idea in general. But I am not convinced this works, we need to benchmark this more carefully.

I have a feeling this method will be slower for directed graphs, where some nodes are not able to reach all other nodes in the graph. In those cases, having a small hashmap with the nodes that can be reached is much faster than having a large vector with mostly non-visited entries.

let mut verts_sorted_by_distance: Vec<G::NodeId> = Vec::with_capacity(c); // a stack
let mut predecessors: Vec<Vec<usize>> = vec![Vec::new(); max_index];
let mut sigma: Vec<f64> = vec![0.; max_index];
let mut distance: Vec<i64> = vec![-1; max_index];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more appropriate type here is Option<i64> if you want to represent missing paths. I'd even say Option<usize>

let coeff = (1.0 + delta[iw]) / path_calc.sigma[iw];
let p_w = path_calc.predecessors.get(iw).unwrap();
for iv in p_w {
//let iv = graph.to_index(*v);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment

Comment on lines -361 to -423

for node in graph.node_identifiers() {
predecessors.insert(node, Vec::new());
sigma.insert(node, 0.0);
distance.insert(node, -1);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Se how the hashmap was full filled with value for all node of the graph, so replacing with a vec of size of node bound will no be a problem for cache efficiency, because hashmap was already full when the algorithm started

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was not involved in the original review of #799. Your version is definetely better than what was submitted in #799. With that being said, I'd like to compare it to a more optimized version like in https://github.com/IvanIsCoding/rustworkx/blob/f9de1db7fd5f9efafe4d6f5d9012ae2c75364081/rustworkx-core/src/centrality.rs#L1137. But that can come in a follow up PR.

@Paulo-21
Copy link
Contributor Author

Paulo-21 commented Nov 8, 2024

I think minimizing the number of hashing operations that happen during the shortest path is a good idea in general. But I am not convinced this works, we need to benchmark this more carefully.

I have a feeling this method will be slower for directed graphs, where some nodes are not able to reach all other nodes in the graph. In those cases, having a small hashmap with the nodes that can be reached is much faster than having a large vector with mostly non-visited entries.

I heard your argument and i agree that we should benchmark to be sure that it will be faster when the hashmap was not full filled
But in this particular case the hashmap was full filled before the algorithm started so i don't think it will cause any different.
I think we can replace every hashmap indexed by NodeId type and that is full filled before algorithm start.

@Paulo-21
Copy link
Contributor Author

Paulo-21 commented May 13, 2025

Hello, i have finaly run some benchmark
This is a screenshot from the perf tool on linux with the version with hashmap.
We can see that there a non -negligable time spend with the hashmap.
image

I have run some benchmark in single thread :

For a small graph
with hashmap 1789 micro seconds
without hasmap 1050 micro seconds .

For a very big graph :
with hashmap : 949950855 micro seconds -> 950 secondes
without hashmap : 455474771 micro seconds -> 455 secondes

Do you want more benchmark ?

Copy link
Collaborator

@IvanIsCoding IvanIsCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will stand corrected that indeed this is better than the code from #799. I was not involved in the review of #799, but to be honest we can do much better than hashing every single node in the graph (multiple) times.

Your numbers are nice but not very reproducible at the moment, I suggest you test both edge_betweness_centrality and betweness_centrality with two best/worst case scenarios from Python:

With the benchmark code, we can reuse it to testperformance improvements. I think at the end, the status of thee code will be:

  • Undirected graphs always call the vector version (this PR)
  • Directed graphs call an optimized version with hash map (future PR)

G::NodeId: Eq + Hash,
G::EdgeId: Eq + Hash,
G::NodeId: Eq,
G::EdgeId: Eq,
{
let mut verts_sorted_by_distance: Vec<G::NodeId> = Vec::new(); // a stack
let c = graph.node_count();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we'd need to change this to node_bound(). It is definetely worth adding a test case where some nodes are removed. I need to check if we have any

Comment on lines -361 to -423

for node in graph.node_identifiers() {
predecessors.insert(node, Vec::new());
sigma.insert(node, 0.0);
distance.insert(node, -1);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was not involved in the original review of #799. Your version is definetely better than what was submitted in #799. With that being said, I'd like to compare it to a more optimized version like in https://github.com/IvanIsCoding/rustworkx/blob/f9de1db7fd5f9efafe4d6f5d9012ae2c75364081/rustworkx-core/src/centrality.rs#L1137. But that can come in a follow up PR.

@IvanIsCoding
Copy link
Collaborator

Also, last but not least add a test to test edge betweness centrality with a deleted node:

class TestCentralityGraphDeletedNode(unittest.TestCase):

@Paulo-21
Copy link
Contributor Author

Hello,
come back from benchmark

I have run some benchmark with the graphs you mentionned before.
The supposed worse case and best case senarios.

For the Directed graph of 10000 nodes :

With hashmap : 2.5117154121398926 secondes
without : 0.8792691230773926 secondes

For Complete graph of 1000 nodes :

With hashmap : 4.231908559799194 secondes
Without : 2.379117965698242 secondes

if you want to test by your self
pip install --force-reinstall git+https://github.com/Paulo-21/rustworkx

from rustworkx import betweenness_centrality
import rustworkx.generators
from time import time
#graph = rustworkx.generators.directed_path_graph(100000)
graph = rustworkx.generators.complete_graph(1000)
t = time()
betweenness_centrality(graph)
print(time() - t)

@Paulo-21
Copy link
Contributor Author

Also, last but not least add a test to test edge betweness centrality with a deleted node:

class TestCentralityGraphDeletedNode(unittest.TestCase):

Okay i will do it later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants