Remove The Hashmap from Shorted Path for Centrality Computation #1307

Paulo-21 · 2024-11-03T18:53:33Z

Hello,
I removed the hashmap for the shorted path for centrality.

This may improve performance.

Tell me what do u think about :)

… performance

coveralls · 2024-11-03T23:23:33Z

Pull Request Test Coverage Report for Build 15007738295

Details

51 of 51 (100.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.001%) to 95.235%

Totals
Change from base Build 14954677437:	-0.001%
Covered Lines:	18727
Relevant Lines:	19664

💛 - Coveralls

IvanIsCoding

I think minimizing the number of hashing operations that happen during the shortest path is a good idea in general. But I am not convinced this works, we need to benchmark this more carefully.

I have a feeling this method will be slower for directed graphs, where some nodes are not able to reach all other nodes in the graph. In those cases, having a small hashmap with the nodes that can be reached is much faster than having a large vector with mostly non-visited entries.

IvanIsCoding · 2024-11-05T00:22:53Z

rustworkx-core/src/centrality.rs

+    let mut verts_sorted_by_distance: Vec<G::NodeId> = Vec::with_capacity(c); // a stack
+    let mut predecessors: Vec<Vec<usize>> = vec![Vec::new(); max_index];
+    let mut sigma: Vec<f64> = vec![0.; max_index];
+    let mut distance: Vec<i64> = vec![-1; max_index];


A more appropriate type here is Option<i64> if you want to represent missing paths. I'd even say Option<usize>

IvanIsCoding · 2024-11-05T00:29:09Z

rustworkx-core/src/centrality.rs

+        let coeff = (1.0 + delta[iw]) / path_calc.sigma[iw];
+        let p_w = path_calc.predecessors.get(iw).unwrap();
+        for iv in p_w {
+            //let iv = graph.to_index(*v);


Remove this comment

Paulo-21 · 2024-11-08T15:38:18Z

rustworkx-core/src/centrality.rs

-
-    for node in graph.node_identifiers() {
-        predecessors.insert(node, Vec::new());
-        sigma.insert(node, 0.0);
-        distance.insert(node, -1);
-    }


Se how the hashmap was full filled with value for all node of the graph, so replacing with a vec of size of node bound will no be a problem for cache efficiency, because hashmap was already full when the algorithm started

So I was not involved in the original review of #799. Your version is definetely better than what was submitted in #799. With that being said, I'd like to compare it to a more optimized version like in https://github.com/IvanIsCoding/rustworkx/blob/f9de1db7fd5f9efafe4d6f5d9012ae2c75364081/rustworkx-core/src/centrality.rs#L1137. But that can come in a follow up PR.

Paulo-21 · 2024-11-08T15:40:03Z

I think minimizing the number of hashing operations that happen during the shortest path is a good idea in general. But I am not convinced this works, we need to benchmark this more carefully.

I have a feeling this method will be slower for directed graphs, where some nodes are not able to reach all other nodes in the graph. In those cases, having a small hashmap with the nodes that can be reached is much faster than having a large vector with mostly non-visited entries.

I heard your argument and i agree that we should benchmark to be sure that it will be faster when the hashmap was not full filled
But in this particular case the hashmap was full filled before the algorithm started so i don't think it will cause any different.
I think we can replace every hashmap indexed by NodeId type and that is full filled before algorithm start.

Paulo-21 · 2025-05-13T21:57:47Z

Hello, i have finaly run some benchmark
This is a screenshot from the perf tool on linux with the version with hashmap.
We can see that there a non -negligable time spend with the hashmap.

I have run some benchmark in single thread :

For a small graph
with hashmap 1789 micro seconds
without hasmap 1050 micro seconds .

For a very big graph :
with hashmap : 949950855 micro seconds -> 950 secondes
without hashmap : 455474771 micro seconds -> 455 secondes

Do you want more benchmark ?

IvanIsCoding

I will stand corrected that indeed this is better than the code from #799. I was not involved in the review of #799, but to be honest we can do much better than hashing every single node in the graph (multiple) times.

Your numbers are nice but not very reproducible at the moment, I suggest you test both edge_betweness_centrality and betweness_centrality with two best/worst case scenarios from Python:

https://www.rustworkx.org/apiref/rustworkx.generators.directed_path_graph.html -> worst case scenario, very sparse braph
https://www.rustworkx.org/apiref/rustworkx.generators.complete_graph.html -> best case scenario for the optimization

With the benchmark code, we can reuse it to testperformance improvements. I think at the end, the status of thee code will be:

Undirected graphs always call the vector version (this PR)
Directed graphs call an optimized version with hash map (future PR)

IvanIsCoding · 2025-05-13T22:21:04Z

rustworkx-core/src/centrality.rs

-    G::NodeId: Eq + Hash,
-    G::EdgeId: Eq + Hash,
+    G::NodeId: Eq,
+    G::EdgeId: Eq,
 {
    let mut verts_sorted_by_distance: Vec<G::NodeId> = Vec::new(); // a stack
    let c = graph.node_count();


I believe we'd need to change this to node_bound(). It is definetely worth adding a test case where some nodes are removed. I need to check if we have any

IvanIsCoding · 2025-05-13T22:23:48Z

rustworkx-core/src/centrality.rs

-
-    for node in graph.node_identifiers() {
-        predecessors.insert(node, Vec::new());
-        sigma.insert(node, 0.0);
-        distance.insert(node, -1);
-    }


So I was not involved in the original review of #799. Your version is definetely better than what was submitted in #799. With that being said, I'd like to compare it to a more optimized version like in https://github.com/IvanIsCoding/rustworkx/blob/f9de1db7fd5f9efafe4d6f5d9012ae2c75364081/rustworkx-core/src/centrality.rs#L1137. But that can come in a follow up PR.

IvanIsCoding · 2025-05-13T22:32:21Z

Also, last but not least add a test to test edge betweness centrality with a deleted node:

rustworkx/tests/graph/test_centrality.py

Line 62 in 30897c5

class TestCentralityGraphDeletedNode(unittest.TestCase):

Paulo-21 · 2025-05-14T14:54:35Z

Hello,
come back from benchmark

I have run some benchmark with the graphs you mentionned before.
The supposed worse case and best case senarios.

For the Directed graph of 10000 nodes :

With hashmap : 2.5117154121398926 secondes
without : 0.8792691230773926 secondes

For Complete graph of 1000 nodes :

With hashmap : 4.231908559799194 secondes
Without : 2.379117965698242 secondes

if you want to test by your self
pip install --force-reinstall git+https://github.com/Paulo-21/rustworkx

from rustworkx import betweenness_centrality
import rustworkx.generators
from time import time
#graph = rustworkx.generators.directed_path_graph(100000)
graph = rustworkx.generators.complete_graph(1000)
t = time()
betweenness_centrality(graph)
print(time() - t)

Paulo-21 · 2025-05-15T14:30:32Z

Also, last but not least add a test to test edge betweness centrality with a deleted node:

rustworkx/tests/graph/test_centrality.py

Line 62 in 30897c5

class TestCentralityGraphDeletedNode(unittest.TestCase):

Okay i will do it later

Paulo-21 and others added 9 commits April 24, 2024 15:41

Remove the Hashmap of the katz centrality computation to avoid better…

88f5122

… performance

cargo fmt

3087023

Merge branch 'main' into main

77a3eb6

Remove the Hashmap from the ShortestPath_for_centrality computation

15cec78

Merge branch 'main' of https://github.com/Paulo-21/rustworkx

7fbbac4

clippy advice

0a2b549

Fix

a6d8171

fmt

70322a0

Fix python test

00af4af

Remove Hashmap from accumulate vertice too

d65c34e

Paulo-21 mentioned this pull request Nov 4, 2024

Avoid using HashMaps in intermediate betweenness centrality computations when not necessary #1309

Open

IvanIsCoding reviewed Nov 5, 2024

View reviewed changes

Merge branch 'Qiskit:main' into main

83cbfe6

Paulo-21 commented Nov 8, 2024

View reviewed changes

Paulo-21 and others added 5 commits November 8, 2024 17:04

Remove useless comment

55a8b91

Merge branch 'main' into main

4332b7c

Merge branch 'Qiskit:main' into main

2b37127

Remove hashmap for betwenness with edge too and clean the code

83707e0

fmt

f0b14f2

Fix clippy

26feeef

IvanIsCoding reviewed May 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove The Hashmap from Shorted Path for Centrality Computation #1307

Remove The Hashmap from Shorted Path for Centrality Computation #1307

Uh oh!

Paulo-21 commented Nov 3, 2024

Uh oh!

coveralls commented Nov 3, 2024 •

edited

Loading

Uh oh!

IvanIsCoding left a comment

Uh oh!

IvanIsCoding Nov 5, 2024

Uh oh!

IvanIsCoding Nov 5, 2024

Uh oh!

Paulo-21 Nov 8, 2024

Uh oh!

IvanIsCoding May 13, 2025

Uh oh!

Paulo-21 commented Nov 8, 2024

Uh oh!

Paulo-21 commented May 13, 2025 •

edited

Loading

Uh oh!

IvanIsCoding left a comment

Uh oh!

IvanIsCoding May 13, 2025

Uh oh!

IvanIsCoding May 13, 2025

Uh oh!

IvanIsCoding commented May 13, 2025

Uh oh!

Paulo-21 commented May 14, 2025

Uh oh!

Paulo-21 commented May 15, 2025

Uh oh!

Uh oh!

Remove The Hashmap from Shorted Path for Centrality Computation #1307

Are you sure you want to change the base?

Remove The Hashmap from Shorted Path for Centrality Computation #1307

Uh oh!

Conversation

Paulo-21 commented Nov 3, 2024

Uh oh!

coveralls commented Nov 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 15007738295

Details

💛 - Coveralls

Uh oh!

IvanIsCoding left a comment

Choose a reason for hiding this comment

Uh oh!

IvanIsCoding Nov 5, 2024

Choose a reason for hiding this comment

Uh oh!

IvanIsCoding Nov 5, 2024

Choose a reason for hiding this comment

Uh oh!

Paulo-21 Nov 8, 2024

Choose a reason for hiding this comment

Uh oh!

IvanIsCoding May 13, 2025

Choose a reason for hiding this comment

Uh oh!

Paulo-21 commented Nov 8, 2024

Uh oh!

Paulo-21 commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IvanIsCoding left a comment

Choose a reason for hiding this comment

Uh oh!

IvanIsCoding May 13, 2025

Choose a reason for hiding this comment

Uh oh!

IvanIsCoding May 13, 2025

Choose a reason for hiding this comment

Uh oh!

IvanIsCoding commented May 13, 2025

Uh oh!

Paulo-21 commented May 14, 2025

For the Directed graph of 10000 nodes :

For Complete graph of 1000 nodes :

Uh oh!

Paulo-21 commented May 15, 2025

Uh oh!

Uh oh!

coveralls commented Nov 3, 2024 •

edited

Loading

Paulo-21 commented May 13, 2025 •

edited

Loading