Skip to content

Commit 0ec469b

Browse files
committed
Adding more docs
1 parent fe9810c commit 0ec469b

File tree

5 files changed

+158
-2
lines changed

5 files changed

+158
-2
lines changed

_static/plac.png

505 KB
Loading

aapsp.rst

+74
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
Shortest paths approximation
2+
=============================
3+
4+
In order to limit computation time of shortest paths for large graphs, library gives ability to approximate them. Approximation can be divided into four main phases:
5+
6+
#. Graph coarsening
7+
#. Paths calculation in coarsed graph
8+
#. 2-hop neighborhood distances calculation
9+
#. Paths approximation
10+
11+
Approximation gives worst-case result of 3*p+2 where p is real path. Result is not awesome in terms of beeing exact, but it keeps rankings of vertices and can be used for measures approximation (Closeness) or in tasks where order of vertices is important, not exact distance.
12+
13+
14+
Alghotim block scheme
15+
----------------------
16+
.. image:: _static/plac.png
17+
:scale: 50 %
18+
:alt: PLAC algorithm
19+
:align: center
20+
:target: _static/plac.png
21+
22+
23+
Examples
24+
----------------------
25+
26+
Alghoritm API lets to compute paths :
27+
28+
* For single vertex:
29+
30+
.. code-block:: scala
31+
32+
import ml.sparkling.graph.operators.algorithms.aproximation.ApproximatedShortestPathsAlgorithm
33+
import org.apache.spark.SparkContext
34+
import org.apache.spark.graphx.{Graph, VertexRDD}
35+
36+
implicit ctx:SparkContext=???
37+
// initialize your SparkContext as implicit value
38+
val graph = ???
39+
// load your graph (for example using Graph loading API)
40+
val sourceVertexId=1
41+
val graphWithPaths=ApproximatedShortestPathsAlgorithm.computeSingleShortestPathsLengths(graph,sourceVertexId)
42+
val paths : VertexRDD[Iterable[(VertexId, JDouble)] = graphWithPaths.vertices
43+
44+
* For whole graph:
45+
46+
.. code-block:: scala
47+
48+
import ml.sparkling.graph.operators.algorithms.aproximation.ApproximatedShortestPathsAlgorithm
49+
import org.apache.spark.SparkContext
50+
import org.apache.spark.graphx.{Graph, VertexRDD}
51+
52+
implicit ctx:SparkContext=???
53+
// initialize your SparkContext as implicit value
54+
val graph = ???
55+
// load your graph (for example using Graph loading API)
56+
val graphWithPaths = ApproximatedShortestPathsAlgorithm.computeShortestPaths(graph)
57+
val paths : VertexRDD[Iterable[(VertexId, JDouble)] = graphWithPaths.vertices
58+
59+
* using iterative approach:
60+
61+
.. code-block:: scala
62+
63+
import ml.sparkling.graph.operators.algorithms.aproximation.ApproximatedShortestPathsAlgorithm
64+
import org.apache.spark.SparkContext
65+
import org.apache.spark.graphx.{Graph, VertexRDD}
66+
67+
implicit ctx:SparkContext=???
68+
// initialize your SparkContext as implicit value
69+
val graph = ???
70+
// load your graph (for example using Graph loading API)
71+
val bucketSize=10
72+
val graphWithPaths = ApproximatedShortestPathsAlgorithm.computeShortestPathsLengthsIterative(graph, (g:Graph[_,_])=>bucketSize)
73+
val paths : VertexRDD[Iterable[(VertexId, JDouble)] = graphWithPaths.vertices
74+

coarsening.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
Graph coarsening
22
=====================
33

4-
In order to limit computation, you can decrease graph size using coarsening operator. New graph will be smaller because neighborhood vertices will be coarsed into single vertices. Edges are created using edges from input vertices, filtering self loops.
4+
In order to limit computation, you can decrease graph size using coarsening operator. New graph will be smaller because neighborhood vertices will be coarsed into single vertices. Edges are created using edges from input graph, filtering self loops.
55

66

77
Label propagation based graph coarsening
88
-------------------------------------------
99

10-
One of implementation is based on label propagation. Just three iterations are enaught in order to coarse graph. Implementation propagates vertex identifier to neighbours. Neighbours groups them and sorts by number of occurences. If number of occurences is same, greater one is selected (in order to gurante deterministic execution). After that last id is selected (one with bigest number of occurences, or greatest one). After three iterations, vertices has their final IDs.
10+
One of implementation is based on label propagation. Implementation propagates vertex identifier to neighbours. Neighbours groups them and sorts by number of occurences. If number of occurences is same, minimal one is selected (in order to gurante deterministic execution). Otherwise, vertex identifier with bigest number of occurencies (or minimal one in case of same occurencies number) is selected .
1111

1212
.. code-block:: scala
1313

index.rst

+2
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ For bigger insight please refer to `a API`_ documentation in ScalaDocs.
2323
comunities
2424
coarsening
2525
measures
26+
partitioning
27+
aapsp
2628
links
2729
todos
2830

partitioning.rst

+80
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
Partitioning methods
2+
=====================
3+
4+
Library provides multiple methods for graph partitioning. By default GraphX provides only random methods, in SparklingGraph you can find approaches that are using structural properties of graphs in order to minimize computation times and storage overheads.
5+
6+
7+
8+
Propagation bases
9+
------------------
10+
11+
In that approach, label propagation is used in order to determine vertex cluster id. In iterative way, alghoritm propagates vertices ids. In each step, vertex selects minimal id from all recived. Steps are repeated until number of components in graph is less than or equal number of requested partitions. If number of unique clusters ids is not equal to the number of requested partitions, alghoritm selects closer solution.
12+
13+
.. code-block:: scala
14+
15+
import ml.sparkling.graph.operators.partitioning.PropagationBasedPartitioning
16+
import org.apache.spark.SparkContext
17+
import org.apache.spark.graphx.Graph
18+
19+
implicit ctx:SparkContext=???
20+
// initialize your SparkContext as implicit value
21+
val graph = ???
22+
// load your graph (for example using Graph loading API)
23+
val numberOfRequiredPartitions=24
24+
val partitionedGraph = PropagationBasedPartitioning.partitionGraphBy(graph,numberOfRequiredPartitions)
25+
26+
27+
Naive PSCAN
28+
------------------
29+
30+
Aglhorimt use PSCAN alghoritm to determine comunities in graph and then use them as partitions. Without configuration, method use default PSCAN configuration, but that can be changed if it is needed.
31+
32+
.. code-block:: scala
33+
34+
import ml.sparkling.graph.operators.partitioning.CommunityBasedPartitioning
35+
import ml.sparkling.graph.operators.algorithms.community.pscan.PSCAN
36+
import org.apache.spark.SparkContext
37+
import org.apache.spark.graphx.Graph
38+
39+
implicit ctx:SparkContext=???
40+
// initialize your SparkContext as implicit value
41+
val graph = ???
42+
// load your graph (for example using Graph loading API)
43+
val communityDetectionMethod=PSCAN
44+
val partitionedGraph = CommunityBasedPartitioning.partitionGraphBy(graph,communityDetectionMethod)
45+
46+
47+
In order to change parameters you can use
48+
49+
.. code-block:: scala
50+
51+
import ml.sparkling.graph.operators.partitioning.CommunityBasedPartitioning
52+
import ml.sparkling.graph.operators.algorithms.community.pscan.PSCAN
53+
import org.apache.spark.SparkContext
54+
import org.apache.spark.graphx.Graph
55+
56+
implicit ctx:SparkContext=???
57+
// initialize your SparkContext as implicit value
58+
val graph = ???
59+
// load your graph (for example using Graph loading API)
60+
val partitionedGraph = CommunityBasedPartitioning.partitionGraphBy(graph,PSCAN.computeConnectedComponents(_,epsilon = 0))
61+
62+
63+
64+
Dynamic PSCAN
65+
------------------
66+
67+
That is solution that use PSCAN alghoritm in conduction with epsilon parameter search. Aglhoritm looks for possible epsilon values and use binary search to find one that terurns clustering that hase size closest to requested number of partitions. Found clustering is used as partitioning.
68+
69+
.. code-block:: scala
70+
71+
import ml.sparkling.graph.operators.partitioning.PSCANBasedPartitioning
72+
import org.apache.spark.SparkContext
73+
import org.apache.spark.graphx.Graph
74+
75+
implicit ctx:SparkContext=???
76+
// initialize your SparkContext as implicit value
77+
val graph = ???
78+
// load your graph (for example using Graph loading API)
79+
val numberOfRequiredPartitions=24
80+
val partitionedGraph = PSCANBasedPartitioning.partitionGraphBy(graph,numberOfRequiredPartitions)

0 commit comments

Comments
 (0)