|
| 1 | +Partitioning methods |
| 2 | +===================== |
| 3 | + |
| 4 | +Library provides multiple methods for graph partitioning. By default GraphX provides only random methods, in SparklingGraph you can find approaches that are using structural properties of graphs in order to minimize computation times and storage overheads. |
| 5 | + |
| 6 | + |
| 7 | + |
| 8 | +Propagation bases |
| 9 | +------------------ |
| 10 | + |
| 11 | +In that approach, label propagation is used in order to determine vertex cluster id. In iterative way, alghoritm propagates vertices ids. In each step, vertex selects minimal id from all recived. Steps are repeated until number of components in graph is less than or equal number of requested partitions. If number of unique clusters ids is not equal to the number of requested partitions, alghoritm selects closer solution. |
| 12 | + |
| 13 | +.. code-block:: scala |
| 14 | + |
| 15 | + import ml.sparkling.graph.operators.partitioning.PropagationBasedPartitioning |
| 16 | + import org.apache.spark.SparkContext |
| 17 | + import org.apache.spark.graphx.Graph |
| 18 | +
|
| 19 | + implicit ctx:SparkContext=??? |
| 20 | + // initialize your SparkContext as implicit value |
| 21 | + val graph = ??? |
| 22 | + // load your graph (for example using Graph loading API) |
| 23 | + val numberOfRequiredPartitions=24 |
| 24 | + val partitionedGraph = PropagationBasedPartitioning.partitionGraphBy(graph,numberOfRequiredPartitions) |
| 25 | +
|
| 26 | +
|
| 27 | +Naive PSCAN |
| 28 | +------------------ |
| 29 | + |
| 30 | +Aglhorimt use PSCAN alghoritm to determine comunities in graph and then use them as partitions. Without configuration, method use default PSCAN configuration, but that can be changed if it is needed. |
| 31 | + |
| 32 | +.. code-block:: scala |
| 33 | + |
| 34 | + import ml.sparkling.graph.operators.partitioning.CommunityBasedPartitioning |
| 35 | + import ml.sparkling.graph.operators.algorithms.community.pscan.PSCAN |
| 36 | + import org.apache.spark.SparkContext |
| 37 | + import org.apache.spark.graphx.Graph |
| 38 | +
|
| 39 | + implicit ctx:SparkContext=??? |
| 40 | + // initialize your SparkContext as implicit value |
| 41 | + val graph = ??? |
| 42 | + // load your graph (for example using Graph loading API) |
| 43 | + val communityDetectionMethod=PSCAN |
| 44 | + val partitionedGraph = CommunityBasedPartitioning.partitionGraphBy(graph,communityDetectionMethod) |
| 45 | +
|
| 46 | +
|
| 47 | +In order to change parameters you can use |
| 48 | + |
| 49 | +.. code-block:: scala |
| 50 | + |
| 51 | + import ml.sparkling.graph.operators.partitioning.CommunityBasedPartitioning |
| 52 | + import ml.sparkling.graph.operators.algorithms.community.pscan.PSCAN |
| 53 | + import org.apache.spark.SparkContext |
| 54 | + import org.apache.spark.graphx.Graph |
| 55 | +
|
| 56 | + implicit ctx:SparkContext=??? |
| 57 | + // initialize your SparkContext as implicit value |
| 58 | + val graph = ??? |
| 59 | + // load your graph (for example using Graph loading API) |
| 60 | + val partitionedGraph = CommunityBasedPartitioning.partitionGraphBy(graph,PSCAN.computeConnectedComponents(_,epsilon = 0)) |
| 61 | +
|
| 62 | +
|
| 63 | +
|
| 64 | +Dynamic PSCAN |
| 65 | +------------------ |
| 66 | + |
| 67 | +That is solution that use PSCAN alghoritm in conduction with epsilon parameter search. Aglhoritm looks for possible epsilon values and use binary search to find one that terurns clustering that hase size closest to requested number of partitions. Found clustering is used as partitioning. |
| 68 | + |
| 69 | +.. code-block:: scala |
| 70 | + |
| 71 | + import ml.sparkling.graph.operators.partitioning.PSCANBasedPartitioning |
| 72 | + import org.apache.spark.SparkContext |
| 73 | + import org.apache.spark.graphx.Graph |
| 74 | +
|
| 75 | + implicit ctx:SparkContext=??? |
| 76 | + // initialize your SparkContext as implicit value |
| 77 | + val graph = ??? |
| 78 | + // load your graph (for example using Graph loading API) |
| 79 | + val numberOfRequiredPartitions=24 |
| 80 | + val partitionedGraph = PSCANBasedPartitioning.partitionGraphBy(graph,numberOfRequiredPartitions) |
0 commit comments