Adding more docs

riomus · riomus · commit 0ec469b789cb · 2017-03-31T19:32:35.000+02:00
diff --git a/_static/plac.png b/_static/plac.png
diff --git a/aapsp.rst b/aapsp.rst
@@ -0,0 +1,74 @@
+Shortest paths approximation
+=============================
+
+In order to limit computation time of shortest paths for large graphs, library gives ability to approximate them. Approximation can be divided into four main phases:
+
+#. Graph coarsening
+#. Paths calculation in coarsed graph
+#. 2-hop neighborhood distances calculation
+#. Paths approximation
+
+Approximation gives worst-case result of 3*p+2 where p is real path. Result is not awesome in terms of beeing exact, but it keeps rankings of vertices and can be used for measures approximation (Closeness) or in tasks where order of vertices is important, not exact distance.
+
+
+Alghotim block scheme
+----------------------
+.. image:: _static/plac.png
+   :scale: 50 %
+   :alt: PLAC algorithm
+   :align: center
+   :target: _static/plac.png 
+
+
+Examples
+----------------------
+
+Alghoritm API lets to compute paths :
+
+*	For single vertex:
+
+	.. code-block:: scala
+		
+		import ml.sparkling.graph.operators.algorithms.aproximation.ApproximatedShortestPathsAlgorithm
+		import org.apache.spark.SparkContext
+		import org.apache.spark.graphx.{Graph, VertexRDD}
+
+		implicit ctx:SparkContext=???
+		// initialize your SparkContext as implicit value
+		val graph = ???
+		// load your graph (for example using Graph loading API)
+		val sourceVertexId=1
+		val graphWithPaths=ApproximatedShortestPathsAlgorithm.computeSingleShortestPathsLengths(graph,sourceVertexId)
+		val paths : VertexRDD[Iterable[(VertexId, JDouble)]  =  graphWithPaths.vertices
+
+*	For whole graph:
+
+	.. code-block:: scala
+		
+		import ml.sparkling.graph.operators.algorithms.aproximation.ApproximatedShortestPathsAlgorithm
+		import org.apache.spark.SparkContext
+		import org.apache.spark.graphx.{Graph, VertexRDD}
+
+		implicit ctx:SparkContext=???
+		// initialize your SparkContext as implicit value
+		val graph = ???
+		// load your graph (for example using Graph loading API)
+		val graphWithPaths =  ApproximatedShortestPathsAlgorithm.computeShortestPaths(graph)
+		val paths : VertexRDD[Iterable[(VertexId, JDouble)]  =  graphWithPaths.vertices
+
+*	using iterative approach:
+
+	.. code-block:: scala
+		
+		import ml.sparkling.graph.operators.algorithms.aproximation.ApproximatedShortestPathsAlgorithm
+		import org.apache.spark.SparkContext
+		import org.apache.spark.graphx.{Graph, VertexRDD}
+
+		implicit ctx:SparkContext=???
+		// initialize your SparkContext as implicit value
+		val graph = ???
+		// load your graph (for example using Graph loading API)
+		val bucketSize=10
+		val graphWithPaths =  ApproximatedShortestPathsAlgorithm.computeShortestPathsLengthsIterative(graph, (g:Graph[_,_])=>bucketSize)
+		val paths : VertexRDD[Iterable[(VertexId, JDouble)]  =  graphWithPaths.vertices
+
diff --git a/coarsening.rst b/coarsening.rst
@@ -1,13 +1,13 @@
 Graph coarsening
 =====================
 
-In order to limit computation, you can decrease graph size using coarsening operator. New graph will be smaller because  neighborhood vertices will be coarsed into single vertices. Edges are created using edges from input vertices, filtering self loops. 
+In order to limit computation, you can decrease graph size using coarsening operator. New graph will be smaller because  neighborhood vertices will be coarsed into single vertices. Edges are created using edges from input graph, filtering self loops. 
 
 
 Label propagation based graph coarsening
 -------------------------------------------
 
-One of implementation is based on label propagation. Just three iterations are enaught in order to coarse graph. Implementation propagates vertex identifier to neighbours. Neighbours groups them and sorts by number of occurences. If number of occurences is same, greater one is selected (in order to gurante deterministic execution). After that last id is selected (one with bigest number of occurences, or greatest one). After three iterations, vertices has their final IDs.
+One of implementation is based on label propagation. Implementation propagates vertex identifier to neighbours. Neighbours groups them and sorts by number of occurences. If number of occurences is same, minimal one is selected (in order to gurante deterministic execution). Otherwise, vertex identifier with bigest number of occurencies (or minimal one in case of same occurencies number) is selected .
 
 .. code-block:: scala
 	
diff --git a/index.rst b/index.rst
@@ -23,6 +23,8 @@ For bigger insight please refer to  `a API`_ documentation in ScalaDocs.
    comunities
    coarsening
    measures
+   partitioning
+   aapsp
    links
    todos
 
diff --git a/partitioning.rst b/partitioning.rst
@@ -0,0 +1,80 @@
+Partitioning methods
+=====================
+
+Library provides multiple methods for graph partitioning. By default GraphX provides only random methods, in SparklingGraph you can find approaches that are using structural properties of graphs in order to minimize computation times and storage overheads. 
+
+
+
+Propagation bases
+------------------
+
+In that approach, label propagation is used in order to determine vertex cluster id. In iterative way, alghoritm propagates vertices ids. In each step, vertex selects minimal id from all recived. Steps are repeated until number of components in graph is less than or equal number of requested partitions. If number of unique clusters ids is not equal to the number of requested partitions, alghoritm selects closer solution. 
+
+.. code-block:: scala
+	
+	import ml.sparkling.graph.operators.partitioning.PropagationBasedPartitioning
+	import org.apache.spark.SparkContext
+	import org.apache.spark.graphx.Graph
+
+	implicit ctx:SparkContext=???
+	// initialize your SparkContext as implicit value
+	val graph = ???
+	// load your graph (for example using Graph loading API)
+	val numberOfRequiredPartitions=24
+	val partitionedGraph =  PropagationBasedPartitioning.partitionGraphBy(graph,numberOfRequiredPartitions)
+
+
+Naive PSCAN
+------------------
+
+Aglhorimt use PSCAN alghoritm to determine comunities in graph and then use them as partitions. Without configuration, method use default PSCAN configuration, but that can be changed if it is needed. 
+
+.. code-block:: scala
+	
+	import ml.sparkling.graph.operators.partitioning.CommunityBasedPartitioning
+	import ml.sparkling.graph.operators.algorithms.community.pscan.PSCAN
+	import org.apache.spark.SparkContext
+	import org.apache.spark.graphx.Graph
+
+	implicit ctx:SparkContext=???
+	// initialize your SparkContext as implicit value
+	val graph = ???
+	// load your graph (for example using Graph loading API)
+	val communityDetectionMethod=PSCAN
+	val partitionedGraph =  CommunityBasedPartitioning.partitionGraphBy(graph,communityDetectionMethod)
+
+
+In order to change parameters you can use
+
+.. code-block:: scala
+	
+	import ml.sparkling.graph.operators.partitioning.CommunityBasedPartitioning
+	import ml.sparkling.graph.operators.algorithms.community.pscan.PSCAN
+	import org.apache.spark.SparkContext
+	import org.apache.spark.graphx.Graph
+
+	implicit ctx:SparkContext=???
+	// initialize your SparkContext as implicit value
+	val graph = ???
+	// load your graph (for example using Graph loading API)
+	val partitionedGraph =  CommunityBasedPartitioning.partitionGraphBy(graph,PSCAN.computeConnectedComponents(_,epsilon = 0))
+
+
+
+Dynamic PSCAN
+------------------
+
+That is solution that use PSCAN alghoritm in conduction with epsilon parameter search. Aglhoritm looks for possible epsilon values and use binary search to find one that terurns clustering that hase size closest to requested number of partitions. Found clustering is used as partitioning. 
+
+.. code-block:: scala
+	
+	import ml.sparkling.graph.operators.partitioning.PSCANBasedPartitioning
+	import org.apache.spark.SparkContext
+	import org.apache.spark.graphx.Graph
+
+	implicit ctx:SparkContext=???
+	// initialize your SparkContext as implicit value
+	val graph = ???
+	// load your graph (for example using Graph loading API)
+	val numberOfRequiredPartitions=24
+	val partitionedGraph =  PSCANBasedPartitioning.partitionGraphBy(graph,numberOfRequiredPartitions)