Library support loading graphs from multiple file formats. Nevertheless, we will be implementing more of them in next releases.
Main graph loading object is a LoadGraph
It takes implementations of a GraphLoader and lets you easily configure loading process. Parameters (Parameter ) for configuration are set using using(parameter: Parameter) method. Parameters are specific for each GraphLoader
To load graph from CSV file you must use CSV implementation of GraphLoader trait:
import ml.sparkling.graph.api.loaders.GraphLoading.LoadGraph
import ml.sparkling.graph.loaders.csv.GraphFromCsv.CSV
import org.apache.spark.SparkContext
implicit val ctx:SparkContext=???
// initialize your SparkContext as implicit value so it will be passed automatically to graph loading API
val filePath="your_graph_path.csv"
val graph=LoadGraph.from(CSV(filePath)).load()That is simplest way of loading standard CSV file:
"vertex1","vertex2"
"<numerical_id_of_vertex_1>","<numerical_id_of_vertex_2>"
In order to change file format you can use parameters like:
import ml.sparkling.graph.loaders.csv.GraphFromCsv.LoaderParameters.{Delimiter,Quotation}
import ml.sparkling.graph.api.loaders.GraphLoading.LoadGraph
import ml.sparkling.graph.loaders.csv.GraphFromCsv.CSV
import org.apache.spark.SparkContext
implicit ctx:SparkContext=???
// initialize your SparkContext as implicit value so it will be passed automatically to graph loading API
val filePath="your_graph_path.csv"
val graph=LoadGraph.from(CSV(filePath)).using(Delimiter(";")).using(Quotation("'")).load()Presented snipet will load graph from file with format:
'vertex1';'vertex2'
'<numerical_id_of_vertex_1>';'<numerical_id_of_vertex_2>'
Because in some cases vertices identifiers can be not numerical (username as string). You can load this kind of graph specifying that Indexing is required:
import ml.sparkling.graph.api.loaders.GraphLoading.LoadGraph
import ml.sparkling.graph.loaders.csv.GraphFromCsv.CSV
import ml.sparkling.graph.loaders.csv.GraphFromCsv.LoaderParameters.Indexing
import org.apache.spark.SparkContext
implicit ctx:SparkContext=???
// initialize your SparkContext as implicit value so it will be passed automatically to graph loading API
val filePath="your_graph_path.csv"
val graph=LoadGraph.from(CSV(filePath)).using(Indexing).load()That approach gives you ability to load graphs from CSV files with any structure and vertex identifiers of any type. For example:
"vertex1","vertex2"
"centralized","computation"
"is","lame"
Full list of CSV loading parameters is available in here
To load graph from GraphML XML file you must use GraphML implementation of GraphLoader trait:
import ml.sparkling.graph.api.loaders.GraphLoading.LoadGraph
import ml.sparkling.graph.loaders.graphml.GraphFromGraphML.GraphML
import org.apache.spark.SparkContext
implicit ctx:SparkContext=???
// initialize your SparkContext as implicit value so it will be passed automatically to graph loading API
val filePath="your_graph_path.xml"
val graph=LoadGraph.from(GraphML(filePath)).load()That is simplest way of loading standard GraphML XML file (vertices are automatically indexed, and receive VertexId identifier ):
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="v_name" for="node" attr.name="name" attr.type="string"/>
<key id="v_type" for="node" attr.name="type" attr.type="string"/>
<graph id="G" edgedefault="undirected">
<node id="n0">
<data key="v_name">name0</data>
<data key="v_type">type0</data>
</node>
<node id="n1">
<data key="v_name">name1</data>
</node>
<node id="n2">
<data key="v_name">name2</data>
</node>
<node id="n3">
<data key="v_name">name3</data>
</node>
<edge id="e1" source="n0" target="n1"/>
<edge id="e2" source="n1" target="n2"/>
</graph>
</graphml>All attributes associated with vertices will be puted into GraphProperties type which expands to Map[String,Any]. By default each edge and vertex has id attribute.
import ml.sparkling.graph.api.loaders.GraphLoading.LoadGraph
import ml.sparkling.graph.loaders.graphml.GraphFromGraphML.{GraphProperties, GraphML}
import org.apache.spark.SparkContext
implicit ctx:SparkContext=???
// initialize your SparkContext as implicit value so it will be passed automatically to graph loading API
val filePath="your_graph_path.xml"
val graph: Graph[GraphProperties, GraphProperties] =LoadGraph.from(GraphML(filePath)).load()
val verticesIdsFromFile: Array[String] = graph.vertices.map(_._2("id").asInstanceOf[String]).collect()