You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[GRAPHX][EXAMPLES] move graphx test data directory and update graphx document
## What changes were proposed in this pull request?
There are two test data files used for graphx examples existing in directory "graphx/data"
I move it into "data/" directory because the "graphx" directory is used for code files and other test data files (such as mllib, streaming test data) are all in there.
I also update the graphx document where reference the data files which I move place.
## How was this patch tested?
N/A
Author: WeichenXu <[email protected]>
Closesapache#14010 from WeichenXu123/move_graphx_data_dir.
Copy file name to clipboardExpand all lines: docs/graphx-programming-guide.md
+9-9
Original file line number
Diff line number
Diff line change
@@ -1007,15 +1007,15 @@ PageRank measures the importance of each vertex in a graph, assuming an edge fro
1007
1007
1008
1008
GraphX comes with static and dynamic implementations of PageRank as methods on the [`PageRank` object][PageRank]. Static PageRank runs for a fixed number of iterations, while dynamic PageRank runs until the ranks converge (i.e., stop changing by more than a specified tolerance). [`GraphOps`][GraphOps] allows calling these algorithms directly as methods on `Graph`.
1009
1009
1010
-
GraphX also includes an example social network dataset that we can run PageRank on. A set of users is given in `graphx/data/users.txt`, and a set of relationships between users is given in `graphx/data/followers.txt`. We compute the PageRank of each user as follows:
1010
+
GraphX also includes an example social network dataset that we can run PageRank on. A set of users is given in `data/graphx/users.txt`, and a set of relationships between users is given in `data/graphx/followers.txt`. We compute the PageRank of each user as follows:
1011
1011
1012
1012
{% highlight scala %}
1013
1013
// Load the edges as a graph
1014
-
val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt")
1014
+
val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
1015
1015
// Run PageRank
1016
1016
val ranks = graph.pageRank(0.0001).vertices
1017
1017
// Join the ranks with the usernames
1018
-
val users = sc.textFile("graphx/data/users.txt").map { line =>
1018
+
val users = sc.textFile("data/graphx/users.txt").map { line =>
1019
1019
val fields = line.split(",")
1020
1020
(fields(0).toLong, fields(1))
1021
1021
}
@@ -1032,11 +1032,11 @@ The connected components algorithm labels each connected component of the graph
1032
1032
1033
1033
{% highlight scala %}
1034
1034
// Load the graph as in the PageRank example
1035
-
val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt")
1035
+
val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
1036
1036
// Find the connected components
1037
1037
val cc = graph.connectedComponents().vertices
1038
1038
// Join the connected components with the usernames
1039
-
val users = sc.textFile("graphx/data/users.txt").map { line =>
1039
+
val users = sc.textFile("data/graphx/users.txt").map { line =>
1040
1040
val fields = line.split(",")
1041
1041
(fields(0).toLong, fields(1))
1042
1042
}
@@ -1053,11 +1053,11 @@ A vertex is part of a triangle when it has two adjacent vertices with an edge be
1053
1053
1054
1054
{% highlight scala %}
1055
1055
// Load the edges in canonical order and partition the graph for triangle count
1056
-
val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt", true).partitionBy(PartitionStrategy.RandomVertexCut)
1056
+
val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt", true).partitionBy(PartitionStrategy.RandomVertexCut)
1057
1057
// Find the triangle count for each vertex
1058
1058
val triCounts = graph.triangleCount().vertices
1059
1059
// Join the triangle counts with the usernames
1060
-
val users = sc.textFile("graphx/data/users.txt").map { line =>
1060
+
val users = sc.textFile("data/graphx/users.txt").map { line =>
1061
1061
val fields = line.split(",")
1062
1062
(fields(0).toLong, fields(1))
1063
1063
}
@@ -1081,11 +1081,11 @@ all of this in just a few lines with GraphX:
1081
1081
val sc = new SparkContext("spark://master.amplab.org", "research")
1082
1082
1083
1083
// Load my user data and parse into tuples of user id and attribute list
1084
-
val users = (sc.textFile("graphx/data/users.txt")
1084
+
val users = (sc.textFile("data/graphx/users.txt")
1085
1085
.map(line => line.split(",")).map( parts => (parts.head.toLong, parts.tail) ))
1086
1086
1087
1087
// Parse the edge data which is already in userId -> userId format
1088
-
val followerGraph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt")
1088
+
val followerGraph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
1089
1089
1090
1090
// Attach the user attributes
1091
1091
val graph = followerGraph.outerJoinVertices(users) {
0 commit comments