The Pregel Programming Model with Spark GraphX
-
Upload
andrea-iacono -
Category
Software
-
view
191 -
download
6
Transcript of The Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphX
Agenda
- GraphX Introduction - Pregel programming model - Code examples
The main focus will be on the programming model
GraphX is a graph processing system built on top of Apache Spark
- property graph representation- based on RDDs- user defined partitioning on RDDs
GraphX / Spark software stack
Pregel Programming Model
https://kowshik.github.io/JPregel/pregel_paper.pdf
- based on vertices- messages from/to neighbours- bounded in supersteps- status (active / inactive)
Pregel Sample: finding the maximum value
GraphX implementation of Pregel
Uses three functions:
- vprog computes the new vertex value- sendMsg decides to whom send the new value- mergeMsg merges incoming values
GraphX communication diagram
graph.pregel( initialMsg = Int.MinValue, maxIterations = Int.MaxValue, activeDirection = EdgeDirection.Out)( // vprog (vertexId: Long, currentVertexAttr: Int, newVertexAttr: Int) => if (newVertexAttr > currentVertexAttr)
newVertexAttr else currentVertexAttr, // sendMsg (edgeTriplet: EdgeTriplet[Int, Int]) => { if (edgeTriplet.srcAttr > edgeTriplet.dstAttr) Iterator( (edgeTriplet.dstId, edgeTriplet.srcAttr) ) else Iterator.empty },
// mergeMsg (attribute1: Int, attribute2: Int) =>
if (attribute1 > attribute2) attribute1 else attribute2)
Max Value implementation
Graph initial stateNode [1]: 3Node [2]: 6Node [3]: 2Node [4]: 1
Graph final stateNode [1]: 6Node [2]: 6Node [3]: 6Node [4]: 6
Max value of the graph is 6.
Max Value implementationResults:
Dijkstra's algorithm
Unvisited nodes:
- Baltimore- Detroit- Chicago- NewYork- Philadelphia
Dijkstra's algorithm
Unvisited nodes:
- Baltimore- Detroit- Chicago- NewYork- Philadelphia
Dijkstra's algorithm
Unvisited nodes:
- Baltimore- Detroit- Chicago- NewYork- Philadelphia
Dijkstra's algorithm
Unvisited nodes:
- Baltimore- Detroit- Chicago- NewYork- Philadelphia
Dijkstra's algorithm
Unvisited nodes:
- Detroit- Chicago- NewYork- Philadelphia
Dijkstra's algorithm
Unvisited nodes:
- Detroit- Chicago- NewYork- Philadelphia
Dijkstra's algorithm
Unvisited nodes:
- Detroit- Chicago- NewYork- Philadelphia
Dijkstra's algorithm
Unvisited nodes:
- Chicago- NewYork- Philadelphia
Dijkstra's algorithm
Unvisited nodes:
- Chicago- NewYork- Philadelphia
Dijkstra's algorithm
Unvisited nodes:
- Chicago- Philadelphia
Dijkstra's algorithm
Unvisited nodes:
- Chicago- Philadelphia
Dijkstra's algorithm
Unvisited nodes:
- Chicago
Dijkstra's algorithm
Unvisited nodes:
type VertexId = scala.Long
case class City(name: String, id: VertexId
)
case class VertexAttribute(cityName: String, distance: Double, path: List[City]
)
Dijkstra's algorithm implementation
Types definitions:
val shortestPathGraph = initialGraph.pregel(initialMsg = VertexAttribute(
"", Double.PositiveInfinity, List[City]()
),maxIterations = Int.MaxValue,activeDirection = EdgeDirection.Out)(vprog,sendMsg,mergeMsg)
Dijkstra's algorithm implementation
val vprog = ( vertexId: VertexId, currentVertexAttr: VertexAttribute, newVertexAttr: VertexAttribute ) =>
if (currentVertexAttr.distance <= newVertexAttr.distance) { currentVertexAttr else newVertexAttr
}
val mergeMsg = (attribute1: VertexAttribute, attribute2: VertexAttribute
) =>
if (attribute1.distance < attribute2.distance) { attribute1 else attribute2
}
Dijkstra's algorithm implementation
val sendMsg = (edgeTriplet: EdgeTriplet[VertexAttribute, Double]) => { if (edgeTriplet.srcAttr.distance < (edgeTriplet.dstAttr.distance - edgeTriplet.attr)) {
Iterator( (edgeTriplet.dstId,
new VertexAttribute(edgeTriplet.dstAttr.cityName,edgeTriplet.srcAttr.distance + edgeTriplet.attr,edgeTriplet.srcAttr.path :+ new City(
edgeTriplet.dstAttr.cityName, edgeTriplet.dstId
) ) ) ) } else Iterator.empty}
Dijkstra's algorithm implementation
Going from Washington to Chicago has a distance of 105.0 km. Path is: Washington [1] => Baltimore [2] => Detroit [3] => NewYork [5] => Chicago [4]
Going from Washington to Washington has a distance of 0.0 km. Path is: Washington [1]
Going from Washington to Philadelphia has a distance of 91.0 km. Path is: Washington [1] => Baltimore[2] => Detroit[3] => NewYork[5] => Philadelphia[6]
Going from Washington to Detroit has a distance of 62.0 km. Path is: Washington [1] => Baltimore [2] => Detroit [3]
Going from Washington to NewYork has a distance of 76.0 km. Path is: Washington [1] => Baltimore [2] => Detroit [3] => NewYork [5]
Going from Washington to Baltimore has a distance of 27.0 km. Path is: Washington [1] => Baltimore [2]
Dijkstra's algorithm implementationResults:
Questions & Answers
Thanks!
The code is available at https://github.com/andreaiacono/TalkGraphX