GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph...

54
GraphX: Unifying Table and Graph Analytics Presented by Joseph Gonzalez Joint work with Reynold Xin, Daniel Crankshaw, Ankur Dave, Michael Franklin, and Ion Stoica IPDPS 2014 *These slides are best viewed in PowerPoint with animation.

Transcript of GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph...

Page 1: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

GraphX: Unifying Table and Graph Analytics

Presented by Joseph Gonzalez ���Joint work with Reynold Xin, Daniel Crankshaw, Ankur Dave, Michael Franklin, and Ion Stoica IPDPS 2014

*These slides are best viewed in PowerPoint with animation.

Page 2: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Graphs are Central to Analytics

Raw Wikipedia

< / >!< / >!< / >!XML!

Hyperlinks PageRank Top 20 Pages Title PR

Text Table

Title Body Topic Model

(LDA) Word Topics Word Topic

Editor Graph Community Detection

User Community

User Com.

Term-Doc Graph

Discussion Table

User Disc.

Community Topic

Topic Com.

Page 3: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Update ranks in parallel

Iterate until convergence

Rank of user i Weighted sum of

neighbors’ ranks

3

R[i] = 0.15 +X

j2Nbrs(i)

wjiR[j]

PageRank: Identifying Leaders

Page 4: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Post

Post

Post

Mean Field Algorithm

Y2  

Y1  

Y3  

X2  

X3  

X1  

� 1,3

�1,2

2,3(x2,

x3)

�1(x1)

�2(x2)

�3(x3)

Sum over ���Neighbors

bi(xi) / �i(xi) exp

0

@X

j2Ni

f(xi, bj)

1

A

f(x

i

, b

j

) =

X

xj

b

j

(x

j

) log �

i,j

(x

i

, x

j

)

Page 5: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Low-Rank Matrix Factorization:

5

r13

r14

r24

r25

f(1)

f(2)

f(3)

f(4)

f(5) Use

r Fac

tors

(U)

Movie Factors (M

) U

sers Movies

Netflix U

sers

≈ x

Movies

f(i)

f(j)

Iterate:

f [i] = arg minw2Rd

X

j2Nbrs(i)

�rij � wT f [j]

�2+ �||w||22

Recommending Products

Page 6: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

The Graph-Parallel Pattern

6  

Model / Alg. State

Computation depends only on the neighbors

Page 7: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Many Graph-Parallel Algorithms •  Collaborative Filtering

–  Alternating Least Squares –  Stochastic Gradient Descent –  Tensor Factorization

•  Structured Prediction –  Loopy Belief Propagation –  Max-Product Linear Programs –  Gibbs Sampling

•  Semi-supervised ML –  Graph SSL –  CoEM

•  Community Detection –  Triangle-Counting –  K-core Decomposition –  K-Truss

•  Graph Analytics –  PageRank –  Personalized PageRank –  Shortest Path –  Graph Coloring

•  Classification –  Neural Networks

7

MACHINE  LEARNING  

SOCIAL  NETWORK  ANALYSIS  

GRAPH  ALGORITHMS  

Page 8: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Graph-Parallel Systems

8

oogle

Expose specialized APIs to simplify graph programming.

Page 9: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

“Think like a Vertex.” - Pregel [SIGMOD’10]

9

Page 10: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

The Pregel (Push) Abstraction Vertex-Programs interact by sending messages.

i Pregel_PageRank(i,  messages)  :        //  Receive  all  the  messages      total  =  0      foreach(  msg  in  messages)  :          total  =  total  +  msg        //  Update  the  rank  of  this  vertex      R[i]  =  0.15  +  total        //  Send  new  messages  to  neighbors      foreach(j  in  out_neighbors[i])  :          Send    msg(R[i])  to  vertex  j  

10 Malewicz et al. [PODC’09, SIGMOD’10]

Page 11: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

The GraphLab (Pull) Abstraction Vertex Programs directly access adjacent vertices and edges

GraphLab_PageRank(i)        //  Compute  sum  over  neighbors      total  =  0      foreach(  j  in  neighbors(i)):            total  =  total  +  R[j]  *  wji        //  Update  the  PageRank      R[i]  =  0.15  +  total      

11  

R[4]  *  w41  

+  +  

4 1

3 2

Data movement is managed by the system and not the user.

Page 12: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Barrier Iterative Bulk Synchronous Execution

Compute Communicate

Page 13: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Graph-Parallel Systems

13

oogle

Expose specialized APIs to simplify graph programming.

Exploit graph structure to achieve orders-of-

magnitude performance gains over more general ���data-parallel systems.

Page 14: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

PageRank on the Live-Journal Graph

22

354

1340

0 200 400 600 800 1000 1200 1400 1600

GraphLab

Naïve Spark

Mahout/Hadoop

Runtime (in seconds, PageRank for 10 iterations)

Spark is 4x faster than Hadoop GraphLab is 16x faster than Spark

Page 15: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Counted: 34.8 Billion Triangles

15  

Triangle Counting on Twitter

64 Machines 15 Seconds

1536 Machines 423 Minutes

Hadoop���[WWW’11]

S.  Suri  and  S.  Vassilvitskii,  “Coun7ng  triangles  and  the  curse  of  the  last  reducer,”  WWW’11  

1000 x Faster

40M Users, 1.4 Billion Links

Page 16: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Graph Analytics Pipeline

Raw Wikipedia

< / >!< / >!< / >!XML!

Hyperlinks PageRank Top 20 Pages Title PR

Text Table

Title Body Topic Model

(LDA) Word Topics Word Topic

Editor Graph Community Detection

User Community

User Com.

Term-Doc Graph

Discussion Table

User Disc.

Community Topic

Topic Com.

Page 17: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Tables

Raw Wikipedia

< / >!< / >!< / >!XML!

Hyperlinks PageRank Top 20 Pages Title PR

Text Table

Title Body Topic Model

(LDA) Word Topics Word Topic

Editor Graph Community Detection

User Community

User Com.

Term-Doc Graph

Discussion Table

User Disc.

Community Topic

Topic Com.

Page 18: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Graphs

Raw Wikipedia

< / >!< / >!< / >!XML!

Hyperlinks PageRank Top 20 Pages Title PR

Text Table

Title Body Topic Model

(LDA) Word Topics Word Topic

Editor Graph Community Detection

User Community

User Com.

Term-Doc Graph

Discussion Table

User Disc.

Community Topic

Topic Com.

Page 19: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Separate Systems to Support Each View Table View Graph View

Dependency Graph

6. Before

8. After

7. After

Table

Result

Row

Row

Row

Row

Page 20: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Having separate systems ���for each view is ���

difficult to use and inefficient

20  

Page 21: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Difficult to Program and Use

Users must Learn, Deploy, and Manage multiple systems

Leads to brittle and often ���complex interfaces

 21  

Page 22: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Inefficient

22  

Extensive data movement and duplication across ���the network and file system

< / >!< / >!< / >!XML!

HDFS   HDFS   HDFS   HDFS  

Limited reuse internal data-structures ���across stages

Page 23: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

GraphX Solution: Tables and Graphs are ���views of the same physical data

GraphX Unified Representation

Graph View Table View

Each view has its own operators that ���exploit the semantics of the view

to achieve efficient execution

Page 24: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Graphs à Relational Algebra 1.  Encode graphs as distributed tables

2.  Express graph computation in relational algebra

3.  Recast graph systems optimizations as: 1.  Distributed join optimization 2.  Incremental materialized maintenance

Integrate Graph and Table data processing

systems.

Achieve performance parity with specialized

systems.

Page 25: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Part. 2

Part. 1

Vertex Table

B C

A D

F E

A D

Distributed Graphs as Distributed Tables

D

Property Graph

B C

D

E

A A

F

Edge Table

A B

A C

C D

B C

A E

A F

E F

E D

B

C

D

E

A

F

Routing Table

B

C

D

E

A

F

1  

2  

1   2  

1   2  

1  

2  

2D Vertex Cut Heuristic

Page 26: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Table Operators Table operators are inherited from Spark:

26

map

filter

groupBy

sort

union

join

leftOuterJoin

rightOuterJoin

reduce

count

fold

reduceByKey

groupByKey

cogroup

cross

zip

sample

take

first

partitionBy

mapWith

pipe

save

...

Page 27: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

class Graph [ V, E ] { def Graph(vertices: Table[ (Id, V) ], edges: Table[ (Id, Id, E) ])

// Table Views ----------------- def vertices: Table[ (Id, V) ] def edges: Table[ (Id, Id, E) ] def triplets: Table [ ((Id, V), (Id, V), E) ] // Transformations ------------------------------ def reverse: Graph[V, E] def subgraph(pV: (Id, V) => Boolean,

pE: Edge[V,E] => Boolean): Graph[V,E] def mapV(m: (Id, V) => T ): Graph[T,E] def mapE(m: Edge[V,E] => T ): Graph[V,T] // Joins ---------------------------------------- def joinV(tbl: Table [(Id, T)]): Graph[(V, T), E ] def joinE(tbl: Table [(Id, Id, T)]): Graph[V, (E, T)] // Computation ---------------------------------- def mrTriplets(mapF: (Edge[V,E]) => List[(Id, T)], reduceF: (T, T) => T): Graph[T, E]

}

Graph Operators

27

Page 28: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Triplets Join Vertices and Edges The triplets operator joins vertices and edges:

The mrTriplets operator sums adjacent triplets.

SELECT t.dstId, reduce( map(t) ) AS sum FROM triplets AS t GROUPBY t.dstId

Triplets Vertices

B

A

C

D

Edges

A B A C B C C D

A B A

B A C B C C D

SELECT s.Id, d.Id, s.P, e.P, d.P FROM edges AS e JOIN vertices AS s, vertices AS d ON e.srcId = s.Id AND e.dstId = d.Id

Page 29: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

F  

E  

Example: Oldest Follower

D  

B  

A  

C  Calculate the number of older followers for each user?

val olderFollowerAge = graph .mrTriplets(

e => // Map if(e.src.age < e.dst.age) { (e.srcId, 1) else { Empty } , (a,b) => a + b // Reduce ) .vertices

23 42

30

19 75

16 29

Page 30: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

We express enhanced Pregel and GraphLab ���abstractions using the GraphX operators���

in less than 50 lines of code!

30

Page 31: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Enhanced Pregel in GraphX

31 Malewicz et al. [PODC’09, SIGMOD’10]

pregelPR(i, messageList ): !// Receive all the messages !total = 0 !foreach( msg in messageList) : ! total = total + msg!

// Update the rank of this vertex !R[i] = 0.15 + total !

// Send new messages to neighbors !foreach(j in out_neighbors[i]) : ! Send msg(R[i]/E[i,j]) to vertex!

Require  Message  Combiners  messageSum  

messageSum  

Remove  Message  Computation  

from  the  Vertex  Program  

sendMsg(iàj, R[i], R[j], E[i,j]): ! // Compute single message ! return msg(R[i]/E[i,j]) ! !

combineMsg(a, b): ! // Compute sum of two messages ! return a + b ! !

Page 32: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

PageRank in GraphX

32

// Load and initialize the graph !

val graph = GraphBuilder.text(“hdfs://web.txt”) !val prGraph = graph.joinVertices(graph.outDegrees) !!

// Implement and Run PageRank !

val pageRank = ! prGraph.pregel(initialMessage = 0.0, iter = 10)( ! (oldV, msgSum) => 0.15 + 0.85 * msgSum, ! triplet => triplet.src.pr / triplet.src.deg, !

(msgA, msgB) => msgA + msgB) !

Page 33: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Join Elimination Identify and bypass joins for unused triplet fields

33

0 2000 4000 6000 8000

10000 12000 14000

0 5 10 15 20

Com

mun

icatio

n (M

B)

Iteration

PageRank on Twitter Three Way Join

Join Elimination

Factor of 2 reduction in communication

sendMsg(iàj, R[i], R[j], E[i,j]): ! // Compute single message ! return msg(R[i]/E[i,j]) ! !

Page 34: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

We express the Pregel and GraphLab like ���abstractions using the GraphX operators���

in less than 50 lines of code!

34

By composing these operators we can ���construct entire graph-analytics pipelines.

Page 35: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Example Analytics Pipeline // Load raw data tables !

val verts = sc.textFile(“hdfs://users.txt”).map(parserV) !

val edges = sc.textFile(“hdfs://follow.txt”).map(parserE) !

// Build the graph from tables and restrict to recent links !

val graph = new Graph(verts, edges) !

val recent = graph.subgraph(edge => edge.date > LAST_MONTH) !

// Run PageRank Algorithm !

val pr = graph.PageRank(tol = 1.0e-5) !

// Extract and print the top 25 users !

val topUsers = verts.join(pr).top(25).collect !

topUsers.foreach(u => println(u.name + ‘\t’ + u.pr)) !

Page 36: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

The GraphX Stack���(Lines of Code)

GraphX (3575)

Spark

Pregel (28) + GraphLab (50)

PageRank (5)

Connected Comp. (10)

Shortest Path (10)

ALS (40) LDA

(120)

K-core (51) Triangle

Count (45)

SVD (40)

Page 37: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Performance Comparisons

22 68

207 354

1340

0 200 400 600 800 1000 1200 1400 1600

GraphLab GraphX Giraph

Naïve Spark Mahout/Hadoop

Runtime (in seconds, PageRank for 10 iterations)

GraphX is roughly 3x slower than GraphLab

Live-Journal: 69 Million Edges

Page 38: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

GraphX scales to larger graphs

203

451

749

0 200 400 600 800

GraphLab

GraphX

Giraph

Runtime (in seconds, PageRank for 10 iterations)

GraphX is roughly 2x slower than GraphLab » Scala + Java overhead: Lambdas, GC time, … » No shared memory parallelism: 2x increase in comm.

Twitter Graph: 1.5 Billion Edges

Page 39: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

PageRank is just one stage…. ������

What about a pipeline?

Page 40: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

HDFS HDFS

Compute Spark Preprocess Spark Post.

A Small Pipeline in GraphX

Timed end-to-end GraphX is faster than GraphLab

Raw Wikipedia

< / >!< / >!< / >!XML!

Hyperlinks PageRank Top 20 Pages

342

1492

0 200 400 600 800 1000 1200 1400 1600

GraphLab + Spark GraphX

Giraph + Spark Spark

Total Runtime (in Seconds)

605

375

Page 41: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Status Part of Apache Spark

In production at several large technology companies

Page 42: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

GraphX: Unified Analytics

Enabling users to easily and efficiently express the entire graph analytics pipeline

New API Blurs the distinction between

Tables and Graphs

New System Combines Data-Parallel Graph-Parallel Systems

Page 43: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

A Case for Algebra in Graphs A standard algebra is essential for graph systems:

•  e.g.: SQL à proliferation of relational system

By embedding graphs in relational algebra:

•  Integration with tables and preprocessing

•  Leverage advances in relational systems

•  Graph opt. recast to relational systems opt.

Page 44: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Conclusions Composable domain specific views and operators

Single system that efficiently spans the pipeline

Graphs through the lens of database systems » Graph-Parallel Pattern à Triplet joins in relational alg. » Graph Systems à Distributed join optimizations

44

UC  BERKELEY  

Joseph E. Gonzalez [email protected] http://tinyurl.com/ampgraphx

Page 45: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Thanks!

[email protected] [email protected]

[email protected] [email protected]

http://amplab.cs.berkeley.edu/projects/graphx/

Page 46: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Low-Rank Matrix Factorization:

46

r13

r14

r24

r25

f(1)

f(2)

f(3)

f(4)

f(5) Use

r Fac

tors

(U)

Movie Factors (M

) U

sers Movies

Netflix U

sers

≈ x

Movies

f(i)

f(j)

Iterate:

f [i] = arg minw2Rd

X

j2Nbrs(i)

�rij � wT f [j]

�2+ �||w||22

Recommending Products

Page 47: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Post

Post

Post

Mean Field Algorithm

Y2  

Y1  

Y3  

X2  

X3  

X1  

� 1,3

�1,2

2,3(x2,

x3)

�1(x1)

�2(x2)

�3(x3)

Sum over ���Neighbors

bi(xi) / �i(xi) exp

0

@X

j2Ni

f(xi, bj)

1

A

f(x

i

, b

j

) =

X

xj

b

j

(x

j

) log �

i,j

(x

i

, x

j

)

Page 48: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

GraphX System Design

Page 49: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Vertex Table

(RDD)

Caching for Iterative mrTriplets Edge Table

(RDD) A B

A C

C D

B C

A E

A F

E F

E D

Mirror Cache

B C D

A

Mirror Cache

D E F

A

B

C

D

E

A

F

B

C

D

E

A

F

A

D

Page 50: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Vertex Table

(RDD)

Edge Table (RDD)

A B

A C

C D

B C

A E

A F

E F

E D

Mirror Cache

B C D

A

Mirror Cache

D E F

A

Incremental Updates for Iterative mrTriplets

B

C

D

E

A

F

Change A A

Change E

Scan

Page 51: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Vertex Table

(RDD)

Edge Table (RDD)

A B

A C

C D

B C

A E

A F

E F

E D

Mirror Cache

B C D

A

Mirror Cache

D E F

A

Aggregation for Iterative mrTriplets

B

C

D

E

A

F

Change

Change

Scan

Change

Change

Change

Change

Local Aggregate

Local Aggregate

B C

D

F

Page 52: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Reduction in Communication Due to Cached Updates

0.1

1

10

100

1000

10000

0 2 4 6 8 10 12 14 16

Net

wor

k Co

mm

. (M

B)

Iteration

Connected Components on Twitter Graph

Most vertices are within 8 hops���of all vertices in their comp.

Page 53: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Benefit of Indexing Active Edges

0

5

10

15

20

25

30

0 2 4 6 8 10 12 14 16

Runt

ime

(Sec

onds

)

Iteration

Connected Components on Twitter Graph

Scan

Indexed

Scan All Edges

Index of “Active” Edges

Page 54: GraphX: Unifying Table and Graph Analyticsjegonzal/assets/... · GraphX: Unifying Table and Graph Analytics ! Presented by Joseph Gonzalez! " Joint work with Reynold Xin, Daniel Crankshaw,

Additional Query Optimizations

Indexing and Bitmaps: » To accelerate joins across graphs » To efficiently construct sub-graphs

Substantial Index and Data Reuse: » Reuse routing tables across graphs and sub-graphs » Reuse edge adjacency information and indices

54