Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer...

61
Families of distributed graph algorithms Divide and conquer arton Balassi 1 [email protected] 1 Hungarian Academy of Sciences – Institute for Computer Science and Control Data Mining & Search Group June 24, 2014

Transcript of Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer...

Page 1: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithmsDivide and conquer

Marton Balassi1

[email protected]

1Hungarian Academy of Sciences – Institute for Computer Science and ControlData Mining & Search Group

June 24, 2014

Page 2: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

2 / 61

Table of contents

Distributing data-intensive algorithmsMotivationMapReduce & PregelCounting the number of triangles in a graph

Families of distributed graph algorithmsLocal algorithmsGraph traversal based algorithmsMatrix multiplication based algorithms

ExperimentsRepresentative algorithmsResults

Page 3: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

3 / 61

Table of contents

Distributing data-intensive algorithmsMotivationMapReduce & PregelCounting the number of triangles in a graph

Families of distributed graph algorithmsLocal algorithmsGraph traversal based algorithmsMatrix multiplication based algorithms

ExperimentsRepresentative algorithmsResults

Page 4: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

Motivation

4 / 61

A bit about myself

My background

I BSc, MSc in Computer Science, Eotvos University Budapest

I BA in Economics, TU Budapest

I Distributed algorithms

I Big data architecture

Page 5: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

Motivation

5 / 61

A bit about myself

My background

I BSc, MSc in Computer Science, Eotvos University Budapest

I BA in Economics, TU Budapest

I Distributed algorithms

I Big data architecture

Page 6: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

Motivation

6 / 61

A bit about myself

My background

I BSc, MSc in Computer Science, Eotvos University Budapest

I BA in Economics, TU Budapest

I Distributed algorithms

I Big data architecture

Page 7: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

Motivation

7 / 61

A bit about myself

My background

I BSc, MSc in Computer Science, Eotvos University Budapest

I BA in Economics, TU Budapest

I Distributed algorithms

I Big data architecture

Page 8: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

Motivation

8 / 61

Motivation

Let’s do a PageRank on this graph. . .

I A large Portugese webcrawl1

I 3.1 · 109 nodes

I 1.1 · 1011 edges

I 80 GB of compressed data

I Divide and conquer is almost mandatory

1a large Portuguese crawl of the Portuguese Web Archive obtained fromDaniel Gomes

Page 9: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

Motivation

9 / 61

Motivation

Let’s do a PageRank on this graph. . .

I A large Portugese webcrawl1

I 3.1 · 109 nodes

I 1.1 · 1011 edges

I 80 GB of compressed data

I Divide and conquer is almost mandatory

1a large Portuguese crawl of the Portuguese Web Archive obtained fromDaniel Gomes

Page 10: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

Motivation

10 / 61

Motivation

Let’s do a PageRank on this graph. . .

I A large Portugese webcrawl1

I 3.1 · 109 nodes

I 1.1 · 1011 edges

I 80 GB of compressed data

I Divide and conquer is almost mandatory

1a large Portuguese crawl of the Portuguese Web Archive obtained fromDaniel Gomes

Page 11: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

Motivation

11 / 61

Motivation

Let’s do a PageRank on this graph. . .

I A large Portugese webcrawl1

I 3.1 · 109 nodes

I 1.1 · 1011 edges

I 80 GB of compressed data

I Divide and conquer is almost mandatory

1a large Portuguese crawl of the Portuguese Web Archive obtained fromDaniel Gomes

Page 12: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

Motivation

12 / 61

Motivation

Let’s do a PageRank on this graph. . .

I A large Portugese webcrawl1

I 3.1 · 109 nodes

I 1.1 · 1011 edges

I 80 GB of compressed data

I Divide and conquer is almost mandatory

1a large Portuguese crawl of the Portuguese Web Archive obtained fromDaniel Gomes

Page 13: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

MapReduce & Pregel

13 / 61

MapReduce [DG04]

Page 14: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

MapReduce & Pregel

14 / 61

Pregel [MAB+10]

Traits

I Bulk SynchronousParallel [Val90]

I ,,Think like a vertex”

I Graph kept in memory

Scheme of the BSP systemWikipedia, public domain

Page 15: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

MapReduce & Pregel

15 / 61

Pregel [MAB+10]

Traits

I Bulk SynchronousParallel [Val90]

I ,,Think like a vertex”

I Graph kept in memory

Scheme of the BSP systemWikipedia, public domain

Page 16: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

MapReduce & Pregel

16 / 61

Pregel [MAB+10]

Traits

I Bulk SynchronousParallel [Val90]

I ,,Think like a vertex”

I Graph kept in memory

Scheme of the BSP systemWikipedia, public domain

Page 17: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

MapReduce & Pregel

17 / 61

Pregel [MAB+10]

Vertex. . .

In1

Inn

. . .

Out1

Outm

t

t

t

t − 1

t − 1

t − 1

Pregel schema as perceived from a vertex

Page 18: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

Counting the number of triangles in a graph

18 / 61

Triangle Counter – Sequential algorithm

Sequential algorithm

Every vertex executes a search of itself bounded in depth of three.Thus every triangle is counted three times.

Page 19: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

Counting the number of triangles in a graph

19 / 61

Triangle Counter – Sequential algorithm

Sequential algorithm

Every vertex executes a search of itself bounded in depth of three.Thus every triangle is counted three times.You can do better by making use of the ordering on the vertices.

Page 20: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

Counting the number of triangles in a graph

20 / 61

Triangle Counter – distributed algorithm

Representation

0 1 21 22 03

0 1

23

Page 21: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

Counting the number of triangles in a graph

21 / 61

Triangle Counter – distributed algorithm

First Map

Let’s send our ID to all of ourneighbours possessing a higher IDthan ours. Let’s send our neighboursto ourselves.

First Reduce

Let’s write out the informationreceived.

0 1

2

0

1

Page 22: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

Counting the number of triangles in a graph

22 / 61

Triangle Counter – distributed algorithm

Second Map

If the ID received is smaller thenours let’s pass it on to ourneighbours.Let’s send our neighbours toourselves.

Second Reduce

If the ID received is our neighbourthen let’s increment a globalcounter.

0 [] 1 [0]

2 [1]

1

Page 23: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Distributing data-intensive algorithms

Counting the number of triangles in a graph

23 / 61

Triangle Counter – distributed algorithm

Second Map

If the ID received is smaller thenours let’s pass it on to ourneighbours.Let’s send our neighbours toourselves.

Second Reduce

If the ID received is our neighbourthen let’s increment a globalcounter.

0 + + 1

2

Page 24: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Families of distributed graph algorithms

24 / 61

Table of contents

Distributing data-intensive algorithmsMotivationMapReduce & PregelCounting the number of triangles in a graph

Families of distributed graph algorithmsLocal algorithmsGraph traversal based algorithmsMatrix multiplication based algorithms

ExperimentsRepresentative algorithmsResults

Page 25: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Families of distributed graph algorithms

Local algorithms

25 / 61

Local algorithms

Traits

I Dependant on a small environment of the given vertex or edge.

I ,,Trivial” candidates for parallel computing.

I Examples are fingerprint computation, local clusteringcoefficient and the number of triangles.

Page 26: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Families of distributed graph algorithms

Local algorithms

26 / 61

Local algorithms

Traits

I Dependant on a small environment of the given vertex or edge.

I ,,Trivial” candidates for parallel computing.

I Examples are fingerprint computation, local clusteringcoefficient and the number of triangles.

Page 27: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Families of distributed graph algorithms

Local algorithms

27 / 61

Local algorithms

Traits

I Dependant on a small environment of the given vertex or edge.

I ,,Trivial” candidates for parallel computing.

I Examples are fingerprint computation, local clusteringcoefficient and the number of triangles.

Page 28: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Families of distributed graph algorithms

Graph traversal based algorithms

28 / 61

Graph traversal based algorithms

Traits

I Dependant on taking long routes in the graph.

I Difficult to implement in a distributed environment.

I The distributed algorithm can be less effective than thesequential as the representation is less powerful.

I Examples could be accessibility, betweenness centrality andstrongly connected components.

Page 29: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Families of distributed graph algorithms

Graph traversal based algorithms

29 / 61

Graph traversal based algorithms

Traits

I Dependant on taking long routes in the graph.

I Difficult to implement in a distributed environment.

I The distributed algorithm can be less effective than thesequential as the representation is less powerful.

I Examples could be accessibility, betweenness centrality andstrongly connected components.

Page 30: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Families of distributed graph algorithms

Graph traversal based algorithms

30 / 61

Graph traversal based algorithms

Traits

I Dependant on taking long routes in the graph.

I Difficult to implement in a distributed environment.

I The distributed algorithm can be less effective than thesequential as the representation is less powerful.

I Examples could be accessibility, betweenness centrality andstrongly connected components.

Page 31: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Families of distributed graph algorithms

Graph traversal based algorithms

31 / 61

Graph traversal based algorithms

Traits

I Dependant on taking long routes in the graph.

I Difficult to implement in a distributed environment.

I The distributed algorithm can be less effective than thesequential as the representation is less powerful.

I Examples could be accessibility, betweenness centrality andstrongly connected components.

Page 32: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Families of distributed graph algorithms

Matrix multiplication based algorithms

32 / 61

Matrix multiplication based algorithms

Traits

I Are basically power iterations of matrix-vector multiplicationswith typically fast convergence traits.

I Suitable to implement in Pregel-based systems and not thatstraight-forward in plain MapReduce. [KBEK14]

I Representatives could be eigenvalue centrality, PageRank orLineRank.

Page 33: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Families of distributed graph algorithms

Matrix multiplication based algorithms

33 / 61

Matrix multiplication based algorithms

Traits

I Are basically power iterations of matrix-vector multiplicationswith typically fast convergence traits.

I Suitable to implement in Pregel-based systems and not thatstraight-forward in plain MapReduce. [KBEK14]

I Representatives could be eigenvalue centrality, PageRank orLineRank.

Page 34: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Families of distributed graph algorithms

Matrix multiplication based algorithms

34 / 61

Matrix multiplication based algorithms

Traits

I Are basically power iterations of matrix-vector multiplicationswith typically fast convergence traits.

I Suitable to implement in Pregel-based systems and not thatstraight-forward in plain MapReduce. [KBEK14]

I Representatives could be eigenvalue centrality, PageRank orLineRank.

Page 35: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

35 / 61

Table of contents

Distributing data-intensive algorithmsMotivationMapReduce & PregelCounting the number of triangles in a graph

Families of distributed graph algorithmsLocal algorithmsGraph traversal based algorithmsMatrix multiplication based algorithms

ExperimentsRepresentative algorithmsResults

Page 36: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Representative algorithms

36 / 61

Representative algorithms

Local

Graph traversal based

Matrix multiplication based

Page 37: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Representative algorithms

37 / 61

Representative algorithms

Local

TriangleCounter already presented . . .

Graph traversal based

Matrix multiplication based

Page 38: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Representative algorithms

38 / 61

Representative algorithms

Local

Graph traversal based

LineRank Construct the linegraph, where vertices represent theedges of the original graph and a directed edge pointsfrom e1 to e2 if the target of e1 is the source of e2.

Matrix multiplication based

Page 39: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Representative algorithms

39 / 61

Representative algorithms

Local

Graph traversal based

LineRank Construct the linegraph, where vertices represent theedges of the original graph and a directed edge pointsfrom e1 to e2 if the target of e1 is the source of e2.Run a PageRank on this. [KPST11, PBMW98]

Matrix multiplication based

Page 40: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Representative algorithms

40 / 61

Representative algorithms

Local

Graph traversal based

Matrix multiplication based

SCC Run a label propagation to detect connected components.Remove edges with different labels, then do a reverse labelpropagation. [MIHPR05]

Page 41: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Representative algorithms

41 / 61

Representative algorithms

Local

Graph traversal based

Matrix multiplication based

SCC Run a label propagation to detect connected components.Remove edges with different labels, then do a reverse labelpropagation. [MIHPR05] Remove the sccs found and iterate.

Page 42: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Representative algorithms

42 / 61

Representative algorithms

Local

Graph traversal based

Matrix multiplication based

SCC Run a label propagation to detect connected components.Remove edges with different labels, then do a reverse labelpropagation. [MIHPR05] Remove the sccs found and iterate.The sequential algorithm is more efficient. [BCVDP11]

Page 43: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

43 / 61

Results [EBKK14]

Name Vertices Edges Source

web-Google 8, 7 · 105 5, 0 · 107 [LLDM08]wiki-Talk 2, 4 · 106 5, 1 · 107 [LHK10]soc-LiveJournal 4, 8 · 106 6, 9 · 108 [BHKL06]forest10M 107 2, 4 · 108 generated2

forest20M 2 · 107 4, 8 · 108 generated2

Summary of the graphs used

2A slightly modified version of the ForestFire model. [LF06, EBKK14]

Page 44: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

44 / 61

Results [EBKK14]

Implementations

sequential Plain Java.

MapReduce Apache Hadoop on 20 cores, 40 reducers.

Pregel Apache Giraph on 17 cores, 3 occupied by Zookeeper.

Page 45: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

45 / 61

Results [EBKK14]

Implementations

sequential Plain Java.

MapReduce Apache Hadoop on 20 cores, 40 reducers.

Pregel Apache Giraph on 17 cores, 3 occupied by Zookeeper.

Page 46: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

46 / 61

Results [EBKK14]

Implementations

sequential Plain Java.

MapReduce Apache Hadoop on 20 cores, 40 reducers.

Pregel Apache Giraph on 17 cores, 3 occupied by Zookeeper.

Page 47: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

47 / 61

Results [EBKK14]

Implementations

sequential Plain Java.

MapReduce Apache Hadoop on 20 cores, 40 reducers.

Pregel Apache Giraph on 17 cores, 3 occupied by Zookeeper.

Page 48: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

48 / 61

Results [EBKK14]

Page 49: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

49 / 61

Results [EBKK14]

Page 50: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

50 / 61

Results [EBKK14]

Page 51: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

51 / 61

Summary

Take home messages

I If your data is ,,big” you have to go multi-core/machine

I If multi-machine, then try MapReduce

I For distributed graph algorithms it is useful to distinguishthree families

I The families behave differently

I It is worth to distribute even for data not that ,,big”

Page 52: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

52 / 61

Summary

Take home messages

I If your data is ,,big” you have to go multi-core/machine

I If multi-machine, then try MapReduce

I For distributed graph algorithms it is useful to distinguishthree families

I The families behave differently

I It is worth to distribute even for data not that ,,big”

Page 53: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

53 / 61

Summary

Take home messages

I If your data is ,,big” you have to go multi-core/machine

I If multi-machine, then try MapReduce

I For distributed graph algorithms it is useful to distinguishthree families

I The families behave differently

I It is worth to distribute even for data not that ,,big”

Page 54: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

54 / 61

Summary

Take home messages

I If your data is ,,big” you have to go multi-core/machine

I If multi-machine, then try MapReduce

I For distributed graph algorithms it is useful to distinguishthree families

I The families behave differently

I It is worth to distribute even for data not that ,,big”

Page 55: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

55 / 61

Summary

Take home messages

I If your data is ,,big” you have to go multi-core/machine

I If multi-machine, then try MapReduce

I For distributed graph algorithms it is useful to distinguishthree families

I The families behave differently

I It is worth to distribute even for data not that ,,big”

Page 56: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

56 / 61

Marton [email protected]

Hungarian Academy of Sciences – Institute for Computer Science and ControlData Mining & Search Group

Page 57: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

57 / 61

Jirı Barnat, Jakub Chaloupka, and Jaco Van De Pol.Distributed algorithms for scc decomposition.J. Log. and Comput., 21(1):23–44, February 2011.

Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, andXiangyang Lan.Group formation in large social networks: Membership,growth, and evolution.In Proceedings of the 12th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, pages44–54, 2006.

Jeffrey Dean and Sanjay Ghemawat.Mapreduce: Simplified data processing on large clusters.In OSDI’04: Sixth Symposium on Operating System Designand Implementation, 2004.

Page 58: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

58 / 61

Peter Englert, Marton Balassi, Balazs Kosa, and Attila Kiss.Efficiency issues of computing graph properties of socialnetworks.In Proceedings of the 9th International Conference on AppliedInformatics, page to appear, 2014.

Balazs Kosa, Marton Balassi, Peter Englert, and Attila Kiss.Betweenness versus linerank, 2014.sent to the 6th International Conference on ComputationalCollective Intelligence Technologies and Applications,download from: http://people.inf.elte.hu/balhal/

publications/BetweennessVsLinerank.pdf.

Page 59: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

59 / 61

U Kang, Spiros Papadimitriou, Jimeng Sun, and HanghangTong.Centralities in large networks: Algorithms and observations.In SDM, pages 119–130. SIAM / Omnipress, 2011.

Jure Leskovec and Christos Faloutsos.Sampling from large graphs.In Proceedings of the 12th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, KDD’06, pages 631–636, 2006.

Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg.Signed networks in social media.In Proceedings of the SIGCHI Conference on Human Factors inComputing Systems, pages 1361–1370, 2010.

Page 60: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

60 / 61

Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, andMichael W. Mahoney.Community structure in large networks: Natural cluster sizesand the absence of large well-defined clusters, 2008.cite arxiv:0810.1355Comment: 66 pages, a much expandedversion of our WWW 2008 paper.

Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik,James C. Dehnert, Ilan Horn, Naty Leiser, and GrzegorzCzajkowski.Pregel: a system for large-scale graph processing.In Proceedings of the 2010 ACM SIGMOD InternationalConference on Management of data, SIGMOD ’10, pages135–146, 2010.

Page 61: Divide and conquer · 2014. 7. 6. · 1Hungarian Academy of Sciences { Institute for Computer Science and Control Data Mining & Search Group June 24, 2014. Families of distributed

Families of distributed graph algorithms

Experiments

Results

61 / 61

William McLendon III, Bruce Hendrickson, Steven J.Plimpton, and Lawrence Rauchwerger.Finding strongly connected components in distributed graphs.Journal of Parallel and Distributed Computing, 65(8):901–910,August 2005.

L. Page, S. Brin, R. Motwani, and T. Winograd.The pagerank citation ranking: Bringing order to the web.In Proceedings of the 7th International World Wide WebConference, pages 161–172, 1998.

Leslie G. Valiant.A bridging model for parallel computation.Commun. ACM, 33(8):103–111, August 1990.