Pregel reading circle

58
Pregel: A System for Large- Scale Graph Processing 2014 / 5 /14 Ishikawa Yasutaka
  • date post

    18-Oct-2014
  • Category

    Technology

  • view

    107
  • download

    1

description

研究室での論文紹介のスライド

Transcript of Pregel reading circle

Page 1: Pregel reading circle

Pregel: A System for Large-Scale Graph Processing

2014 / 5 /14

Ishikawa Yasutaka

Page 2: Pregel reading circle

About this Paper

• Authers:Malewicz, GrzegorzAustern, Matthew HBik, Aart J.CDehnert, James CHorn, IlanLeiser, NatyCzajkowski, Grzegorz• Google’s paper

• Proceedings of the 2010 international conference on Management of data - SIGMOD '10

2

Page 3: Pregel reading circle

Outline

• Introduction

• Model of computation

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

3

Page 4: Pregel reading circle

Outline

• Introduction

• Model of computation

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

4

Page 5: Pregel reading circle

Today’s problems of graph processing

• Poor locality of memory access

• Very little work ver vertex

5

Page 6: Pregel reading circle

Methods of graph processing…(1/2)

1. Crafting a custom distributed infrastructure→typically requiring a substantial implementation effort

2. Relying on an existing distributed computing platform(e.g.,MapReduce)→this can lead to suboptimal performance and usability

issues.

6

Page 7: Pregel reading circle

Methods of graph processing…(2/2)

3. Using a single-computer graph algorithm library→limiting the scale of problems

4. Using an existing parallel graph system→do not address fault tolerance or other issues that are

important for very large scale distributed systems

7

Page 8: Pregel reading circle

What is Pregel

• Scalable graph processing model- Based on BSP(Bulk Synchronous Parallel)- Designed for efficient,scalable and fault- tolerant

Implementation on clusters- Distribution-related details are hidden behind an

abstract API

• Not open source software- Apach Giraph is a open source software

implementation of Pregel

8

Page 9: Pregel reading circle

Bulk Synchronous Parallel

• Bridging model for designing parallel algorithm

• BPS iterates superstep for computing

and synchronize all

processes at

each superstep

superstep

9

Page 10: Pregel reading circle

BSP’s algorithm(1/3)

1. Concurrent computation

2. Communication

3. Barrier synchronisation

Each thread processes their data concurrently,independently

10

Page 11: Pregel reading circle

BSP’s algorithm(2/3)

1. Concurrent computation

2. Communication

3. Barrier synchronisation

They pass messages

11

Page 12: Pregel reading circle

BSP’s algorithm(3/3)

1. Concurrent computation

2. Communication

3. Barrier synchronisation

They wait for completion of message passing of all other tread

Next superstep…

12

Page 13: Pregel reading circle

Outline

• Introduction

• Model of computation

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

13

Page 14: Pregel reading circle

Pregel’s input and output

• Input: graph

• Output: graph

• Iterate superstep,which

consists of user defined function,

message passing

Graph:Input

Graph:output

Superstep

Superstep

Superstep

14

Page 15: Pregel reading circle

Graph component

• Graph of Pregel consists of vertex and edge• Vertex:

- Consisting of unique identifier, user defined value

- Outgoing edge and value are modifiable

• Edge:- Consisting of source vertex, target vertex, user defined value

- User defined value is modifiable

- Not first class citizen

A B

Vertex value is modifiableD

C

B

A

D

C

B

A

Outgoing edge and edge value are modifiablea

b c

d

15

Page 16: Pregel reading circle

State of vertex

• Vertex has two states:Active,Inactive

• In case vertex receives message, chage state to Active

• In case vertex has no message, change state to Inactive

Active Inactive

Vote to halt

Message received 16

Page 17: Pregel reading circle

Pregel’s Superstep

1. In Superstep S,vertex V, compute user defined fuctionwith messages send in Superstep S-1

2. Send messages to other vertices that will be received in Superstep S+1

3. Modify the state of V

4. If all other vertices finish 1~3, go to Superstep S+1

• When no further vertices change in a superstep, algorithm terminates with output

17

Page 18: Pregel reading circle

Example: maximum value(1/4)

3 6 2 1

3 6 2 1

:Active

:InactiveSuperstep 0

18

Page 19: Pregel reading circle

Example: maximum value(2/4)

3 6 2 1

6 6 2 6

6 6 2 6

:Active

:InactiveSuperstep 0

Superstep 1

19

Page 20: Pregel reading circle

Example: maximum value(3/4)

3 6 2 1

6 6 2 6

6 6 6 6

6 6 6 6

:Active

:InactiveSuperstep 0

Superstep 1

Superstep 2

20

Page 21: Pregel reading circle

Example: maximum value(4/4)

3 6 2 1

6 6 2 6

6 6 6 6

6 6 6 6

:Active

:InactiveSuperstep 0

Superstep 1

Superstep 2

Superstep 3

21

Page 22: Pregel reading circle

Outline

• Introduction

• Model of computation

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

22

Page 23: Pregel reading circle

Vertex class

• Writing Pregel program involves subclassing the predefined Vertex class• Compute() method will be executed at each active vertex

23

Page 24: Pregel reading circle

Message Passing

• The type of message which sent by vertex is specified by the user as template parameter of Vertex class

• There is no guaranteed order of messages in the iterator, but it is guaranteed that messages will be delivered

24

Page 25: Pregel reading circle

Combiners

• Sending a message to a vertex on another machine incurs some overhead

• In some case, using combiners can reduce the number of messages

• To enable this, user subclass

Conbiner classReduction of messages

25

Page 26: Pregel reading circle

Aggregators(1/2)

• Pregel aggregators are a mechanism for global communication

• Each vertex can provide a value in Superstep S, and this value is made available to all vertices in Superstep S+1

Superstep S

4

2

1

Superstep S+1

7

7

7

4+2+1…

Sum aggregator: number of edges

26

Page 27: Pregel reading circle

Aggregators(2/2)

• To define a new aggregator, a user subclasses the predefined Aggregator class

Superstep S

4

2

1

Superstep S+1

7

7

7

4+2+1…

Sum aggregator: number of edges

27

Page 28: Pregel reading circle

Topology Mutations(1/2)

• Some graph algorithms need to change the graph’s topology

- Clustering algorithm

- Minimum spanning tree algorithm

• User’s Compute() function can issue requests to add or remove vertices or edges

- it causes conflicts

28

Page 29: Pregel reading circle

Topology Mutations(2/2)

• We can solve this conflict using two mechanisms- Partial ordering: edge remove → vertex remove → vertex addition → edge addition

- Handler: This picks one arbitrary. User can define hundler method in vertex subclass

• Partial ordering yields deterministic for most conflict

29

Page 30: Pregel reading circle

Input and output

• Pregel adapts to many file format in input and output

- It decouples the task of interpreting an input file from task of graph computation

- Library provides readers and writers

- Users can write own by subclassing Reader and Writer

File format A

File format B

Reader

Compute

File format C

File format D

Writer

30

Page 31: Pregel reading circle

Outline

• Introduction

• Model of computation

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

31

Page 32: Pregel reading circle

Basic architecture(1/2)

• The Pregel library divides a graph into partitions

• Assignment of a vertex to a partition depends sololy on vertex ID

- Default partitioning function is Hash(ID):mod N

32

Page 33: Pregel reading circle

Basic architecture(2/2)

• The execution of a Pregel program consists of several stages

1. Many copies of the user program begin executing on a cluster of machines. One of these acts as the master

2. The master determines how many partitions the graph will have, and assigns partitions to each worker

3. The master assigns a portion of the user’s input to each worker

4. The master instructs each worker to perform a superstep

33

Page 34: Pregel reading circle

Fault tolerance(1/2)

• Fault tolerance is achieved through chechpointing

• The master instructs workers to save the state of their partitions to persistent storage is

- Including vertex values,edge values,imcoming messages

- Master separately saves the aggregator values

34

Page 35: Pregel reading circle

Fault tolerance(2/2)

• Worker failures are detected using regular “ping” messages the master issues to workers

• When one or more workers fail, the master reassigns graph partitions to the workers

- Repeating the missing Supersteps

35

Page 36: Pregel reading circle

Worker implementation

• A worker machine maintains the state of its portion of the graph in memory

• There are two copies of active flag and incoming message queue• One for the current superstep and another for the next

superstep

• In message sending, there are two pattern: remote, local

36

Page 37: Pregel reading circle

Master implementation

• The master assigns unique identifier to each worker at the time of registration

• The master maintains a list of all workers known to be active

• If any worker fails, the master enters recovery mode

• The master runs an HTTP server that display statistics about the progress of computation

37

Page 38: Pregel reading circle

Outline

• Introduction

• Model of computation

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

38

Page 39: Pregel reading circle

[1]Page Rank(1/2)

• Page Rank algorithm decide the importance of web pages

• This algorithm is based on evaluation of paper- Good paper might be cited from many other papers

- 「A paper that is cited from papers cited from many papers」 might be good paper

• This is named from one of Google’s founders,

Larry “Page”

39

Page 40: Pregel reading circle

[1]Page Rank(2/2)

40

Page 41: Pregel reading circle

[2]Shortest Path(1/6)

• Shortest-Path problem: calculate the shortest path in given two nodes of a weighted graph

• There is several variety of Shortest-Path problem- The single-source shortest paths problem- The s-t shortest path problem- All-pairs shortest paths problem

• In this paper, focusing on single-source shortest paths problems

41

Page 42: Pregel reading circle

[2]Shortest Path(2/6)

∞ ∞

0 ∞

5

3

1 4

3 2

1

2

4

Superstep 0

42

Page 43: Pregel reading circle

[2]Shortest Path(3/6)

5 ∞

0 3

5

3

1 4

3 2

1

2

4

Superstep 1

43

Page 44: Pregel reading circle

[2]Shortest Path(4/6)

4 6

0 3

6

5

5

3

1 4

3 2

1

2

4

Superstep 2

44

Page 45: Pregel reading circle

[2]Shortest Path(5/6)

4 5

0 3

6

9

5

5

3

1 4

3 2

1

2

4

Superstep 3

45

Page 46: Pregel reading circle

[2]Shortest Path(6/6)

46

Page 47: Pregel reading circle

Outline

• Introduction

• Model of computation

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

47

Page 48: Pregel reading circle

Experiment details

• Three experiments with the single-source shortest paths

• Using a cluster of 300 multicore commodity PCs

• Reporting runtime for binary trees and log-normal graphs

- Binary tree, varying number of worker tasks- Binary tree, varying graph sizes- Log-normal, random graphs: varying graph sizes

48

Page 49: Pregel reading circle

[1]1 billion vertex binary tree:varyingnumber of worker tasks

• Setting- A billion vertices, the number of Pregelworkers varying from50 to 800

• Result- Using 16 times as many as Workersrepresents a speedupof about 10

49

Page 50: Pregel reading circle

[2]Binary tree:varying graph sizes on 800 worker tasks

• Setting- Varying in size from a billion to 50 billion vertices,using a fixed numberof 800 workertasks

• Result- tree size varying from a billion to 50 billion,the time increase from17.3 to 702

50

Page 51: Pregel reading circle

[3]Log-normal random graphs: varying graph sizes on 800 worker

tasks(1/2)

• Binary trees are not representative of graphs encountered in practice

• Use a log-normal distribution of outdegrees

• In this experiment, μ = 4, σ = 1.3

ed

ddp

22 2/)(ln

2

1)(

51

Page 52: Pregel reading circle

[3]Log-normal random graphs: varying graph sizes on 800 worker

tasks(2/2)• Setting

- Varying in size from

10million to a a billion

vertices

• Result- Largest graph took

a little over 10 minutes

52

Page 53: Pregel reading circle

Outline

• Introduction

• Model of computation

• Pregel’s API

• Implementation

• Application

• Experiments

• conclusion

53

Page 54: Pregel reading circle

Conclusion

• They suggest a computing model that is suitable for graph processing, and has scalability, fault-tolerance

• They say that programmers can implement graph processing algorithm easily with Pregel

54

Page 55: Pregel reading circle

This slide’s sources(1/)

• http://www.slideshare.net/doryokujin/largescale-graph-processingintroduction

• http://shnya.jp/blog/?p=797

• http://www.slideshare.net/sscdotopen/introducing-apache-giraph-for-large-scale-graph-processing

• http://teppei.hateblo.jp/entry/2013/11/11/232052

• http://ja.wikipedia.org/wiki/%E5%AF%BE%E6%95%B0%E6%AD%A3%E8%A6%8F%E5%88%86%E5%B8%83

55

Page 56: Pregel reading circle

This slide’s sources(2/)

• http://keisan.casio.jp/exec/system/1161228861

• http://www.atmarkit.co.jp/ait/articles/1203/22/news165_2.html

• http://en.wikipedia.org/wiki/Bulk_synchronous_parallel

• http://research.preferred.jp/2011/06/bsp_piccolo_spark_introduction/

• http://ja.wikipedia.org/wiki/%E3%83%9A%E3%83%BC%E3%82%B8%E3%83%A9%E3%83%B3%E3%82%AF

56

Page 57: Pregel reading circle

This slide’s sources(3/)

• http://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%91%E3%83%8B%E3%83%B3%E3%82%B0%E3%83%84%E3%83%AA%E3%83%BC%E3%83%97%E3%83%AD%E3%83%88%E3%82%B3%E3%83%AB

• http://ja.wikipedia.org/wiki/%E6%9C%80%E7%9F%AD%E7%B5%8C%E8%B7%AF%E5%95%8F%E9%A1%8C

• http://matome.naver.jp/odai/2128685245125920701?&page=1

• http://www.cs.ucsb.edu/~prakash/projects/cs290b/index.html

57

Page 58: Pregel reading circle

This slide’s sources

• http://homepage2.nifty.com/well/Template.html

• http://ja.wikipedia.org/wiki/%E7%AC%AC%E4%B8%80%E7%B4%9A%E3%82%AA%E3%83%96%E3%82%B8%E3%82%A7%E3%82%AF%E3%83%88

• http://ja.wikipedia.org/wiki/%E3%82%AF%E3%83%AA%E3%83%BC%E3%82%AF_(%E3%82%B0%E3%83%A9%E3%83%95%E7%90%86%E8%AB%96)

• http://www.alaxala.com/jp/techinfo/archive/manual/AX2000R/HTML/KAISETS2/0078.HTM

58