RACHEL WILLIAMS YOUTH WORKER KATE JONES YOUTH SUPPORT WORKER
Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4....
Transcript of Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4....
![Page 1: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/1.jpg)
Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L.,
Leiser, N., Czjkowski, G.
Speaker: Chong Li
Department: Applied Health Science
Program: Master of Health
Informatics
1
![Page 2: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/2.jpg)
Term explanation
Motivation & Introduction
Computation Model
System Implementation
Experiment
Conclusion & Future Work
Application
2
![Page 3: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/3.jpg)
Graph Database: a storage system that uses
graph representations for data where each
node represents an entity with unique id,
type and properties.
Superstep: iteration that is used for graph
algorithm in Pregel . It can be viewed as sort
of a barrier for parallel-y executing entities.
3
![Page 4: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/4.jpg)
4
![Page 5: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/5.jpg)
Daddies?
Yes?
Larry Page& Sergey Brin, 2
geniuses brought a
surprise to this world in
1998:
5
![Page 6: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/6.jpg)
-- 70 offices in more than 40 countries
-- Products include search tools, security tools, map-related
products, etc.
-- More and more information is collected and stored in
geographically different offices.
Distributed
computation?
6
![Page 7: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/7.jpg)
80% of google distributed computation is based
on MapReduce (Google Map, Google Translate,
etc).
--can take advantage of locality of data,
processing it on or near the storage assets in
order to reduce the distance over which it must
be transmitted
MapReduce!
7
![Page 8: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/8.jpg)
Challenges faced by MapReduce:
Many practical computing problems concern large-scale graphs- such as shortest path.
MapReduce, however :
- A lot of I/O due to passing the entire state of the graph
from one stage to the next.
- Too many iterations are needed for parallel graph
processing
MapReduce?
8
![Page 9: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/9.jpg)
Need for a scalable distributed solution
with features of :--Scalable and Fault-tolerant platform
--API with flexibility to express arbitrary graph algorithm
--Vertex centric computation (Think like a vertex) –pg.14
9
![Page 10: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/10.jpg)
Need for a scalable distributed solution
with features of :--Scalable and Fault-tolerant platform
--API with flexibility to express arbitrary algorithm
--Vertex centric computation (Think like a vertex)
Pregel!
10
![Page 11: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/11.jpg)
Pregel is a system for large-scale graph processing. It provides a fault-tolerant framework for the execution of graph algorithms in parallel over many machines.
Pregel model retains worker state (the same worker is responsible for the same set of nodes) across iteration, the graph can be loaded in memory once and reuse across iterations.
Pregel only sends local computed result over the network, which implies the minimal bandwidth consumption.
Note: Pregel is not a database because no key-value store or any new means of storing is used in this Google product.
11
![Page 12: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/12.jpg)
Bulk Synchronic
Parallel model (BSP)
12
![Page 13: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/13.jpg)
Input
Output
Supersteps(a sequence of iterations)
13
![Page 14: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/14.jpg)
In Superstep: the vertices compute in parallel
Each vertex
Receives messages sent in the previous superstep
Executes the same user-defined function
Modifies its value or values of its outgoing edges
Sends messages to other vertices (to be received in the
next superstep)
Mutates the topology of the graph
Votes to halt if it has no further work to do
--Vertex centric computation
14
![Page 15: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/15.jpg)
Vertex State Machine
•Termination condition
•All vertices are simultaneously inactive
•There are no messages in transit
15
![Page 16: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/16.jpg)
Pregel system also uses the master/worker model
Master Maintains worker
Recovers faults of workers
Provides Web-UI monitoring tool of job progress
Worker Processes its task
Communicates with the other workers
Persistent data is stored as files on a distributed storage system (such as GFS or BigTable)
Temporary data is stored on local disk
16
![Page 17: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/17.jpg)
1. Many copies of the program begin executing on a cluster of
machines
2. Master partitions the graph and assigns one or more
partitions to each worker
3. Master also assigns a partition of the input to each worker
Each worker loads the vertices and marks them as active
17
![Page 18: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/18.jpg)
4. The master instructs each worker to perform a superstep
Each worker loops through its active vertices &
computes for each vertex
Messages are sent asynchronously, but are delivered
before the end of the superstep
Note: This step is repeated as long as any vertices are
active, or any message is in transit
5. After the computation halts, the master may instruct each worker to save its portion of the graph
18
![Page 19: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/19.jpg)
Checkpointing
The master periodically instructs the workers to save the
state of their partitions to persistent storage system
e.g., Vertex values, edge values, incoming messages
Failure detection
Using regular “ping” messages
Recovery
The master reassigns graph partitions to the currently
available workers
The workers all reload their partition state from most
recent available checkpoint
19
![Page 20: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/20.jpg)
Worker can combine messages reported by its
vertices and send out one single message
Reduce message traffic and disk space
20
![Page 21: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/21.jpg)
Used for global communication, global data and
monitoring
21
![Page 22: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/22.jpg)
22
![Page 23: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/23.jpg)
Environment
H/W: A cluster of 300 multicore commodity PCs
Data: binary trees, log-normal random graphs (general graphs)
Naïve SSSP implementation (single-source shortest path )
The weight of all edges = 1
No checkpointing- because of short runtime
23
![Page 24: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/24.jpg)
SSSP – 1 billion vertex binary tree: varying #
of worker tasks
24
![Page 25: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/25.jpg)
SSSP – binary trees: varying graph sizes on 800 worker tasks
25
![Page 26: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/26.jpg)
SSSP – Random graphs: varying graph sizes on 800 worker tasks
26
![Page 27: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/27.jpg)
Pregel is a scalable and fault-tolerant platform with an API that is sufficiently flexible to express arbitrary graph algorithms
Future work
Relaxing the synchronicity of the model
Not to wait for slower workers at inter-superstep barriers
Assigning vertices to machines to minimize inter-
machine communication
Caring dense graphs in which most vertices send
messages to most other vertices
27
![Page 28: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/28.jpg)
Single Source Shortest Path
Find shortest path from a source node to all
target nodes
28
![Page 29: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/29.jpg)
0
10
5
2 3
2
1
9
7
4 6 Inactive Vertex
Active Vertex
Edge weight
Message
x
x
29
![Page 30: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/30.jpg)
0
10
5
2 3
2
1
9
7
4 6
10
5
Inactive Vertex
Active Vertex
Edge weight
Message
x
x
30
![Page 31: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/31.jpg)
0
10
5
10
5
2 3
2
1
9
7
4 6Inactive Vertex
Active Vertex
Edge weight
Message
x
x
31
![Page 32: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/32.jpg)
0
10
5
10
5
2 3
2
1
9
7
4 6
11
7
12
814
Inactive Vertex
Active Vertex
Edge weight
Message
x
x
32
![Page 33: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/33.jpg)
0
8
5
11
7
10
5
2 3
2
1
9
7
4 6Inactive Vertex
Active Vertex
Edge weight
Message
x
x
33
![Page 34: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/34.jpg)
0
8
5
11
7
10
5
2 3
2
1
9
7
4 6
9
14
13
15
Inactive Vertex
Active Vertex
Edge weight
Message
x
x
34
![Page 35: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/35.jpg)
0
8
5
9
7
10
5
2 3
2
1
9
7
4 6Inactive Vertex
Active Vertex
Edge weight
Message
x
x
35
![Page 36: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/36.jpg)
0
8
5
9
7
10
5
2 3
2
1
9
7
4 6
13
Inactive Vertex
Active Vertex
Edge weight
Message
x
x
36
![Page 37: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/37.jpg)
0
8
5
9
7
10
5
2 3
2
1
9
7
4 6Inactive Vertex
Active Vertex
Edge weight
Message
x
x
37
![Page 38: Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn… · 2014. 11. 20. · 4. The master instructs each worker to perform a superstep Each worker loops through](https://reader035.fdocuments.net/reader035/viewer/2022071101/5fd9c23841d73c6296184668/html5/thumbnails/38.jpg)
--Any question?
38