Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe...
Transcript of Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe...
![Page 1: Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”,](https://reader033.fdocuments.net/reader033/viewer/2022042406/5f205702bd80b53e7f6cc781/html5/thumbnails/1.jpg)
1
MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjey GhemawatGoogle, Inc.
Presented By: Hani Khoshdel Nikkhoo
Monday, January 25, 2010
![Page 2: Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”,](https://reader033.fdocuments.net/reader033/viewer/2022042406/5f205702bd80b53e7f6cc781/html5/thumbnails/2.jpg)
2
Outline
● Raison d'etre for MapReduce● Background: Functional Programming
● Map● Fold (Reduce)
● MapReduce in action● How to use MapReduce● Why MapReduce is useful
![Page 3: Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”,](https://reader033.fdocuments.net/reader033/viewer/2022042406/5f205702bd80b53e7f6cc781/html5/thumbnails/3.jpg)
3
Raison d'etre for MapReduce
The need to process large amounts of raw data (e.g.1000 GB) in a short amount of time (a few minutes)
– Parallelism– Fault-tolerance
![Page 4: Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”,](https://reader033.fdocuments.net/reader033/viewer/2022042406/5f205702bd80b53e7f6cc781/html5/thumbnails/4.jpg)
4
Functional Programming Principles
● When a function is applied to a data structure, the data structure does not change, rather the result is stored in a new data structure.
● A function can be used as the argument of another function.
![Page 5: Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”,](https://reader033.fdocuments.net/reader033/viewer/2022042406/5f205702bd80b53e7f6cc781/html5/thumbnails/5.jpg)
5
Map
map f lst
Creates a new list by applying f to each element of the input list; returns output in order. (Adapted from [2,3])
f ff f f
![Page 6: Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”,](https://reader033.fdocuments.net/reader033/viewer/2022042406/5f205702bd80b53e7f6cc781/html5/thumbnails/6.jpg)
6
Fold (Reduce)
fold f x0 lst
Moves across a list, applying f to each element plus an accumulator. f returns the next accumulator value, which is combined with the next element of the list (Adapted from [2,3])
f ff f f returned
initial
![Page 7: Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”,](https://reader033.fdocuments.net/reader033/viewer/2022042406/5f205702bd80b53e7f6cc781/html5/thumbnails/7.jpg)
7
MapReduce in Action
● Problem: counting the number of occurrences of each word in a literary collection.
(Ex. Adapted from [4])
to be or not to be that is the question
the head is not more native to the heart
brevity is the soul of wit
Literary Collection
![Page 8: Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”,](https://reader033.fdocuments.net/reader033/viewer/2022042406/5f205702bd80b53e7f6cc781/html5/thumbnails/8.jpg)
8
to be or not to be that is the question
the head is not more native to the heart
brevity is the soul of wit
Literary Collection
split
to be or not to be that is the question
the head is not more native to the heart
brevity is the soul of wit
Worker #1 Worker #2 Worker #3
![Page 9: Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”,](https://reader033.fdocuments.net/reader033/viewer/2022042406/5f205702bd80b53e7f6cc781/html5/thumbnails/9.jpg)
9
to be or not to be that is the question map the head is not more
native to the heart map brevity is the soul of wit map
![Page 10: Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”,](https://reader033.fdocuments.net/reader033/viewer/2022042406/5f205702bd80b53e7f6cc781/html5/thumbnails/10.jpg)
10
to be or not to be that is the question map
Map Worker #1
the head is not more native to the heart map
Map Worker #2
brevity is the soul of wit map
Map Worker #3
![Page 11: Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”,](https://reader033.fdocuments.net/reader033/viewer/2022042406/5f205702bd80b53e7f6cc781/html5/thumbnails/11.jpg)
11
to be or not to be that is the question map
(“to”, 1),(“be”,1),(“or”,1),(“not”,1),(“to”,1),(“be”,1), (“that”, 1),(“is”,1),(“the”, 1), (“question”,1)
Map#1
the head is not more native to the heart map
(“the”, 1),(“head”,1),(“is”,1),(“not”,1),(“more”,1),(“native”,1), (“to”,1),(“the”, 1), (“heart”,1)
Map#2
brevity is the soul of wit map
(“brevity”, 1),(“is”,1),(“the”,1),(“soul”,1),(“of”,1), (“wit”, 1)
Map#3
![Page 12: Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”,](https://reader033.fdocuments.net/reader033/viewer/2022042406/5f205702bd80b53e7f6cc781/html5/thumbnails/12.jpg)
12
to be or not to be that is the question map
(“to”, 1),(“be”,1),(“or”,1),(“not”,1),(“to”,1),(“be”,1), (“that”, 1),(“is”,1),(“the”, 1), (“question”,1)
partition
Map#1
the head is not more native to the heart map
(“the”, 1),(“head”,1),(“is”,1),(“not”,1),(“more”,1),(“native”,1), (“to”,1),(“the”, 1), (“heart”,1)
partition
Map#2
brevity is the soul of wit map
(“brevity”, 1),(“is”,1),(“the”,1),(“soul”,1),(“of”,1), (“wit”, 1)
partition
Map#3
Hmm! How should we partition?
![Page 13: Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”,](https://reader033.fdocuments.net/reader033/viewer/2022042406/5f205702bd80b53e7f6cc781/html5/thumbnails/13.jpg)
13
to be or not to be that is the question map
(“to”, 1),(“be”,1),(“or”,1),(“not”,1),(“to”,1),(“be”,1), (“that”, 1),(“is”,1),(“the”, 1), (“question”,1)
partition
(“to”,1), (“be”,1), (“to,1),(“be”,1), (“to”,1), (“native”,1), (“heart”,1),(“more”,1), (“of”,1)
Map#1
the head is not more native to the heart map
(“the”, 1),(“head”,1),(“is”,1),(“not”,1),(“more”,1),(“native”,1), (“to”,1),(“the”, 1), (“heart”,1)
partition
(“or”,1),(“is”,1),(“the,1),(“the”,1)(“is”,1),(“the”,1),(“is,1),(“the,1), (“soul”,1),(“wit”,1)
Map#2
brevity is the soul of wit map
(“brevity”, 1),(“is”,1),(“the”,1),(“soul”,1),(“of”,1), (“wit”, 1)
partition
(“not”,1),(“that”,1),(“question”,1),(“head”,1), (“not”,1),(“brevity”,1)
Map#3
![Page 14: Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”,](https://reader033.fdocuments.net/reader033/viewer/2022042406/5f205702bd80b53e7f6cc781/html5/thumbnails/14.jpg)
14
to be or not to be that is the question map
(“to”, 1),(“be”,1),(“or”,1),(“not”,1),(“to”,1),(“be”,1), (“that”, 1),(“is”,1),(“the”, 1), (“question”,1)
partition
(“to”,1), (“be”,1), (“to,1),(“be”,1), (“to”,1), (“native”,1), (“heart”,1),(“more”,1), (“of”,1)
shuffle
(“be”, <1,1>), (“heart”,<1>), (“more”,<1>), (“native”,<1>),(“of”,<1>), (“to”, <1,1,1>)
Map#1
the head is not more native to the heart map
(“the”, 1),(“head”,1),(“is”,1),(“not”,1),(“more”,1),(“native”,1), (“to”,1),(“the”, 1), (“heart”,1)
partition
(“or”,1),(“is”,1),(“the,1),(“the”,1)(“is”,1),(“the”,1),(“is,1),(“the,1), (“soul”,1),(“wit”,1)
shuffle
(“is”, <1,1,1>),(“or”,<1>), (“soul”,<1>),(“the”,<1,1,1,1>),(“wit”,<1>)
Map#2
brevity is the soul of wit map
(“brevity”, 1),(“is”,1),(“the”,1),(“soul”,1),(“of”,1), (“wit”, 1)
partition
(“not”,1),(“that”,1),(“question”,1),(“head”,1), (“not”,1),(“brevity”,1)
shuffle
(“brevity”, <1>),(“head”,<1>), (“not”,<1,1>),(“question”,<1>),(“that”,<1>)
Map#3
![Page 15: Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”,](https://reader033.fdocuments.net/reader033/viewer/2022042406/5f205702bd80b53e7f6cc781/html5/thumbnails/15.jpg)
15
to be or not to be that is the question map
(“to”, 1),(“be”,1),(“or”,1),(“not”,1),(“to”,1),(“be”,1), (“that”, 1),(“is”,1),(“the”, 1), (“question”,1)
partition
(“to”,1), (“be”,1), (“to,1),(“be”,1), (“to”,1), (“native”,1), (“heart”,1),(“more”,1), (“of”,1)
shuffle
(“be”, <1,1>), (“heart”,<1>), (“more”,<1>), (“native”,<1>),(“of”,<1>), (“to”, <1,1,1>)
reduce
(“be”, 2),(“heart”,1),(“more”,1),(“native”,1),(“of”,1),(“to”,3)
Map#1
Reduce#1
the head is not more native to the heart map
(“the”, 1),(“head”,1),(“is”,1),(“not”,1),(“more”,1),(“native”,1), (“to”,1),(“the”, 1), (“heart”,1)
partition
(“or”,1),(“is”,1),(“the,1),(“the”,1)(“is”,1),(“the”,1),(“is,1),(“the,1), (“soul”,1),(“wit”,1)
shuffle
(“is”, <1,1,1>),(“or”,<1>), (“soul”,<1>),(“the”,<1,1,1,1>),(“wit”,<1>)
reduce
(“is”, 3),(“or”,1),(“soul”,1),(“the”,4),(“wit”,1)
Map#2
Reduce#2
brevity is the soul of wit map
(“brevity”, 1),(“is”,1),(“the”,1),(“soul”,1),(“of”,1), (“wit”, 1)
partition
(“not”,1),(“that”,1),(“question”,1),(“head”,1), (“not”,1),(“brevity”,1)
shuffle
(“brevity”, <1>),(“head”,<1>), (“not”,<1,1>),(“question”,<1>),(“that”,<1>)
reduce
(“brevity”, 1),(“head”,1),(“not”,2)(“question”,1), (“that”,1)
Map#3
Reduce#3
![Page 16: Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”,](https://reader033.fdocuments.net/reader033/viewer/2022042406/5f205702bd80b53e7f6cc781/html5/thumbnails/16.jpg)
16
How to use MapReduce
● The user needs to worry only about two things:● The Map function● The Reduce function
![Page 17: Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”,](https://reader033.fdocuments.net/reader033/viewer/2022042406/5f205702bd80b53e7f6cc781/html5/thumbnails/17.jpg)
17
Why is MapReduce useful?
● The model is easy to use● Complexities hidden from users● A variety of problems expressible in this framework
● Scalability● Parallelism
● Fault-tolerance● Recovery
![Page 18: Google, Inc. · Systems Design and Implementation (OSDI'04), pages 137-150, 2004. (2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”,](https://reader033.fdocuments.net/reader033/viewer/2022042406/5f205702bd80b53e7f6cc781/html5/thumbnails/18.jpg)
18
References
(1)Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. In Proc. Symposium on Operating Systems Design and Implementation (OSDI'04), pages 137-150, 2004.
(2)Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet. “MapReduce Theory and Implementation”, Distributed Computing Seminar, Summer 2007
(3)Aaron Kimball,“Cluster Computing and MapReduce Lecture 2”, Google Inc., Summer 2007, Google Code University
http://www.youtube.com/watch?v=-vD6PUdf3Js
(4)Buettcher S., Clarke C.L.A. and Cormack G.V., Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010.