An Introduction To Map-Reduce
-
Upload
francisco-perez-sorrosal -
Category
Documents
-
view
1.166 -
download
2
description
Transcript of An Introduction To Map-Reduce
![Page 1: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/1.jpg)
An Introduction to MapReduce
Francisco Pérez-SorrosalDistributed Systems Lab (DSL/LSD)
Universidad Politécnica de Madrid
10/Apr/2008
![Page 2: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/2.jpg)
An Introduction to MapReduce2
Outline
1. Motivation2. What is MapReduce?
Simple Example What is MapReduce’s Main Goal? Main Features What MapReduce Solves?
3. Programming Model4. Framework Overview
Example5. Other Features6. Hadoop: A MapReduce Implementation
Example7. References
![Page 3: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/3.jpg)
An Introduction to MapReduce3
Motivation
Increasing demand of large scale processing applications Web engines, semantic search tools, scientific
applications... Most of these applications can be parallelized
There are many ad-hoc implementations for such applications but...
![Page 4: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/4.jpg)
An Introduction to MapReduce4
Motivation (II)
...the development and management execution of such ad-hoc parallel applications was too complex Usually implies the use and management of
hundreds/thousands of machines
However, they share basically the same problems: Parallelization Fault-tolerance Data distribution Load balancing
![Page 5: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/5.jpg)
An Introduction to MapReduce5
What is MapReduce?
It is a framework to... ...automatically partition jobs that have large
input data sets into simpler work units or tasks, distribute them in the nodes of a cluster (map)
and... ...combine the intermediate results of those
tasks (reduce) in a way to produce the required results.
Presented by Google in 2004 http://labs.google.com/papers/mapreduce.html
![Page 6: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/6.jpg)
An Introduction to MapReduce6
Simple Example
Input data
Mapped dataon Node 1
Mapped dataon Node 2
Result
![Page 7: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/7.jpg)
An Introduction to MapReduce7
What is MapReduce’s Main Goal?
Simplify the parallelization and distribution of large-scale computations in clusters
![Page 8: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/8.jpg)
An Introduction to MapReduce8
MapReduce Main Features
Simple interface
Automatic partition, parallelization and distribution of tasks
Fault-tolerance
Status and monitoring
![Page 9: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/9.jpg)
An Introduction to MapReduce9
What does MapReduce solves?
It allows non-experienced programmers on parallel and distributed systems to use large distributed systems easily
Used extensively on many applications inside Google and Yahoo that...
...require simple processing tasks... ...but have large input data sets
![Page 10: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/10.jpg)
An Introduction to MapReduce10
What does MapReduce solves?
Examples: Distributed grep Distributed sort Count URL access frequency Web crawling Represent the structure of web documents Generate summaries (pages crawled per
host, most frequent queries, results returned...)
![Page 11: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/11.jpg)
An Introduction to MapReduce11
Programming Model Input & Output
Each one is a set of key/value pairs Map:
Processes input key/value pairs Compute a set of intermediate key/value pairs
map (in_key, in_value) -> list(int_key, intermediate_value)
Reduce: Combine all the intermediate values that share the
same key Produces a set of merged output values (usually just
one per key)reduce(int_key, list(intermediate_value)) -> list(out_value)
![Page 12: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/12.jpg)
An Introduction to MapReduce12
Programming Model: Example
Problem: Count of URL access frequency
Input: Log of web page requests Map:
Processes the assigned chunk of the log Compute a set of intermediate pairs <URL, 1>
Reduce: Processes the intermediate pairs <URL, 1> Adds together all the values that share the same
URL Produces a set pairs in the form <URL, total count>
![Page 13: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/13.jpg)
An Introduction to MapReduce13
Framework Overview
![Page 14: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/14.jpg)
An Introduction to MapReduce14
Framework Overview
![Page 15: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/15.jpg)
An Introduction to MapReduce15
Big File 640MB
Worker
Idle
Worker
Idle
Master
1) Split File into 10 pieces of 64MB
Worker
Idle
R = 4 output files(Set by theuser)
Example: Count # of Each Letter in a Big File
a t b om a p rr e d uc e g oo o g le a p im a c ac a b ra a r ro z f ei j a o
t o m at e c ru i m es s o l
(There are 26 different keysletters in the range [a..z])
Worker
Idle
Worker
Idle
Worker
Idle
Worker
Idle
Worker
Idle
12345
67
8
9
10
![Page 16: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/16.jpg)
An Introduction to MapReduce16
Big File 640MB
Worker
Idle
Worker
Idle
Master
2) Assign map and reduce tasks
Worker
Idle
Example: Count # of Each Letter in a Big File
a t b om a p rr e d uc e g oo o g le a p im a c ac a b ra a r ro z f ei j a o
t o m at e c ru i m es s o l
Worker
Idle
Worker
Idle
Worker
Idle
Worker
Idle
Worker
Idle
Mappers Reducers12345
67
8
9
10
![Page 17: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/17.jpg)
An Introduction to MapReduce17
Big File 640MB
Master
3) Read the split data
Example: Count # of Each Letter in a Big File
a t b om a p rr e d uc e g oo o g le a p im a c ac a b ra a r ro z f ei j a o
t o m at e c ru i m es s o l
Map T.
In progress
Map T.
In progress
Map T.
In progress
Reduce T.
Idle
Reduce T.
Idle
Reduce T.
Idle
Map T.
In progress
Reduce T.
Idle
1234
![Page 18: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/18.jpg)
An Introduction to MapReduce18
a b c d e f g h i j k l m n n o p q r s t v w x y z
Machine 1
Big File 640MB
4) Process data (in memory)
Map T.1
In-Progress
Example: Count # of Each Letter in a Big File
a y b om a p rr e d uc e g oo o g le a p im a c ac a b ra a r ro z f ei j a o
t o m at e c ru i m es s o l
R1
Partition Function(used to map the letters in regions):
R2R3R4
Simulating the execution in memory
R1R2R3R4
(a,1) (b,1) (a,1)(m1)
(o,1) (p,1) (r, 1)(y,1)
![Page 19: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/19.jpg)
An Introduction to MapReduce19
Machine 1
Big File 640MB
Master
5) Apply combiner function
Map T.1
In-Progress
Example: Count # of Each Letter in a Big File
a t b om a p rr e d uc e g oo o g le a p im a c ac a b ra a r ro z f ei j a o
t o m at e c ru i m es s o l
Simulating the execution in memory
R1R2R3R4
(a,1) (b,1) (a,1)(m1)
(o,1) (p,1) (r, 1)(y,1)
(a,2) (b,1) (m1)
(o,1) (p,1) (r, 1)(y,1)
![Page 20: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/20.jpg)
An Introduction to MapReduce20
Machine 1
Big File 640MB
Master
6) Store results on disk
Map T.1
In-Progress
Example: Count # of Each Letter in a Big File
a t b om a p rr e d uc e g oo o g le a p im a c ac a b ra a r ro z f ei j a o
t o m at e c ru i m es s o l
Memory
R1R2R3R4
Disk
(a,2) (b,1) (m1)
(o,1) (p,1) (r, 1)(y,1)
![Page 21: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/21.jpg)
An Introduction to MapReduce21
Big File 640MB
Master
7) Inform the master about the position of the intermediate results in local disk
Example: Count # of Each Letter in a Big File
a t b om a p rr e d uc e g oo o g le a p im a c ac a b ra a r ro z f ei j a o
t o m at e c ru i m es s o l
Machine 1
Map T.1
In-Progress
R1R2R3R4
MT1 ResultsLocation
MT1 Results (a,2) (b,1) (m1)
(o,1) (p,1) (r, 1)(y,1)
![Page 22: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/22.jpg)
An Introduction to MapReduce22
Big File 640MB
Master
8) The Master assigns the next task (Map Task 5) to the Worker recently free
Example: Count # of Each Letter in a Big File
a t b om a p rr e d uc e g oo o g le a p im a c ac a b ra a r ro z f ei j a o
t o m at e c ru i m es s o l
Machine 1
Worker
In-Progress
R1R2R3R4
T1 Results
Data for Map Task 5
(a,2) (b,1) (m1)
(o,1) (p,1) (r, 1)(y,1)
Task 5
![Page 23: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/23.jpg)
An Introduction to MapReduce23
Master
9) The Master forwards the location of the intermediate results of Map Task 1 to reducers
Example: Count # of Each Letter in a Big File
Machine 1
Map T.5
In-Progress
R1R2R3R4
Reduce T.1
Idle
MT1 Results
MT1 Results Location (R1)
MT1 Results Location (Rx)
Big File 640MBa t b om a p rr e d uc e g oo o g le a p im a c ac a b ra a r ro z f ei j a o
t o m at e c ru i m es s o l
...
(a,2) (b,1) (m1)
(o,1) (p,1) (r, 1)(y,1)
![Page 24: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/24.jpg)
An Introduction to MapReduce24
Example: Count # of Each Letter in a Big File
Reduce T.1
Idle
Big File 640MBa t b om a p rr e d uc e g oo o g le a p im a c ac a b ra a r ro z f ei j a o
t o m at e c ru i m es s o l
(a, 2) (b,1)(e, 1) (d, 1)(c, 1) (e, 1)
(g, 1)
(e, 1) (a, 3) (c, 1)(c, 1) (a, 1) (b,1)
(a, 2) (f, 1) (e, 1)
(a, 2)(e, 1)(c, 1)
(e, 1)
R1a b c d e f g
Letters in Region 1:
![Page 25: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/25.jpg)
An Introduction to MapReduce25
Machine N
Example: Count # of Each Letter in a Big File
Reduce T.1
In-Progress
(a, 2) (b,1)(e, 1) (d, 1)(c, 1) (e, 1)
(g, 1)(e, 1) (a, 3) (c, 1)(c, 1) (a, 1) (b,1)(a, 2) (f, 1) (e, 1)
(a, 2)(e, 1)(c, 1)
(e, 1)
Data read from each Map Task
stored in region 1
10) The RT 1 reads the data in R=1 from each MT
![Page 26: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/26.jpg)
An Introduction to MapReduce26
Machine N
Example: Count # of Each Letter in a Big File
Reduce T.1
In-Progress
(a, 2) (a, 3) (a, 1)(a, 2) (a, 2) (b,1)(b,1) (c, 1) (c, 1)(c, 1) (c, 1) (d, 1)(e, 1) (e, 1) (e, 1) (e, 1) (e, 1) (e, 1)
(f, 1) (g, 1)
11) The reduce task 1 sorts the data
![Page 27: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/27.jpg)
An Introduction to MapReduce27
Machine N
Example: Count # of Each Letter in a Big File
Reduce T.1
In-Progress
(a, 2) (a, 3) (a, 1)(a, 2) (a, 2) (b,1)(b,1) (c, 1) (c, 1)(c, 1) (c, 1) (d, 1)(e, 1) (e, 1) (e, 1) (e, 1) (e, 1) (e, 1)
(f, 1) (g, 1)
12) Then it passes the key and the corresponding set of intermediate data to the user's reduce function
(a, {2,3,1,2,2})
(b, {1,1})(c, {1,1,1,1})(d,{1})(e, {1,1,1,1,1,1})(f, {1})(g, {1})
![Page 28: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/28.jpg)
An Introduction to MapReduce28
Machine N
Example: Count # of Each Letter in a Big File
Reduce T.1
In-Progress
12) Finally, generates the output file 1 of R, after executing the user's reduce
(a, {2,3,1,2,2})(b, {1,1})
(c, {1,1,1,1})(d,{1})
(e, {1,1,1,1,1,1})(f, {1})(g, {1})
(a, 10)(b, 2)(c, 4)(d, 1)(e, 6)(f, 1)(g, 1)
![Page 29: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/29.jpg)
An Introduction to MapReduce29
Other Features: Failures Re-execution is the main mechanism for fault-tolerance Worker failures:
Master detect Worker failures via periodic heartbeats The master drives the re-execution of tasks
Completed and in-progress map tasks are re-executed In-progress reduce tasks are re-executed
Master failure: The initial implementation did not support failures of the
master Solutions:
Checkpoint the state of internal structures in the GFS Use replication techniques
Robust: lost 1600 of 1800 machines once, but finished fine
![Page 30: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/30.jpg)
An Introduction to MapReduce30
Other Features: Locality
Most input data is read locally
Why? To not consume network bandwidth
How does it achieve that? The master attempts to schedule a map task on a
machine that contains a replica (in the GFS) of the corresponding input data
If it fails, attempts to schedule near a replica (e.g. on the same network switch)
![Page 31: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/31.jpg)
An Introduction to MapReduce31
Other Features: Backup Tasks Some tasks may have delays (Stragglers):
A machine that takes too long time to complete one of the last few map or reduce tasks
Causes: Bad disk, concurrency with other processes, processor caches disabled
Solution: When close to completion, master schedules Backup Tasks for in-progress tasks Whichever one that finishes first "wins"
Effect: Dramatically shortens job completion time
![Page 32: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/32.jpg)
An Introduction to MapReduce32
Performance
Tests run on cluster of ~ 1800 machines: 4 GB of memory Dual-processor 2 GHz Xeons with Hyperthreading Dual 160 GB IDE disks Gigabit Ethernet per machine All machines in placed in the same hosting facility
![Page 33: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/33.jpg)
An Introduction to MapReduce33
Performance: Distributed Grep Program
Searching for rare three-character pattern The pattern occurs 97337 times
Scans through 1010 100-byte records (Input)
Input split into aprox. 64MB Map tasks = 15000
Entire output is placed in one file Reducers =1
![Page 34: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/34.jpg)
An Introduction to MapReduce34
Performance: Grep Test completes in ~
150 sec Locality optimization
helps: 1800 machines read 1
TB of data at peak of ~31 GB/s
Without this, rack switches would limit to 10 GB/s Startup overhead is
significant for short jobs
1764 Workers
Maps are starting to finish
Scan Rate
![Page 35: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/35.jpg)
An Introduction to MapReduce35
Hadoop: A MapReduce Implementation http://hadoop.apache.org Installing Hadoop MapReduce
Install Hadoop Core Configure Hadoop site
in conf/hadoop-site.xml
HDFS Master MapReduce Master # of replicated files in the
cluster
<configuration> <property>
<name>fs.default.name</name><value>hdfs://localhost:9000</
value></property><property>
<name>mapred.job.tracker</name><value>localhost:9001</value>
</property><property>
<name>dfs.replication</name><value>1</value>
</property></configuration>
![Page 36: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/36.jpg)
An Introduction to MapReduce36
Hadoop: A MapReduce Implementation Create a distributed filesystem:
$ bin/hadoop namenode -format Start Hadoop daemons
$ bin/start-all.sh ($ bin/start-dfs.sh + $ bin/start-mapred.sh)
Check the namenode (HDFS) http://localhost:50070/
Check the job tracker (MapReduce) http://localhost:50030/
![Page 37: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/37.jpg)
An Introduction to MapReduce37
Hadoop: HDFS Console
![Page 38: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/38.jpg)
An Introduction to MapReduce38
Hadoop: JobTracker Console
![Page 39: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/39.jpg)
An Introduction to MapReduce39
Hadoop: Word Count Example $ bin/hadoop dfs -ls
/tmp/fperez-hadoop/wordcount/input/ /tmp/fperez-hadoop/wordcount/input/file01 /tmp/fperez-hadoop/wordcount/input/file02
$ bin/hadoop dfs -cat /tmp/fperez-hadoop/wordcount/input/file01 Welcome To Hadoop World
$ bin/hadoop dfs -cat /tmp/fperez-hadoop/wordcount/input/file02 Goodbye Hadoop World
![Page 40: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/40.jpg)
An Introduction to MapReduce40
Hadoop: Running the Example Run the application
$ bin/hadoop jar /tmp/fperez-hadoop/wordcount.jar org.myorg.WordCount /tmp/fperez-hadoop/wordcount/input /tmp/fperez/wordcount/output
Output: $ bin/hadoop dfs -cat
/tmp/fperez-hadoop/wordcount/output/part-00000 Goodbye 1 Hadoop 2 To 1 Welcome 1 World 2
![Page 41: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/41.jpg)
An Introduction to MapReduce41
Hadoop: Word Count Examplepublic class WordCount extends Configured implements Tool {...
public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text,IntWritable> {... // Map Task Definition}
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text,IntWritable> {
... // Reduce Task Definition}
public int run(String[] args) throws Exception {... // Job Configuration}
public static void main(String[] args) throws Exception {int res = ToolRunner.run(new Configuration(), new WordCount(), args);System.exit(res);
}}
![Page 42: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/42.jpg)
An Introduction to MapReduce42
Hadoop: Job Configurationpublic int run(String[] args) throws Exception { JobConf conf = new JobConf(getConf(), WordCount.class); conf.setJobName("wordcount");
// the keys are words (strings) conf.setOutputKeyClass(Text.class); // the values are counts (ints) conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(MapClass.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class);
conf.setInputPath(new Path(args.get(0))); conf.setOutputPath(new Path(args.get(1))); JobClient.runJob(conf);
return 0; }
![Page 43: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/43.jpg)
An Introduction to MapReduce43
Hadoop: Map Classpublic static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new
IntWritable(1); private Text word = new Text(); // map(WritableComparable, Writable, OutputCollector, Reporter)
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString(); StringTokenizer itr = new StringTokenizer(line);
while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one);
} }
}
![Page 44: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/44.jpg)
An Introduction to MapReduce44
Hadoop: Reduce Classpublic static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
// reduce(WritableComparable, Iterator, OutputCollector, Reporter)public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
int sum = 0; while (values.hasNext()) {
sum += values.next().get(); } output.collect(key, new IntWritable(sum));
}
}
![Page 45: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/45.jpg)
An Introduction to MapReduce45
References• Jeffrey Dean, Sanjay Ghemawat. MapReduce: Simplified Data
Processing on Large Clusters. OSDI'04, San Francisco, CA, December, 2004.
• Ralf Lämmel. Google's MapReduce Programming Model – Revisited. 2006-2007. Accepted for publication in the Science of Computer Programming Journal
• Jeff Dean, Sanjay Ghemawat. Slides from the OSDI'04. http://labs.google.com/papers/mapreduce-osdi04-slides/index.html
• Hadoop. http://hadoop.apache.org
![Page 46: An Introduction To Map-Reduce](https://reader030.fdocuments.net/reader030/viewer/2022020720/54be2e4e4a79592f108b462f/html5/thumbnails/46.jpg)
An Introduction to MapReduce46
Questions?