Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

32
Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Transcript of Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Page 1: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Lecture 12: MapReduce: Simplified Data Processing on Large Clusters

Xiaowei Yang (Duke University)

Page 2: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Review

• What is cloud computing?• Novel cloud applications• Inner workings of a cloud

– MapReduce: how to process large datasets using a large cluster

– Datacenter networking

Page 3: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Roadmap

• Introduction• Examples• How it works• Fault tolerance• Debugging• Performance

Page 4: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

What is MapReduce• An automated parallel programming

model for large clusters– User implements Map() and Reduce()

• A framework– Libraries take care of the rest

• Data partition and distribution • Parallel computation• Fault tolerance• Load balancing

• Useful– Google

Page 5: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Map and Reduce

• Functions borrowed from functional programming languages (eg. Lisp)

• Map()– Process a key/value pair to generate

intermediate key/value pairs– map (in_key, in_value) -> (out_key,

intermediate_value) list

• Reduce()– Merge all intermediate values associated

with the same key– reduce (out_key, intermediate_value list) ->

out_value list

Page 6: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Example: word counting

• Map()– Input <filename, file text>– Parses file and emits <word, count>

pairs• eg. <”hello”, 1>

• Reduce()– Sums all values for the same key and

emits <word, TotalCount>• eg. <”hello”, (1 1 1 1)> => <”hello”, 4>

Page 7: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Example: word counting

• map(String key, String value):• // key: document name• // value: document contents• for each word w in value:• EmitIntermediate(w, "1");

• reduce(String key, Iterator values):• // key: a word• // values: a list of counts• int result = 0;• for each v in values:• result += ParseInt(v);• Emit(AsString(result));

Page 8: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Google Computing Environment

• Typical Clusters contain 1000's of machines

• Dual-processor x86's running Linux with 2-4GB memory

• Commodity networking– Typically 100 Mbs or 1 Gbs

• IDE drives connected to individual machines– Distributed file system

Page 9: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

How does it work?

• From user:• Input/output files• M: number of map tasks

– M >> # of worker machines for load balancing

• R: number of reduce tasks• W: number of machines

– Write map and reduce functions– Submit the job

• Requires no knowledge of parallel or distributed systems

• What about everything else?

Page 10: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Step 1: Data Partition and Distribution

• Split an input file into M pieces on distributed file system– Typically ~ 64 MB blocks

• Intermediate files created from map tasks are written to local disk

• Output files are written to distributed file system

Page 11: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Step 2: Parallel computation

• Many copies of user program are started

• One instance becomes the Master

• Master finds idle machines and assigns them tasks – M map tasks– R reduce tasks

Page 12: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Locality

• Tries to utilize data localization by running map tasks on machines with data

• map() task inputs are divided into 64 MB blocks: same size as Google File System chunks

Page 13: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Step 3: Map Execution

• Map workers read in contents of corresponding input partition

• Perform user-defined map computation to create intermediate <key,value> pairs

Page 14: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Step 4: output intermediate data

• Periodically buffered output pairs written to local disk– Partitioned into R regions by a

partitioning function

• Send locations of these buffered pairs on the local disk to the master, who is responsible for forwarding the locations to reduce workers

Page 15: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Partition Function

• Partition on the intermediate key – Example partition function: hash(key)

mod R

• Question: why do we need this?• Example Scenario:

– Want to do word counting on 10 documents

– 5 map tasks, 2 reduce tasks

Page 16: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Step 5: Reduce Execution

• The master notifies a reduce worker• Reduce workers iterate over ordered

intermediate data– Data is sorted by the intermediate keys

• Why is sorting needed?

– Each unique key encountered – values are passed to user's reduce function

– eg. <key, [value1, value2,..., valueN]>

• Output of user's reduce function is written to output file on global file system

• When all tasks have completed, master wakes up user program

Page 17: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Page 18: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Observations

• No reduce can begin until map is complete– Why?

• Tasks scheduled based on location of data• If map worker fails any time before reduce

finishes, task must be completely rerun• Master must communicate locations of

intermediate files• MapReduce library does most of the hard

work

Page 19: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Data store 1 Data store nmap

(key 1, values...)

(key 2, values...)

(key 3, values...)

map

(key 1, values...)

(key 2, values...)

(key 3, values...)

Input key*value pairs

Input key*value pairs

== Barrier == : Aggregates intermediate values by output key

reduce reduce reduce

key 1, intermediate

values

key 2, intermediate

values

key 3, intermediate

values

final key 1 values

final key 2 values

final key 3 values

...

Page 20: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Fault Tolerance

• Workers are periodically pinged by master– No response = failed worker

• Reassign tasks if workers dead

• Input file blocks stored on multiple machines

Page 21: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Backup tasks

• When computation almost done, reschedule in-progress tasks– Avoids “stragglers”– Reasons for stragglers

• Bad disk, background competition, bugs

Page 22: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Refinements

• User specified partition function– hash(Hostname(urlkey)) mod R

• Ordering guarantees• Combiner function

– Partial merging before a map worker sends the data

– Local reduce– Ex: <the, 1>

Page 23: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Skipping Bad Records

• The MapReduce library detects which records cause deterministic crashes– Each worker process installs a signal

handler that catches segmentation violations and bus errors

– Sends a “last gasp” UDP packet to the MapReduce master

– Skip the record

Page 24: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Debugging

• Offers human readable status info on http server– Users can see jobs completed, in-

progress, processing rates, etc.

Page 25: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Performance• Tests run on 1800 machines

– 4GB memory– Dual-processor # 2 GHz Xeons with

Hyperthreading– Dual 160 GB IDE disks– Gigabit Ethernet per machine

• Run over weekend – when machines were mostly idle

• Benchmark: Sort– Sort 10^10 100-byte records

Page 26: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Grep

Page 27: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Sort

N

Normal 200 P200rocesses KilledNormal No backup 200 tasks killed

Page 28: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Google usage

Page 29: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

More examples

• Distributed Grep

• Count of URL Access Frequency: the total access # to each url in web logs

• Inverted Index: the list of documents including a word

Page 30: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Conclusions

• Simplifies large-scale computations that fit this model

• Allows user to focus on the problem without worrying about details

• Computer architecture not very important– Portable model

Page 31: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Project proposal

Page 32: Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Count of URL Access Frequency

• The map function processes logs of webpage requests and outputs <URL, 1>.

• The reduce function adds together all values for the same URL and emits a <URL, total count> pair.