Vgu bis2010 Mapreduce and Batch processing

12
MapReduce and Batch Processing VGU BIS2010, Group 13 Son Pham: [email protected] | Phong Le: [email protected] | Lam Pham: [email protected] | Chuong Nguyen: [email protected] | Chapter 4

Transcript of Vgu bis2010 Mapreduce and Batch processing

Page 1: Vgu bis2010 Mapreduce and Batch processing

MapReduce and Batch Processing

VGU BIS2010, Group 13

Son Pham: [email protected] |

Phong Le: [email protected] |

Lam Pham: [email protected] |

Chuong Nguyen: [email protected] |

Chapter 4

Page 2: Vgu bis2010 Mapreduce and Batch processing

Content

Part 1: Son Pham

Batch Layer <

Part 2: Phong Le

> MapReduce

Part 3: Lam Pham

MapReduce <

Part 4: Chuong Nguyen

> Demo

Page 3: Vgu bis2010 Mapreduce and Batch processing

Batch Layer

Lambda Architecture

Page 4: Vgu bis2010 Mapreduce and Batch processing

Batch Layer

• Precomputation• High latency• Linearly Scalable

Page 5: Vgu bis2010 Mapreduce and Batch processing

Batch Layer

On-the-fly computation:

Precomputation:

Page 6: Vgu bis2010 Mapreduce and Batch processing

Batch Layer – Linear Scalability

“Scalability is the ability of a system to maintain performance under increased

load by adding more resources”

Page 7: Vgu bis2010 Mapreduce and Batch processing

Linear vs. Non-Linear Scalability

Linear Scalability Non- Linear Scalability

“A linearly scalable system can maintain performance under increasedload by adding resources in proportion to the increased load”

Page 8: Vgu bis2010 Mapreduce and Batch processing

MapReduce

A distributed computing paradigm originally pioneered by Google

Inspired by the “Map” and “Reduce” functions commonly used in functional programming (LISP)

Operating on data stored in a distributed filesystem (HDFS…)

A population free implementation is Apache Hadoop.

Page 9: Vgu bis2010 Mapreduce and Batch processing

MapReduce

Page 10: Vgu bis2010 Mapreduce and Batch processing

MapReduce - “Word count” Example

Page 11: Vgu bis2010 Mapreduce and Batch processing

MapReduceScalability

Automatically parallelize the computation across the cluster of machines

Fault-ToleranceReassign failed tasks

Page 12: Vgu bis2010 Mapreduce and Batch processing

Q&A

THANK YOU