A Comparison of Join Algorithms for Log Processing in MapReduce

30
A Comparison of Join Algorithms for Log Processing in MapReduce Spyros Blanas, Jignesh M. Patel (University of Wisconsin- Madison) Eugene J. Shekita, Yuanyuan Tian(IBM Almaden Research Center) SIGMOD 2010 August 1, 2010 Presented by Hyojin Song

description

A Comparison of Join Algorithms for Log Processing in MapReduce. Spyros Blanas , Jignesh M. Patel(University of Wisconsin-Madison) Eugene J. Shekita , Yuanyuan Tian (IBM Almaden Research Center) SIGMOD 2010 August 1, 2010 Presented by Hyojin Song. Contents. Introduction - PowerPoint PPT Presentation

Transcript of A Comparison of Join Algorithms for Log Processing in MapReduce

Page 1: A Comparison of Join Algorithms for Log Processing in MapReduce

A Comparison of Join Algorithmsfor Log Processing in MapReduce

Spyros Blanas, Jignesh M. Patel (University of Wiscon-sin-Madison)Eugene J. Shekita, Yuanyuan Tian (IBM Almaden Re-search Center)

SIGMOD 2010

August 1, 2010Presented by Hyojin Song

Page 2: A Comparison of Join Algorithms for Log Processing in MapReduce

Contents

Introduction

Join Algorithms In MapReduce

Experimental Evaluation

Discussion

Conclusion

2 / 30

Page 3: A Comparison of Join Algorithms for Log Processing in MapReduce

Introduction(1/3) Log Processing

– Important type of data analysis commonly done with MapRe-duce

– A log of events click-stream log of phone call records a sequence of transactions

– To compute various statistics for business insight filtered aggregated mined for patterns

– Often needs to be join Log data and Reference data(user information)

3 / 30

Log Table

Call records Number

2010.09.24.14:20.30

01191655603

2010.09.24.14:30.45

01046841397

2010.09.25.19:11.118

01926540846

2010.09.28.06:40.97

01098446512

2010.09.29.08:44.08

01013461655

…… ……

Reference TableNumber Name

01191655603 송효진

01046841397 안철수

01926540846 한효주

01098446512 안인석

01013461655 마음이

…… ……

Page 4: A Comparison of Join Algorithms for Log Processing in MapReduce

Introduction(2/3) MapReduce Framework

– Used to analyze large volumes of data

– The success of MapReduce Simple programming framework To manage parallelization, fault tolerance, and load balancing

– The critics of MapReduce lack of a schema lack of a declarative query language lack of indexes

– Difficult for joins Not originally designed to combine information from several

data sources To use simple but inefficient algorithms to perform joins

4 / 30

Page 5: A Comparison of Join Algorithms for Log Processing in MapReduce

Introduction(3/3) The benefits of MapReduce for log processing

– Scalability China Mobile gathers 5-8TB of phone call records per day Facebook collect almost 6TB of new log data everyday with to-

tally 1.7PB

– Schema free flexibility a log record may also change over time

– Simple scans preferable (<-> index scans)– Time consuming work

gracefully fault tolerance support (<-> parallel RDBMS)

5 / 30

The goal of this paper– the implementation of several well-known join strategies in MapReduce – comprehensive experiments to compare these join techniques

Page 6: A Comparison of Join Algorithms for Log Processing in MapReduce

Contents

Introduction

Join Algorithms In MapReduce Experimental Evaluation

Discussion

Conclusion

6 / 30

Problem Statement1. Repartition Join2. Improved Repartition Join3. Directed Join4. Broadcast Join5. Semi-Join6. Per-split Semi-Join

Page 7: A Comparison of Join Algorithms for Log Processing in MapReduce

Join Algorithms in MRProblem Statement

An equi-join between a log table L and a reference table R on single column, with |L| >> |R|

7 / 30

To propose further improving its performance with some preprocessing techniques– Well-known in the RDBMS literature– Adapting them to MapReduce is not always straightforward– Crucial implementation details of these join algorithms

To implement two additional functions: init() and close()– These are called before and after each map or reduce task

Page 8: A Comparison of Join Algorithms for Log Processing in MapReduce

Join Algorithms in MR1. Repartition Join

The most commonly used join strategy in the MapRe-duce framework– L and R are dynamically partitioned on the join key– The corresponding pairs of partitions are joined– Similar to partitioned sort-merge join in the parallel RDBMS

8 / 30

Log Tablelog Student ID

DB B+ 2008-2424

KRR A 2010-8281

Opt A- 2005-3682

ML C0 2009-0078

OS A+ 2010-1004

NL D- 2008-0909

… …

User TableStudent ID Name

2008-0909 Ahn Jaemin

2010-1004 Kim Somin

2009-0078 Song Hyo-jin

2005-3682 Lee taewhi

2010-8281 An Inseok

… …

Example Tables(Log table & User table)– Log table

500,000 records Log has a lecture name and degree

– User table 10,000 records

– Join key is the student ID

Page 9: A Comparison of Join Algorithms for Log Processing in MapReduce

Join Algorithms in MR1. Repartition Join

9 / 30

Song 2009-0078

An 2010-8281

…….

A split of R or L(Distributed File System)

DB B 2008-2424

KRR A 2010-8281

NL D 2008-0909

ML C 2009-0078

OPT A 2005-3682

Map Phase Reduce Phase

2008-2424

L: DB B

R

L

L

2010-8281

L: KRR A

2010-8281

R: An

2008-0909

L: NL D

2009-0078

L: ML C

2009-0078

R: Song

2005-3682

L: OPT A

Local disk

Intermediate results

.

.

.

2008-0909

L: NL D

2010-8281

L: KRR A

2010-8281

R: An

2009-0078

R: Song

2005-3682

L: OPT A

2008-2424

L: DB B

2009-0078

L: ML C

Buffer

Page 10: A Comparison of Join Algorithms for Log Processing in MapReduce

Join Algorithms in MR1. Repartition Join

10 / 30

Output File(Distributed File System)

Reduce Phase

Student ID

Name Log

2009-0078

An In Seok KRR A

2010-8281

Song Hyo Jin ML C

BL

2008-0909

L: NL D

2010-8281

L: KR A

BR2010-8281

R: An

BR2009-0078

R: Song

BL

2005-3682

L: OPT A

2008-2424

L: DB B

2009-0078

L: ML C

Buffer

2008-2424

L: DB B

2010-8281

L: KRR A

2010-8281

R: An

2008-0909

L: NL D

2009-0078

L: ML C

2009-0078

R: Song

2005-3682

L: OPT A

Local disk

Page 11: A Comparison of Join Algorithms for Log Processing in MapReduce

Join Algorithms in MR1. Repartition Join

Standard Repartition Join– Potential problem

all records have to be buffered.

– May not fit in memory The data is highly skewed The key cardinality is small

– Variants of the standard repartition join are used in Pig, Hive, and Jaql today.

They all suffer from the buffering problem

11 / 30

Improved Repartition Join– The output key is changed to a composite of the join key and

the table tag– The partitioning & grouping function is customized– Records from the smaller table R are buffered and L records

are streamed to generate the join output

Page 12: A Comparison of Join Algorithms for Log Processing in MapReduce

Join Algorithms in MR2. Improved Repartition Join

12 / 30

Song 2009-0078

An 2010-8281

…….

A split of R or L(Distributed File System)

DB B 2008-2424

KRR A 2010-8281

NL D 2008-0909

ML C 2009-0078

OPT A 2005-3682

Map Phase Reduce Phase

2008-2424 L

L: DB B

R

L

L

2010-8281 L

L: KRR A

2010-8281 R

R: An

2008-0909 L

L: NL D

2009-0078 L

L: ML C

2009-0078 R

R: Song

2005-3682 L

L: OPT A

Local disk

Intermediate results

.

.

.

2008-0909 L

L: NL D

2010-8281 L

L: KRR A

2010-8281 R

R: An

2009-0078 R

R: Song

2005-3682 L

L: OPT A

2008-2424 L

L: DB B

2009-0078 L

L: ML C

Buffer

Page 13: A Comparison of Join Algorithms for Log Processing in MapReduce

Join Algorithms in MR2. Improved Repartition Join

13 / 30

Output File(Distributed File System)

Reduce Phase

Student ID

Name Log

2009-0078

An In Seok KRR A

2010-8281

Song Hyo Jin ML C

L records are streamed

BR2010-8281

R: An

BR2009-0078

R: Song

Buffer

2008-2424 L

L: DB B

2010-8281 L

L: KRR A

2010-8281 R

R: An

2008-0909 L

L: NL D

2009-0078 L

L: ML C

2009-0078 R

R: Song

2005-3682 L

L: OPT A

Local disk

L records are streamed

Page 14: A Comparison of Join Algorithms for Log Processing in MapReduce

Join Algorithms in MR3. Directed Join

Preprocessing for Repartition Join (Directed Join)– Both L and R have already been partitioned on the join key

Pre-partitioning L on the join key Then at query time, matching partitions from L and R can be di-

rectly joined

– A map-only MapReduce job. During the init phase, Ri is retrieved from the DFS To use a main memory hash table, if it’s not already in local

storage

14 / 30

Page 15: A Comparison of Join Algorithms for Log Processing in MapReduce

Join Algorithms in MR4. Broadcast Join

Broadcast Join– In most applications, |R| << |L|– Instead of moving both R and L across the network,– To broadcast the smaller table R to avoids the network over-

head– A map-only job– Each map task uses a main-memory hash table for either L

or R

15 / 30

Page 16: A Comparison of Join Algorithms for Log Processing in MapReduce

Join Algorithms in MR4. Broadcast Join

Broadcast Join– If R < a split of L

To build the hash table on R

– If R > a split of L To build the hash table on a split of L

16 / 30

Preprocessing for Broadcast Join– Most nodes in the cluster

have a local copy of R in advance

– To avoid retrieving Rfrom the DFS in its init() function

Page 17: A Comparison of Join Algorithms for Log Processing in MapReduce

Join Algorithms in MR5. Semi-Join

Semi-Join– Some applications, |R| << |L|

In Facebook, user table has hundreds of millions of records A few million unique active users per hour

– To avoid sending the records in R over the network that will not join with L

Preprocessing for Semi-Join– First two phases of semi-join can preprocess

17 / 30

Page 18: A Comparison of Join Algorithms for Log Processing in MapReduce

Join Algorithms in MR6. Per-Split Semi-Join

Per-Split Semi-Join– The problem of Semi-join : All records of extracted R will not

join Li

– Li can be joined with Ri directly

Preprocessing for Per-split Semi-join– Also benefit from moving its first two phases

18 / 30

Page 19: A Comparison of Join Algorithms for Log Processing in MapReduce

Contents

Introduction

Join Algorithms In MapReduce

Experimental Evaluation Discussion

Conclusion

19 / 30

1. Environment2. Datasets3. MapReduce Time Breakdown4. Experimental Results

Page 20: A Comparison of Join Algorithms for Log Processing in MapReduce

Experimental Evaluation1. Environment

System Specification– All experiments run on a 100-node cluster– Single 2.4GHz Intel Core 2 Duo processor– 4GB of DRAM and two SATA disks– Red Hat Enterprise Server 5.2 running Linux 2.6.18

20 / 30

Network Specification– The 100 nodes were spread across two racks– Each node can execute two map and two reduce tasks con-

currently– Each rack had its own gigabit Ethernet switch– The rack level bandwidth is 32Gb/s– Under full load, 35MB/s cross-rack node-to-node bandwidth

version 0.19.0, HDFS (128MB block size)

Page 21: A Comparison of Join Algorithms for Log Processing in MapReduce

Experimental Evaluation2. Datasets

Datasets

21 / 30

Event Log (L) User Info (R)

Join column size 10 bytes 5 bytes

Record size 100bytes (average) 100 bytes (exactly)

Total size 500GB 10MB~100GB

• Join result is a 10 bytes join key• n-to-1 join• many users are inactive• All the records in L always appear in the result• To fix the fraction of R that was referenced by L to be 0.1%, 1%, or 10%• To simulate some active users, a Zipf distribution was used

Page 22: A Comparison of Join Algorithms for Log Processing in MapReduce

Experimental Evaluation3. MapReduce Time Breakdown

22 / 30

Page 23: A Comparison of Join Algorithms for Log Processing in MapReduce

Experimental Evaluation3. MapReduce Time Breakdown

MapReduce Time Breakdown– What transpires during the execution of a MapReduce job– The overhead of various execution components of MapRe-

duce

– System Environment The standard repartition join algorithm 500GB log table and 30MB reference table 1% actually referenced by the log records 4000 map tasks and 200 reduce tasks A node was assigned 40 map and 2 reduce tasks

23 / 30

Page 24: A Comparison of Join Algorithms for Log Processing in MapReduce

Experimental Evaluation3. MapReduce Time Breakdown

Interesting Observations on MapReduce– The map phase was clearly CPU-bound– The reduce phase was limited by the network bandwidth

Writing the three copies of the join result to HDFS

– The disk and the network activities were moderate and peri-odic during map phase

The peaks were related to the output generation in the map task The shuffle phase in the reduce task

– Almost idle for about 30 secondsbetween the 9 min and 10 min mark

Waiting for the slowest map task

– By enabling independent and concurrent map tasks, almost all CPU, disk andnetwork activities can be overlapped

24 / 30

Page 25: A Comparison of Join Algorithms for Log Processing in MapReduce

Experimental Evaluation4. Experimental Results

25 / 30

▣ No preprocess-ing

▣ preprocessing

Page 26: A Comparison of Join Algorithms for Log Processing in MapReduce

Experimental Evaluation4. Experimental Results

26 / 30

Page 27: A Comparison of Join Algorithms for Log Processing in MapReduce

Contents

Introduction

Join Algorithms In MapReduce

Experimental Evaluation

Discussion Conclusion

27 / 30

Page 28: A Comparison of Join Algorithms for Log Processing in MapReduce

Discussion Choosing the Right Strategy

– To determine what is the right join strategy for a given cir-cumstance

– To provide an important first step for query optimization

28 / 30

Page 29: A Comparison of Join Algorithms for Log Processing in MapReduce

Contents

Introduction

Join Algorithms In MapReduce

Experimental Evaluation

Discussion

Conclusion

29 / 30

Page 30: A Comparison of Join Algorithms for Log Processing in MapReduce

Conclusion Joining log data with reference data in MapReduce

has emerged as an important part– Analytic operations for enterprise customers– Web 2.0 companies

30 / 30

To design a series of join algorithms on top of MapRe-duce– Without requiring any modification to the actual framework– To propose many details for efficient implementation

Two additional function: Init(), close() Practical preprocessing techniques

Future work– Multi-way joins– Indexing methods to speedup join queries– Optimization module (selecting appropriate join algorithms)– New programming models to extend the MapReduce framework