Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma,...

21
Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard Labs

Transcript of Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma,...

Page 1: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

Meeting Service Level Objectives of Pig Programs

Zhuoyao Zhang, Ludmila Cherkasova,

Abhishek Verma, Boon Thau Loo

University of PennsylvaniaHewlett-Packard Labs

Page 2: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

Cloud Environment

•Advantages▫Large amount of resources▫Elasticity ▫Pay-as-you-go pricing model

•Challenges▫Distributed resources▫Error-prone

Page 3: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

MapReduce and Pig

•MapReduce: Simple and fault tolerant framework for data processing in the cloud

•Pig▫Advanced MapReduce based platform▫Widely used: Yahoo!, Twitter, LinkedIn▫PigLatin: A high-level declaratice language

for expressing data analysis tasks as Pig programs

j1

j2

j3

j4

j5

j6

j7

Page 4: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

Motivation•Latency-sensitive applications

▫Personalized advertising▫Spam and fraud detection▫Real-time log analysis

•How much resource does an application need to meet their deadlines?

Page 5: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

Contributions•Performance modeling for Pig programs▫Given a Pig grogram, estimates its

completion time as a function of assigned resource

•Deadline driven resource allocation estimates for Pig programs▫Given a completion time target,

determine the amount of resources for a Pig program to achieve it

Page 6: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

Outline•Introduction•Building block

▫Performance model for single MapReduce jobs

•Resource allocation for Pig programs

•Evaluation•Conclusion and ongoing work

Page 7: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

Theoretical Makespan Bounds•Bounds- based makespan estimates

▫n tasks, k servers▫avg: average duration of the n tasks▫max: maximum duration of the n tasks

•Lower bound

•Upper boundk

navgTlow

max)1(

k

navgTup

Page 8: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

IllustrationSchedule 1: 1 4 3 2 3 1 2

Schedule 2: 3 1 2 3 2 1 4

Makespan = 4Lower bound =

4

Makespan = 7Upper bound =

8

1

2

4

3

1

2

4

3

Page 9: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

•Estimate the bounds of the job completion time based on job profile▫Most production jobs are executed

routinely on new data sets

▫Job profile based on previous running Map stage: Mavg, Mmax, AvgInputSize, Selectivity

Reduce stage: Shavg, Shmax, Ravg, Rmax, Selectivity

▫Predict the completion time for future running with the profile

Estimate Completion Time for Single MR Job

Page 10: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

•Estimating bounds on the duration of map and reduce stages

•Map stage duration depends on:▫NM -- the number of map tasks

▫SM -- the number of map slots

•Reduce stage duration depends on:▫NR -- the number of reduce tasks

▫SR -- the number of reduce slots

•Job duration TJlow , TJ

up , Tjavg

▫ Sum of the map and reduce stage duration10

max

)1(

MS

NMT

SN

MT

M

Mavg

upM

M

Mavg

lowM

Estimate Completion Time for Single MR Job

Page 11: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

•Given a deadline D and the job profile, find the minimal resource to complete the job within D

Resource Allocation for Single MR Job

Given number of map/reduce tasks

Find the value of SMJ, SR

J with minimum value of SM

J+ SRJ using Lagrange's multipliers

Statistics from job profile

Page 12: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

Outline•Introduction•Building block

▫Performance model for single MapReduce jobs

•Resource allocation for Pig programs

•Evaluation•Conclusion and ongoing work

Page 13: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

Performance Model for Pig Programs

•Let P = {J1, J2,….JN } , extract the job profile of each job contained in P▫Assign unique name for each job within a

program•The program completion time sum of

the completion time of all the jobs contained in P

Ni iP TT

1

Page 14: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

•Possible strategy: find out an appropriate pair of map and reduce slots for each job in the program

•Problem: difficult to implement and manage by the scheduler

NNN

R

N

N

M

N

RM

RM

dC SB

SA

dC SB

SA

dC SB

SA

222

2

2

2

111

1

1

1

Dd

Ni i 1

Resource Allocation for Pig Programs

with

Page 15: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

Resource Allocation for Pig Programs

•A simpler and more elegant solution▫Allocate the same set of resource to the

entire program instead of to each job•Rewrite the previous equations into

DSS

TNi

NiNi

iPR

iPM

iP C

BA

1

11

Find the minimum set of map and reduce slots

( SMP , SR

P ) for the entire Pig program

Page 16: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

Experiment Setup•66 nodes cluster in 2 racks

▫4 AMD 2.39GHz cores▫8 GB RAM, ▫two 160GB hard disks

•Configuration▫1 jobtracker, 1 namenode, 64 worker

nodes▫2 map slots and 1 reduce slot for each

node

Page 17: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

Benchmark•Pigmix benchmark

▫17 programs▫8 tables as the input data

•Dataset▫Test dataset

Generated with the Pig mix data generator Total size around 1TB.

▫Experimental dataset Same layout as the test dataset 20% larger in size

Page 18: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

Model Accuracy•How well of our performance model

captures Pig program completion time?

Normalized results for predicted and measured completion time

Page 19: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

Meeting Deadlines•Are we meeting deadlines with our

resource allocation mode?

Pigmix executed on experimental data set : do we meet deadlines?

Page 20: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

Conclusion•Conclusion

▫The performance model can accurately estimate the completion time of MapReduce workflow

▫Enables automatic resource provisioning for MapReduce workflow with deadlines

•Ongoing work▫Refine the performance model for workflow with

concurrent jobs▫Incorporating failure scenarios in the current

model

Page 21: Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

Thank you