UC#BERKELEY# Mesos: A Platform for Fine-...

26
Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (I) UC BERKELEY Anthony D. Joseph LASER Summer School September 2013

Transcript of UC#BERKELEY# Mesos: A Platform for Fine-...

Page 1: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Mesos: A Platform for Fine-Grained Resource Sharing

in Data Centers (I)

UC  BERKELEY  

Anthony D. Joseph

LASER Summer School September 2013

Page 2: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

My Talks at LASER 2013

1.  AMP Lab introduction

2.  The Datacenter Needs an Operating System

3.  Mesos, part one

4.  Dominant Resource Fairness

5.  Mesos, part two

6.  Spark 2

Page 3: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Collaborators

•  Matei Zaharia

•  Benjamin Hindman

•  Andy Konwinski

•  Ali Ghodsi

•  Randy Katz

•  Scott Shenker

•  Ion Stoica 3

Page 4: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Modern Data Center Paradigm Commodity machines (100’s – 10,000’s of machines) » Attached storage devices

Data distributed and replicated across nodes » Data locality to computation matters

Solution: Use a datacenter computing framework » Divide jobs into smaller tasks, so that jobs can take turns

accessing each node and ideally locally accessing data » Tasks are both fine-grained in time (short) and space (use

fraction of a machine)

4

Page 5: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Rapid innovation in datacenter computing frameworks

No single framework optimal for all applications

Want to run multiple frameworks in a single datacenter »  …to maximize utilization »  …to share data between frameworks

Pig

Datacenter Scheduling Problem

Dryad

Pregel

Percolator

CIEL

5

Page 6: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Hadoop

Pregel

MPI Shared cluster

Today: static partitioning Dynamic sharing

Where We Want to Go

6

Page 7: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Solution: Apache Mesos

Mesos

Node Node Node Node

Hadoop Pregel …

Node Node

Hadoop

Node Node

Pregel

Mesos is a common resource sharing layer over which diverse frameworks can run

Run multiple instances of the same framework »  Isolate production and experimental jobs » Run multiple versions of the framework concurrently

Build specialized frameworks targeting particular problem domains » Better performance than general-purpose abstractions

http://mesos.apache.org/ 7

Page 8: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Mesos Goals

High utilization of resources

Support diverse frameworks (current & future)

Scalability to 10,000’s of nodes

Reliability in face of failures

Resulting design: Small microkernel-like core that pushes scheduling logic to frameworks

8

Page 9: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Previous Approaches Locality less important (HPC & grid computing) » Expensive, dedicated storage (SANs, Parallel FS) » Expensive, high speed networks (Infiniband)

Fine-grained task model infeasible » Ad-hoc programs (many barriers, tight message passing) » Legacy programs (Fortran77)

Approach taken: coarse-grained sharing »  Job specifies number of machines and amount of time needed » Scheduler queues job and allocates all machines at the same time

9

Page 10: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Mesos Design Elements

Fine-grained sharing: » Allocation at the level of tasks within a job » Improves utilization, latency, and data locality

Resource offers: » Simple, scalable application-controlled scheduling

mechanism

10

Page 11: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Element 1: Fine-Grained Sharing

Framework 1

Framework 2

Framework 3

Coarse-Grained Sharing (HPC): Fine-Grained Sharing (Mesos):

+ Improved utilization, responsiveness, data locality

Storage System (e.g. HDFS) Storage System (e.g. HDFS)

Fw. 1

Fw. 1 Fw. 3

Fw. 3 Fw. 2 Fw. 2

Fw. 2

Fw. 1

Fw. 3

Fw. 2 Fw. 3

Fw. 1

Fw. 1 Fw. 2 Fw. 2

Fw. 1

Fw. 3 Fw. 3

Fw. 3

Fw. 2

Fw. 2

11

Page 12: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Element 2: Resource Offers

Option: Global scheduler » Frameworks express needs in a specification language,

global scheduler matches them to resources

+ Can make optimal decisions

– Complex: language must support all framework needs – Difficult to scale and to make robust

– Future frameworks may have unanticipated needs

12

Page 13: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Element 2: Resource Offers

Mesos: Resource offers » Offer available resources to frameworks, let them pick

which resources to use and which tasks to launch ���

+  Keeps Mesos simple, lets it support future frameworks -  Decentralized decisions might not be optimal

13

Page 14: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Machines Make datacenter a real computer!

14

Node OS (e.g. Linux)

Node OS (e.g. Windows)

Node OS (e.g. Linux)

Spar

k SCADS

Datacenter “OS” (e.g., Apache Mesos)

Had

oop

MPI

Hyp

ertb

ale

Cas

sand

ra

Hive PIQL

Support interactive and iterative data analysis (e.g., ML algorithms)

Consistency adjustable data store

Predictive & insightful query language

AMP stack

Existing stack

Page 15: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Allocation Policies

Mesos controls how many resources each framework can get, but not which resources

Allocation policies are pluggable to suit organization needs

15

Page 16: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Example: Hierarchical Fair Sharing

Facebook.com  

Spam   Ads  

Job  3  

Job  2  

User  1  

Job  1  

User  2  

Job  4  

100%  

0%  

20%  

40%  

60%  

80%  

100%  

0   1   2   3  Time  

Cluster  Utilization  

Curr  Time  

80%  20%    

30%  

70%  User  1  User  2  

Cluster  Share  Policy  

20%  

80%  

Spam  Dept.  

Ads  Dept.  

20%   14%  100%  

Curr  Time  

6%  

Curr  Time  

0%  

70%  30%  

16

Page 17: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Mesos Architecture Slave 1

Hadoop Executor MPI executor

Slave 2 Hadoop Executor

Slave 3

Mesos Master Allocation Module

Framework Scheduler Hadoop

JobTracker

Framework Scheduler MPI

Scheduler

Resource offer Status

Slaves send status updates about

available resources

Pluggable policy picks which framework to offer resources to

Framework scheduler selects resources

and provides tasks

Framework executors run tasks and may persist across tasks

Launch Hadoop task 2

task 1

task 2

task 1

17

Page 18: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Resource Offer Details A resource offer is a set of machine-resource tuples »  { [m1, 1 CPU, 1GB], [m2, 4 CPU, 16GB] }

Resource offers count towards a frameworks share » Rescinded after a time out (incentive to reply fast)

Optimizations » Frameworks indicate interest to get offers » Frameworks can set filters to automatically filter out certain

nodes or nodes with too few resources

18

Page 19: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Dynamic Resource Sharing

0  

0.1  

0.2  

0.3  

0.4  

0.5  

0.6  

0.7  

0.8  

0.9  

1  

1   101   201   301   401   501   601   701   801   901   1001  

Clus

ter  U

tiliz

ation  

Time  (seconds)  

Torque   Hadoop  Instance  1   Hadoop  Instance  2   Hadoop  Instance  3  

19

Page 20: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Which Offers to Accept?

Delay scheduling » Initially only accept preferred (e.g., local) resources » Accept any resource after timeout (1-5 seconds)

Can achieve near optimal locality

20

Page 21: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Multiple Hadoops Experiment

Hadoop1

Hadoop 2

Hadoop 3

Storage System (e.g. HDFS) Storage System (e.g. HDFS)

Hadoop 1

Hadoop 1 Hadoop 3

Hadoop 3 Hadoop 2 Hadoop 2

Hadoop 2

Fw. 1

Hadoop 3

Fw. 2 Hadoop 3

Hadoop 1

Hadoop 1 Hadoop 2 Hadoop 2

Hadoop 1

Hadoop 3 Hadoop 3

Hadoop 3

Hadoop 2

Hadoop 2

21

Page 22: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Data Locality on Mesos 16 Hadoop MapReduce instances over shared file system

22

0%  

20%  

40%  

60%  

80%  

100%  

Static  partitioning   Mesos,  no  delay  sched.   Mesos,  1s  delay  sched.   Mesos,  5s  delay  sched.  

Loca

l  Map

 Tas

ks  (%

)  

0  

100  

200  

300  

400  

500  

600  

Static  partitioning   Mesos,  no  delay  sched.   Mesos,  1s  delay  sched.   Mesos,  5s  delay  sched.  

Job  Run

ning

 Tim

e  (s)  

Page 23: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Some Related Datacenter Resource Managers

Hadoop YARN » Open-source follow-on to Hadoop with pluggable

allocation policies » Primary focus is Hadoop jobs

Google’s Omega resource manager » Closed-source follow-on to original resource manager » Framework-specific schedulers use optimistic

concurrency model – all compete simultaneously to select resources

23

Page 24: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

In Mesos Part II Lecture

Implementation Details and Supported Frameworks

Isolation

Handling Mesos Master Failure

Resource Revocation

Scalability

Results and Macrobenchmarks

24

Page 25: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

Summary (Part One)

Mesos is a platform for sharing data centers among diverse cluster computing frameworks » Enables efficient fine-grained sharing » Gives frameworks control over scheduling » Supports current and future frameworks » Achieves high utilization

25

Page 26: UC#BERKELEY# Mesos: A Platform for Fine- …laser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-3.pdfMesos: A Platform for Fine-Grained Resource Sharing in Data Centers (I)! UC#BERKELEY#

My Talks at LASER 2013

1.  AMP Lab introduction

2.  The Datacenter Needs an Operating System

3.  Mesos, part one

4.  Dominant Resource Fairness

5.  Mesos, part two

6.  Spark 26