The Datacenter Needs an Operating...

45
The Datacenter Needs an Operating System UC BERKELEY Anthony D. Joseph LASER Summer School September 2013

Transcript of The Datacenter Needs an Operating...

Page 1: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

The Datacenter Needs an Operating System

UC  BERKELEY  

Anthony D. Joseph

LASER Summer School September 2013

Page 2: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

My Talks at LASER 2013

1.  AMP Lab introduction

2.  The Datacenter Needs an Operating System

3.  Mesos, part one

4.  Dominant Resource Fairness

5.  Mesos, part two

6.  Spark 2

Page 3: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Collaborators

•  Matei Zaharia

•  Benjamin Hindman

•  Andy Konwinski

•  Ali Ghodsi

•  Randy Katz

•  Scott Shenker

•  Ion Stoica 3

Page 4: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Machines: Background

Clusters of commodity servers have become a major computing platform in industry and academia (100’s – 10,000’s of machines)

Driven by data volumes outpacing the processing capabilities of single machines – big data and science

Democratized by cloud computing

4

Page 5: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Machines: Background Some have declared that “the datacenter is the new computer”

Our claim: this new computer increasingly needs an operating system

Not necessarily a new host OS, but a common software layer that manages resources and provides shared services for the whole datacenter, like an OS does for one host

5

Page 6: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Why Datacenters need an OS Growing diversity of applications » Computing frameworks: MapReduce, Dryad, Pregel,

Percolator, Dremel, MR Online, Spark » Storage systems: GFS, BigTable, Dynamo, SCADS » Web apps and supporting services

Dryad

Pregel

Cassandra Hypertable

6

Page 7: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Why Datacenters need an OS Growing diversity of applications » Computing frameworks: MapReduce, Dryad, Pregel,

Percolator, Dremel, MR Online, Spark » Storage systems: GFS, BigTable, Dynamo, SCADS » Web apps and supporting services

Growing diversity of users » 200+ Hive users at Facebook, ���

running near-interactive ���ad hoc queries

Same reasons computers���needed one!

7

Page 8: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

What Operating Systems Provide

Resource Sharing

Data Sharing Programming Abstractions

Debugging & Monitoring

time-sharing, virtual memory, …

ptrace, DTrace, top, …

files, pipes, IPC, … libraries, languages

8

Page 9: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

What Operating Systems Provide

Resource Sharing

Data Sharing Programming Abstractions

Debugging & Monitoring

time-sharing, virtual memory, …

ptrace, DTrace, top, …

files, pipes, IPC, … libraries, languages

Most importantly: enables a highly interoperable software ecosystem

that we now take for granted

9

Page 10: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Example

A scientist analyzing data on one machine can pipe it through a variety of tools, write new tools that interface with these through standard APIs, and trace across the stack

In the future, the scientist should be able to launch a cluster on EC2 and do the same things: » Mix and combine a variety of apps & programming models » Write new parallel programs that talk to these » Get a unified interface for managing the cluster » Debug and trace across all these components

10

Page 11: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Today’s Datacenter OS

Hadoop MapReduce as common execution and resource sharing platform » Means jobs have to compile to MapReduce » Inter-user resource sharing, but at the level of MR jobs

Hadoop InputFormat API for data sharing – what happens with the next hot platform after Hadoop?

11

Page 12: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Today’s Datacenter OS

Abstractions for productivity programmers, but not for system builders

Difficult to debug, especially across layers

Other examples: » Amazon/Azure services » Google internal stack and Google Compute Engine » Hadoop YARN

12

Page 13: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Today’s Datacenter OS

Abstractions for productivity programmers, but not for system builders

Difficult to debug, especially across layers

Other examples: » Amazon/Azure services » Google internal stack and Google Compute Engine » Hadoop YARN

The problems motivating a datacenter OS are well recognized, but solutions are narrowly targeted

Can researchers take a longer-term view?

13

Page 14: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Tomorrow’s Datacenter OS

Resource Sharing

Data Sharing Programming Abstractions

Debugging & Monitoring

time-sharing, virtual memory, …

ptrace, DTrace, top, …

files, pipes, IPC, … libraries, languages

14

Page 15: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Resource Sharing To solve these interaction problems we would like to have a computer made simultaneously available to many users in a manner somewhat like a telephone exchange. Each user would be able to use a console at his own pace and without concern for the activity of others using the system.”

– Fernando J. Corbató, 1962

15

Page 16: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Today’s Resource Sharing Today, cluster apps are built to run independently���and assume they own a fixed set of nodes

Result: inefficient static partitioning

What’s the right interface for dynamic sharing?

0%  17%  33%  

0%  17%  33%  

0%  17%  33%   0%  

50%  

100%  App 1

App 2

App 3 16

Page 17: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Tomorrow’s Datacenter OS

Resource sharing: » Lower-level interfaces for fine-grained sharing – Mesos

and Hadoop YARN are first steps in this direction » Optimization for a variety of metrics (e.g., energy) » Integration with network scheduling mechanisms (e.g.,

Seawall [NSDI ‘11], NOX, Orchestra) » Others: Azure Fabric Controller

17

Page 18: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Tomorrow’s Datacenter OS

Resource Sharing

Data Sharing Programming Abstractions

Debugging & Monitoring

time-sharing, virtual memory, …

ptrace, DTrace, top, …

files, pipes, IPC, … libraries, languages

18

Page 19: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Tomorrow’s Datacenter OS

Persistent data sharing – many design issues addressed » Placement/Locality » Reliability » Availability » Consistency » Bandwidth/Latency » Software versioning

19

Page 20: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Tomorrow’s Datacenter OS

Persistent data sharing: » Standard interfaces for cluster file systems, key-value

stores, etc. » Lineage instead of replication for reliability (Spark RDDs) » Application frameworks self-manage versioning

Many possibilities: » Amazon Elastic Block Store and S3 » HDFS » Azure storage services

20

Page 21: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Tomorrow’s Datacenter OS

Transient data sharing – many design issues addressed » Failures on either side » Consistency » Timeliness

21

Page 22: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Tomorrow’s Datacenter OS

Transient data sharing: » In-memory data sharing (e.g. Spark, DFS cache), and a

unified system to manage this memory – DFS cache for MapReduce cluster could serve 90% of jobs at Facebook (HotOS ’11) » Streaming data abstractions (analogous to pipes)

Many possibilities: » Amazon/Azure message queues » Percolator

22

Page 23: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Tomorrow’s Datacenter OS

Resource Sharing

Data Sharing Programming Abstractions

Debugging & Monitoring

time-sharing, virtual memory, …

ptrace, DTrace, top, …

files, pipes, IPC, … libraries, languages

23

Page 24: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Tomorrow’s Datacenter OS

Programming abstractions: » Many new distributed application programming models,

abstractions, and languages » Tools for programming for distributed coordination and

fault-tolerance (e.g., Apache Zookeeper) » New tools that can be used to build the next

MapReduce / BigTable in a week (e.g., BOOM) » Efficient implementations of communication primitives

(e.g. shuffle, broadcast)

24

Page 25: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Tomorrow’s Datacenter OS

Resource Sharing

Data Sharing Programming Abstractions

Debugging & Monitoring

time-sharing, virtual memory, …

ptrace, DTrace, top, …

files, pipes, IPC, … libraries, languages

25

Page 26: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Tomorrow’s Datacenter OS

Debugging and Monitoring facilities: » Tracing and debugging tools that work across the cluster

software stack (e.g. X-Trace, Dapper, Magpie, Hystrix) » Replay debugging that takes advantage of limited

languages / computational models » Unified monitoring infrastructure and APIs (e.g., Hystrix)

26

Page 27: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Putting it Together

A successful datacenter OS might let users: » Build a Hadoop-like software stack in a week using the

OS’s APIs, while gaining other benefits (e.g. cross-stack replay debugging) » Share data efficiently between independently written

apps and programming frameworks » Understand cluster behavior without having to log into

individual nodes » Dynamically share the cluster with other users

27

Page 28: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

How Researchers can Help Focus on paradigms, not performance » Industry is tackling performance but lacks luxury to take

long-term view towards abstractions

Explore clean-slate approaches » Likelier to have greater impact here than in a “real” OS

because datacenter software changes quickly!

Bring cluster computing to non-experts » Most impactful (datacenter as the new workstation) » Much harder and more rewarding than big users

28

Page 29: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Berkeley Data Analytics Stack

Apache Spark

Shark BlinkDB

SQL

HDFS / Hadoop Storage / Tachyon

Apache Mesos / YARN Resource Manager

Spark Streaming

GraphX MLBase

29

Page 30: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Apache Mesos – Cluster Operating System

Efficiently shares resources among diverse parallel

applications Mesos  slave  

Mesos  master  

Dryad  scheduler  

Mesos  slave  

Hadoop  executor  

task  

Mesos  slave  

Dryad  executor  

task  

MPI  scheduler  

MPI  executor  

task  

Hadoop  scheduler  

Dryad  executor  

task  

MPI  executor  

task  

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

100%

1 31 61 91 121 151 181 211 241 271 301 331

Shar

e of

Clu

ster

Time (s)

MPI

Hadoop

Spark

30

Page 31: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Machines Make datacenter a real computer!

Node OS (e.g. Linux)

Node OS (e.g. Windows)

Node OS (e.g. Linux)

Datacenter “OS” (e.g., Apache Mesos)

•  Share datacenter between multiple cluster computing apps •  Provide new abstractions and services

AMP stack

Existing stack

31

Page 32: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Machines Make datacenter a real computer!

Node OS (e.g. Linux)

Node OS (e.g. Windows)

Node OS (e.g. Linux)

Datacenter “OS” (e.g., Apache Mesos)

Had

oop

MPI

Hyp

ertb

ale

Cas

sand

ra

Hive Support existing cluster computing apps

AMP stack

Existing stack

32

Page 33: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Machines Make datacenter a real computer!

33

Node OS (e.g. Linux)

Node OS (e.g. Windows)

Node OS (e.g. Linux)

Spar

k SCADS

Datacenter “OS” (e.g., Apache Mesos)

Had

oop

MPI

Hyp

ertb

ale

Cas

sand

ra

Hive PIQL

Support interactive and iterative data analysis (e.g., ML algorithms)

Consistency adjustable data store

Predictive & insightful query language

AMP stack

Existing stack

Page 34: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Machines Make datacenter a real computer!

Node OS (e.g. Linux)

Node OS (e.g. Windows)

Node OS (e.g. Linux)

Spar

k SCADS

Datacenter “OS” (e.g., Apache Mesos)

Applications, tools

Had

oop

MPI

Hyp

ertb

ale

Cas

sand

ra

Hive PIQL •  Advanced ML algorithms •  Interactive data mining •  Collaborative visualization

AMP stack

Existing stack

34

Page 35: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Milestones

2010: Mesos in Apache incubator

2010: Spark open sourced

2012: Shark (SQL) open sourced

Feb 2013: Spark Streaming alpha open sourced

Mar 2013: Tachyon alpha open sourced

Jun 2013: Spark entered Apache Incubator

Aug 2013: Machine Learning library for Spark 35

Page 36: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

BDAS Users���(partial list)

36

Page 37: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

37

BDAS Buzz

Page 38: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Big Data Landscape – Our Corner

38

Page 39: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

MLbase Meet Up at Twitter ���(13 Aug 2013)

39

Page 40: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

BDAS Contributors

70+ public contributors on GitHub » US, China, India, UK, Canada, Vietnam » Startups and large multinationals: Intel, Yahoo, Ooyala,

Quantifind, ClearStory, Palantir, Foursquare, Groupon …

40

Page 41: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Researchers Using BDAS

UC Berkeley

IBM Almaden

Cornell

Duke

Tsinghua

Purdue

41

Page 42: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

What is fueling the traction?

Superior technologies J » Fast and expressive » It works!

Integration with existing Hadoop ecosystem » HDFS » HBase » Hive

42

Page 43: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

BDAS Future Directions

Future data analytics need to support » Fast SQL » Approximate queries » Machine learning » GraphX » Streaming » Crowdsourcing!!!

Mix and match all of the above

http://ampcamp.berkeley.edu/3/ 43

Page 44: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

Conclusion

Datacenters need an OS-like software stack for same reasons as single computers: manageability, efficiency, programmability, and thriving software ecosystem

Multiple DCOS already emerging in ad-hoc ways

Researchers can help by taking a long-term systems view towards these problems

44

Page 45: The Datacenter Needs an Operating Systemlaser.inf.ethz.ch/2013/material/joseph/LASER-Joseph-2.pdf · Apache Mesos – Cluster Operating System! Efficiently shares resources among

My Talks at LASER 2013

1.  AMP Lab introduction

2.  The Datacenter Needs an Operating System

3.  Mesos, part one

4.  Dominant Resource Fairness

5.  Mesos, part two

6.  Spark 45