Download - Datacenter Management with Apache Mesos

Transcript
Page 1: Datacenter Management with Apache  Mesos

Benjamin Hindman – @benh

Datacenter Management with Apache Mesosmesos.apache.org@ApacheMesos

Page 2: Datacenter Management with Apache  Mesos

I’ve got tons of data ...

Page 3: Datacenter Management with Apache  Mesos

… more everyday!

Page 4: Datacenter Management with Apache  Mesos

That must be why they call it a datacenter.

Page 5: Datacenter Management with Apache  Mesos

I’d love to answer some questions with the help

of my data!

Page 6: Datacenter Management with Apache  Mesos

I think I’ll try Hadoop.

Page 7: Datacenter Management with Apache  Mesos

your datacenter

Page 8: Datacenter Management with Apache  Mesos

+ Hadoop

Page 9: Datacenter Management with Apache  Mesos

happy?

Page 10: Datacenter Management with Apache  Mesos

Not exactly …

Page 11: Datacenter Management with Apache  Mesos

… Hadoop is a big hammer, but not

everything is a nail!

Page 12: Datacenter Management with Apache  Mesos

I’ve got some iterative algorithms, I want to try

Spark!

Page 13: Datacenter Management with Apache  Mesos

datacenter management

Page 14: Datacenter Management with Apache  Mesos

datacenter management

Page 15: Datacenter Management with Apache  Mesos

datacenter management

Page 16: Datacenter Management with Apache  Mesos

static partitioning

Page 17: Datacenter Management with Apache  Mesos

Oh noes! Spark wants to read and write data to

HDFS!

Page 18: Datacenter Management with Apache  Mesos

Hadoop …

(map/reduce)

(distributed file system)

Page 19: Datacenter Management with Apache  Mesos

HDFS

Page 20: Datacenter Management with Apache  Mesos

HDFS

Page 21: Datacenter Management with Apache  Mesos

Could we just give Spark it’s own HDFS cluster

too?

Page 22: Datacenter Management with Apache  Mesos

HDFS

Page 23: Datacenter Management with Apache  Mesos

HDFS

Page 24: Datacenter Management with Apache  Mesos

HDFS

Page 25: Datacenter Management with Apache  Mesos

HDFS tee incoming data(2 copies)

Page 26: Datacenter Management with Apache  Mesos

HDFS tee incoming data(2 copies)

periodic copy/sync

Page 27: Datacenter Management with Apache  Mesos

That sounds annoying … let’s not do that. Can we do any better though?

Page 28: Datacenter Management with Apache  Mesos

HDFS

Page 29: Datacenter Management with Apache  Mesos

HDFS

Page 30: Datacenter Management with Apache  Mesos

HDFS

Page 31: Datacenter Management with Apache  Mesos

happy now?

Page 32: Datacenter Management with Apache  Mesos

No! We’ve decided to start doing real time

computation with Storm …

Page 33: Datacenter Management with Apache  Mesos

datacenter management

Page 34: Datacenter Management with Apache  Mesos

datacenter management

Page 35: Datacenter Management with Apache  Mesos

happy now!?

Page 36: Datacenter Management with Apache  Mesos

Not really … during the day I’d rather give more machines to Spark but at

night I’d rather give more machines to

Hadoop!

Page 37: Datacenter Management with Apache  Mesos

datacenter management

Page 38: Datacenter Management with Apache  Mesos

datacenter management

Page 39: Datacenter Management with Apache  Mesos

datacenter management

Page 40: Datacenter Management with Apache  Mesos

datacenter management

Page 41: Datacenter Management with Apache  Mesos
Page 42: Datacenter Management with Apache  Mesos

And failures require more datacenter management!

Page 43: Datacenter Management with Apache  Mesos

datacenter management

Page 44: Datacenter Management with Apache  Mesos

datacenter management

Page 45: Datacenter Management with Apache  Mesos

datacenter management

Page 46: Datacenter Management with Apache  Mesos
Page 47: Datacenter Management with Apache  Mesos
Page 48: Datacenter Management with Apache  Mesos
Page 49: Datacenter Management with Apache  Mesos
Page 50: Datacenter Management with Apache  Mesos

I don’t want to deal with this!

Page 51: Datacenter Management with Apache  Mesos

the datacenter …rather than think about the datacenter like this …

Page 52: Datacenter Management with Apache  Mesos

… is a computerthink about it like this …

Page 53: Datacenter Management with Apache  Mesos

datacenter computer

applications

resources

filesystem

Page 54: Datacenter Management with Apache  Mesos

mesosapplications

resources

filesystem

kernel

Page 55: Datacenter Management with Apache  Mesos

Okay, so how does it work?

Page 56: Datacenter Management with Apache  Mesos

Step 1: HDFS

Page 57: Datacenter Management with Apache  Mesos

Step 2: Mesosrun a “master” (or multiple for high availability)

Page 58: Datacenter Management with Apache  Mesos

Step 2: Mesosrun “slaves” on the rest of the machines

Page 59: Datacenter Management with Apache  Mesos

Step 3: Frameworks

Page 60: Datacenter Management with Apache  Mesos

Step 3: Frameworks

Page 61: Datacenter Management with Apache  Mesos

Step 3: Frameworks

Page 62: Datacenter Management with Apache  Mesos

Step 3: Frameworks

Page 63: Datacenter Management with Apache  Mesos

Step 3: Frameworks

Page 64: Datacenter Management with Apache  Mesos

Step 3: Frameworks

Page 65: Datacenter Management with Apache  Mesos

Step 3: Frameworks

Page 66: Datacenter Management with Apache  Mesos

Step 3: Frameworks

Page 67: Datacenter Management with Apache  Mesos

Step 3: Frameworks

Page 68: Datacenter Management with Apache  Mesos

Step 3: Frameworks

Page 69: Datacenter Management with Apache  Mesos

Step 3: Frameworks

Page 70: Datacenter Management with Apache  Mesos

Step 3: Frameworks

Page 71: Datacenter Management with Apache  Mesos

Step 3: Frameworks

Page 72: Datacenter Management with Apache  Mesos

tep 4: Profit$

Page 73: Datacenter Management with Apache  Mesos

tep 4: Profit (utilize)$

just one big pool of resources,utilize single machines more fully!

Page 74: Datacenter Management with Apache  Mesos

tep 4: Profit (utilize)$

Page 75: Datacenter Management with Apache  Mesos

tep 4: Profit (utilize)$

Page 76: Datacenter Management with Apache  Mesos

tep 4: Profit (utilize)$

Page 77: Datacenter Management with Apache  Mesos

tep 4: Profit (utilize)$

Page 78: Datacenter Management with Apache  Mesos

tep 4: Profit (utilize)$

Page 79: Datacenter Management with Apache  Mesos

tep 4: Profit(statistical multiplexing)$

Page 80: Datacenter Management with Apache  Mesos

tep 4: Profit(statistical multiplexing)$

Page 81: Datacenter Management with Apache  Mesos

tep 4: Profit(statistical multiplexing)$

Page 82: Datacenter Management with Apache  Mesos

tep 4: Profit(statistical multiplexing)$

Page 83: Datacenter Management with Apache  Mesos

tep 4: Profit(statistical multiplexing)$

Page 84: Datacenter Management with Apache  Mesos

tep 4: Profit(statistical multiplexing)$

reduces CapEx and OpEx!

Page 85: Datacenter Management with Apache  Mesos

tep 4: Profit(statistical multiplexing)$

reduces latency!

Page 86: Datacenter Management with Apache  Mesos

tep 4: Profit(statistical multiplexing)$

Page 87: Datacenter Management with Apache  Mesos

tep 4: Profit (failures)$

Page 88: Datacenter Management with Apache  Mesos

tep 4: Profit (failures)$

Page 89: Datacenter Management with Apache  Mesos

tep 4: Profit (failures)$

Page 90: Datacenter Management with Apache  Mesos

This sounds pretty good!

Page 91: Datacenter Management with Apache  Mesos

Other than Hadoop, Spark, and Storm, what

else can I run on Mesos?

Page 92: Datacenter Management with Apache  Mesos

frameworks• Hadoop (github.com/mesos/hadoop)• Spark (github.com/mesos/spark)• DPark (github.com/douban/dpark)• Storm (github.com/nathanmarz/storm)• Chronos (github.com/airbnb/chronos)• MPICH2 (in mesos git repository)• Aurora (proposed for Apache incubator)

Page 93: Datacenter Management with Apache  Mesos

What about XYZ?

Page 94: Datacenter Management with Apache  Mesos

port an existing frameworkstrategy: write a “wrapper” which launches existing components on mesos~100 lines of code to write a wrapper (the more lines, the more you can take advantage of elasticity or other mesos features)see src/examples/ in mesos repository

Page 95: Datacenter Management with Apache  Mesos

write a new framework!as a “kernel”, mesos provides a lot of primitives that make writing a new framework relatively easyprimitives: extracted commonality across existing distributed systems/frameworks (launching tasks, doing failure detection, etc) … why re-implement them each time!?

Page 96: Datacenter Management with Apache  Mesos

case study: chronosdistributed cron with dependencies

developed at airbnb~3k lines of Scala!distributed, highly available, and fault tolerant without any network programming!http://github.com/airbnb/chronos

Page 97: Datacenter Management with Apache  Mesos

Hmm … if Mesos gives me a datacenter

computer … can I run stuff other than

analytics?

Page 98: Datacenter Management with Apache  Mesos

case study: aurorarun N instances of my server, somewhere, forever

(where server == arbitrary command line)developed at Twitterruns hundreds of production services, including ads!recently proposed for Apache Incubator!

Page 99: Datacenter Management with Apache  Mesos

aurora

Page 100: Datacenter Management with Apache  Mesos

aurora

Page 101: Datacenter Management with Apache  Mesos

aurora

Page 102: Datacenter Management with Apache  Mesos

aurora

Page 103: Datacenter Management with Apache  Mesos

aurora

Page 104: Datacenter Management with Apache  Mesos

But what about resource isolation!? I don’t want

my end users to have to wait for our website to

load because of resource contention!

Page 105: Datacenter Management with Apache  Mesos

resource isolationLinux control groups (cgroups)

CPU (upper and lower bounds)memorynetwork I/O (traffic controller)filesystem (lvm, in progress)

Page 106: Datacenter Management with Apache  Mesos

conclusionsdatacenter management is a pain

Page 107: Datacenter Management with Apache  Mesos

conclusionsmesos makes running frameworks on your datacenter easier as well as increasing utilization and performance while reducing CapEx and OpEx!

Page 108: Datacenter Management with Apache  Mesos

conclusionsrather than build your next distributed system from scratch, consider using mesos

Page 109: Datacenter Management with Apache  Mesos

conclusionsyou can share your datacenter between analytics and online services!

Page 110: Datacenter Management with Apache  Mesos

Questions?mesos.apache.org@ApacheMesos

Page 111: Datacenter Management with Apache  Mesos

framework commonalityrun processes simultaneously (distributed)handle process failures (fault-tolerance)optimize execution (elasticity, scheduling)

Page 112: Datacenter Management with Apache  Mesos

primitivesscheduler – distributed system “master” or “coordinator”(executor – lower-level control of task execution, optional)requests/offers – resource allocationstasks – “threads” of the distributed system…

Page 113: Datacenter Management with Apache  Mesos

scheduler

ApacheHadoop

Chronos

Page 114: Datacenter Management with Apache  Mesos

scheduler(1) brokers for resources(2) launches tasks(3) handles task termination

Page 115: Datacenter Management with Apache  Mesos

brokering for resources(1) make resource requests 2 CPUs 1 GB RAM slave *(2) respond to resource offers 4 CPUs 4 GB RAM slave foo.bar.com

Page 116: Datacenter Management with Apache  Mesos

offers: non-blocking resource allocationexist to answer the question:“what should mesos do if it can’t satisfy a request?”

(1) wait until it can(2) offer the best allocation it can immediately

Page 117: Datacenter Management with Apache  Mesos

offers: non-blocking resource allocationexist to answer the question:“what should mesos do if it can’t satisfy a request?”

(1) wait until it can(2) offer the best allocation it can immediately

Page 118: Datacenter Management with Apache  Mesos

resource allocation

ApacheHadoop

Chronos

request

Page 119: Datacenter Management with Apache  Mesos

resource allocation

ApacheHadoop

Chronos

request

allocatordominant resource fairnessresource reservations

Page 120: Datacenter Management with Apache  Mesos

resource allocation

ApacheHadoop

Chronos

request

allocatordominant resource fairnessresource reservations

optimisticpessimistic

Page 121: Datacenter Management with Apache  Mesos

resource allocation

ApacheHadoop

Chronos

request

allocatordominant resource fairnessresource reservations

optimisticpessimisticno overlapping offers all overlapping offers

Page 122: Datacenter Management with Apache  Mesos

resource allocation

ApacheHadoop

Chronos

offer

allocatordominant resource fairnessresource reservations

Page 123: Datacenter Management with Apache  Mesos

“two-level scheduling”mesos: controls resource allocations to framework schedulersschedulers: make decisions about what to run given allocated resources

Page 124: Datacenter Management with Apache  Mesos

end-to-end principle“application-specific functions ought to reside in the end hosts of a network rather than intermediary nodes”

Page 125: Datacenter Management with Apache  Mesos

taskseither a concrete command line or an opaque description (which requires a framework executor to execute)

a consumer of resources

Page 126: Datacenter Management with Apache  Mesos

task operationslaunching/killinghealth monitoring/reporting (failure detection)resource usage monitoring (statistics)

Page 127: Datacenter Management with Apache  Mesos

resource isolationcgroup per executor or task (if no executor)

resource controls adjusted dynamically as tasks come and go!

Page 128: Datacenter Management with Apache  Mesos

case study: chronosdistributed cron with dependencies

built at airbnb by @flo

Page 129: Datacenter Management with Apache  Mesos

before chronos

Page 130: Datacenter Management with Apache  Mesos

before chronos

single point of failure (and AWS was unreliable)resource starved (not scalable)

Page 131: Datacenter Management with Apache  Mesos

chronos requirementsfault tolerancedistributed (elastically take advantage of resources)retries (make sure a command eventually finishes)dependencies

Page 132: Datacenter Management with Apache  Mesos

chronosleverages the primitives of mesos

~3k lines of scalahighly available (uses Mesos state)distributed / elasticno actual network programming!

Page 133: Datacenter Management with Apache  Mesos

after chronos

Page 134: Datacenter Management with Apache  Mesos

after chronos + hadoop

Page 135: Datacenter Management with Apache  Mesos

case study: aurora“run 200 of these, somewhere, forever”

built at Twitter

Page 136: Datacenter Management with Apache  Mesos

before aurorastatic partitioning of machines to serviceshardware outages caused site outagespuppet + monitops couldn’t scale as fast as engineers

Page 137: Datacenter Management with Apache  Mesos

aurorahighly available (uses mesos replicated log)uses a python DSL to describe servicesleverages service discovery and proxying (see Twitter commons)

Page 138: Datacenter Management with Apache  Mesos

after aurorapower loss to 19 racks, no lost services!more than 400 engineers running serviceslargest cluster has >2500 machines

Page 139: Datacenter Management with Apache  Mesos

Mesos

MesosNode Node Nod

e Node

Hadoop

Node Node Node Node

Spark

Node Node

MPI Storm

Node

Chronos

Page 140: Datacenter Management with Apache  Mesos

Mesos

MesosNode Node Nod

e Node

Hadoop

Node Node Node Node

Spark

Node Node

MPI

Node

Page 141: Datacenter Management with Apache  Mesos

Mesos

MesosNode Node Nod

e Node

Hadoop

Node Node Node Node

Spark

Node Node

MPI Storm

Node

Page 142: Datacenter Management with Apache  Mesos

Mesos

MesosNode Node Nod

e Node

Hadoop

Node Node Node Node

Spark

Node Node

MPI Storm

Node

Chronos …