Benjamin Hindman – @benh Apache Mesos Design Decisions mesos. @ApacheMesos

download Benjamin Hindman – @benh Apache Mesos Design Decisions mesos.  @ApacheMesos

If you can't read please download the document

  • date post

    25-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    1

Embed Size (px)

Transcript of Benjamin Hindman – @benh Apache Mesos Design Decisions mesos. @ApacheMesos

  • Slide 1
  • Benjamin Hindman @benh Apache Mesos Design Decisions mesos.apache.org @ApacheMesos
  • Slide 2
  • this is not a talk about YARN
  • Slide 3
  • at least not explicitly!
  • Slide 4
  • this talk is about Mesos!
  • Slide 5
  • a little history Mesos started as a research project at Berkeley in early 2009 by Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, Ion Stoica
  • Slide 6
  • our motivation increase performance and utilization of clusters
  • Slide 7
  • our intuition static partitioning considered harmful
  • Slide 8
  • static partitioning considered harmful datacenter
  • Slide 9
  • static partitioning considered harmful
  • Slide 10
  • Slide 11
  • Slide 12
  • faster!
  • Slide 13
  • higher utilization! static partitioning considered harmful
  • Slide 14
  • our intuition build new frameworks
  • Slide 15
  • Map/Reduce is a big hammer, but not everything is a nail!
  • Slide 16
  • Apache Mesos is a distributed system for running and building other distributed systems
  • Slide 17
  • Mesos is a cluster manager
  • Slide 18
  • Mesos is a resource manager
  • Slide 19
  • Mesos is a resource negotiator
  • Slide 20
  • Mesos replaces static partitioning of resources to frameworks with dynamic resource allocation
  • Slide 21
  • Mesos is a distributed system with a master/slave architecture masters slaves
  • Slide 22
  • frameworks register with the Mesos master in order to run jobs/tasks masters slaves frameworks
  • Slide 23
  • frameworks can be required to authenticate as a principal masters SASL CRAM-MD5 secret mechanism (Kerberos in development) framework masters initialized with secrets
  • Slide 24
  • Mesos @Twitter in early 2010 goal: run long-running services elastically on Mesos
  • Slide 25
  • Apache Aurora (incubating) masters Aurora is a Mesos framework that makes it easy to launch services written in Ruby, Java, Scala, Python, Go, etc!
  • Slide 26
  • masters Storm, Jenkins,
  • Slide 27
  • a lot of interesting design decisions along the way
  • Slide 28
  • many appear (IMHO) in YARN too
  • Slide 29
  • design decisions two-level scheduling and resource offers fair-sharing and revocable resources high-availability and fault-tolerance execution and isolation C++
  • Slide 30
  • design decisions two-level scheduling and resource offers fair-sharing and revocable resources high-availability and fault-tolerance execution and isolation C++
  • Slide 31
  • frameworks get allocated resources from the masters masters framework resources are allocated via resource offers a resource offer represents a snapshot of available resources (one offer per host) that a framework can use to run tasks offer hostname 4 CPUs 4 GB RAM
  • Slide 32
  • frameworks use these resources to decide what tasks to run masters framework a task can use a subset of an offer task 3 CPUs 2 GB RAM
  • Slide 33
  • Mesos challenged the status quo of cluster managers
  • Slide 34
  • cluster manager status quo cluster manager application specification the specification includes as much information as possible to assist the cluster manager in scheduling and execution
  • Slide 35
  • cluster manager status quo cluster manager application wait for task to be executed
  • Slide 36
  • cluster manager status quo cluster manager application result
  • Slide 37
  • problems with specifications hard to specify certain desires or constraints hard to update specifications dynamically as tasks executed and finished/failed
  • Slide 38
  • an alternative model masters framework request 3 CPUs 2 GB RAM a request is purposely simplified subset of a specification, mainly including the required resources
  • Slide 39
  • question: what should Mesos do if it cant satisfy a request?
  • Slide 40
  • wait until it can
  • Slide 41
  • question: what should Mesos do if it cant satisfy a request? wait until it can offer the best it can immediately
  • Slide 42
  • question: what should Mesos do if it cant satisfy a request? wait until it can offer the best it can immediately
  • Slide 43
  • an alternative model masters framework offer hostname 4 CPUs 4 GB RAM
  • Slide 44
  • offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM an alternative model masters framework offer hostname 4 CPUs 4 GB RAM
  • Slide 45
  • offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM an alternative model masters framework offer hostname 4 CPUs 4 GB RAM framework uses the offers to perform its own scheduling
  • Slide 46
  • an analogue: non-blocking sockets kernel application write(s, buffer, size);
  • Slide 47
  • an analogue: non-blocking sockets kernel application 42 of 100 bytes written!
  • Slide 48
  • resource offers address asynchrony in resource allocation
  • Slide 49
  • IIUC, even YARN allocates the best it can to an application when it cant satisfy a request
  • Slide 50
  • requests are complimentary (but not necessary)
  • Slide 51
  • offers represent the currently available resources a framework can use
  • Slide 52
  • question: should resources within offers be disjoint?
  • Slide 53
  • masters framework1framework2 offer hostname 4 CPUs 4 GB RAM offer hostname 4 CPUs 4 GB RAM
  • Slide 54
  • concurrency control optimisticpessimistic
  • Slide 55
  • concurrency control optimisticpessimistic all offers overlap with one another, thus causing frameworks to compete first-come-first-served
  • Slide 56
  • concurrency control optimisticpessimistic offers made to different frameworks are disjoint
  • Slide 57
  • Mesos semantics: assume overlapping offers
  • Slide 58
  • design comparison: Googles Omega
  • Slide 59
  • the Omega model database framework snapshot a framework gets a snapshot of the cluster state from a database (note, does not make a request!)
  • Slide 60
  • the Omega model database framework transaction a framework submits a transaction to the database to acquire resources (which it can then use to run tasks) failed transactions occur when another framework has already acquired sought resources
  • Slide 61
  • isomorphism?
  • Slide 62
  • observation: snapshots are optimistic offers
  • Slide 63
  • Omega and Mesos database framework snapshot masters framework offer hostname 4 CPUs 4 GB RAM
  • Slide 64
  • Omega and Mesos database framework transaction masters framework task 3 CPUs 2 GB RAM
  • Slide 65
  • thought experiment: whats gained by exploiting the continuous spectrum of pessimistic to optimistic? optimisticpessimistic
  • Slide 66
  • design decisions two-level scheduling and resource offers fair-sharing and revocable resources high-availability and fault-tolerance execution and isolation C++
  • Slide 67
  • Mesos allocates resources to frameworks using a fair-sharing algorithm we created called Dominant Resource Fairness (DRF)
  • Slide 68
  • DRF, born of static partitioning datacenter
  • Slide 69
  • static partitioning across teams promotionstrends recommendations team
  • Slide 70
  • promotionstrends recommendations team fairly shared! static partitioning across teams
  • Slide 71
  • goal: fairly share the resources without static partitioning
  • Slide 72
  • partition utilizations promotionstrends recommendations 45% CPU 100% RAM 75% CPU 100% RAM 100% CPU 50% RAM team utilization
  • Slide 73
  • observation: a dominant resource bottlenecks each team from running any more jobs/tasks
  • Slide 74
  • dominant resource bottlenecks promotionstrends recommendations team utilization bottleneckRAM 45% CPU 100% RAM 75% CPU 100% RAM 100% CPU 50% RAM RAMCPU
  • Slide 75
  • insight: allocating a fair share of each teams dominant resource guarantees they can run at least as many jobs/tasks as with static partitioning!
  • Slide 76
  • if my team gets at least 1/N of my dominant resource I will do no worse than if I had my own cluster, but I