Introduction to Apache Mesos

Click here to load reader

  • date post

  • Category


  • view

  • download


Embed Size (px)


Apache Mesos is a cluster manager that simplifies the complexity of running applications on a shared pool of servers.

Transcript of Introduction to Apache Mesos

  • 1.Apache Mesos An Introduction

2. Joe Stein Developer, Architect & Technologist Founder & Principal Consultant => Big Data Open Source Security LLC - Big Data Open Source Security LLC provides professional services and product solutions for the collection, storage, transfer, real-time analytics, batch processing and reporting for complex data streams, data sets and distributed systems. BDOSS is all about the "glue" and helping companies to not only figure out what Big Data Infrastructure Components to use but also how to change their existing (or build new) systems to work with them. Apache Kafka Committer & PMC member Blog & Podcast - Twitter @allthingshadoop 3. Overview Where Why What How Configurations Tools Frameworks 4. Origins Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center 2011 GAFS Omega John Wilkes Omega: exible, scalable schedulers for large compute clusters content/uploads/2013/paper/Schwarzkopf.pdf 5. Static Partitioning 6. Static Partitioning IS Bad hard to utilize machines (i.e., X GB RAM and Y CPUs) 7. Static Partitioning does NOT scale hard to scale elastically (to take advantage of statistical multiplexing) 8. Failures === Downtime hard to deal with failures 9. It doesnt have to be that way 10. Operating System === Datacenter 11. Mesos => data center kernel 12. Mesos is node abstraction 13. Containers 14. Master and Slave Configuration --ip=VALUE IP address to listen on --[no-]help Prints this help message (default: false) --log_dir=VALUE Location to put log files (no default, nothing is written to disk unless specified; does not affect logging to stderr) --logbufsecs=VALUE How many seconds to buffer log messages for (default: 0) --logging_level=VALUE Log message at or above this level; possible values: 'INFO', 'WARNING', 'ERROR'; if quiet flag is used, this will affect just the logs from log_dir (if specified) (default: INFO) --port=VALUE Port to listen on (default: 5051) --[no-]quiet Disable logging to stderr (default: false) --[no-]version Show version and exit. (default: false) 15. Master Configurations (required) --quorum=VALUE The size of the quorum of replicas when using 'replicated_log' based registry. It is imperative to set this value to be a majority of masters i.e., quorum > (number of masters)/2. --work_dir=VALUE Where to store the persistent information stored in the Registry. --zk=VALUE ZooKeeper URL (used for leader election amongst masters) May be one of: zk://host1 :port1,host2:port2,../path or zk://username:password @host1:port1,host2:port2,.../path file://path/to/file (where file contains one of the above) 16. Master Configurations (optional) Way too many for a slide 17. Slave Configuration (required) --master=VALUE May be one of: zk://host1:port1,host2:port2,.../path zk://username:password@host1:port1,host2:po rt2,.../path file://path/to/file (where file contains one of the above) 18. Slave Configuration (optional) Way too many for a slide 19. Organizations using Mesos (Public) Airbnb Conviva Ignidata MediaCrossing Pinkbike UCSF Virdata Atigeo eBay iQIYI Mesosphere Sharethrough UC Berkeley Yieldbot Atlassian Devicescape Magine TV Netflix Sigmoid URX Xogito Categorize DueDil Medidata OpenTable SiQueries Viadeo CloudPhysics HubSpot Meemo PayPal Twitter Vimeo 20. Tools to setup and run Mesos collectd plugin to collect Mesos cluster metrics. Deploy scripts for launching a Mesos cluster on a set of machines. EC2 scripts for launching a Mesos cluster on Amazon EC2. Chef cookbook by Everpeace Install Mesos and configure master and slave. This cookbook supports installation from source or the Mesosphere packages. Chef cookbook by Mdsol Application cookbook for installing the Apache Mesos cluster manager. This cookbook installs Mesos via packages provided by Mesosphere. Puppet Module by Deric This is a Puppet module for managing Mesos nodes in a cluster. Vagrant setup by Everpeace Spin up your Mesos Cluster with Vagrant! Vagrant setup by Mesosphere Quickly build Mesos sandbox environments using Vagrant. 21. Sample Frameworks C++ - Java - Python - Scala - Go - 22. Scheduler /** * Invoked when the scheduler successfully registers with a Mesos * master. A unique ID (generated by the master) used for * distinguishing this framework from others and MasterInfo * with the ip and port of the current master are provided as arguments. */ def registered( driver: SchedulerDriver, frameworkId: FrameworkID, masterInfo: MasterInfo): Unit = {"Scheduler.registered")"FrameworkID:n%s" format frameworkId)"MasterInfo:n%s" format masterInfo) } 23. Scheduler /** * Invoked when the scheduler re-registers with a newly elected Mesos master. * This is only called when the scheduler has previously been registered. * MasterInfo containing the updated information about the elected master * is provided as an argument. */ def reregistered( driver: SchedulerDriver, masterInfo: MasterInfo): Unit = {"Scheduler.reregistered")"MasterInfo:n%s" format masterInfo) } 24. Scheduler /**Invoked when resources have been offered to this framework. A single offer will only contain resources from a single slave. Resources associated with an offer will not be re-offered to _this_ framework until either (a) this framework has rejected those resources or (b) those resources have been rescinded.Note that resources may be concurrently offered to more than one framework at a time (depending on the allocator being used). In that case, the first framework to launch tasks using those resources will be able to use them while the other frameworks will have those resources rescinded (or if a framework has already launched tasks with those resources then those tasks will fail with a TASK_LOST status and a message saying as much).*/ def resourceOffers( driver: SchedulerDriver, offers: JList[Offer]): Unit = {"Scheduler.resourceOffers") // print and decline all received offers offers foreach { offer => driver declineOffer offer.getId } } 25. Scheduler /** * Invoked when an offer is no longer valid (e.g., the slave was * lost or another framework used resources in the offer). If for * whatever reason an offer is never rescinded (e.g., dropped * message, failing over framework, etc.), a framwork that attempts * to launch tasks using an invalid offer will receive TASK_LOST * status updats for those tasks. */ def offerRescinded( driver: SchedulerDriver, offerId: OfferID): Unit = {"Scheduler.offerRescinded [%s]" format offerId.getValue) } 26. Scheduler /** * Invoked when the status of a task has changed (e.g., a slave is * lost and so the task is lost, a task finishes and an executor * sends a status update saying so, etc). Note that returning from * this callback _acknowledges_ receipt of this status update! If * for whatever reason the scheduler aborts during this callback (or * the process exits) another status update will be delivered (note, * however, that this is currently not true if the slave sending the * status update is lost/fails during that time). */ def statusUpdate( driver: SchedulerDriver, status: TaskStatus): Unit = {"Scheduler.statusUpdate:n%s" format status) } 27. Scheduler /** * Invoked when an executor sends a message. These messages are best * effort; do not expect a framework message to be retransmitted in * any reliable fashion. */ def frameworkMessage( driver: SchedulerDriver, executorId: ExecutorID, slaveId: SlaveID, data: Array[Byte]): Unit = {"Scheduler.frameworkMessage") } 28. Scheduler ** * Invoked when the scheduler becomes "disconnected" from the master * (e.g., the master fails and another is taking over). */ def disconnected(driver: SchedulerDriver): Unit = {"Scheduler.disconnected") } /** * Invoked when a slave has been determined unreachable (e.g., * machine failure, network partition). Most frameworks will need to * reschedule any tasks launched on this slave on a new slave. */ def slaveLost( driver: SchedulerDriver, slaveId: SlaveID): Unit = {"Scheduler.slaveLost: [%s]" format slaveId.getValue) } 29. Scheduler /** * Invoked when an executor has exited/terminated. Note that any * tasks running will have TASK_LOST status updates automagically * generated. */ def executorLost( driver: SchedulerDriver,executorId: ExecutorID, slaveId: SlaveID, status: Int): Unit = {"Scheduler.executorLost: [%s]" format executorId.getValue) } /** * Invoked when there is an unrecoverable error in the scheduler or * scheduler driver. The driver will be aborted BEFORE invoking this * callback. */ def error(driver: SchedulerDriver, message: String): Unit = {"Scheduler.error: [%s]" format message) } 30. Executor /** * Invoked once the executor driver has been able to successfully * connect with Mesos. In particular, a scheduler can pass some * data to it's executors through the * field. */ def registered( driver: ExecutorDriver, executorInfo: ExecutorInfo, frameworkInfo: FrameworkInfo, slaveInfo: SlaveInfo): Unit = {"Executor.registered") } 31. Executor /** * Invoked when the executor re-registers w