Introduction to Apache Mesos

50
Apache Mesos An Introduction http://mesos.apache. org

description

Apache Mesos is a cluster manager that simplifies the complexity of running applications on a shared pool of servers.

Transcript of Introduction to Apache Mesos

Page 1: Introduction to Apache Mesos

Apache MesosAn Introduction

http://mesos.apache.org

Page 2: Introduction to Apache Mesos

Joe Stein• Developer, Architect & Technologist

• Founder & Principal Consultant => Big Data Open Source Security LLC - http://stealth.ly

Big Data Open Source Security LLC provides professional services and product solutions for the collection, storage, transfer, real-time analytics, batch processing and reporting for complex data streams, data sets and distributed systems. BDOSS is all about the "glue" and helping companies to not only figure out what Big Data Infrastructure Components to use but also how to change their existing (or build new) systems to work with them.

• Apache Kafka Committer & PMC member

• Blog & Podcast - http://allthingshadoop.com

• Twitter @allthingshadoop

Page 3: Introduction to Apache Mesos

Overview

• Where

• Why

• What

• How

• Configurations

• Tools

• Frameworks

Page 4: Introduction to Apache Mesos

Origins

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center http://static.usenix.org/event/nsdi11/tech/full_papers/Hindman_new.pdf

2011 GAFS Omega John Wilkes https://www.youtube.com/watch?v=0ZFMlO98Jkc&feature=youtu.be

Omega: flexible, scalable schedulers for large compute clusters http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf

Page 5: Introduction to Apache Mesos

Static Partitioning

Page 6: Introduction to Apache Mesos

Static Partitioning IS Bad

hard to utilize machines (i.e., X GB RAM and Y CPUs)

Page 7: Introduction to Apache Mesos

Static Partitioning does NOT scale

hard to scale elastically (to take advantage of statistical multiplexing)

Page 8: Introduction to Apache Mesos

Failures === Downtime

hard to deal with failures

Page 9: Introduction to Apache Mesos

It doesn’t have to be that way

Page 10: Introduction to Apache Mesos

Operating System === Datacenter

Page 11: Introduction to Apache Mesos

Mesos => data center “kernel”

Page 12: Introduction to Apache Mesos

Mesos is node abstraction

Page 13: Introduction to Apache Mesos
Page 14: Introduction to Apache Mesos
Page 15: Introduction to Apache Mesos

Containers

Page 16: Introduction to Apache Mesos

Master and Slave Configuration--ip=VALUE IP address to listen on

--[no-]help Prints this help message (default: false)

--log_dir=VALUE Location to put log files (no default, nothing is written to disk unless specified; does not affect logging to stderr)

--logbufsecs=VALUE How many seconds to buffer log messages for (default: 0)

--logging_level=VALUE Log message at or above this level; possible values: 'INFO', 'WARNING', 'ERROR'; if quiet flag is used, this will affect just the logs from log_dir (if specified) (default: INFO)

--port=VALUE Port to listen on (default: 5051)

--[no-]quiet Disable logging to stderr (default: false)

--[no-]version Show version and exit. (default: false)

Page 17: Introduction to Apache Mesos

Master Configurations (required)

--quorum=VALUE The size of the quorum of replicas when using 'replicated_log' based registry. It is imperative to set this value to be a majority of masters i.e., quorum > (number of masters)/2.

--work_dir=VALUE Where to store the persistent information stored in the Registry.

--zk=VALUE ZooKeeper URL (used for leader election amongst masters) May be one of: zk://host1 :port1,host2:port2,../path or zk://username:password @host1:port1,host2:port2,.../path file://path/to/file (where file contains one of the above)

Page 18: Introduction to Apache Mesos

Master Configurations (optional)

Way too many for a slide

http://mesos.apache.org/documentation/latest/configuration/

Page 19: Introduction to Apache Mesos

Slave Configuration (required)

--master=VALUE May be one of: zk://host1:port1,host2:port2,.../path zk://username:password@host1:port1,host2:port2,.../path file://path/to/file (where file contains one of the above)

Page 20: Introduction to Apache Mesos

Slave Configuration (optional)

Way too many for a slide

http://mesos.apache.org/documentation/latest/configuration/

Page 21: Introduction to Apache Mesos

Organizations using Mesos (Public)

AirbnbConvivaIgnidataMediaCrossingPinkbikeUCSFVirdata

AtigeoeBayiQIYIMesosphereSharethroughUC BerkeleyYieldbot

AtlassianDevicescapeMagine TVNetflixSigmoid URXXogito

CategorizeDueDilMedidataOpenTableSiQueriesViadeo

CloudPhysicsHubSpotMeemoPayPalTwitterVimeo

http://mesos.apache.org/documentation/latest/powered-by-mesos/

Page 22: Introduction to Apache Mesos

Tools to setup and run Mesoshttp://mesos.apache.org/documentation/latest/tools/

• collectd plugin to collect Mesos cluster metrics.

• Deploy scripts for launching a Mesos cluster on a set of machines.

• EC2 scripts for launching a Mesos cluster on Amazon EC2.

• Chef cookbook by Everpeace Install Mesos and configure master and slave. This cookbook supports installation from source or the Mesosphere packages.

• Chef cookbook by Mdsol Application cookbook for installing the Apache Mesos cluster manager. This cookbook installs Mesos via packages provided by Mesosphere.

• Puppet Module by Deric This is a Puppet module for managing Mesos nodes in a cluster.

• Vagrant setup by Everpeace Spin up your Mesos Cluster with Vagrant!

• Vagrant setup by Mesosphere Quickly build Mesos sandbox environments using Vagrant.

Page 24: Introduction to Apache Mesos

Scheduler/**

* Invoked when the scheduler successfully registers with a Mesos

* master. A unique ID (generated by the master) used for

* distinguishing this framework from others and MasterInfo

* with the ip and port of the current master are provided as arguments.

*/

def registered(

driver: SchedulerDriver,

frameworkId: FrameworkID,

masterInfo: MasterInfo): Unit = {

log.info("Scheduler.registered")

log.info("FrameworkID:\n%s" format frameworkId)

log.info("MasterInfo:\n%s" format masterInfo)

}

Page 25: Introduction to Apache Mesos

Scheduler/**

* Invoked when the scheduler re-registers with a newly elected Mesos master.

* This is only called when the scheduler has previously been registered.

* MasterInfo containing the updated information about the elected master

* is provided as an argument.

*/

def reregistered(

driver: SchedulerDriver,

masterInfo: MasterInfo): Unit = {

log.info("Scheduler.reregistered")

log.info("MasterInfo:\n%s" format masterInfo)

}

Page 26: Introduction to Apache Mesos

Scheduler/**Invoked when resources have been offered to this framework. A single offer will only contain resources from a

single slave. Resources associated with an offer will not be re-offered to _this_ framework until either (a) this framework has rejected those resources or (b) those resources have been rescinded.Note that resources may be concurrently offered to more than one framework at a time (depending on the allocator being used). In that case, the first framework to launch tasks using those resources will be able to use them while the other frameworks will have those resources rescinded (or if a framework has already launched tasks with those resources then those tasks will fail with a TASK_LOST status and a message saying as much).*/

def resourceOffers(

driver: SchedulerDriver, offers: JList[Offer]): Unit = {

log.info("Scheduler.resourceOffers")

// print and decline all received offers

offers foreach { offer =>

log.info(offer.toString)

driver declineOffer offer.getId

}

}

Page 27: Introduction to Apache Mesos

Scheduler/**

* Invoked when an offer is no longer valid (e.g., the slave was

* lost or another framework used resources in the offer). If for

* whatever reason an offer is never rescinded (e.g., dropped

* message, failing over framework, etc.), a framwork that attempts

* to launch tasks using an invalid offer will receive TASK_LOST

* status updats for those tasks.

*/

def offerRescinded(

driver: SchedulerDriver,

offerId: OfferID): Unit = {

log.info("Scheduler.offerRescinded [%s]" format offerId.getValue)

}

Page 28: Introduction to Apache Mesos

Scheduler/**

* Invoked when the status of a task has changed (e.g., a slave is

* lost and so the task is lost, a task finishes and an executor

* sends a status update saying so, etc). Note that returning from

* this callback _acknowledges_ receipt of this status update! If

* for whatever reason the scheduler aborts during this callback (or

* the process exits) another status update will be delivered (note,

* however, that this is currently not true if the slave sending the

* status update is lost/fails during that time).

*/

def statusUpdate(

driver: SchedulerDriver, status: TaskStatus): Unit = {

log.info("Scheduler.statusUpdate:\n%s" format status)

}

Page 29: Introduction to Apache Mesos

Scheduler

/**

* Invoked when an executor sends a message. These messages are best

* effort; do not expect a framework message to be retransmitted in

* any reliable fashion.

*/

def frameworkMessage(

driver: SchedulerDriver,

executorId: ExecutorID,

slaveId: SlaveID,

data: Array[Byte]): Unit = {

log.info("Scheduler.frameworkMessage")

}

Page 30: Introduction to Apache Mesos

Scheduler**

* Invoked when the scheduler becomes "disconnected" from the master

* (e.g., the master fails and another is taking over).

*/

def disconnected(driver: SchedulerDriver): Unit = {

log.info("Scheduler.disconnected")

}

/**

* Invoked when a slave has been determined unreachable (e.g.,

* machine failure, network partition). Most frameworks will need to

* reschedule any tasks launched on this slave on a new slave.

*/

def slaveLost(

driver: SchedulerDriver,

slaveId: SlaveID): Unit = {

log.info("Scheduler.slaveLost: [%s]" format slaveId.getValue)

}

Page 31: Introduction to Apache Mesos

Scheduler/**

* Invoked when an executor has exited/terminated. Note that any

* tasks running will have TASK_LOST status updates automagically

* generated.

*/

def executorLost(

driver: SchedulerDriver,executorId: ExecutorID, slaveId: SlaveID,

status: Int): Unit = {

log.info("Scheduler.executorLost: [%s]" format executorId.getValue)

}

/**

* Invoked when there is an unrecoverable error in the scheduler or

* scheduler driver. The driver will be aborted BEFORE invoking this

* callback.

*/

def error(driver: SchedulerDriver, message: String): Unit = {

log.info("Scheduler.error: [%s]" format message)

}

Page 32: Introduction to Apache Mesos

Executor/**

* Invoked once the executor driver has been able to successfully

* connect with Mesos. In particular, a scheduler can pass some

* data to it's executors through the ExecutorInfo.data

* field.

*/

def registered(

driver: ExecutorDriver,

executorInfo: ExecutorInfo,

frameworkInfo: FrameworkInfo,

slaveInfo: SlaveInfo): Unit = {

log.info("Executor.registered")

}

Page 33: Introduction to Apache Mesos

Executor/**

* Invoked when the executor re-registers with a restarted slave.

*/

def reregistered(

driver: ExecutorDriver,

slaveInfo: SlaveInfo): Unit = {

log.info("Executor.reregistered")

}

/** Invoked when the executor becomes "disconnected" from the slave

* (e.g., the slave is being restarted due to an upgrade).

*/

def disconnected(driver: ExecutorDriver): Unit = {

log.info("Executor.disconnected")

}

Page 34: Introduction to Apache Mesos

Executor/**

* Invoked when a task has been launched on this executor (initiated

* via Scheduler.launchTasks. Note that this task can be

* realized with a thread, a process, or some simple computation,

* however, no other callbacks will be invoked on this executor

* until this callback has returned.

*/

def launchTask(driver: ExecutorDriver, task: TaskInfo): Unit = {

log.info("Executor.launchTask")

}

Page 35: Introduction to Apache Mesos

Executor/**

* Invoked when a task running within this executor has been killed

* (via SchedulerDriver.killTask). Note that no status

* update will be sent on behalf of the executor, the executor is

* responsible for creating a new TaskStatus (i.e., with

* TASK_KILLED) and invoking ExecutorDriver.sendStatusUpdate.

*/

def killTask(driver: ExecutorDriver, taskId: TaskID): Unit = {

log.info("Executor.killTask")

}

Page 36: Introduction to Apache Mesos

Executor/**

* Invoked when a framework message has arrived for this

* executor. These messages are best effort; do not expect a

* framework message to be retransmitted in any reliable fashion.

*/

def frameworkMessage(driver: ExecutorDriver, data: Array[Byte]): Unit = {

log.info("Executor.frameworkMessage")

}

Page 37: Introduction to Apache Mesos

Executor/**

* Invoked when the executor should terminate all of it's currently

* running tasks. Note that after a Mesos has determined that an

* executor has terminated any tasks that the executor did not send

* terminal status updates for (e.g., TASK_KILLED, TASK_FINISHED,

* TASK_FAILED, etc) a TASK_LOST status update will be created.

*/

def shutdown(driver: ExecutorDriver): Unit = {

log.info("Executor.shutdown")

}

Page 38: Introduction to Apache Mesos

Executor/**

* Invoked when a fatal error has occured with the executor and/or

* executor driver. The driver will be aborted BEFORE invoking this

* callback.

*/

def error(driver: ExecutorDriver, message: String): Unit = {

log.info("Executor.error")

}

Page 39: Introduction to Apache Mesos

Apache Aurora

http://aurora.incubator.apache.org/

Apache Aurora is a service scheduler that runs on top of Mesos, enabling you to run long-running services that take advantage of Mesos' scalability, fault-tolerance, and resource isolation. Apache Aurora is currently part of the Apache Incubator.

Page 40: Introduction to Apache Mesos

Marathonhttps://github.com/mesosphere/marathon

Cluster-wide init and control system for

services in cgroups or docker based on

Apache Mesos

Page 41: Introduction to Apache Mesos

Chronoshttps://github.com/mesosphere/Chronos

Fault tolerant job scheduler that

handles dependencies and iso8601

based schedules.

Page 42: Introduction to Apache Mesos

Apache Sparkhttp://spark.apache.org/docs/latest/running-on-mesos.html

Spark can run on hardware clusters managed by Apache Mesos. The advantages of deploying

Spark with Mesos include: dynamic partitioning between Spark and other frameworks

scalable partitioning between multiple instances of Spark

Page 43: Introduction to Apache Mesos

Apache Storm

http://mesosphere.io/learn/run-storm-on-mesos/

Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation on a stream. This tutorial shows you how to install Storm on a Mesos cluster and run a sample topology on it.

Page 44: Introduction to Apache Mesos

Apache Kafka (and Zookeeper)

Using Apache Aurora https://github.com/pegasussolutions/borealis

Page 45: Introduction to Apache Mesos

Apache Cassandra

https://github.com/mesosphere/cassandra-mesos

This project allows you to utilize your Mesos cluster to run Cassandra. The scheduler (aka the bin/cassandra-mesos process) will do all the heavy lifting like downloading Cassandra to the worker nodes, distributing the configuration and monitoring the instances. It will automatically modify the cassandra.yaml file to include the selected nodes running Cassandra as seed nodes through a template variable.

Page 46: Introduction to Apache Mesos

Docker

Mesos containerizer hooks for Docker https://github.com/mesosphere/deimos

… native support coming soon … https://issues.apache.org/jira/browse/MESOS-1524

Page 47: Introduction to Apache Mesos

Google Kuberneteshttps://github.com/mesosphere/kubernetes-mesos

Kubernetes and Mesos are a match made in heaven. Kubernetes enables the Pod (group of containers) abstraction, along with Pod labels for service discovery, load-balancing, and replication control. Mesos provides the fine-grained resource allocations for pods across nodes in a cluster, and can make Kubernetes play nicely with other frameworks running on the same cluster resources. With the Kubernetes framework for Mesos, the framework scheduler will register Kubernetes with Mesos, and then Mesos will begin offering Kubernetes sets of available resources from the cluster nodes (slaves/minions). The framework scheduler will match Mesos' resource offers to Kubernetes pods to run, and then send a launchTasks message to the Mesos master, which will forward the request onto the appropriate slaves. The slave will then fetch the kubelet/executor and the pod/task to run, start running the pod, and send a TASK_RUNNING update back to the Kubernetes framework scheduler. Once the pods are running on the slaves/minions, they can make use of Kubernetes' pod labels to register themselves in etcd for service discovery, load-balancing, and replication control.

Page 49: Introduction to Apache Mesos

First Ever #MesosCon

http://mesos.apache.org/

Page 50: Introduction to Apache Mesos

Questions?

/*******************************************

Joe Stein

Founder, Principal Consultant

Big Data Open Source Security LLC

http://www.stealth.ly

Twitter: @allthingshadoop

********************************************/