Mesos at OpenTable
date post
13-Apr-2017Category
Data & Analytics
view
667download
0
Embed Size (px)
Transcript of Mesos at OpenTable
Mesos at OpenTable
Pablo Delgado Senior Data Engineer OpenTable @pablete
MesosCon 2015, Seattle, WA
Over 32,000 restaurants worldwide
more than 760 million diners seated since 1998, representing more than $30 billion spent at partner restaurants
Over 16 million diners seated every month
OpenTable has seated over 190 million diners via a mobile device. Almost 50% of our reservations are made via a mobile device
OpenTable currently has presence in US, Canada, Mexico, UK, Germany and Japan
OpenTable has nearly 600 partners including Facebook, Google, TripAdvisor, Urbanspoon, Yahoo and Zagat.
2
OpenTable the worlds leading provider of online restaurant reservations
At OpenTable
we aim to power
the best dining experiences!
Service Oriented Architecture
5
From monolith to microservices
6
Mesos: A Platform for Fine-Grained Resource Sharing in the Data CenterPAPER: http://mesos.berkeley.edu/mesos_tech_report.pdf
Omega: flexible, scalable schedulers for large compute clusters PAPER: http://research.google.com/pubs/pub41684.html
Apache Mesos
http://mesos.berkeley.edu/mesos_tech_report.pdfhttp://research.google.com/pubs/pub41684.html
7
Apache Mesos Mesos slaves connect to
masters and offer resources like CPU, disk, and memory.
Masters take those offers and make decisions about resource allocation using frameworks like Singularity.
Frameworks in turn choose to use resource offers, and run tasks on slaves.
8
Zookeeper
Netflixs Exhibitor
Mesos Master
Zookeeper
Netflixs Exhibitor
Standby Master
Zookeeper
Netflixs Exhibitor
Standby Master
Docker
Mesos SlaveDocker
Mesos Slave
Docker
Mesos SlaveDocker
Mesos Slave
Docker
Mesos SlaveDocker
Mesos Slave
availability zone 2bavailability zone 2a availability zone 2c
Apache Mesos
Hubspots Singularity Scheduler
10
Native Docker Support
JSON REST API and Java Client
Fully featured web application (replaces and improves Mesos Master UI)
Deployments, automatic rollbacks, and healthchecks
Configurable email alerts to service owners
Singularity Features
11
Hubspots SingularityProcess types:Web Services WorkersScheduled (CRON-type) JobsOn-Demand Processes
Slave placement:GREEDYSEPARATE_BY_DEPLOYSEPARATE_BY_REQUESTOPTIMISTIC
Executors:Mesos executorSingularity executorDocker executor
Linux Containers
13
Docker Immutability
Portability
Isolation
Service Discovery
15
Services no longer live in a well known address/port, so we needed a registry or dynamic way to find them. Also it had to be MESOS agnostic.
Service announce their presence to the Discovery Server
Service subscribe to changes in dependencies announcement
Service un-announce on termination or timeout on crash
Service Discovery
16
Zookeeper Zookeeper Zookeeper
availability zone 2bavailability zone 2a availability zone 2c
Service Discovery
Discovery Server Discovery Server Discovery Server
A
A
A
BB
Announce
Discover
Subscribe
17
Service Discovery API
FrontDoor
19
FrontDoor
Route external traffic to internal services
Simple Discovery-aware proxy
Dynamic configuration
Developer friendly configuration via Git repo
REQUEST_URI=/api/timezone* passthru timezone
Monitoring
21
Monitoring
https://github.com/opentable/mesos_stats
Finds your service name by parsing the task names.
Includes grafana dashboard
Runs inside mesos
All together
23
Github
Continuous Integration
Singularity
Discovery
MasterZookeeper
MasterZookeeper
MasterZookeeper
SlaveDocker
SlaveDocker
SlaveDocker
SlaveDocker
SlaveDocker
SlaveDocker
FrontDoor
Docker Registry
Discovery
Discovery
Overview
24
Github
Continuous Integration
Singularity
Docker Registry
Developers Concerns
Initialize projects with Continuous integration template
Enable monitoring/logging of application level errors
Build project as an immutable docker image
Deploy to Mesos through singularity using a rest API
25
Singularity
Discovery
MasterZookeeper
MasterZookeeper
MasterZookeeper
SlaveDocker
SlaveDocker
SlaveDocker
SlaveDocker
SlaveDocker
SlaveDocker
FrontDoor
Docker Registry
Discovery
Discovery
Operational Concerns
Provide Mesos with resources
Monitor and maintain external traffic routing
Monitor and replace failing resources
26
Stateless Mesos Cluster
Datastores
Caches
Stateless Simplicity
Other
Mysql, PostgreSQL, MongoDB
Redis, Memcached
Zookeeper, Amazon S3
27
US Data Center EU Data Center
AWS us-west-2 AWS eu-west-1 AWS us-west-2
PROD PROD
PROD PROD QA
DATA PROCESSING
28
US Data Center EU Data Center
AWS us-west-2 AWS eu-west-1 AWS us-west-2
PROD PROD
PROD PROD QA
DATA PROCESSING
Kafka Kafka
Kafka Kafka Kafka
Data Processing
30
Distributed Multitenant Data Processing
31
Sparks Approach
Generalize MapReduce in order to support new apps in the same engine
General DAGs and Data Sharing
Unification benefits the engine, which is more efficient, and simple for user
Handles batch, interactive and online processing
API available for Java, Scala, Python, SQL, R
32
Spark RDDs
Resilient Distributed Datasets (or RDD) are fault-tolerant distributed collections
They exists in the form of:
Parallelized Collections
External datasets, distributed datasets from any storage source supported by Hadoop, including your local file system, HDFS, Cassandra, HBase, Amazon S3, etc.
33
HadoopRDD(path(=(hdfs://...(
FilteredRDD(func(=(_.contains()(shouldCache(=(true(
file:%
errors:%
Partition.level%view:%Dataset.level%view:%
Task%1%Task%2% ...%
RDD GraphDataset-level view Partition-level view
file RDD
errors RDD
Task 1 Task 2 Task 3 Task n
34
Scheduling Process
rdd1.join(rdd2) .groupBy() .filter()
RDD#Objects#
build#operator#DAG!
agnos&c(to(operators!(
doesnt(know(about(stages(
DAGScheduler#
split#graph#into#stages#of#tasks!
submit#each#stage#as#ready#
DAG#
TaskScheduler#
TaskSet#
launch#tasks#via#cluster#manager!
retry#failed#or#straggling#tasks!
Cluster#manager#
Worker#
execute#tasks!
store#and#serve#blocks!
Block(manager(
Threads(Task#
stage#failed#
Lifetime of a job. Scheduling Process
35
Scheduling Process
rdd1.join(rdd2) .groupBy() .filter()
RDD#Objects#
build#operator#DAG!
agnos&c(to(operators!(
doesnt(know(about(stages(
DAGScheduler#
split#graph#into#stages#of#tasks!
submit#each#stage#as#ready#
DAG#
TaskScheduler#
TaskSet#
launch#tasks#via#cluster#manager!
retry#failed#or#straggling#tasks!
Cluster#manager#
Worker#
execute#tasks!
store#and#serve#blocks!
Block(manager(
Threads(Task#
stage#failed#
Lifetime of a job. Scheduling Process
36
Scheduling Process
rdd1.join(rdd2) .groupBy() .filter()
RDD#Objects#
build#operator#DAG!
agnos&c(to(operators!(
doesnt(know(about(stages(
DAGScheduler#
split#graph#into#stages#of#tasks!
submit#each#stage#as#ready#
DAG#
TaskScheduler#
TaskSet#
launch#tasks#via#cluster#manager!
retry#failed#or#straggling#tasks!
Cluster#manager#
Worker#
execute#tasks!
store#and#serve#blocks!
Block(manager(
Threads(Task#
stage#failed#
Lifetime of a job. Scheduling Process
37
Scheduling Process
rdd1.join(rdd2) .groupBy() .filter()
RDD#Objects#
build#operator#DAG!
agnos&c(to(operators!(
doesnt(know(about(stages(
DAGScheduler#
split#graph#into#stages#of#tasks!
submit#each#stage#as#ready#
DAG#
TaskScheduler#
TaskSet#
launch#tasks#via#cluster#manager!
retry#failed#or#straggling#tasks!
Cluster#manager#
Worker#
execute#tasks!
store#and#serve#blocks!
Block(manager(
Threads(Task#
stage#failed#
Lifetime of a job. Scheduling Process
38
Alternating Least Squares (ALS) in MLlib
39
Driver Program
SparkContext
Cluster Manager
Worker Node
Executor
Task Task
Cache
Worker Node
Executor
Task Task
Cache
Running Spark
40
Driver Program
SparkContext
Cluster Manager
Worker Node
Executor
Task Task
Cache
Mesos Master
Mesos Executor
Worker Node
Task Task
Cache