Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN...
Transcript of Optimizing Mesos Utilization at Opentable · Optimizing Mesos Utilization at Opentable JAY CHIN...
1.4 Billion Online Reservations
MesosCon Europe 2017
2.3 Million Diners per Month58 Million verified reviews
Search
Opentable Codebase
Reviews Emails
Reservations Photo Service
Restaurant profiles
Availability Service Menu API White Label
External API Person API Feedback API
Etc Etc Etc
Around 2013
VirtualizationSearch
Codebase
Reviews Emails
Reservations Photo Service
Restaurant profiles
Availability Service Menu API White Label
External API Person API Feedback API
Etc Etc Etc
Search
DATACENTRE
Reviews Emails
Reservations Photo Service
Restaurant profiles
Availability Service Menu API White Label
External API Person API Feedback API
VM
VM
VM VMVM
VM VM
VM VM VM
VMVM
Let’s Scale !
. Search
Codebase
Reviews Emails
Reservations Photo Service
Restaurant profiles
Availability Service Menu API White Label
External API Person API Feedback API
Etc Etc Etc
Search
DATACENTRE
Reviews Emails
Reservations
Photo Service
Restaurant profiles
Availability Service Menu API White Label
External API Person API Feedback API
VM
VM
VM VMVM
VM VM
VM VM VM
VMVM
VM
EmailsVM
Search
VM
Emails
Restaurant profiles
VM
Restaurant profiles
VM
Restaurant profiles
VM
Menu API
VM
Menu API
VM
Menu API
VM
Write Puppet Code
Local Vagrant Build
Test and Version ControlCode
Provision VMsProvision More VMs in different Regions/Envs
Wait for Provisioned host puppet
run
Infrastructure Team pushes Puppet Code
Local Build
Provision
Metrics Write Puppet Code
Infrastructure Team pushes Puppet code
Build Grafana Dashboards
Code integration with Statsd/Graphite
Monitoring Runbooks and escalation policies
Write Puppet Code
Infrastructure Team pushes Puppet code
Identify Metrics or emit metrics
DATACENTRE
Mesos Cluster
Search
ReviewsEmails Reservations
Photo Service
Availability Service
Menu API
White Label
Restaurant profiles External API
Person APIFeedback API
Hubspot Singularity
Around 2014 Explore Mesos
Local Docker Testing
Push to Docker RepoCode
Deploy service to other Mesos
Cluster
Deploy Service to
Mesos Cluster
Local Build
Provision
Metrics Write Puppet Code
Infrastructure Team pushes Puppet code
Build Grafana Dashboards
Code integration with Statsd/Graphite
Monitoring Runbooks and escalation policies
Write Puppet Code
Infrastructure Team pushes Puppet code
Identify Metrics or emit metrics
Mesos Task
Singularity API
Mesos API
Carbon Format
PublisherKafka
Carbon Format
ConsumerCarbon-c relay
Graphite Cluster
Grafana
https://github.com/opentable/mesos_statshttps://github.com/weaveworks/grafanalib
Metrics Pipeline
https://github.com/weaveworks/grafanalib
Auto-generated Grafana Dashboard
Help text explaining the graphs and what
they mean
Every Service runningin Mesos will have an
auto-generated dashboard
Shows cluster-wideUsage and Instance Usage
Right-sizing Resource Usage = $$$ Saved
SingularityTask
Mesos Cluster Mesos stats
and Metrics
Shows that memory isover-provisioned for this service
Local Docker Testing
Push to Docker RepoCode
Deploy service to other Mesos
Cluster
Deploy Service to
Mesos Cluster
Local Build
Provision
Metrics
Monitoring Runbooks and escalation policies
Write Puppet Code
Infrastructure Team pushes Puppet code
Identify Metrics or emit metrics
Only application
specific metrics
Create application
specific dashboards
Optional
Soushttps://github.com/opentable/sous
Sous Service
Global DeploymentManifest
Mesos Cluster QA
Container Repository
Mesos Cluster Prod
(London)
Mesos Cluster Prod
(US-West2)
CodeSous Build
Sous DeployManifestChange
Local Docker Testing Sous DeployCode
Updated Global
Deployment Manifest
Local Build
Provision
Metrics
Monitoring Runbooks and escalation policies
Identify Metrics and Thresholds
Updated Global
Deployment Manifest
Only application
specific metrics
Create application
specific dashboards
Optional
Key TakeawaysMap out developer workflow and constantly look for opportunities to standardise, automate and enhance.Make metrics and monitoring part and parcel of the Mesos service.Engineers don’t always make the best choice when deciding resource usage - help them make an informed choice. Have a common deployment pipeline across the organisation that facilitates production readiness*Having a global data model for logging allows us to make more sense of logging data across the various Mesos tasks.