Optimizing Mesos Utilization at...

27
Optimizing Mesos Utilization at Opentable JAY CHIN INFRASTRUCTURE ENGINEERING MesosCon Europe 2017

Transcript of Optimizing Mesos Utilization at...

Page 1: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

Optimizing Mesos Utilization at Opentable

JAY CHININFRASTRUCTURE ENGINEERING

MesosCon Europe 2017

Page 2: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

1.4 Billion Online Reservations

MesosCon Europe 2017

2.3 Million Diners per Month58 Million verified reviews

Page 3: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

http

s://fl

ic.k

r/p/9

F6Kh

k

Before 2013

Page 4: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

Every 2 Months

Phot

o C

redi

t : N

ASA

Page 5: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE
Page 6: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

Search

Opentable Codebase

Reviews Emails

Reservations Photo Service

Restaurant profiles

Availability Service Menu API White Label

External API Person API Feedback API

Etc Etc Etc

Around 2013

Page 7: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE
Page 8: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

VirtualizationSearch

Codebase

Reviews Emails

Reservations Photo Service

Restaurant profiles

Availability Service Menu API White Label

External API Person API Feedback API

Etc Etc Etc

Search

DATACENTRE

Reviews Emails

Reservations Photo Service

Restaurant profiles

Availability Service Menu API White Label

External API Person API Feedback API

VM

VM

VM VMVM

VM VM

VM VM VM

VMVM

Page 9: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

Let’s Scale !

. Search

Codebase

Reviews Emails

Reservations Photo Service

Restaurant profiles

Availability Service Menu API White Label

External API Person API Feedback API

Etc Etc Etc

Search

DATACENTRE

Reviews Emails

Reservations

Photo Service

Restaurant profiles

Availability Service Menu API White Label

External API Person API Feedback API

VM

VM

VM VMVM

VM VM

VM VM VM

VMVM

VM

EmailsVM

Search

VM

Emails

Restaurant profiles

VM

Restaurant profiles

VM

Restaurant profiles

VM

Menu API

VM

Menu API

VM

Menu API

VM

Page 10: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

Infrastructure Team / SRE

Page 11: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

Write Puppet Code

Local Vagrant Build

Test and Version ControlCode

Provision VMsProvision More VMs in different Regions/Envs

Wait for Provisioned host puppet

run

Infrastructure Team pushes Puppet Code

Local Build

Provision

Metrics Write Puppet Code

Infrastructure Team pushes Puppet code

Build Grafana Dashboards

Code integration with Statsd/Graphite

Monitoring Runbooks and escalation policies

Write Puppet Code

Infrastructure Team pushes Puppet code

Identify Metrics or emit metrics

Page 12: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE
Page 13: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

DATACENTRE

Mesos Cluster

Search

ReviewsEmails Reservations

Photo Service

Availability Service

Menu API

White Label

Restaurant profiles External API

Person APIFeedback API

Hubspot Singularity

Around 2014 Explore Mesos

Page 14: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

Local Docker Testing

Push to Docker RepoCode

Deploy service to other Mesos

Cluster

Deploy Service to

Mesos Cluster

Local Build

Provision

Metrics Write Puppet Code

Infrastructure Team pushes Puppet code

Build Grafana Dashboards

Code integration with Statsd/Graphite

Monitoring Runbooks and escalation policies

Write Puppet Code

Infrastructure Team pushes Puppet code

Identify Metrics or emit metrics

Page 15: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

Mesos Task

Singularity API

Mesos API

Carbon Format

PublisherKafka

Carbon Format

ConsumerCarbon-c relay

Graphite Cluster

Grafana

https://github.com/opentable/mesos_statshttps://github.com/weaveworks/grafanalib

Metrics Pipeline

https://github.com/weaveworks/grafanalib

Page 16: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

Auto-generated Grafana Dashboard

Help text explaining the graphs and what

they mean

Every Service runningin Mesos will have an

auto-generated dashboard

Shows cluster-wideUsage and Instance Usage

Page 17: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

Right-sizing Resource Usage = $$$ Saved

SingularityTask

Mesos Cluster Mesos stats

and Metrics

Shows that memory isover-provisioned for this service

Page 18: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE
Page 19: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

Local Docker Testing

Push to Docker RepoCode

Deploy service to other Mesos

Cluster

Deploy Service to

Mesos Cluster

Local Build

Provision

Metrics

Monitoring Runbooks and escalation policies

Write Puppet Code

Infrastructure Team pushes Puppet code

Identify Metrics or emit metrics

Only application

specific metrics

Create application

specific dashboards

Optional

Page 20: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

Soushttps://github.com/opentable/sous

Sous Service

Global DeploymentManifest

Mesos Cluster QA

Container Repository

Mesos Cluster Prod

(London)

Mesos Cluster Prod

(US-West2)

CodeSous Build

Sous DeployManifestChange

Page 21: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE
Page 22: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

Local Docker Testing Sous DeployCode

Updated Global

Deployment Manifest

Local Build

Provision

Metrics

Monitoring Runbooks and escalation policies

Identify Metrics and Thresholds

Updated Global

Deployment Manifest

Only application

specific metrics

Create application

specific dashboards

Optional

Page 23: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

Logging

Restaurant_id == RID == ResID == Res_ID

Global RequestID

Page 24: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

https://github.com/opentable/request-timeline

Page 25: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

Timeline Demo

Page 26: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

Key TakeawaysMap out developer workflow and constantly look for opportunities to standardise, automate and enhance.Make metrics and monitoring part and parcel of the Mesos service.Engineers don’t always make the best choice when deciding resource usage - help them make an informed choice. Have a common deployment pipeline across the organisation that facilitates production readiness*Having a global data model for logging allows us to make more sense of logging data across the various Mesos tasks.

Page 27: Optimizing Mesos Utilization at Opentableevents17.linuxfoundation.org/sites/events/files/slides/mesoscon-europe-2017-Jay-Chin 2.pdfUtilization at Opentable JAY CHIN INFRASTRUCTURE

Thank You

[email protected]@jaychin

https://www.linkedin.com/in/jayschin