Apache Mesos at Twitter (Texas LinuxFest 2014)

52
@andypiper Chris Aniszczyk Head of Open Source @cra Apache Mesos at Twitter #TXLF 2014

description

http://2014.texaslinuxfest.org/content/apache-mesos-twitter

Transcript of Apache Mesos at Twitter (Texas LinuxFest 2014)

Page 1: Apache Mesos at Twitter (Texas LinuxFest 2014)

@andypiper

Chris Aniszczyk Head of Open Source @cra

Apache Mesos at Twitter #TXLF 2014

Page 2: Apache Mesos at Twitter (Texas LinuxFest 2014)

Hi, I’m @cra & run the @TwitterOSS office!

2

Page 3: Apache Mesos at Twitter (Texas LinuxFest 2014)

Twitter is Built on Open Source…

3

Page 4: Apache Mesos at Twitter (Texas LinuxFest 2014)

Agenda!

• Introduction • How does Mesos work?• Mesos Ecosystem• Conclusion• Q&A

Page 5: Apache Mesos at Twitter (Texas LinuxFest 2014)

Twitter Scale…

5

255M+

500M+

77%

Active users

Tweets per day

of users are outside the US

2006 2014

100TB+compressed data per day

Page 6: Apache Mesos at Twitter (Texas LinuxFest 2014)

6

Growth challenges… sad times… remember the fail whale?

Page 7: Apache Mesos at Twitter (Texas LinuxFest 2014)

7

Ups and Downs… remember World Cup 2010?

http://gigaom.com/2010/06/11/is-the-world-cup-bringing-down-twitter/

Page 8: Apache Mesos at Twitter (Texas LinuxFest 2014)

Easy solution!? Lets add machines… but…

!

• Can get expensive… even with commodity hardware…

• Hard to fully utilize machines (e.g., 72 GB RAM and 24 CPUs)

• Hard to deal with failures…• What else could we do…?

Page 9: Apache Mesos at Twitter (Texas LinuxFest 2014)

Evaluate industry…!

• Google was ahead of the game of managing warehouse scale computing: http://research.google.com/pubs/pub35290.html!

• Google hit a lot of these problems before many other companies and came up with interesting solutions: http://youtube.com/watch?v=0ZFMlO98Jkc

Page 10: Apache Mesos at Twitter (Texas LinuxFest 2014)

Evaluate research at universities…!

• Universities (wooooo PhDs) were doing research in this area, we decided to partner and hire researchers: https://amplab.cs.berkeley.edu/tag/mesos/!

• “Return of the Borg: How Twitter Rebuilt Google’s Secret Weapon: http://www.wired.com/2013/03/google-borg-twitter-mesos

Page 11: Apache Mesos at Twitter (Texas LinuxFest 2014)

Enter Apache Mesos!

• We took university research and spun into an open source project at the Apache Foundation: https://blog.twitter.com/2012/incubating-apache-mesos

• https://twitter.com/ApacheMesos/statuses/360039441500340224

Page 12: Apache Mesos at Twitter (Texas LinuxFest 2014)

What is exactly is Mesos?

• Mesos is an open source project with a healthy independent community: http://mesos.apache.org

• Mesos is a distributed system to build and run distributed systems

• Mesos provides fine-grained resource sharing and isolation

• Mesos enables high-availability and fault-tolerance for your cluster

Page 13: Apache Mesos at Twitter (Texas LinuxFest 2014)

This is your typical data center

1 2 3

4 5 6

7 8 9

Page 14: Apache Mesos at Twitter (Texas LinuxFest 2014)

This is your typical data center with static partitioned apps

1 2 3

4 5 6

7 8 9

Page 15: Apache Mesos at Twitter (Texas LinuxFest 2014)

Not sharing wastes resources

0%

11%

22%

33%

0%

11%

22%

33%

0%

11%

22%

33%

Page 16: Apache Mesos at Twitter (Texas LinuxFest 2014)

Resource sharing increases throughput and utilization

0%

11%

22%

33%

0%

11%

22%

33%

0%

11%

22%

33%

0%

33.333%

66.667%

100%

Page 17: Apache Mesos at Twitter (Texas LinuxFest 2014)

Running at the container level improves performance…

Tim

e to

pro

visi

on (s

econ

ds)

1

100

10000

Bare metal VM Container

Inspired by Tomas Barton’s Mesos talk at InstallFest in Prague

Page 18: Apache Mesos at Twitter (Texas LinuxFest 2014)

Agenda!

• Introduction• How does Mesos work? • Mesos Ecosystem• Conclusion• Q&A

Page 19: Apache Mesos at Twitter (Texas LinuxFest 2014)

Mesos Slave

Hadoop task-tracker Mesos Executor

Task #1 Task #2 ./ruby XYZ

Mesos Slave

Docker Executor Docker Executor

java -jar XYZ.jar ./xyz

Mesos Master Mesos Master Mesos Master

Hadoop scheduler

Marathon scheduler

Zookeeperquorum

*Thank you to Niklas Nielsen and Adam Borlen for the following diagrams explaining Mesos https://www.youtube.com/watch?v=EI0ROkf0vks

Mesos consists of master/slave nodes

Page 20: Apache Mesos at Twitter (Texas LinuxFest 2014)

Mesos Slave

Hadoop task-tracker Mesos Executor

Task #1 Task #2 ./ruby XYZ

Mesos Slave

Docker Executor Docker Executor

java -jar XYZ.jar ./xyz

Mesos Master Mesos Master Mesos Master

Hadoop scheduler

Marathon scheduler

Zookeeperquorum

applications are known as frameworks in Mesos, they interact with master

Page 21: Apache Mesos at Twitter (Texas LinuxFest 2014)

Mesos Slave

Hadoop task-tracker Mesos Executor

Task #1 Task #2 ./ruby XYZ

Mesos Slave

Docker Executor Docker Executor

java -jar XYZ.jar ./xyz

Mesos Master Mesos Master Mesos Master

Hadoop scheduler

Marathon scheduler

Zookeeperquorum

Multiple masters can be in place for HA; coordinate leader election with ZK

Page 22: Apache Mesos at Twitter (Texas LinuxFest 2014)

Mesos Slave

Hadoop task-tracker Mesos Executor

Task #1 Task #2 ./ruby XYZ

Mesos Slave

Docker Executor Docker Executor

java -jar XYZ.jar ./xyz

Mesos Master Mesos Master Mesos Master

Hadoop scheduler

Marathon scheduler

Zookeeperquorum

Master schedules tasks to run on slaves’ available resources; slaves use executors to coordinate execution of tasks

Tasks are the unit of execution

Page 23: Apache Mesos at Twitter (Texas LinuxFest 2014)

Mesos provides fine-grained resource isolation (via cgroups)

Compute Node

Mesos Slave Process

Hadoop task-tracker Mesos Executor

Task #1 Task #2 ruby XYZ

Container(Cgroups)

Executor

Slaves isolate executors and tasks via containers (dotted line)

Page 24: Apache Mesos at Twitter (Texas LinuxFest 2014)

Compute Node

Mesos Slave Process

Hadoop task-tracker

Task #1 Task #2

Container(Cgroups)

Task #3

Mesos provides fine-grained resource isolation (via cgroups)

Containers can GROW AND SRHINK as tasks run and complete

Page 25: Apache Mesos at Twitter (Texas LinuxFest 2014)

Mesos provides componentized resource isolation

Mesos Slave Process

Mesos Containerizer

CGroups CPU isolator

CGroups Memory isolator

Launcher

Container foo

Task baz

Containerizer API

Executor bar

When a slave starts, you can specify a “containerizer” to launch the container and set of isolators to enforce resource constraints (CPU/memory)

Mesos can track and allocate more resource types, allowing you to manage resources like ip-addresses, ports, disk space and even GPUs!

Page 26: Apache Mesos at Twitter (Texas LinuxFest 2014)

Mesos provides pluggable resource isolation (e.g., Docker)

External Containerizer

External Containerizer API

Mesos Slave Process

External Containerizer Program

Container foo

MySQL

Containerizer API

Ubuntu 13.10

Container bar

Ruby

Centos 6.4

github.com/mesosphere/deimos

Page 27: Apache Mesos at Twitter (Texas LinuxFest 2014)

Everything fails all the timeWerner Vogels (Amazon CTO)

Page 28: Apache Mesos at Twitter (Texas LinuxFest 2014)

Mesos has no single point of failure (master keeps monitoring tasks and waits for a node to reconnect, master will update the framework with any tasks that were completed while it was gone)

Tasks keep running!

Framework

Masters

Page 29: Apache Mesos at Twitter (Texas LinuxFest 2014)

Master node can fail-over (ZK quorum will elect a new leader)

Tasks keep running!

Framework

Masters

Page 30: Apache Mesos at Twitter (Texas LinuxFest 2014)

Slave processes can fail over (loads check pointed state to learn what pods to reconnect for reach task and re-registeres with the master)

Tasks keep running!

Compute Node

Mesos Slave Process

Mesos Executor Mesos Executor

Page 31: Apache Mesos at Twitter (Texas LinuxFest 2014)

The Mesos ecosystem is growing, frameworks everywhere)

http://mesos.apache.org/documentation/latest/mesos-frameworks/

Page 32: Apache Mesos at Twitter (Texas LinuxFest 2014)

Chronos: Distributed cron with dependencies https://github.com/airbnb/chronos

Page 33: Apache Mesos at Twitter (Texas LinuxFest 2014)

Marathon: init.d for your data center https://github.com/mesosphere/marathon

Page 34: Apache Mesos at Twitter (Texas LinuxFest 2014)

Aurora: Advanced scheduler used by Twitter in production http://aurora.incubator.apache.org

Page 35: Apache Mesos at Twitter (Texas LinuxFest 2014)

You can also build your own framework…

Page 36: Apache Mesos at Twitter (Texas LinuxFest 2014)

Agenda!

• Introduction• How does Mesos work?• Mesos Ecosystem • Conclusion• Q&A

Page 37: Apache Mesos at Twitter (Texas LinuxFest 2014)

#PoweredByMesos (public)

http://mesos.apache.org/documentation/latest/powered-by-mesos/

Page 38: Apache Mesos at Twitter (Texas LinuxFest 2014)
Page 39: Apache Mesos at Twitter (Texas LinuxFest 2014)

Mesos allow services to scale

Engineers think about resources, not machines

Page 40: Apache Mesos at Twitter (Texas LinuxFest 2014)

Storage

MySQL

Tweet store

Flock

User Store

Cache

Memcached

Redis

Logic

Tweet Service

User Service

Timeline Service

SocialGraph Service

DM Service

Presentation

API

Web

Search

Feature X

Feature Y

Presentation

TFE(netty)

Reverse Proxy

HTTP Thrift Thrift

Aurora

Mesos

Monorail

Page 41: Apache Mesos at Twitter (Texas LinuxFest 2014)
Page 42: Apache Mesos at Twitter (Texas LinuxFest 2014)

Mesos enables multi-tenant clusters

Small teams can move fast

AWS-based infrastructure beyond just Hadoop

Page 43: Apache Mesos at Twitter (Texas LinuxFest 2014)

Marathon

Mesos

Chronos

Batch/Streaming

Hadoop

Spark

Kafka

Query/Analysis

Cascading

Presto

Hive

Shark

Pig

Services

Rails

Redis

Cassandra

KairosDB

RDS

Hadoop A Hadoop B

Page 44: Apache Mesos at Twitter (Texas LinuxFest 2014)

Agenda!

• Introduction• How does Mesos work?• Mesos Ecosystem• Conclusion • Q&A

Page 45: Apache Mesos at Twitter (Texas LinuxFest 2014)

Conclusion

• Mesos is a distributed system to build and run distributed systems (think datacenter OS)

• Mesos enables resource sharing, high-availability and fault-tolerance for your data centers

• Mesos is an open source project with a healthy independent community: http://mesos.apache.org

• So please check it out, use it or contribute back if you can to make it better!

Page 46: Apache Mesos at Twitter (Texas LinuxFest 2014)

https://elastic.mesosphere.io

Page 47: Apache Mesos at Twitter (Texas LinuxFest 2014)

http://mesos.apache.org

Open Source Support from the Mesos Community

Page 48: Apache Mesos at Twitter (Texas LinuxFest 2014)

http://mesos.apache.org/community/user-groups/

Learn more via Mesos User Groups

Page 49: Apache Mesos at Twitter (Texas LinuxFest 2014)

http://mesosphere.io/learn

Commercial Support from Mesosphere

Page 50: Apache Mesos at Twitter (Texas LinuxFest 2014)

http://mesoscon.org

First #MesosCon to coincide with LinuxCon 2014!

Page 51: Apache Mesos at Twitter (Texas LinuxFest 2014)

Thank you for listening!Chris Aniszczyk (@cra) [email protected]://opensource.twitter.com!

http://mesos.apache.orgemail: {user,dev}@mesos.apache.org

51Also thanks to Niklas Nielsen and Adam Borlen for their slides explaining Mesos from ApacheCon 2014 https://www.youtube.com/watch?v=EI0ROkf0vks

Page 52: Apache Mesos at Twitter (Texas LinuxFest 2014)

Resources!

http://mesos.apache.org http://mesosphere.io/learn/

http://wired.com/wiredenterprise/2013/03/google-borg-twitter-mesos http://mesosphere.io/2013/09/26/docker-on-mesos/

http://typesafe.com/blog/play-framework-grid-deployment-with-mesos http://research.google.com/pubs/pub35290.html

http://nerds.airbnb.com/hadoop-on-mesos/ https://blog.twitter.com/2013/mesos-graduates-from-apache-incubation

http://www.ebaytechblog.com/2014/04/04/delivering-ebays-ci-solution-with-apache-mesos-part-i/ https://www.youtube.com/watch?v=EI0ROkf0vks

!