© 2014 MapR Technologies 1 Ted Dunning February 20, 2015.

37
© 2014 MapR Technologies 1 © 2014 MapR Technologies Project Myriad: Mesos and Yarn together Ted Dunning February 20, 2015

Transcript of © 2014 MapR Technologies 1 Ted Dunning February 20, 2015.

© 2014 MapR Technologies 1© 2014 MapR Technologies

Project Myriad: Mesos and Yarn together

Ted Dunning

February 20, 2015

© 2014 MapR Technologies 2

Contact Information

Ted Dunning

Chief Applications Architect, MapR Technologies

Committer & PMC for Apache’s Drill, Zookeeper & Mahout

Mentor for Myriad & Apache’s Storm, Flink, Datafu, Optiq, Drill

Email [email protected] [email protected]

Twitter @ted_dunning @MapR@ApacheMyriad

Hashtag today: #StrataHadoop

© 2014 MapR Technologies 3

Myriad Project

• Very new open source / open community project

• Started as collaboration between Mesosphere, MapR & eBay

• Proposal to be an incubator project of the Apache Foundation submitted 12 February 2015

• Goal: global resource management for multiple data centers

© 2014 MapR Technologies 4

Agenda

• The need• Recap• How it works• Use Cases• Lessons Learned• The Future

© 2014 MapR Technologies 5

What We Need

• Tight integration of resources and programming models• User specified resources and allocation models• Lightweight executive• Strong isolation• Fast task launch

© 2014 MapR Technologies 6

What We Need

• Very fast scheduling• Very careful (slow) scheduling• Long-lived system tasks• Short-lived tasks• Long-lived ephemeral tasks• Pre-emption

© 2014 MapR Technologies 7

What We Need

• Very good support of entire Hadoop eco-system– Tight integration of MapReduce2– Tez– Impala– Drill– Spark

• Very good support of everything else– Arbitrary containers– Web servers– Systems processes without containers– User defined containers– Licensing constraints

© 2014 MapR Technologies 8

This is a problem

© 2014 MapR Technologies 9

And an opportunity

© 2014 MapR Technologies 10

What We Have - Yarn

• Resource Manager, NodeManager, heartbeat– Direct lineage from JobTracker, TaskTracker

• Application Master, Task containers– The other half of the JobTracker and TaskTracker

• Monolithic scheduling• Pre-emption• Hadoop standard• Pre-defined resources• Good Hadoop eco support

– MapReduce2, Tez, Impala, Drill, Spark

© 2014 MapR Technologies 11

What We Have - Mesos

• Two level scheduling– Bottom level is application specific– Frameworks to ease complexity– Offers, Returns

• Actor-based, bidi RPC– Super fast process launch

• Marathon, Chronos– ISO8601, jboss, jetty, sinatra, rails

• User defined resources, attributes• Some Hadoop (Spark native!)

© 2014 MapR Technologies 12

Sound the sameVery much not

© 2014 MapR Technologies 13

Myriad integratesMesos and Yarn

© 2014 MapR Technologies 14

How It Works

• Mesos creates virtual clusters

• YARN uses resources provided by Mesos

• Myriad can ask YARN to release some resources

• Or give it more

Mesos

YARN clusterYARNcluster

Web Servers

© 2014 MapR Technologies 15

© 2014 MapR Technologies 16

© 2014 MapR Technologies 17

© 2014 MapR Technologies 18

© 2014 MapR Technologies 19

How Myriad Works

• Mesos runs Yarn– Yarn runs Yarn programs– Multiple Yarns supported– Multiple Yarn versions easy

• Mesos runs program + Yarn fakeout– Gets resources back from Yarn quickly– High priority “Yarn” program– As Yarn executes “tasks”, resources given back to Mesos– Allows fast spinup/spindown of Yarn resources

© 2014 MapR Technologies 20

How Myriad Works

Mesos

Persistence Layer

© 2014 MapR Technologies 21

How Myriad Works

Mesos

Persistence Layer

© 2014 MapR Technologies 22

Let’s see some examples

© 2014 MapR Technologies 23

#1 – I wanna cluster

© 2014 MapR Technologies 24

I Want a Cluster

• Very common need– Ephemeral clusters for multi-tenancy– Quick dev or QA clusters– Compatibility testing

• Yarn doesn’t run Yarn well– Especially across incompatible versions– Encapsulation can’t be unrolled

• Myriad does this trivially, but– Must have data localization, universal name space

© 2014 MapR Technologies 25

#2 – Version upgrade

© 2014 MapR Technologies 26

YARN Version Upgrade

• Another very common need– Need to test first– Applications roll over to new cluster– Resources follow applications– Data layer must remain inter-operable

• Yarn doesn’t run Yarn well (again)– Especially across incompatible versions– Encapsulation can’t be unrolled

• Myriad does this trivially, but– Must have data localization, universal name space

© 2014 MapR Technologies 27

#3 – Resource slosh

© 2014 MapR Technologies 28

Resource Slosh

• Resource slosh– Data ingestion pulse requires many web-servers– After ingestion, analytics pulse requires many Hadoop nodes– Data layer must remain inter-operable

• Conflict between Sysop/Hadoop viewpoints

• Myriad does this trivially, but– Must have data localization, universal name space

© 2014 MapR Technologies 29

Resource Slosh

• Resource slosh– Data ingestion pulse requires many web-servers– After ingestion, analytics pulse requires many Hadoop nodes– Data layer must remain inter-operable

• Conflict between Sysop/Hadoop viewpoints

• Myriad does this trivially, but– Must have data localization, universal name space

© 2014 MapR Technologies 30

Some Lessons Learned

• Omega paper – Not news– Single scheduler framework not viable

• Multi-cultural software is actually pretty cool– But you have to value both cultures

• One incubator project (Slider) doesn’t change that

© 2014 MapR Technologies 31

The Future

• Incubator– Proposal at http://wiki.apache.org/incubator/MyriadProposal– Initial team from Mesosphere, Ebay, MapR

• Community building– Diversity is good already– Starting with very lean team

© 2014 MapR Technologies 32

The Future

• Incubator– Proposal at http://wiki.apache.org/incubator/MyriadProposal– Initial team from Mesosphere, Ebay, MapR

• Community building– Diversity is good already– Starting with very lean team

• Older whisky, faster horses, more features– Apologies to the cowboy and the poet– And Tom T Hall

© 2014 MapR Technologies 33

World domination

© 2014 MapR Technologies 34

World domination

© 2014 MapR Technologies 35

World domination Peaceful coexistence via

specialization

© 2014 MapR Technologies 36

Myriad Project

• Blog “Project Myriad: No Hadoop is an Island” http://bit.ly/myriad-mapr-blog

• Proposal to be an incubator project of the Apache Foundation submitted 12 February 2015 http://bit.ly/myriad-asf-proposal

• Initial code on github: http://bit.ly/github-myriad

• Join us! Twitter for Myriad community @ApacheMyriad[no, it’s not

an official project logo]

© 2014 MapR Technologies 37

Contact Information

Ted Dunning

Chief Applications Architect, MapR Technologies

Committer & PMC for Apache’s Drill, Zookeeper & Mahout

Mentor for Myriad & Apache’s Storm, Flink, Datafu, Optiq, Drill

Email [email protected] [email protected]

Twitter @ted_dunning @ApacheMyriad

Hashtag today: #StrataHadoop