Hadoop Summit San Jose 2015: YARN - Past, Present and Future

© Hortonworks Inc. 2011 – 2015. All Rights Reserved

Apache Hadoop YARN - 2015

June 9, 2015

Past, Present & Future


We are

Vinod Kumar Vavilapalli• Long time Hadooper since 2007• Apache Hadoop Committer / PMC• Apache Member• Yahoo! -> Hortonworks• MapReduce -> YARN from day one

Jian He• Hadoop contributor since 2012• Apache Hadoop Committer / PMC• Hortonworks• All things YARN


OverviewThe Why and the What


Data architectures

• Traditional architectures– Specialized Silos

– Per silo security, management, governance etc.

– Limited Scalability

– Limited cost efficiencies

• For the present and the future– Hadoop repository

– Commodity storage

– Centralized but distributed system

– Scalable

– Uniform org policy enforcement

– Innovation across silos!

Data - HDFS

Cluster Resources


Resource Management

• Extracting value out of centralized data architecture

• A messy problem– Multiple apps, frameworks, their life-cycles and evolution

• Tenancy– “I am running this system for one user”– It almost never stops there

– Groups, Teams, Users

• Sharing / isolation needed• Adhoc structures get unusable real fast


Varied goals & expectations• On isolation, capacity allocations, scheduling

Faster!

More! Best for my clusterThroughputUtilizationElasticity

Service uptimeSecurity

ROIEverything! Right now!

SLA!


Enter Hadoop YARN

HDFS (Scalable, Reliable Storage)

YARN (Cluster Resource Management)

Applications (Running Natively in Hadoop)

• Store all your data in one place … (HDFS)

• Interact with that data in multiple ways … (YARN Platform + Apps): Data centric

• Scale as you go, shared, multi-tenant, secure … (The Hadoop Stack)

Queues Admins/Users

Cluster Resources

Pipelines


Hadoop YARN

• Distributed System• Host of frameworks, meta-frameworks, applications• Varied workloads

– Batch

– Interactive

– Stream processing

– NoSQL databases

– ….

• Large scale– Linear scalability

– Tens of thousands of nodes

– More coming


PastA quick history


A brief Timeline

• Sub-project of Apache Hadoop• Releases tied to Hadoop releases• Alphas and betas

– In production at several large sites for MapReduce already by that time

1st line of Code Open sourced First 2.0 alpha First 2.0 beta

June-July 2010 August 2011 May 2012 August 2013


GA Releases

2.2 2.3 2.4 2.5

15 October 2013 24 February 2014 07 April 2014 11 August 2014

• 1st GA

• MR binary compatibility

• YARN API cleanup

• Testing!

• 1st Post GA

• Bug fixes

• Alpha features

• RM Fail-over

• CS Preemption

• Timeline Service V1

• Writable REST APIs

• Timeline Service V1 security


Present


Last few Hadoop releases

• Hadoop 2.6– 18 November 2014– Rolling Upgrades– Services– Node labels

• Hadoop 2.7– 21 Apr 2015– Moving to JDK 7+

• Focus on some features next!

Apache Hadoop 2.6

Apache Hadoop 2.7


Rolling Upgrades


YARN Rolling Upgrades• Why? No more losing work during

upgrades!

• Workflow• Servers first: Masters followed by per-node agents

• Upgrade of Applications/Frameworks is decoupled!

• Work preserving RM restart: RM recovers state from NMs and apps

• Work preserving NM restart: NM recovers state from local disk

• RM fail-over is optional


YARN Rolling Upgrades: A Cluster Snapshot


Stack Rolling Upgrades

Enterprise grade rolling upgrade of a Live Hadoop Cluster

Jun 10, 3:25PM - 4:05PMSanjay Radia & Vinod K V from Hortonworks


Services on YARN


Long running services

• You could run them already before 2.6!

• Enhancements needed– Logs

– Security

– Management/monitoring

– Sharing and Placement

– Discovery

• Resource sharing across workload types

• Fault tolerance of long running services– Work preserving AM restart

– AM forgetting faults

• Service registry


Project Slider• Bring your existing services unmodified to YARN: slider.incubator.apache.org/

• HBase, Storm, Kafka already!

YARN

MapReduce Tez

Storm Kafka

Spark

HBasePig Hive Cascading

Apache Slider

Moreservices..

DeathStar: Easy, Dynamic, Multi-tenant HBase via YARN

June 11: 1:30-2:10PMIshan Chhabra & Nitin Aggarwal from Rocket Fuel

Authoring and hosting applications on YARN using Slider

Jun 11, 11:00AM - 11:40AM Sumit Mohanty & Jonathan Maron from Hortonworks


Operational and Developer tooling


Node Labels

• Today: Partitions– Admin: “I have machines of different types”– Impact on capacity planning: “Hey, we bought

those GPU machines”

• Types– Exclusive: “This is my Precious!”– Non-exclusive: “I get binding preference. Use it

for others when idle”

• Future: Constraints– “Take me to a machine running JDK version 9”– No impact on capacity planning

Default Partition Partition BGPUs

Partition CWindows

JDK 8 JDK 7 JDK 7

Node Labels in YARNJun 11, 11:00AM - 11:40AM

Mayank Bansal (ebay) & Wangda Tan (Hortonworks)


Pluggable ACLs

• Pluggable YARN authorization model• YARN Apache Ranger integration

Apache Ranger

Queue ACLsManagement plugin

2. Submit app

1. Admin manages ACLs

YARN

Securing Hadoop with Apache Ranger : Strategies & Best Practices

Jun 11, 3:10PM - 3:50PM Selvamohan Neethiraj & Velmurugan Periasamy from

HortonWorks


Usability

• Why is my application stuck?

• “How many rack local containers did I get”

• Lots more..– “Why is my application stuck? What limits did it hit?”– “What is the number of running containers of my app?”– “How healthy is the scheduler?”


Future


Per-queue Policy-driven scheduling

Previously Now

Ingestion

FIFO

Adhoc

User-fairnessAdhoc

FIFO

Ingestion

FIFO

• Coarse policies• One scheduling algorithm in the cluster• Rigid• Difficult to experiment

• Fine grained policies• One scheduling algorithm per queue• Flexible• Very easy to experiment!

Batch

FIFO

Batch

FIFO

rootroot


Reservations

• “Run my workload tomorrow at 6AM”• Next: Persistence of the plans

Timeline

Res

ourc

es

6:00AM

Block #1

Timeline

Res

ourc

es

6:00AM

Block #1

Block #2

Reservation-based Scheduling: If You’re Late Don’t Blame Us!

June 10 12:05PM – 12:45PMCarlo Curino & Subru Venkatraman Krishnan (Microsoft)


Containerized Applications

• Running Containerized Applications on YARN– As a packaging mechanism

– As a resource-isolation mechanism

• Docker• Adding the notion of Container Runtimes• Multiple use-cases

– “Run my existing service on YARN via Slider + Docker”– “Run my existing MapReduce application on YARN via a docker image”

Apache Hadoop YARN and the Docker EcosystemJune 9 1:45PM – 2:25PM

Sidharta Seethana (Hortonworks) & Abin Shahab (Altiscale)


Disk Isolation

• Isolation and scheduling dimensions– Disk Capacity

– IOPs

– Bandwidth

DataNode NodeManager Map TaskHBase RegionServer

Disks on a node

Reduce Task

• Read• Write

• Localization• Logs• Shuffle

• Read• Write

• Read Spills• Write shuffled data

• Read Spills• Write

Remote IO

• Today: Equal allocation to all containers along all dimensions

• Next: Scheduling


Network Isolation

• Isolation and scheduling dimensions– Incoming bandwidth

– Outgoing bandwidth

DataNode NodeManager Map TaskStorm SpoutReduce

Task

• Write Pipeline

• Localization• Logs• Shuffle

• Read • Read shuffled data• Write outputs

• Readinput

Remote IO

• Today: Equi-share Outbound bandwidth

• Next: Scheduling

Network

Storm Bolt

• Read• Write


Timeline Service

• Application History– “Where did my containers run?”– MapReduce specific Job History Server

– Need a generic solution beyond ResourceManager Restart

• Cluster History– Run analytics on historical apps!

– “User with most resource utilization”– “Largest application run”

• Running Application’s Timeline– Framework specific event collection and UIs

– “Show me the Counters for my running MapReduce task”

– “Show me the slowest Storm stream processing bolt while it is running”

• What exists today– A LevelDB based implementation

– Integrated into MapReduce, Apache Tez, Apache Hive


Timeline Service 2.0

• Next generation– Today’s solution helped us understand the space

– Limited scalability and availability

• “Analyzing Hadoop Clusters is becoming a big-data problem”– Don’t want to throw away the Hadoop application metadata

– Large scale

– Enable near real-time analysis: “Find me the user who is hammering the FileSystem with rouge applications. Now.”

• Timeline data stored in HBase and accessible to queries


Improved Usability

• With Timeline Service– “Why is my application slow?”– “Is it really slow?”– “Why is my application failing?”– “What happened with my application?

Succeeded?”

– “Why is my cluster slow?”– “Why is my cluster down?”– “What happened in my clusters?”

• Collect and use past data– To schedule “my application” better

– To do better capacity planning


More..

• Application priorities within a queue

• YARN Federation – 100K+ nodes

• Node anti-affinity– “Do not run two copies of my service daemon

on the same machine”

• Gang scheduling– “Run all of my app at once”

• Dynamic scheduling based on actual containers’ utilization

• Time based policies– “10% cluster capacity for queue A from 6-9AM,

but 20% from 9-12AM”

• Prioritized queues– Admin’s queue takes precedence over

everything else

• Lot more ..– HDFS on YARN

– Global scheduling

– User level preemption

– Container resizing


Community

• Started with just 5 of us!• 104 and counting• Few ‘big’ contributors• And a long tail

0

10

20

30

40

50

60

70

80

90

100

Chart Title


Thank you!


Addendum


Work preserving ResourceManager restart

• ResourceManager remembers some state• Reconstructs the remaining from nodes and apps


Work preserving NodeManager restart

• NodeManager remembers state on each machine• Reconnects to running containers


ResourceManager Fail-over

• Active/Standby based fail-over• Depends on fast-recovery

Hadoop Summit San Jose 2015: YARN - Past, Present and Future

Software

Transcript of Hadoop Summit San Jose 2015: YARN - Past, Present and Future