Hadoop Summit San Jose 2015: YARN - Past, Present and Future
-
Upload
vinod-kumar-vavilapalli -
Category
Software
-
view
236 -
download
2
Transcript of Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Hadoop YARN - 2015
June 9, 2015
Past, Present & Future
Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
We are
Vinod Kumar Vavilapalli• Long time Hadooper since 2007• Apache Hadoop Committer / PMC• Apache Member• Yahoo! -> Hortonworks• MapReduce -> YARN from day one
Jian He• Hadoop contributor since 2012• Apache Hadoop Committer / PMC• Hortonworks• All things YARN
Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
OverviewThe Why and the What
Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data architectures
• Traditional architectures– Specialized Silos
– Per silo security, management, governance etc.
– Limited Scalability
– Limited cost efficiencies
• For the present and the future– Hadoop repository
– Commodity storage
– Centralized but distributed system
– Scalable
– Uniform org policy enforcement
– Innovation across silos!
Data - HDFS
Cluster Resources
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Resource Management
• Extracting value out of centralized data architecture
• A messy problem– Multiple apps, frameworks, their life-cycles and evolution
• Tenancy– “I am running this system for one user”– It almost never stops there
– Groups, Teams, Users
• Sharing / isolation needed• Adhoc structures get unusable real fast
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Varied goals & expectations• On isolation, capacity allocations, scheduling
Faster!
More! Best for my clusterThroughputUtilizationElasticity
Service uptimeSecurity
ROIEverything! Right now!
SLA!
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Enter Hadoop YARN
HDFS (Scalable, Reliable Storage)
YARN (Cluster Resource Management)
Applications (Running Natively in Hadoop)
• Store all your data in one place … (HDFS)
• Interact with that data in multiple ways … (YARN Platform + Apps): Data centric
• Scale as you go, shared, multi-tenant, secure … (The Hadoop Stack)
Queues Admins/Users
Cluster Resources
Pipelines
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop YARN
• Distributed System• Host of frameworks, meta-frameworks, applications• Varied workloads
– Batch
– Interactive
– Stream processing
– NoSQL databases
– ….
• Large scale– Linear scalability
– Tens of thousands of nodes
– More coming
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
PastA quick history
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A brief Timeline
• Sub-project of Apache Hadoop• Releases tied to Hadoop releases• Alphas and betas
– In production at several large sites for MapReduce already by that time
1st line of Code Open sourced First 2.0 alpha First 2.0 beta
June-July 2010 August 2011 May 2012 August 2013
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
GA Releases
2.2 2.3 2.4 2.5
15 October 2013 24 February 2014 07 April 2014 11 August 2014
• 1st GA
• MR binary compatibility
• YARN API cleanup
• Testing!
• 1st Post GA
• Bug fixes
• Alpha features
• RM Fail-over
• CS Preemption
• Timeline Service V1
• Writable REST APIs
• Timeline Service V1 security
Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Present
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Last few Hadoop releases
• Hadoop 2.6– 18 November 2014– Rolling Upgrades– Services– Node labels
• Hadoop 2.7– 21 Apr 2015– Moving to JDK 7+
• Focus on some features next!
Apache Hadoop 2.6
Apache Hadoop 2.7
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Rolling Upgrades
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
YARN Rolling Upgrades• Why? No more losing work during
upgrades!
• Workflow• Servers first: Masters followed by per-node agents
• Upgrade of Applications/Frameworks is decoupled!
• Work preserving RM restart: RM recovers state from NMs and apps
• Work preserving NM restart: NM recovers state from local disk
• RM fail-over is optional
Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
YARN Rolling Upgrades: A Cluster Snapshot
Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Stack Rolling Upgrades
Enterprise grade rolling upgrade of a Live Hadoop Cluster
Jun 10, 3:25PM - 4:05PMSanjay Radia & Vinod K V from Hortonworks
Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Services on YARN
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Long running services
• You could run them already before 2.6!
• Enhancements needed– Logs
– Security
– Management/monitoring
– Sharing and Placement
– Discovery
• Resource sharing across workload types
• Fault tolerance of long running services– Work preserving AM restart
– AM forgetting faults
• Service registry
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Project Slider• Bring your existing services unmodified to YARN: slider.incubator.apache.org/
• HBase, Storm, Kafka already!
YARN
MapReduce Tez
Storm Kafka
Spark
HBasePig Hive Cascading
Apache Slider
Moreservices..
DeathStar: Easy, Dynamic, Multi-tenant HBase via YARN
June 11: 1:30-2:10PMIshan Chhabra & Nitin Aggarwal from Rocket Fuel
Authoring and hosting applications on YARN using Slider
Jun 11, 11:00AM - 11:40AM Sumit Mohanty & Jonathan Maron from Hortonworks
Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Operational and Developer tooling
Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Node Labels
• Today: Partitions– Admin: “I have machines of different types”– Impact on capacity planning: “Hey, we bought
those GPU machines”
• Types– Exclusive: “This is my Precious!”– Non-exclusive: “I get binding preference. Use it
for others when idle”
• Future: Constraints– “Take me to a machine running JDK version 9”– No impact on capacity planning
Default Partition Partition BGPUs
Partition CWindows
JDK 8 JDK 7 JDK 7
Node Labels in YARNJun 11, 11:00AM - 11:40AM
Mayank Bansal (ebay) & Wangda Tan (Hortonworks)
Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Pluggable ACLs
• Pluggable YARN authorization model• YARN Apache Ranger integration
Apache Ranger
Queue ACLsManagement plugin
2. Submit app
1. Admin manages ACLs
YARN
Securing Hadoop with Apache Ranger : Strategies & Best Practices
Jun 11, 3:10PM - 3:50PM Selvamohan Neethiraj & Velmurugan Periasamy from
HortonWorks
Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Usability
• Why is my application stuck?
• “How many rack local containers did I get”
• Lots more..– “Why is my application stuck? What limits did it hit?”– “What is the number of running containers of my app?”– “How healthy is the scheduler?”
Page 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Future
Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Per-queue Policy-driven scheduling
Previously Now
Ingestion
FIFO
Adhoc
User-fairnessAdhoc
FIFO
Ingestion
FIFO
• Coarse policies• One scheduling algorithm in the cluster• Rigid• Difficult to experiment
• Fine grained policies• One scheduling algorithm per queue• Flexible• Very easy to experiment!
Batch
FIFO
Batch
FIFO
rootroot
Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Reservations
• “Run my workload tomorrow at 6AM”• Next: Persistence of the plans
Timeline
Res
ourc
es
6:00AM
Block #1
Timeline
Res
ourc
es
6:00AM
Block #1
Block #2
Reservation-based Scheduling: If You’re Late Don’t Blame Us!
June 10 12:05PM – 12:45PMCarlo Curino & Subru Venkatraman Krishnan (Microsoft)
Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Containerized Applications
• Running Containerized Applications on YARN– As a packaging mechanism
– As a resource-isolation mechanism
• Docker• Adding the notion of Container Runtimes• Multiple use-cases
– “Run my existing service on YARN via Slider + Docker”– “Run my existing MapReduce application on YARN via a docker image”
Apache Hadoop YARN and the Docker EcosystemJune 9 1:45PM – 2:25PM
Sidharta Seethana (Hortonworks) & Abin Shahab (Altiscale)
Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Disk Isolation
• Isolation and scheduling dimensions– Disk Capacity
– IOPs
– Bandwidth
DataNode NodeManager Map TaskHBase RegionServer
Disks on a node
Reduce Task
• Read• Write
• Localization• Logs• Shuffle
• Read• Write
• Read Spills• Write shuffled data
• Read Spills• Write
Remote IO
• Today: Equal allocation to all containers along all dimensions
• Next: Scheduling
Page 30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Network Isolation
• Isolation and scheduling dimensions– Incoming bandwidth
– Outgoing bandwidth
DataNode NodeManager Map TaskStorm SpoutReduce
Task
• Write Pipeline
• Localization• Logs• Shuffle
• Read • Read shuffled data• Write outputs
• Readinput
Remote IO
• Today: Equi-share Outbound bandwidth
• Next: Scheduling
Network
Storm Bolt
• Read• Write
Page 31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Timeline Service
• Application History– “Where did my containers run?”– MapReduce specific Job History Server
– Need a generic solution beyond ResourceManager Restart
• Cluster History– Run analytics on historical apps!
– “User with most resource utilization”– “Largest application run”
• Running Application’s Timeline– Framework specific event collection and UIs
– “Show me the Counters for my running MapReduce task”
– “Show me the slowest Storm stream processing bolt while it is running”
• What exists today– A LevelDB based implementation
– Integrated into MapReduce, Apache Tez, Apache Hive
Page 32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Timeline Service 2.0
• Next generation– Today’s solution helped us understand the space
– Limited scalability and availability
• “Analyzing Hadoop Clusters is becoming a big-data problem”– Don’t want to throw away the Hadoop application metadata
– Large scale
– Enable near real-time analysis: “Find me the user who is hammering the FileSystem with rouge applications. Now.”
• Timeline data stored in HBase and accessible to queries
Page 33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Improved Usability
• With Timeline Service– “Why is my application slow?”– “Is it really slow?”– “Why is my application failing?”– “What happened with my application?
Succeeded?”
– “Why is my cluster slow?”– “Why is my cluster down?”– “What happened in my clusters?”
• Collect and use past data– To schedule “my application” better
– To do better capacity planning
Page 34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
More..
• Application priorities within a queue
• YARN Federation – 100K+ nodes
• Node anti-affinity– “Do not run two copies of my service daemon
on the same machine”
• Gang scheduling– “Run all of my app at once”
• Dynamic scheduling based on actual containers’ utilization
• Time based policies– “10% cluster capacity for queue A from 6-9AM,
but 20% from 9-12AM”
• Prioritized queues– Admin’s queue takes precedence over
everything else
• Lot more ..– HDFS on YARN
– Global scheduling
– User level preemption
– Container resizing
Page 35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Community
• Started with just 5 of us!• 104 and counting• Few ‘big’ contributors• And a long tail
0
10
20
30
40
50
60
70
80
90
100
Chart Title
Page 36 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Thank you!
Page 37 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Addendum
Page 38 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Work preserving ResourceManager restart
• ResourceManager remembers some state• Reconstructs the remaining from nodes and apps
Page 39 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Work preserving NodeManager restart
• NodeManager remembers state on each machine• Reconnects to running containers
Page 40 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ResourceManager Fail-over
• Active/Standby based fail-over• Depends on fast-recovery