© Hortonworks Inc. 2011
Hadoop YARNSF Hadoop Users Meetup
Vinod Kumar Vavilapalli
vinodkv [at] { apache dot org | hortonworks dot com }
@tshooter
Page 1
© Hortonworks Inc. 2011Page 2
Myself
• 6.25 Hadoop-years old• Previously at Yahoo!, @Hortonworks now.• Last thing at college – a two node tomcat cluster. Three months later, first thing at job, brought down a 800 node cluster ;)
• Hadoop YARN lead. Apache Hadoop PMC, Apache Member
• MapReduce, HadoopOnDemand, CapacityScheduler, Hadoop security
• Ambari/Stinger/ random trouble shooting
© Hortonworks Inc. 2011
YARN: A new abstraction layer
HADOOP 1.0
HDFS(redundant, reliable storage)
MapReduce(cluster resource management
& data processing)
HDFS2(redundant, reliable storage)
YARN(cluster resource management)
MapReduce(data processing)
Others(data processing)
HADOOP 2.0
Single Use SystemBatch Apps
Multi Purpose PlatformBatch, Interactive, Online, Streaming, …
Page 3
© Hortonworks Inc. 2011
Concepts
Page 4
HDFS
YARN
MRv2 Tez
Platform
Applications & Frameworks
Job #1 Job #2Jobs
© Hortonworks Inc. 2011Page 5
Concepts
• Platform• Framework• Application
–Application is a job submitted to the framework–Example – Map Reduce Job
• Container–Basic unit of allocation–Fine-grained resource allocation across multiple resource
types (memory, cpu, disk, network, gpu etc.)– container_0 = 2GB, 1CPU
– container_1 = 1GB, 6 CPU
© Hortonworks Inc. 2011
Architecture
Architecting the Future of Big DataPage 6
© Hortonworks Inc. 2011Page 7
Hadoop MapReduce Classic
• JobTracker
–Manages cluster resources and job scheduling
• TaskTracker
–Per-node agent
–Manage tasks
© Hortonworks Inc. 2011
Current Limitations
• Scalability–Maximum Cluster size – 4,000 nodes–Maximum concurrent tasks – 40,000–Coarse synchronization in JobTracker
• Single point of failure–Failure kills all queued and running jobs–Jobs need to be re-submitted by users
• Restart is very tricky due to complex state
Page 8Architecting the Future of Big Data
© Hortonworks Inc. 2011
Current Limitations contd.
• Hard partition of resources into map and reduce slots–Low resource utilization
• Lacks support for alternate paradigms– Iterative applications implemented using MapReduce are
10x slower–Hacks for the likes of MPI/Graph Processing
• Lack of wire-compatible protocols –Client and cluster must be of same version–Applications and workflows cannot migrate to different
clusters
Page 9Architecting the Future of Big Data
© Hortonworks Inc. 2011
Requirements
• Reliability
• Availability
• Utilization
• Wire Compatibility
• Agility & Evolution – Ability for customers to control
upgrades to the grid software stack.• Scalability - Clusters of 6,000-10,000 machines
–Each machine with 16 cores, 48G/96G RAM, 24TB/36TB disks
–100,000+ concurrent tasks–10,000 concurrent jobs
Page 10Architecting the Future of Big Data
© Hortonworks Inc. 2011Page 11
Architecture: Philosophy
• General-purpose, distributed application framework–Cannot scale monolithic masters. Or monsters?–Distribute responsibilities
• ResourceManager – Central scheduler–Only resource arbitration–No failure handling–Provide necessary information to AMs
• Push everything possible responsibility to ApplicationMaster(s)–Don’t trust ApplicationMaster(s)–User land library!
© Hortonworks Inc. 2011Page 12
Architecture
• Resource Manager–Global resource scheduler–Hierarchical queues
• Node Manager–Per-machine agent–Manages the life-cycle of container–Container resource monitoring
• Application Master–Per-application–Manages application scheduling and task execution–E.g. MapReduce Application Master
© Hortonworks Inc. 2011
YARN Architecture
Page 13Architecting the Future of Big Data
© Hortonworks Inc. 2011
Apache Hadoop MapReduce on YARN
Page 14Architecting the Future of Big Data
NodeManager NodeManager NodeManager NodeManager
map 1.1
reduce2.1
ResourceManager
NodeManager NodeManager NodeManager NodeManager
NodeManager NodeManager NodeManager NodeManager
map1.2
reduce1.1
MR AM 1
map2.1
map2.2
reduce2.2
MR AM2
Scheduler
© Hortonworks Inc. 2011Page 15
Global Scheduler (ResourceManager)
• Resource arbitration • Multiple resource dimensions
–<priority, data-locality, memory, cpu, …>
• In-built support for data-locality –Node, Rack etc.–Unique to YARN.
© Hortonworks Inc. 2011
Scheduler Concepts
• Input from AM(s) is a dynamic list of ResourceRequests –<resource-name, resource-capability>–Resource name: (hostname / rackname / any)–Resource capability: (memory, cpu, …) –Essentially an inverted <name, capability> request map from
AM to RM–No notion of tasks!
• Output - Container–Resource(s) grant on a specific machine–Verifiable allocation: via Container Tokens
Page 16Architecting the Future of Big Data
© Hortonworks Inc. 2011
Fault tolerance
• Task/container failures– Application Masters should take care, it’s their business
• Node failures– ResourceManager marks the nodes as failed, informs all the apps / Application
Masters. AMs can chose to ignore failure or rerun work depending on what they want.
• Application Master failures– ResourceManager restarts AMs that have failed.– One Application can have multiple ApplicationAttempts– Every ApplicationAttempt should store state, so that next ApplicationAttempt can
recover from failure
• ResourceManager failures– ResourceManager saves state, can do host/ip failover today.– Recovers state, but kills all current work as of now– Work preserving restart– HA
Page 17Architecting the Future of Big Data
© Hortonworks Inc. 2011
Writing your own apps
Architecting the Future of Big DataPage 18
© Hortonworks Inc. 2011
Application Master
• Dynamically allocated per-application on startup• Responsible for individual application scheduling and life-cycle management
• Request and obtain containers for it’s tasks–Do a second-level schedule i.e. containers to component
tasks–Start/stop containers on NodeManagers
• Handle all task/container errors• Obtain resource hints/meta-information from RM for better scheduling–Peek-ahead into resource availability–Faulty resources (node, rack etc.)
Page 19Architecting the Future of Big Data
© Hortonworks Inc. 2011
Writing Custom Applications
• Grand total of 3 protocols• ApplicationClientProtocol
–Application launching program–submitApplication
• ApplicationMasterProtocol–Protocol between AM & RM for resource allocation–registerApplication / allocate / finishApplication
• ContainerManagementProtocol–Protocol between AM & NM for container start/stop–startContainer / stopContainer
Page 20Architecting the Future of Big Data
© Hortonworks Inc. 2011
Other things to take care of
• Container/tasks• Client• UI• Recovery• Container -> AM communication• Application History
Page 21Architecting the Future of Big Data
© Hortonworks Inc. 2011
Libraries for app/framework writers
• YarnClient, AMRMClient, NMClient• More projects:
– Higher level APIs– Weave, REEF
Page 22Architecting the Future of Big Data
© Hortonworks Inc. 2011
Other goodies
• Rolling upgrades• Multiple versions of MR at the same time• Same scheduling algorithms – Capacity, fairness• Secure from start• Locality for generic apps• Log aggregation• Everything on the same cluster
Page 23Architecting the Future of Big Data
© Hortonworks Inc. 2011
Existing applications
Architecting the Future of Big DataPage 24
© Hortonworks Inc. 2011
Compatibility with Apache Hadoop 1.x
• org.apache.hadoop.mapred – Add 1 property to your existing mapred-site.xml
– mapreduce.framework.name = yarn
– Continue submitting using bin/hadoop – Nothing Else Just Run Your MapReduce Jobs!
• org.apache.hadoop.mapreduce – Generally run without changes, recompilation, or minor updates – If your existing apps fail recompile against the new MRv2 jars
• Pig– Scripts built on Pig 10.1+ run without changes
• Hive– Queries built on Hive 10.0+ run without changes
• Streaming, Pipes, Oozie, Sqoop ….
© Hortonworks Inc. 2011
Any Performance Gains?
• Significant gains across the board!
• MapReduce–Lots of runtime improvements–Map side, reduce side–Better shuffle
• So much better throughput• Y! can run lot more jobs on lesser number of nodes in lesser time
More details: http://hortonworks.com/delivering-on-hadoop-next-benchmarking-performance/
Page 26Architecting the Future of Big Data
© Hortonworks Inc. 2011
Testing?
• Testing, *lots* of it• Benchmarks: Blog post soon• Integration testing/ full-stack
–HBase–Pig–Hive–Oozie–…
• Functional tests
Page 27Architecting the Future of Big Data
© Hortonworks Inc. 2011
Deployment
• Beta last month–Misnomer: 10s of PB of storage, on 0.23, a previous state of
YARN before 2.0–Significantly wide variety of applications and load
• GA
–Very soon, less than a month away
–Bugs, blockers only now
Page 28Architecting the Future of Big Data
© Hortonworks Inc. 2011
How do I get it?
Architecting the Future of Big DataPage 29
© Hortonworks Inc. 2011
YARN beta releases
• Apache Hadoop Core 2.1.0-Beta– Official beta release from Apache– YARN APIs are stable– Backwards compatible with MapReduce 1 jobs– Blocker bugs have been resolved
• Features in HDP 2.0 Beta– Apache Ambari deploys YARN and Mapreduce 2– Capacity Scheduler for YARN– Full stack tested
Page 30
© Hortonworks Inc. 2011
Future
Architecting the Future of Big DataPage 31
© Hortonworks Inc. 2011
Looking ahead
• YARN Improvements• Alternate programming models: Apache Tez, Storm.• Long(er) running services (e.g. Hbase): Hoya• ResourceManager HA• Work-preserving restart of resourcemanager• Reconnect running containers to AMs• Gang scheduling• Multi-dimensional resources: CPU in. Disk (capacity, IOPS), network?
Page 32Architecting the Future of Big Data
© Hortonworks Inc. 2011
Ecosystem
• Spark (UCB) on YARN• Real-time data processing
–Storm (Twitter) on YARN
• Graph processing – Apache Giraph on YARN• OpenMPI on YARN? • PAAS on YARN?
• Yarnify: *. on YARN
Page 33Architecting the Future of Big Data
© Hortonworks Inc. 2011
Questions & Answers
TRY download at hortonworks.com
LEARN Hortonworks University
FOLLOWtwitter: @hortonworks
Facebook: facebook.com/hortonworks
MORE EVENTShortonworks.com/events
Page 34
Further questions & comments: [email protected]
Top Related