Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
-
Upload
datatorrent -
Category
Technology
-
view
85 -
download
0
Transcript of Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
![Page 1: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/1.jpg)
Introduction to YARN and Apex as YARN Application
Priyanka Gugale ([email protected])September 30th 2016
![Page 2: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/2.jpg)
Apache Apex - Stream ProcessingEasily Operable - Exposes an easy API for developing Operators (part of an
application) and Applications
Highly Scalable - Scales statically as well as dynamically
Highly Performant - Can reach single digit millisecond end-to-end latency
Fault Tolerant - Automatically recovers from failures - without manual intervention
Stateful - Guarantees that no state will be lost
Apex Malhar library
YARN - Native - Uses Hadoop YARN framework for resource negotiation
![Page 3: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/3.jpg)
Apex Platform Overview
3
![Page 4: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/4.jpg)
An Apex Application is a DAG(Directed Acyclic Graph)
A DAG is composed of vertices (Operators) and edges (Streams).A Stream is a sequence of data tuples which connects operators at end-points called PortsAn Operator takes one or more input streams, performs computations & emits one or more output streams
● Each operator is USER’s business logic, or built-in operator from our open source library● Operator may have multiple instances that run in parallel
![Page 5: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/5.jpg)
DAG Components
• Tuple● Atomic data that flows over a stream
• Operator● Basic compute unit per tuple
• Stream● Connector abstraction between operators● Tuples flow over this
Operator1
Operator2
Streamtuple
3tuple
1tuple
2
![Page 6: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/6.jpg)
How Apex is Yarn Native?
![Page 7: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/7.jpg)
Introducing YARN● YARN - Yet Another Resource Negotiator
● framework that facilitates writing arbitrary distributed processing frameworks and applications.
● YARN Applications/frameworks:e.g. MapReduce2, Apache Spark, Apache Giraph, Apache Apex etc.
![Page 8: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/8.jpg)
Introducing YARNMap Reduce 1YARN
≈
≈
≈ 8Proprietary and Confidential
Job Tracker
Resource Manager
Application Master
Timeline Server
Task Tracker Node Manager
Map Slot
Reduce Slot
![Page 9: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/9.jpg)
Hadoop beyond Batch
YARN for better resource utilization
More applications than MapReduce
![Page 10: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/10.jpg)
• Resource ManagerManages and allocates cluster resources
Application scheduling
Applications Manager
• Node Manager
Per-machine agent
Manages life-cycle of container
Monitors resources
• Application Master
Per-application
Manages application scheduling and task execution
Hadoop v2 (YARN) Architecture
App Master Cont
NodeManager
Cont Cont
NodeManager
App Master
AppMaster
NodeManager
ResourceManager
MapReduce StatusJob SubmissionNode StatusResource Request
Client
Client
![Page 11: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/11.jpg)
Application Submission workflow
YarnClient
Node RM
(ApplicationsManager + Scheduler)
Node
NM
Node
NMApplication Master
ContainerContainer
1) Submit application
2) Launch application Master
RM = Resource ManagerNM = Node ManagerAM = Application Master = Heartbeats
3) AM registers with RM
4) AM negotiates for containers
5) Launch Container
5) Launch Container
![Page 12: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/12.jpg)
Apex as YARN application
Node
ResourceManager(AsM + Scheduler)
NM Node NM Node NM
YarnClient
AppMaster
YarnContainer
YarnContainer
YarnContainerStrAM
(AppMaster)
YarnContainerStrAMChild
O1 O2
YarnContainerStrAMChild
O3
Apex cliStrAMClient
YarnClient
Apache Apex Meetup
ClientRMProtocol
AMRMProtocol
ContainerManagerProtocol
ContainerManagerProtocol
ClientRMProtocol
AMRMProtocol
ContainerManagerProtocol
![Page 13: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/13.jpg)
Application Components of Apex - StrAMClient• Part of apex client interface• Invoked by “launch” command of apex
• Tasks:● Copy required the application package files into HDFS● Validate Logical Plan● Serialize Logical plan to HDFS● Launch Application Master i.e. StrAM
Apache Apex Meetup
![Page 14: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/14.jpg)
Application Components of Apex - StrAM• Streaming Application Master• Started by StrAMClient on a YarnContainer• Tasks:
● Convert logical plan to physical plan● Serialize operators to HDFS● Request for resources to ResourceManager● Start StrAMChild in YarnContainer(s)● Monitor StrAMChild using ContainerManager protocol● Generate Application statistics● Host results on WebService (dtManage)● Checkpointing/Committing Application States● Fault Tolerance● Support Security● Shutdown Application
Apache Apex Meetup
![Page 15: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/15.jpg)
Application Components of Apex - StrAMChild• Deployed on YarnContainer• Started by NodeManager as instructed by StrAM• Instance of StreamingContainer• Contains Operators (compute-related)• Contains BufferServer (stream-related)• Tasks:
● Regularly send heartbeat to StrAM● Execute commands from StrAM● Shutdown or Kill self if instructed● Manage lifecycle of an Operator● Network communication using BufferServer
Apache Apex Meetup
![Page 16: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/16.jpg)
Apex as YARN application
Node
ResourceManager(AsM + Scheduler)
NM Node NM
StrAM(AppMaster)
YarnContainerStrAMChild
O1 O2
YarnContainerStrAMChild
O3
Apex cliStrAMClient
YarnClient
Apache Apex Meetup
ClientRMProtocol
AMRMProtocol
ContainerManagerProtocol
![Page 17: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/17.jpg)
Summary – Apex platform• Enables YARN to be used for Streaming Applications
• Takes care of YARN specific work
• User can focus on business logic defined in Operators
Apache Apex Meetup
![Page 18: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/18.jpg)
Q&A
18
![Page 19: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/19.jpg)
Resources
19
• Apache Apex - http://apex.apache.org/• Subscribe - http://apex.apache.org/community.html• Download - https://www.datatorrent.com/download/• Twitter
ᵒ @ApacheApex; Follow - https://twitter.com/apacheapexᵒ @DataTorrent; Follow – https://twitter.com/datatorrent
• Meetups - http://www.meetup.com/topics/apache-apex• Webinars - https://www.datatorrent.com/webinars/• Videos - https://www.youtube.com/user/DataTorrent• Slides - http://www.slideshare.net/DataTorrent/presentations • Startup Accelerator Program - Full featured enterprise product
ᵒ https://www.datatorrent.com/product/startup-accelerator/
![Page 20: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)](https://reader035.fdocuments.net/reader035/viewer/2022070520/58f10fa61a28abd0438b4613/html5/thumbnails/20.jpg)
We Are Hiring
20
• [email protected]• Developers/Architects• QA Automation Developers• Information Developers• Build and Release• Community Leaders