Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Hadoop YARN: for Fun and...
-
Upload
zhijie-shen -
Category
Technology
-
view
257 -
download
5
description
Transcript of Hadoop Summit San Jose 2014 - Analyzing Historical Data of Applications on Hadoop YARN: for Fun and...
Analyzing Historical Data of Applications on Hadoop
YARN: for Fun and ProfitMayank Bansal, Zhijie Shen
Agenda
• Who we are ?
• Why we need New History Server?
• Application History Server
• Timeline Server
• Future Work
Who we are
• Hadoop Architect @ ebay• Apache Hadoop Committer• Apache Oozie PMC and Committer
• Current• Leading Hadoop Core Development for
YARN and MapReduce @ ebay
• Past• Working on Scheduler / Resource
Managers• Working on Distributed Systems• Data Pipeline frameworks
Mayank Bansal
Who we are
• Software Engineer @ Hortonworks• Apache Hadoop Committer• Apache SAMZA PPMC and Committer
Zhijie Shen
Agenda
• Who we are ?
• Why we need New History Server?
• Application History Server
• Application Timeline Server
• Future Work
MR JobHistory Server
• We already have Job History Server
• It is only for Map Reduce Customized
• Storage is HDFS only
• Storage is very MR specific
• Counters
• Mappers and Reducers
• If you have only Map Reduce you are good.
Hadoop-2
Single Use System Batch Apps
Multi Purpose PlatformBatch, Interactive, streaming
YARN
Issues with current Job History
• What if I have other Applications
• RM crashes
• Hard Limit on # Apps
• Upgrades / Updates
Agenda
• Who we are ?
• Why we need New History Server?
• Application History Server
• Timeline Server
• Future Work
Application History Server
• Separate Process
• Pluggable Storage
• HDFS
• In-Memory
• Resource Manager directly writes to Storage
• Aggregated Logs
• Separate UI, CLI and Rest End Point
Application History Server
Storage:
• It stores generic Data
• Application level data (queue, user etc…)
• List of ApplicationAttempts
• Information about each ApplicationAttempt
• List of containers for ApplicationAttempt
• Generic information about each container.
Application History Server
Application History Server
• CLI Interface $ yarn application -status <Application ID> $ yarn applicationattempt -list <Application ID>
• REST APIs
• http://localhost:8188/ws/v1/applicationhistory/apps/appid
Application History Server
• Scalability for storage
• One file per application
• File format is protobuff
• Size of HDFS files
• Multiple RM threads writing to History Storage
# of Containers
100 1K 10 K 100K
Size of the File
19 KB 184 KB
1.8 MB 19 MB
Agenda
• Who we are ?
• Why we need New History Server?
• Application History Server
• Timeline Server
• Future Work
Timeline Service - Motivation
• YARN takes care of it
– Relieving the application from monitoring service
• Application diversity– Framework specific metadata/metrics
Timeline Service – Data Model
• Entity Type– An abstract concept of anything
• Entity
– One specific instance of a entity type
– Defining the relationship between entities
• Event
– Something happens to an entity
Timeline Service – Architecture
• LevelDB Store
• Client Library
• REST Interfaces
Timeline Service – Store
• LevelDB based store
– Key-value store
– Lightweight
– License compatible
• Implementing reader/writer interfaces
• Support data retention
Timeline Service – Client
• TimelineClient
– Wrap over REST POST method
– POJO objects
• TimelineEntity
• TimelineEvent
– In Client/AM/Container
Timeline Service – APIs
• Rest APIs, JSON as the media
• Get timeline entities
– http://localhost:8188/ws/v1/timeline/{entityType}
• Get timeline entity
– http://localhost:8188/ws/v1/timeline/{entityType}/{entityId}
• Get timeline events
– http://localhost:8188/ws/v1/timeline/{entityType}/events
Timeline Service – Security
• HTTP SPNEGO
• Kerberos Authentication
• Delegation Token
– Performance
– AM/Container no Kerberos
• Access Control
– Admin/owner
– Timeline entity-level
Timeline Service – Use Case (1)
Timeline Service – Early Adopter (2)
Timeline Service – Early Adopter (3)
Agenda
• Who we are ?
• Why we need New History Server?
• Application History Server
• Timeline Server
• Future Work
To Be Continue…
• Integrating the generic history and
timeline data
• Rebasing MR Job history server on the
timeline server
• Making the timeline server rendering the
timeline data
To Be Continue…
Scale
• Leveldb does not handle ebay scale
• We need something which can horizontally scale
• HBASE