Tuning up with Apache Tez
-
Upload
gal-vinograd -
Category
Technology
-
view
476 -
download
1
Transcript of Tuning up with Apache Tez
Tuning Up With Apache Tez
Gal Vinograd @ Crosswise - 2016/03/09
Agenda
The Pipeline
The Problem
Why we chose Tez
Lessons Learned
Demo
The BatchInternet
Labels
Data
Internet
Labels
Data~200 Scripts
250 c3.2xlarge X30 hours
10TB per Batch
“Tez aims to be a general purpose execution runtime that enhances various scenarios that are not well served by classic Map-Reduce. In the short term the major focus is to support Hive and Pig ...”
Tez Design v1.1
“Tez aims to be a general purpose execution runtime that enhances various scenarios that are not well served by classic Map-Reduce. In the short term the major focus is to support Hive and Pig ...”
Tez Design v1.1
Hortonworks
The Batch
Internet
Labels
Data~200 Scripts
Tez Atomic Components
Tokenizer
Aggregator
Edge
Vertex
Vertex
Logical and Physical Graphs
PhysicalLogical
Hortonworks
OptimizationsNo “NOP” Map
Project
Distinct
GroupBy
NOP
Project
Distinct
GroupBy
Tez MR
OptimizationsNo Barrier Between Jobs
Project
GroupBy
Project
Project
Distinct
Project
Distinct
GroupBy
Tez MR
OptimizationsNo Redundant Resource Allocation
Project
Project
Distinct
GroupBy
Project
Project
Distinct
GroupBy
Pig Process
Pig Process
Tez MR
OptimizationsSessions
Allocate
Submit 2
Submit 1
Cleanup
Client
Lessons Learned
Some Pig Tasks Did Not Compile \ Occasionaly Froze
No DistributedCache Support For S3
Poor Amazon Support
No Pre-Built Releases
Additional Deployment for Tez UI
What is it good for?
EarilyAdopters
Pig \ Hive Bounded
Thanks for Listening!