The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark
-
Upload
akshay-rai -
Category
Data & Analytics
-
view
462 -
download
0
Transcript of The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark
![Page 1: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/1.jpg)
Self-Serve Performance Tuning for Hadoop & Spark
The Fifth Elephant 2016
Akshay RaiEngineer, Hadoop Development TeamLinkedin Dr. Elephant
© 2016 LinkedIn Corporation. All Rights Reserved.
![Page 2: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/2.jpg)
Hadoop @ Linkedin c. 2008
● 1 cluster
● 20 nodes
● 10 users
● 10 workflows in production
● MapReduce, Pig
2
![Page 3: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/3.jpg)
Hadoop @ Linkedin c. 2016
● > 10 clusters
● > 10000 nodes
● > 1000 users
● Thousands of queries and flows in development
● Hundreds running in Production
● MapReduce, Pig, Hive, Spark, Scalding, Gobblin, Cubert3
![Page 4: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/4.jpg)
Scaling Hadoop Infrastructure
• Add extra machines to the cluster
• Hadoop is scalable but not that optimal!
• We cannot keep adding machines forever
• Tune given resources and minimize addition of new machines
4
![Page 5: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/5.jpg)
Measuring performance
• Highlights hardware failures and poor performing components
• Scope for environment upgrades.
5
![Page 6: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/6.jpg)
Cluster Level Performance Tuning
Job Level Performance Tuning6
![Page 7: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/7.jpg)
How difficult is it to tune a Job?
• Production Gatekeeper - Let jobs go into production only after verifying it
is tuned.
• Restriction! More questions on how to tune! Spend more resources
helping people.
Here’s what we tried to achieve Job tuning!
7
![Page 8: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/8.jpg)
Challenges in tuning a job
• Hadoop is designed to let users tune their jobs BUT!
• One cannot optimize if one doesn’t understand the internals of the framework
• Critical information is scattered
• Hadoop has a huge set of parameters, tuning some may impact other
8
![Page 9: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/9.jpg)
You cannot tune what you do not know & you cannot improve what you cannot measure
9
![Page 10: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/10.jpg)
Training Sessions
10
![Page 11: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/11.jpg)
• More people, more frequent sessions.
• Hadoop experience varies with people
• Framework specific training. Pig, hive, etc
Training - Doesn’t Scale
11
![Page 12: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/12.jpg)
Expert Review
12
![Page 13: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/13.jpg)
Expert Review - Also Doesn’t Work
• Again not scalable
• Cannot ensure job is performing optimally, no easy comparison.
• Different people, different perspective, no consensus
• Error prone, one might overlook certain aspects.
13
![Page 14: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/14.jpg)
Scaling Hadoop Infrastructure is HARD
Scaling User Productivity is much HARDER 14
![Page 15: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/15.jpg)
Birth of Dr. Elephant
15
![Page 16: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/16.jpg)
What does Dr. Elephant do?
• Help every user get the best performance from their jobs
• Analyse and compare historical executions
• Provides a platform for other performance related tools
16
![Page 17: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/17.jpg)
Architecture
17
![Page 18: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/18.jpg)
Rule #1 : Mapper Data Skew
18
![Page 19: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/19.jpg)
Mapper Skew Problem• Varying size of splits can cause skewness in the Mapper Input
19
![Page 20: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/20.jpg)
Solution to Mapper Skewness• Each Mapper should process the same amount of data
• Combine the small chunks and feed it to a single Mapper
20
![Page 21: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/21.jpg)
Rule #2 : Mapper Memory
21
![Page 22: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/22.jpg)
Mapper Memory Problem & Solution
• Requested Container Memory >> Task’s Consumed Memory
• Request 4 GB of container
• Actually job uses only 512 MB
• Wait longer to get 4 GB and then block 4GB of resources!
• Request a lower container memory by setting
• mapreduce.map(or reduce).memory.mb
22
![Page 23: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/23.jpg)
Search
23
![Page 24: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/24.jpg)
MapReduce Report
24
![Page 25: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/25.jpg)
Job History
25
![Page 26: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/26.jpg)
How to define a rule?
26
![Page 27: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/27.jpg)
How does a Rule work?
INPUT Counters & Task Data
LOGIC Some logic to compute a value
OUTPUT Compare value against threshold levels
27
![Page 28: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/28.jpg)
Customising Dr. Elephant28
![Page 29: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/29.jpg)
Adding a Custom Rule
1. Create a new Rule and test it.
2. Create a help page defining the rule, parameters to tune etc.
3. Add the details of the Rule in the HeuristicConf.xml file <heuristic> <applicationtype>Mapreduce</applicationtype> <heuristicname>Rule Name</heuristicname> <classname>path.to.rule.class</classname> <viewname>path.to.rule.help.page</viewname></heuristic>
4. Run Dr. Elephant. It should now include the new rules.29
![Page 30: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/30.jpg)
What else can you customize?
● Rules, set threshold levels
● Easily integrate with new schedulers (Azkaban, Airflow, Oozie, etc)
● Enable/disable and extend to new Fetchers
● Extend to newer application types and job types
30
![Page 31: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/31.jpg)
Production Gatekeeper31
![Page 32: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/32.jpg)
Automated Production Reviews | JIRA Bot
• Cluster for critical workloads
• Audit before deployment
32
![Page 33: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/33.jpg)
Workflow monitoring and reports
• Monitor performance on each execution
• Compare behaviour across revisions
• Cost to Serve analysis
33
![Page 34: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/34.jpg)
Open Source, April 2016
github.com / linkedin / dr-elephant34
Watchers Stars Forks 60 262 109
![Page 35: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/35.jpg)
Let’s collectively contribute!
35
Pull Requests 60 +
Contributors 10 +
User Topics 50 +
![Page 36: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/36.jpg)
Dr. Elephant Community
36
![Page 37: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/37.jpg)
Coming Soon
37
● Real time analysis of Jobs
● Analytics for Failed Jobs
● Visualizing Workflows through DAGs
● Support for Other schedulers and Frameworks
![Page 38: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/38.jpg)
References
Engineering Blog: engineering.linkedin.com/blog/2016/04/dr-elephant-open-source-self-serve-performance-tuning-hadoop-spark
Open Source Github Link:github.com/linkedin/dr-elephant
Mailing List & Gitterdr-elephant-users, linkedin/dr-elephant
Hadoop Summit 2015:https://www.youtube.com/watch?v=aL3OJ4YoxPA (Mark Wagner)
38
![Page 39: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/39.jpg)
github.com / linkedin / dr-elephant
Thank You
39
Akshay Raihttps://in.linkedin.com/in/akshayrai09
![Page 40: The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark](https://reader030.fdocuments.net/reader030/viewer/2022032611/586f77fc1a28ab10258b69bb/html5/thumbnails/40.jpg)
©2014 LinkedIn Corporation. All Rights Reserved.
©2014 LinkedIn Corporation. All Rights Reserved.
© 2016 40