Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

37
Spark Tuning for Enterprise System Administrators Anya T. Bida, PhD Rachel B. Warren

Transcript of Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Page 1: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Spark Tuning for Enterprise System Administrators

Anya T. Bida, PhD Rachel B. Warren

Page 2: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Don't worry about missing something...

Presentation: http://www.slideshare.net/anyabida Cheat-sheet: http://techsuppdiva.github.io/ !!Anya: https://www.linkedin.com/in/anyabida Rachel: https://www.linkedin.com/in/rachelbwarren !! !2

Page 3: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

About Anya About RachelOperations Engineer !!!

Spark & Scala Enthusiast / Data Engineer

About Alpine Data!alpinenow.com

Alpine deploys Spark in Production for our Enterprise Customers

Page 4: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

About You*

Intermittent

Reliable Optimal

Enterprise System Administrators

mySparkApp Success

*

Page 5: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Intermittent Reliable

Optimal

mySparkApp Success

Page 6: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Default != RecommendedExample: By default, spark.executor.memory = 1g 1g allows small jobs to finish out of the box. Spark assumes you'll increase this parameter.

!6

Page 7: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Which parameters are important? !

How do I configure them?

!7

Default != Recommended

Page 8: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Filter* data before an

expensive reduce or aggregation

consider* coalesce(

Use* data structures that

require less memory

Serialize*

PySpark

serializing is built-in

Scala/Java?

persist(storageLevel.[*]_SER)

Recommended: kryoserializer *

tuning.html#tuning-data-structures

See "Optimize partitions." *

See "GC investigation." *

See "Checkpointing." *

The Spark Tuning Cheat-Sheet

Page 9: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Intermittent Reliable

Optimal

mySparkApp Success

mySparkApp memory issues

Shared Cluster

Page 10: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

!10

Page 11: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

!11

Page 12: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Fair Schedulers

!12

YARN <allocations> <queue name="sample_queue"> <minResources>4000 mb,0vcores</minResources> <maxResources>8000 mb,8vcores</maxResources> <maxRunningApps>10</maxRunningApps> <weight>2.0</weight> <schedulingPolicy>fair</schedulingPolicy> </queue> </allocations>

SPARK <allocations> <pool name="sample_queue"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>2</minShare> </pool> </allocations>

Page 13: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Fair Schedulers

!13

YARN <allocations> <queue name="sample_queue"> <minResources>4000 mb,0vcores</minResources> <maxResources>8000 mb,8vcores</maxResources> <maxRunningApps>10</maxRunningApps> <weight>2.0</weight> <schedulingPolicy>fair</schedulingPolicy> </queue> </allocations>

SPARK <allocations> <pool name="sample_queue"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>2</minShare> </pool> </allocations>

Configure these parameters too!

Page 14: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Fair Schedulers

!14

YARN <allocations> <user name="sample_user"> <maxRunningApps>6</maxRunningApps> </user> <userMaxAppsDefault>5</userMaxAppsDefault> !</allocations>

Page 15: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

What is the memory limit for mySparkApp?

!15

Page 16: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

!16

Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !!!

<maxResources>8000 mb</maxResources>

Limitation

What is the memory limit for mySparkApp?

Reserve 25% for overhead.

Page 17: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

!17

Page 18: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

!18

Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !

mySparkApp_mem_limit = driver.memory + (executor.memory x dynamicAllocation.maxExecutors)

What is the memory limit for mySparkApp?

Page 19: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

!19

Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !

mySparkApp_mem_limit = driver.memory + (executor.memory x dynamicAllocation.maxExecutors)

What is the memory limit for mySparkApp?

Limitation: Each driver and executor must not be larger than a

single node.

Limitation: Driver and executor memory must not be larger than

a single node.

!(yarn.nodemanager.resource.memory-mb - 1Gb)

executor.memory ~ # executors per node

Limitation

Page 20: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

!20

Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !

mySparkApp_mem_limit = driver.memory + (executor.memory x dynamicAllocation.maxExecutors)

Limitation: maxExecutors should not exceed pool allocation.

!Yarn: <maxResources>8vcores</maxResources>

Limitation

What is the memory limit for mySparkApp?

Page 21: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

!21

I want a little more information...Top 5 Mistakes When Writing Spark Applications

by Mark Grover and Ted Malaska of Cloudera http://www.slideshare.net/hadooparchbook/top-5-mistakes-when-writing-spark-applications

How-to: Tune Your Apache Spark Jobs (Part 2) by Sandy Ryza of Cloudera

http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

I want lots more...

Page 22: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

!22

Page 23: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Intermittent Reliable

Optimal

mySparkApp Success

mySparkApp memory issues

Shared Cluster

Page 24: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Reduce the memory needed for mySparkApp. How?

Gracefully handle memory limitations. How?

mySparkApp memory issues

Page 25: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Reduce the memory needed for mySparkApp. How?

mySparkApp memory issues

here let's talk about one scenario

Page 26: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Page 27: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Reduce the memory needed for mySparkApp. How?

mySparkApp memory issues

persist(storageLevel.[*]_SER)

Recommended: kryoserializer *

Page 28: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Gracefully handle memory limitations. How?

mySparkApp memory issues

Reduce the memory needed for mySparkApp. How?

Page 29: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Gracefully handle memory limitations. How?

mySparkApp memory issues

here let's talk about one scenario

Page 30: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Symptoms:

!30

• mySparkApp is running for several hours Container is lost.

• I notice one container fails, then the rest fail one by one

• The first container to fail was the driver • Driver is a SPOF

Page 31: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Investigate:

!31

collect unbounded data to the driver

• Driver failures are often caused by:

• I verified only bounded data is brought to the driver, but still the driver fails intermittently.

Page 32: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Potential Solution: RDD.checkpoint()

!32

Use in these cases: • high-traffic cluster • network blips • preemption • disk space nearly full !!

Function: • saves the RDD to stable

storage (eg hdfs or S3)

How-to: SparkContext.setCheckpointDir(directory: String)

RDD.checkpoint()

Page 33: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Intermittent Reliable

Optimal

mySparkApp Success

mySparkApp memory issues

Shared Cluster

Instead of 2.5 hours, myApp completes in 1 hour.

Page 34: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Cheat-sheet techsuppdiva.github.io/

Page 35: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Intermittent Reliable

Optimal

mySparkApp Success

mySparkApp memory issues

Shared Cluster

HighPerformanceSpark.com

Page 36: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Further Reading:• Learning Spark, by H. Karau, A. Konwinski, P. Wendell, M. Zaharia, 2015, O'Reilly

https://databricks.com/blog/2015/02/09/learning-spark-book-available-from-oreilly.html

• Scheduling:https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application

• Tuning the Spark Conf:Mark Grover and Ted Malaska from Cloudera http://www.slideshare.net/hadooparchbook/top-5-mistakes-when-writing-spark-applicationsSandy Ryza (Cloudera) http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

• Checkpointing:http://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing

• Troubleshooting:Miklos Christine from Databricks https://spark-summit.org/east-2016/events/operational-tips-for-deploying-spark/

• High Performance Spark by R. Warren, H. Karau, coming in 2016, O'Reilly http://highperformancespark.com/

!36

Page 37: Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

More Questions?

!37

Presentation: http://www.slideshare.net/anyabida Cheat-sheet: http://techsuppdiva.github.io/ !!Anya: https://www.linkedin.com/in/anyabida Rachel: https://www.linkedin.com/in/rachelbwarren !!