2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed...
Transcript of 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed...
Sparrow Distributed Low-Latency Spark Scheduling
Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica
Outline
The Spark scheduling bottleneck Sparrow’s fully distributed, fault-tolerant technique Sparrow’s near-optimal performance
Spark Today
Worker
Worker
Worker
Worker
Worker
Worker
Spark Context
User 1 User 2 User 3
Query Compilation
Storage
Scheduling
Spark Today
Worker
Worker
Worker
Worker
Worker
Worker
Spark Context
User 1 User 2 User 3
Query Compilation
Storage
Scheduling
Job Latencies Rapidly Decreasing
10 min. 10 sec. 100 ms 1 ms
2004: MapReduce batch job
2009: Hive query
2010: Dremel Query
2012: Impala query 2010:
In-memory Spark query
2013: Spark
streaming
Job latencies rapidly decreasing
Job latencies rapidly decreasing +
Spark deployments growing in size
Scheduling bottleneck!
Spark scheduler throughput:
1500 tasks / second
1 second 100
100 ms 10
10 second 1000
Task Duration Cluster size
(# 16-core machines)
Optimizing the Spark Scheduler
0.8: Monitoring code moved off critical path 0.8.1: Result deserialization moved off critical path Future improvements may yield 2-3x higher throughput
Is the scheduler the bottleneck in my cluster?
tinyurl.com/sparkdemo
Worker
Worker
Worker
Worker
Worker
Worker
Cluster Scheduler
Task launch
Task completion
tinyurl.com/sparkdemo
Worker
Worker
Worker
Worker
Worker
Worker
Cluster Scheduler
Task launch
Task completion
tinyurl.com/sparkdemo
Worker
Worker
Worker
Worker
Worker
Worker
Cluster Scheduler
Task launch
Task completion
Scheduler delay
tinyurl.com/sparkdemo
Spark Today
Worker
Worker
Worker
Worker
Worker
Worker
Spark Context
User 1 User 2 User 3
Query Compilation
Storage
Scheduling
Future Spark
Worker
Worker
Worker
Worker
Worker
Worker
User 1 User 2 User 3
Scheduler Query compilation
Scheduler Query compilation
Scheduler Query compilation
Benefits: High throughput Fault tolerance
Future Spark
Worker
Worker
Worker
Worker
Worker
Worker
User 1 User 2 User 3
Scheduler Query compilation
Scheduler Query compilation
Scheduler Query compilation
Storage:
Tachyon
Scheduling with Sparrow
Worker
Worker
Worker
Worker
Worker Scheduler
Scheduler
Scheduler
Scheduler Stage
Worker
Stage
Batch Sampling
Worker
Worker
Worker
Worker
Worker Scheduler
Scheduler
Scheduler
Scheduler
Worker
Place m tasks on the least loaded of 2m workers
4 probes (d = 2)
Queue length poor predictor of wait time
Worker
Worker
80 ms 155 ms
530 ms
Poor performance on heterogeneous workloads
Stage
Late Binding
Worker
Worker
Worker
Worker
Worker Scheduler
Scheduler
Scheduler
Scheduler
Worker
Place m tasks on the least loaded of d�m workers
4 probes (d = 2)
Late Binding
Scheduler
Scheduler
Scheduler
Scheduler
Place m tasks on the least loaded of d�m workers
4 probes (d = 2)
Worker
Worker
Worker
Worker
Worker
Worker
Stage
Late Binding
Scheduler
Scheduler
Scheduler
Scheduler
Place m tasks on the least loaded of d�m workers
Worker requests
task Worker
Worker
Worker
Worker
Worker
Worker
Stage
What about constraints?
Stage
Per-Task Constraints
Scheduler
Scheduler
Scheduler
Scheduler
Worker
Worker
Worker
Worker
Worker
Worker
Probe separately for each task
Technique Recap
Scheduler
Scheduler
Scheduler
Scheduler
Batch sampling +
Late binding +
Constraints
Worker
Worker
Worker
Worker
Worker
Worker
How well does Sparrow perform?
How does Sparrow compare to Spark’s native scheduler?
!"!#"""!$"""!%"""!&"""!'"""!("""
!"!#"""!$"""!%"""!&"""!'"""!("""
)*+,-.+*!/01*!21+3
/4+5!678490-.!21+3
:,485!.490;*!+<=*>7?*8:,488-@A>*4?
100 16-core EC2 nodes, 10 tasks/job, 10 schedulers, 80% load
TPC-H Queries: Background
TPC-H: Common benchmark for analytics workloads
Sparrow
Spark
Shark: SQL execution engine
!"!#""!$"""!$#""!%"""!%#""!&"""!&#""!'"""
(& (' () ($%
*+,-./,+!012+!32,4
'%$5!32+674 #&8)!32+674 599$!32+674
*:/6.2 ;-:<<.= >6+:?
TPC-H Queries
100 16-core EC2 nodes, 10 schedulers, 80% load
95
75
25
50
Percentiles
5
Within 12% of ideal Median queuing delay of 9ms
Policy Enforcement
Worker High Priority
Low Priority Worker User A (75%)
User B (25%)
Fair Shares Serve queues using weighted fair
queuing
Priorities Serve queues based on strict
priorities
Weighted Fair Sharing
!"!#"!$""!$#"!%""!%#"!&""!&#"!'""
!" !$" !%" !&" !'" !#"
()**+*,!-./0/
-+12!3/4
5/26!"5/26!$
Fault Tolerance
Scheduler 1
Scheduler 2
Spark Client 1 ✗ Spark Client 2
Timeout: 100ms Failover: 5ms
Re-launch queries: 15ms
!"!#"""!$"""!%"""!&"""
!" !#" !$" !%" !&" !'" !(")*+,!-./
!"!#"""!$"""!%"""!&"""
01,23!2,.456.,!7*+,!-+./
89*:12,;492<!=:*,67!#
;492<!=:*,67!$
Making Sparrow feature-complete
Interfacing with UI
Delay scheduling
Speculation
(2) Distributed, fault-tolerant scheduling
with Sparrow
www.github.com/radlab/sparrow
Scheduler
Scheduler
Scheduler
Scheduler
Worker
Worker
Worker
Worker
Worker
Worker
(1) Diagnosing a Spark scheduling
bottleneck