2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed...

Sparrow Distributed Low-Latency Spark Scheduling

Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica

Outline

The Spark scheduling bottleneck Sparrow’s fully distributed, fault-tolerant technique Sparrow’s near-optimal performance

Spark Today

Worker

Worker

Worker

Worker

Worker

Worker

Spark Context

User 1 User 2 User 3

Query Compilation

Storage

Scheduling

Job Latencies Rapidly Decreasing

10 min. 10 sec. 100 ms 1 ms

2004: MapReduce batch job

2009: Hive query

2010: Dremel Query

2012: Impala query 2010:

In-memory Spark query

2013: Spark

streaming

Job latencies rapidly decreasing

Job latencies rapidly decreasing +

Spark deployments growing in size

Scheduling bottleneck!

Spark scheduler throughput:

1500 tasks / second

1 second 100

100 ms 10

10 second 1000

Task Duration Cluster size

(# 16-core machines)

Optimizing the Spark Scheduler

0.8: Monitoring code moved off critical path 0.8.1: Result deserialization moved off critical path Future improvements may yield 2-3x higher throughput

Is the scheduler the bottleneck in my cluster?

tinyurl.com/sparkdemo

Worker

Worker

Worker

Worker

Worker

Worker

Cluster Scheduler

Task launch

Task completion


Worker

Worker

Worker

Worker

Worker

Worker

Cluster Scheduler

Task launch

Task completion

Scheduler delay


Spark Today

Worker

Worker

Worker

Worker

Worker

Worker

Spark Context


Query Compilation

Storage

Scheduling

Future Spark

Worker

Worker

Worker

Worker

Worker

Worker


Scheduler Query compilation



Benefits: High throughput Fault tolerance

Future Spark

Worker

Worker

Worker

Worker

Worker

Worker





Storage:

Tachyon

Scheduling with Sparrow

Worker

Worker

Worker

Worker

Worker Scheduler

Scheduler

Scheduler

Scheduler Stage

Worker

Stage

Batch Sampling

Worker

Worker

Worker

Worker

Worker Scheduler

Scheduler

Scheduler

Scheduler

Worker

Place m tasks on the least loaded of 2m workers

4 probes (d = 2)

Queue length poor predictor of wait time

Worker

Worker

80 ms 155 ms

530 ms

Poor performance on heterogeneous workloads

Stage

Late Binding

Worker

Worker

Worker

Worker

Worker Scheduler

Scheduler

Scheduler

Scheduler

Worker

Place m tasks on the least loaded of d�m workers

4 probes (d = 2)

Late Binding

Scheduler

Scheduler

Scheduler

Scheduler


4 probes (d = 2)

Worker

Worker

Worker

Worker

Worker

Worker

Stage

Late Binding

Scheduler

Scheduler

Scheduler

Scheduler


Worker requests

task Worker

Worker

Worker

Worker

Worker

Worker

Stage

What about constraints?

Stage

Per-Task Constraints

Scheduler

Scheduler

Scheduler

Scheduler

Worker

Worker

Worker

Worker

Worker

Worker

Probe separately for each task

Technique Recap

Scheduler

Scheduler

Scheduler

Scheduler

Batch sampling +

Late binding +

Constraints

Worker

Worker

Worker

Worker

Worker

Worker

How well does Sparrow perform?

How does Sparrow compare to Spark’s native scheduler?

!"!#"""!$"""!%"""!&"""!'"""!("""

!"!#"""!$"""!%"""!&"""!'"""!("""

)*+,-.+*!/01*!21+3

/4+5!678490-.!21+3

:,485!.490;*!+<=*>7?*8:,488-@A>*4?

100 16-core EC2 nodes, 10 tasks/job, 10 schedulers, 80% load

TPC-H Queries: Background

TPC-H: Common benchmark for analytics workloads

Sparrow

Spark

Shark: SQL execution engine

!"!#""!$"""!$#""!%"""!%#""!&"""!&#""!'"""

(& (' () ($%

*+,-./,+!012+!32,4

'%$5!32+674 #&8)!32+674 599$!32+674

*:/6.2 ;-:<<.= >6+:?

TPC-H Queries

100 16-core EC2 nodes, 10 schedulers, 80% load

95

75

25

50

Percentiles

5

Within 12% of ideal Median queuing delay of 9ms

Policy Enforcement

Worker High Priority

Low Priority Worker User A (75%)

User B (25%)

Fair Shares Serve queues using weighted fair

queuing

Priorities Serve queues based on strict

priorities

Weighted Fair Sharing

!"!#"!$""!$#"!%""!%#"!&""!&#"!'""

!" !$" !%" !&" !'" !#"

()**+*,!-./0/

-+12!3/4

5/26!"5/26!$

Fault Tolerance

Scheduler 1

Scheduler 2

Spark Client 1 ✗ Spark Client 2

Timeout: 100ms Failover: 5ms

Re-launch queries: 15ms

!"!#"""!$"""!%"""!&"""

!" !#" !$" !%" !&" !'" !(")*+,!-./

!"!#"""!$"""!%"""!&"""

01,23!2,.456.,!7*+,!-+./

89*:12,;492<!=:*,67!#

;492<!=:*,67!$

Making Sparrow feature-complete

Interfacing with UI

Delay scheduling

Speculation

(2) Distributed, fault-tolerant scheduling

with Sparrow

www.github.com/radlab/sparrow

Scheduler

Scheduler

Scheduler

Scheduler

Worker

Worker

Worker

Worker

Worker

Worker

(1) Diagnosing a Spark scheduling

bottleneck

2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed...

Documents

Transcript of 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed...