Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

58
Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads Aleksandar Prokopec Martin Odersky 1

description

Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads. Aleksandar Prokopec Martin Odersky. Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads. Irregular Data-Parallel. Aleksandar Prokopec Martin Odersky. Uniform workload. - PowerPoint PPT Presentation

Transcript of Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

Page 1: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

1

Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

Aleksandar ProkopecMartin Odersky

Page 2: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

2

Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

Aleksandar ProkopecMartin Odersky

Irregular Data-Parallel

Page 3: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

3

Uniform workload

(0 until 10000000) reduce (+)

Page 4: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

4

Uniform workload

(0 until 10000000) reduce (+)

sum = sum + x

Page 5: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

5

Uniform workload

(0 until 10000000) reduce (+)

sum = sum + x

N

cycles

Page 6: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

6

Baseline workload

for (0 until 10000000) {}

N

cycles

Page 7: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

7

Irregular workload

Page 8: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

8

Irregular workload

N

cycles

Page 9: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

9

Irregular workload

for { x <- 0 until width y <- 0 until height} image(x, y) = compute(x, y)

N

cycles

Page 10: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

10

Irregular workload

for { x <- 0 until width y <- 0 until height} image(x, y) = compute(x, y)image(x, y) = compute(x, y)

N

cycles

Page 11: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

11

Workload function

workload(n) – work spent on element n after the data-parallel operation completed

Page 12: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

12

Workload function

Could be…

Runtime valuedependent

for { x <- 0 until width y <- 0 until height} img(x, y) = compute(x, y)

workload(n) – work spent on element n after the data-parallel operation completed

Page 13: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

13

Workload function

Could be…

Execution-scheduledependent

for (n <- nodes) n.neighbours += new Node

workload(n) – work spent on element n after the data-parallel operation completed

Page 14: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

14

Workload function

Could be…

Totally randomfor ((x, y) <- img.indices) img(x, y) = sample( x + random(), y + random() )

workload(n) – work spent on element n after the data-parallel operation completed

Page 15: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

15

Data-parallel scheduler

Assign loop elements to workerswithout knowledge about the workload function.

Page 16: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

16

Data-parallel scheduler

1. Linear speedup for the baseline workload

Assign loop elements to workerswithout knowledge about the workload function.

Page 17: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

17

Data-parallel scheduler

1. Linear speedup for the baseline workload2. Optimal speedup for irregular workloads

Assign loop elements to workerswithout knowledge about the workload function.

Page 18: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

18

Static batching

Decides on the worker-element assignment before the data-parallel operation begins.

N

cycles

Page 19: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

19

Static batching

Decides on the worker-element assignment before the data-parallel operation begins.

No knowledge → divide uniformly.

Not optimal for even mildly irregular workloads.

N

cycles

Page 20: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

20

Fixed-size batching

Workload-driven – decides during execution.

N

cycles

progress

Page 21: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

21

Fixed-size batching

Workload-driven – decides during execution.

N

cycles

0

Page 22: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

22

Fixed-size batching

Workload-driven – decides during execution.

N

cycles

2 T0: CAS

T0

Page 23: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

23

Fixed-size batching

Workload-driven – decides during execution.

N

cycles

4T1: CAS

T0 T1

Page 24: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

24

Fixed-size batching

Workload-driven – decides during execution.

N

cycles

6 T0: CAS

T0T1

Page 25: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

25

Fixed-size batching

Workload-driven – decides during execution.

N

cycles

8 T0: CAS

T0T1

Page 26: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

26

Fixed-size batching

Workload-driven – decides during execution.

N

cycles

10 T0: CAS

T0T1

Page 27: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

27

Fixed-size batching

Workload-driven – decides during execution.

N

cycles

12 T0: CAS

T0T1

Page 28: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

28

Fixed-size batching

Workload-driven – decides during execution.

N

cycles

progress

Pros: lightweightCons: minimum batch size, contention

Page 29: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

29

Fixed-size batching - contention

Page 30: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

30

Factoring, GSS, TS

Batch size varies.

N

cycles

progress

Pros: lightweightCons: contention

Page 31: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

31

Task-based work-stealing

N

cycles

0..2 2..4 4..8 8..16

Page 32: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

32

Task-based work-stealing

N

cycles

0..2 2..4 4..8 8..16

2..4

4..8

8..16

T0 T10..2

Page 33: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

33

Task-based work-stealing

N

cycles

0..2 2..4 4..8 8..16

2..4

4..8

8..16

T0 T10..2

steal – a rare event

Page 34: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

34

Task-based work-stealing

N

cycles

0..2 2..4 4..8 8..16

2..4

4..8

8..16

T0 T110..12

12..16

8..100..2

Page 35: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

35

Task-based work-stealing

Pros: can be adaptive - uses stealing informationCons: heavyweight - minimum batch size much larger

N

cycles

0..2 2..4 4..8 8..16

2..4

4..8

8..16

T0 T110..12

12..16

0..2 8..10

Page 36: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

36

Task-based work-stealing

N

cycles

0..2 2..4 4..8 8..16

Cannot be stolenafter T0 starts processing it

Page 37: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

37

Work-stealing tree

0 0T0 N

owned

Page 38: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

38

Work-stealing tree

0 0T0 N 0 50T0 N

owned owned

T0: CAS

Page 39: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

39

Work-stealing tree

0 0T0 N 0 50T0 N 0 NT0 N…

owned owned completed

T0: CAS T0: CAS

What about stealing?

Page 40: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

40

Work-stealing tree

0 0T0 N 0 50T0 N 0 NT0 N…

owned owned completed

0 -51T0 N

T0: CAS

T1: CAS

stolen

T0: CAS

Page 41: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

41

Work-stealing tree

0 50T0 N 0 NT0 N…

owned completed

0 -51T0 N

T0: CAS

stolen

T0: CAS

0 0T0 N

owned

T1: CAS

Page 42: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

42

Work-stealing tree

0 50T0 N 0 NT0 N…

owned completed

0 -51T0 N

T0: CAS

stolen

0 -51T0 N

expanded

50 50T0 M M MT1 N

T0: CAS

0 0T0 N

owned

M = (50 + N) / 2

Page 43: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

43

Work-stealing tree

0 50T0 N 0 NT0 N…

owned completed

0 -51T0 N

T0: CAS

stolen

0 -51T0 N

expanded

50 50T0 M M MT1 N

T0: CAS

0 0T0 N

owned

M = (50 + N) / 2

T0 or T1: CAS

Page 44: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

44

Work-stealing tree

0 50T0 N 0 NT0 N…

owned completed

0 -51T0 N

T0: CAS

stolen

0 -51T0 N

expanded

50 50T0 M M MT1 N

T0 or T1: CAS

T0: CAS

0 0T0 N

owned

M = (50 + N) / 2

Page 45: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

45

Work-stealing tree - contention

Page 46: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

50

Work-stealing tree scheduling

1) find either a non-expanded, non-completed node2) if not found, terminate3) if not owned, steal and/or expand, and descend4) advance until node is completed or stolen5) go to 1)

Page 47: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

51

Work-stealing tree scheduling

1) find either a non-expanded, non-completed node2) if not found, terminate3) if not owned, steal and/or expand, and descend4) advance until node is completed or stolen5) go to 1)

1) find either a non-expanded, non-completed node

Page 48: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

52

Choosing the node to steal

Find first, in-order traversal

2 9

5

3

Page 49: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

53

Choosing the node to steal

Find first, in-order traversal

2 9

5

3

Catastrophic – a lot of stealing, huge trees

Page 50: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

54

Choosing the node to steal

Find first, in-order traversal Find first, random order traversal

2 9

5

3

2 9

5

3

Catastrophic – a lot of stealing, huge trees

Page 51: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

55

Choosing the node to steal

Find first, in-order traversal Find first, random order traversal

2 9

5

3

2 9

5

3

Catastrophic – a lot of stealing, huge trees

Works reasonably well.

Page 52: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

56

Choosing the node to steal

Find first, in-order traversal Find first, random order traversal Find most elements

2 9

5

3

2 9

5

3

2 9

5

3

Catastrophic – a lot of stealing, huge trees

Works reasonably well. Generates least nodes.Seems to be best.

Page 53: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

57

Comparison with fixed-size batching

Page 54: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

58

Comparison with fixed-size batching

Page 55: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

59

Comparison with task work-stealing

Page 56: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

60

Thank you!

Questions?

Page 57: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

61

Finding work

Page 58: Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads

62

Other workloads