An Adaptive Task Creation Strategy for Work-Stealing Scheduling

20
INSTITUTE OF COMPUTING TECHNOLOGY An Adaptive Task Creation Strategy for Work-Stealing Scheduling Lei Wang, Huimin Cui, Yuelu Duan, Fang Lu, Xiaobing Feng, Pen-Chung Yew ICT, Chinese Academy of Sciences, China University of Minnesota, U.S.A 1

description

An Adaptive Task Creation Strategy for Work-Stealing Scheduling. Lei Wang , Huimin Cui, Yuelu Duan , Fang Lu, Xiaobing Feng , Pen-Chung Yew. ICT, Chinese Academy of Sciences, China University of Minnesota, U.S.A. Forecast . Adaptive task granularity. fine-grained parallelism. tasks. - PowerPoint PPT Presentation

Transcript of An Adaptive Task Creation Strategy for Work-Stealing Scheduling

Page 1: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

An Adaptive Task Creation Strategy for Work-Stealing Scheduling

Lei Wang, Huimin Cui, Yuelu Duan, Fang Lu, Xiaobing Feng, Pen-Chung Yew

ICT, Chinese Academy of Sciences, ChinaUniversity of Minnesota, U.S.A

1

Page 2: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

Forecast

2

Adaptive task granularity

fine-grained parallelism

tasks

Multi-cores

An adaptive task creation strategy Work-stealing

Page 3: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

Outline An adaptive task creation strategy

A new data attribute -- taskprivate

Evaluations

Conclusions

3

Page 4: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

Background Cilk, Cilk++, X10, OpenMP3.0, TBB, TPL …

Parallel programming languages and libraries to support task-level parallelism

Programmer: dividing work into tasks instead of threads

Runtime system: mapping and scheduling tasks into physical threads

Key technique Work-stealing scheduling

4

Page 5: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

Granularity

too fine scheduling overhead dominates

too coarse lose potential parallelism, cause starvation

5

cut-off = 3

cut-off = 1

Page 6: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

An unbalanced computation tree

6P0 – red, P1 – blue, P2 – green, P3 – yellow.

Page 7: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

A cut-off strategy

7P0 – red, P1 – blue, P2 – green, P3 -- yellow

Load imbalance

Page 8: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

An adaptive task creation strategy -- AdaptiveTC

8

A special task

P0 – red, P1 – blue, P2 – green, P3 -- yellow

Page 9: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

AdaptiveTC When executing a spawn statement

a task, a function call (a fake task), a special task the task the fake task the special task

Adaptively switching between tasks and fake tasks to get a better performance Cut-off A special task

9

Keeping idle threads busy Improving performanceGood load balancing

a task a fake taska fake task a task

Page 10: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

cilk int nqueens(int depth, int n, char x [ ]){… tmpx = Cilk_alloca(n * sizeof(char)); memcpy(tmpx, x, n * sizeof(char)); sn += spawn nqueens(depth + 1, n, tmpx);…sync;return sn;}

(3)

cilk int nqueens(int depth, int n, char x [ ]){… tmpx = (char *)malloc(n * sizeof(char)); memcpy(tmpx, x, n * sizeof(char)); sn += spawn nqueens(depth + 1, n, tmpx);...sync;free(x); return sn;}

(2) cilk int nqueens(int depth, int n, char x [ ]){... tmpx =(char *)malloc(n * sizeof(char)); memcpy(tmpx, x, n * sizeof(char)); sn += spawn nqueens(depth + 1, n, tmpx); free(tmpx);...sync;return sn;}

(1)

Which Cilk programs are correct?

10

N-queen problem

Page 11: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

A new data attribute -- taskprivate Workspace copying

Not easy to program Overhead is high

taskprivate Introduced for

workspace variables

11

cilk int nqueens(int depth, int n, char x [ ]) taskprivate: (x[]) (n * sizeof(char));{ int sn = 0; if(depth >= n){ sn++; return sn; } for(j = 0; j < n; j++){ if(place(depth, j, x)){ x[depth] = j; sn += spawn nqueens(depth + 1, n, x); } } sync; return sn;}

An AdaptiveTC program for nqueens

In a fake task (a function call) x[depth] = j; sn += nqueens(depth + 1, n, x);

In a task

x[depth] = j; tmpx = Cilk_alloca(n * sizeof(char)); memcpy(tmpx, x, n * sizeof(char)); sn += nqueens(depth + 1, n, tmpx);

Page 12: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

Test system, test cases 8 cores

2-processor quad core Intel Xeon E5520 (2.26GHz, 8G memory)

8 test cases 6 are backtracking search programs. 2 are divide and conquer programs.

Compared systems Cilk-5.4.6, Tascell (PPoPP’09), AdaptiveTC gcc -O3

12

Page 13: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

Test case 1 -- performance

1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

CilkCilk-SYNCHEDTascellAdaptiveTC

Number of Threads

Spee

dup

(Seconds) 1 thread 8 threads

C 61 61

Cilk 198 24.57

Cilk-SYNCHED 184 22.41

Tascell 85 14.24

AdaptiveTC 66 8.27

13Nqueen-array(16)

Page 14: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

Test case 1 -- analysis

Tascell Cilk Cilk-SYNCHED

AdaptiveTC0%

20%

40%

60%

80%

100%

120%working taskprivate variable

Load balanced

28.7% 69.2% 67% 7.9% The usage of cores with 8 threads

14

Tascell Cilk AdaptiveTC

83.3%99.9% 99.0%

16.7%0.1% 1.0%

busy idle

Breakdown of overhead

overhead

Page 15: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

1 2 3 4 5 6 7 80

1

2

3

4

5

6

7

8

CilkCilk-SYNCHEDTascellAdaptiveTC

Number of Threads

Spee

dup

Test case 2 -- performance

(Seconds) 1 thread 8 threads

C 554 554

Cilk 669 85

Cilk-SYNCHED 661 88

Tascell 627 114

AdaptiveTC 612 77

15Nqueen-compute(16)

Page 16: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

Test case 2 -- analysis

11.7% 17.2% 16.2% 9.5%

Tascell Cilk Cilk-SYNCHED

AdaptiveTC0%

20%

40%

60%

80%

100%

120%

working taskprivate variabledeque/nested function

Load balanced

The usage of cores with 8 threads

Tascell Cilk AdaptiveTC

79.2%99.9% 99.1%

20.8%0.1% 0.9%

busy idle

16

Breakdown of overhead

overhead

Page 17: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

012345678

1 2 3 4 5 6 7 8

spee

dup

# of threads

Sudoku ( i nput_bal ance tree)

Ci l kCi l k-SYNCHEDTascel lAdapti veTC

Kni ght' s tour(6*6)

0123456789

10

1 2 3 4 5 6 7 8# of threads

spee

dup Ci l k

Ci l k-SYNCHEDTascel lAdapti veTC

St r i mko

012345678

1 2 3 4 5 6 7 8# of threads

Spee

dup Ci l k

Ci l k- SYNCHEDTascel lAdapt i veTC

Pentomi no(13)

012345678

1 2 3 4 5 6 7 8# of threads

Spee

dup Ci l k

Ci l k- SYNCHEDTascel lAdapt i veTC

Experimental results

17

Page 18: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

Comp(60000)

012345678

1 2 3 4 5 6 7 8# of threads

Spee

dup Ci l k

Tascel lAdapti veTC

Fi b(45)

01234567

1 2 3 4 5 6 7 8# of threads

spee

dup Ci l k

Tascel lAdapt i veTC

Nqueen

_arra

y(16)

Nqueen

_com

pute(

16)

Strimko

Knight'

s Tou

r(6*6

)

Sudok

u (ba

lance_

tree)

Pentom

ino(13

)

Fib(45

)

Comp(6

0000

)

Averag

e0

0.51

1.52

2.53

3.54

Cilk Cilk_SYNCHED Tascell AdaptiveTC

Spee

dup

Experimental results (cont’d)

18

Figure: Speedup with 8 threads, baseline is Cilk’s execution time

speedup

Cilk 1Cilk-SYNED 1.07Tascell 1.5AdaptiveTC 2.24

Page 19: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

Conclusions -- AdaptiveTC An adaptive task creation strategy controls

the tasks granularity. Reducing the system overhead Achieving a good load balancing

A new data attribute taskprivate is introduced for workspace variables. Improving the programmability Reducing the cost of workspace copying with an

adaptive task creation strategy19

Page 20: An Adaptive Task Creation Strategy for Work-Stealing Scheduling

INSTITUTE OF COMPUTING

TECHNOLOGY

Thanks!20