Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon...

33
Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim

Transcript of Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon...

Page 1: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

Age Based Scheduling for Asymmetric Multiprocessors

Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim

Page 2: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

Outline

• Background and Motivation• Age Based Scheduling• Evaluation• Conclusion

2

Page 3: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

3

Asymmetric (Chip) Multiprocessors

• Heterogeneous Architectures where all cores have same ISA but different performance

PEA

PEB

PEB

PEB

PEB

Heterogeneous Architecture

Page 4: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

4

Asymmetric (Chip) Multiprocessors

• Potential for better performance than SMPs occupying same area and consuming same power

Core0 Core1

Core2 Core3

Core0

Symmetric Chip Multiprocessor (SMP/CMP)

Asymmetric Chip Multiprocessor (AMP/ACMP)

Co

re1

Co

re2

Co

re3

Page 5: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

AMPs present new challenges

• Thread Scheduling is one among them

5

Page 6: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

6

Scheduling in Multiprocessor OSes

• Thread Assignment– assign to least loaded core

• Load Balancing– make load on all cores uniform

• Idle Balancing – move threads from busy cores to idle

core

Page 7: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

7

Scheduling in Multiprocessor OSes

• Assume that all cores are identical • Results in bad performance and application

instability

Parsec benchmarks on a (real) AMP using the Linux Scheduler

all-fast 16 cores- 2GHz

half-half 8 cores -2GHz, 8 cores -1GHz

all-slow 16 cores - 1GHz

Page 8: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

8

Problem with current Scheduling

Not taking advantage of fast core

Page 9: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

9

Outline

• Background and Motivation• Age Based Scheduling (ABS)• Evaluation• Conclusion

Page 10: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

10

Motivation for Age Based Scheduling• Many compute-intensive multithreaded applications follow fork-

join model• Milestones (barriers) in thread execution

Application Model

fork

join

barrier

barrier

barrier

barrier

main thread

Page 11: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

11

Symmetry of Applications

• Threads created together are symmetric– Based on instruction count– Degree of Symmetry = Std Dev /

Average

Degree of Symmetry of Parsec Benchmarks

(Symmetric benchmarks are benchmarks with degree of symmetry <= 0.1)

Page 12: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

Insight

exe_dur (T1) = exe_dur (T2) =

exe_dur (T3) = exe_dur (T4)

• Difficult to predict absolute execution duration, so predict relative execution duration

12

execution duration = ?

barrier

barrier

T1

T2 T3

T4

Page 13: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

Putting together

• Applications follow fork-join model with milestones in between

• Many applications are symmetric• Easy to predict relative execution

duration to next milestoneAge Based Scheduling

13

Page 14: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

What is Age?

Age is the progress made by a thread towards its next milestone

14

Page 15: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

15

Age Calculation

• Threads created together have the same age

• As a thread executes, it ages• Reset age when milestone crossed

tA – age of thread A

tB – age of thread B

creation

execution

tA = 0

milestone

(termination)

milestone

(barrier)

tA = 30

tA = X

tA = 0

tB = 0

tB = 50

tB = 0

X – Unknown, assumed to be a large value

Page 16: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

16

Age Based Scheduling Algorithm

To make a Scheduling decision:• Calculate remaining execution

duration to next milestone based on age

• Assign threads with longer remaining execution durations to fast core – Longest Job to Fast Core First (LJFCF)

Page 17: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

Application of L JFCF

• Apply whenever– Thread is created– A core becomes idle– Reassignment timer expires (for load

balancing)

17

Page 18: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

Working of the Algorithm

execution

tA = 0

creation milestone

(termination)

milestone

(barrier)

tA = 30 Age at barrier =

X

rem_exe = (X – 30)

T1

18

Page 19: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

19

Remaining Execution Duration (I)

• Track progress of threads• Using Prediction [AGE]

– Predict all threads have same inter-milestone distance

tA – age of thread A

tB – age of thread B

creation

execution

tA = 0

milestone (termination)

milestone

(barrier)

tA = X tA =

0 tA = X

tB = 0 tB =

X

Page 20: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

20

Remaining Execution Duration (II)

• Using Profiling [AGE(PROF)]– threads have different inter-milestone

distances calculated based on a metric obtained by profiling

tA – age of thread A

tB – age of thread B

creation

execution

tA = 0

milestone

(termination)

milestone (barrier)

tA = X tA = 0

tA = X

tB = 0

tB = rX r is from profiler

Only one r value for each thread

Page 21: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

Working of the Algorithm

fast slow slow slow

B C DA

rem_exeA = 50

rem_exeD = 30

rem_exeC = 90

rem_exeB = 70

AC

rem_exeC = 90

rem_exeA = 50

21

Page 22: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

22

Benefit of Age Based Scheduling

• Asymmetry aware• Utilizes all cores• Gives all threads opportunities to run

on fast cores

Page 23: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

23

Implementation

• OS – Track progress using Performance

Counters– Disable counter on Interrupts

• Compiler (AGE[PROF])– Passing profiled information

• one value for each thread

Page 24: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

24

Outline

• Background and Motivation• Age Based Scheduling• Evaluation• Conclusion

Page 25: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

25

Evaluation• Simulation based experiments

• Trace + execution hybrid simulator • Lock, barriers are modeled• Context switch and migration overhead simulated• 10 ms time slice for each thread

• Machine configuration• 1 fast, 7 slow, 8:1 speed ratio (others are in the paper)

• Benchmarks• Symmetric

– Parsec (simmedium input)

• Asymmetric– Splash-2– OMPSCR– SuperLU

Page 26: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

Comparisons with Other Policies

26

Policy Description

Linux Linux O(1) Scheduler

RR Threads are assigned to fast cores in a Round Robin fashion

SCALEDLD [Li’07]

Fast Core First assignment, asymmetry aware load balancing (baseline)

FCA-AGE Fast Core First assignment with Age based periodic reassignment

AGE Age based assignment and reassignment using prediction

AGE(PROF) Age based assignment and reassignment using profiling

AGE(ORACLE)

Age based assignment and reassignment using oracle

Page 27: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

27

L JFCF vs Other Policies (I)

-200

-150

-100

-50

0

50

100

% R

ed

ucti

on

in

Execu

tio

n T

ime

RR

FCA-AGE

AGE

AGE(PROF)

AGE(ORACLE)

Policy Avg % reduction over SCALEDLD

RR -36.64

FCA-AGE 9.8

AGE 10.4

AGE(PROF) 13.2

AGE(ORACLE)

15.4

• Parsec

Baseline: SCALEDLD

Page 28: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

L JFCF vs Other Policies (II)• Asymmetric Benchmarks

-10

-5

0

5

10

15

20

25

30

35

40

% R

ed

ucti

on

in

E

xecu

tio

n T

ime

FCA-AGE

AGE

AGE(PROF)

AGE(ORACLE)

28

Policy Avg % reduction over SCALEDLD

FCA-AGE 8.2

AGE 7.7

AGE(PROF) 9.4

AGE(ORACLE) 13.1

Baseline: SCALEDLD

Page 29: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

29

Idle Cycles

0%10%20%

30%40%50%60%70%

80%90%

100%

blac

ksch

oles

body

trac

k

fluid

anim

ate

swap

tions

blac

ksch

oles

body

trac

k

fluid

anim

ate

swap

tions

blac

ksch

oles

body

trac

k

fluid

anim

ate

swap

tions

Linux SCALEDLD AGE

Slow Cores

Fast Core

• Linux Scheduler – Most of the idle cycles contributed by fast core

• SCALEDLD – keeps same thread(s) on fast core• AGE – assigns different threads to fast core

Page 30: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

30

Different AMP Configurations

• Need for asymmetry aware scheduling increases as cores become more asymmetric

• AGE based policies show more improvement over SCALEDLD as asymmetry increases

0

0.5

1

1.5

2

2.5

2/1-Parsec 4/1-Parsec 6/1-Parsec 8/1-Parsec

No

rmal

ized

exe

cuti

on

tim

e

LinuxSCALEDLD

AGEAGE(PROF)

X/1 : Ratio of speeds of Fast and Slow cores is X:1

Page 31: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

31

Outline

• Background and Motivation• Age Based Scheduling• Evaluation• Conclusion

Page 32: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

32

Conclusion

• Age based scheduling (ABS) for Asymmetric Multiprocessors– ABS assumes threads created at the same

time are symmetric– ABS assigns threads to cores based on their

predicted remaining execution durations– Predictions are made based on Age of

threads• Improvement of 10.4% (Pred) and 13.2%

(Prof) for Parsec and 7.6% (Pred) and 9.4% (Prof) for Asymmetric benchmarks over Li’s mechanism

Page 33: Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

THANK YOU