David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

14
Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Transcript of David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Page 1: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

•Juergen Ributzka•David Stephenson•Timothy Kong•Dee Lee•Fred Chow•Guang R. Gao

Page 2: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Overview

• SiCortex Multiprocessor

• Software Pipelining Framework

• Results

• Future Work

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 2

Page 3: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

SiCortex Multiprocessor

• RISC

• 6 cores (MIPS 5KF)

• 500 – 700 MHz

• In-Order Execution

• Limited-Dual Issue

• Low Power (12 – 16 Watt)

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 3

Page 4: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Software Pipelining Framework

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 4

DDG

MII

MS

MVE

RA

CG

Front-End

Middle-End

Back-End

FORTRAN C/C++ Java

Loop NestOptimizer

GlobalOptimizer

CodeGenerator

Page 5: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Data Dependence Graph (DDG)

• Cyclic Data Dependence Graph

• No register anti- and output-dependencies

• Only single BBs

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 5

DDG

MII

MS

MVE

RA

CG

S1

S2

S1

S2

Page 6: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Minimum Initiation Interval

• Limited by:

– Resource Requirements

– Recurrence Requirements

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 6

DDG

MII

MS

MVE

RA

CG

Page 7: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Modulo Scheduling (MS)

• Huff Modulo Scheduling

– Lifetime sensitive

– Uses backtracking

• Hyper Node Reduction Scheduling

– Lifetime sensitive

– No backtracking

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 7

DDG

MII

MS

MVE

RA

CG

Page 8: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Modulo Variable Expansion (MVE)

• Separate TN set for every iteration

• # of unrollings depends on longest lifetime

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 8

DDG

MII

MS

MVE

RA

CG

TN10 op1 TNXX…TN20 op2 TN10 TNYY

TN11 op1 TNXX…TN21 op2 TN11 TNYY

TN10 op1 TNXX…TN20 op2 TN10 TNYY

Iteration

C

y

c

l

e

Page 9: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Register Allocation (RA)

• Only kernels are fully register allocated

• No regions

– SWP kernels are not a black box for GRA and LRA anymore

• Currently no register spilling support

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 9

DDG

MII

MS

MVE

RA

CG

Page 10: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Code Generation (CG)

• Prologues and epilogues are partially register allocated

• Need several epilogues due to different register sets

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 10

DDG

MII

MS

MVE

RA

CG

P

P0

K0

K1 E0

E1

E2

E

Page 11: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Benchmark Results

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 11

0.9

0.95

1

1.05

1.1

1.15

1.2

BT CG EP FT IS LU MG SP UA

spe

ed

up

NAS Parallel Benchmark

ABI n32

ABI 64

Page 12: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Benchmark Results

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 12

0.95

0.96

0.97

0.98

0.99

1

1.01

1.02

1.03

1.04

1.05

41

0.b

wav

es

43

3.m

ilc

43

4.z

eusm

p

43

5.g

rom

acs

43

6.c

actu

sAD

M

43

7.le

slie

3d

44

4.n

amd

44

7.d

ealII

45

0.s

op

lex

45

3.p

ovr

ay

45

4.c

alcu

lix

45

9.G

emsF

DTD

46

5.t

on

to

47

0.lb

m

48

1.w

rf

48

2.s

ph

inx3

40

0.p

erlb

ench

40

1.b

zip

2

40

3.g

cc

42

9.m

cf

44

5.g

ob

mk

45

6.h

mm

er

45

8.s

jen

g

46

2.li

bq

uan

tum

46

4.h

26

ref

47

1.o

mn

etp

p

47

3.a

star

spe

ed

up

SPEC2006

ABI n32

ABI 64

Page 13: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Conclusion and Future Work

• Spilling support needed to enable more loops for SWP

• Screen out loops with small trip count during runtime to reduce SWP overhead

• Hit-under-miss support to overcome current hardware limitations

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 13

Page 14: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Questions ?

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 14