’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28...

47
28 November 2002 ,6$GHVLJQRSWLRQV ’$5.LQDQXWVKHOO ,6$GHVLJQRSWLRQV +RZLWDOOVWDUWHG«WKHIRVVLOV

Transcript of ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28...

Page 1: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

,6$�GHVLJQ�RSWLRQV�

(ULN +DJHUVWHQ

8SSVDOD 8QLYHUVLW\

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK�

'$5.�

����

'$5.��LQ�D�QXWVKHOO

�� 0HPRU\ 6\VWHPV �FDFKHV� 90� '5$0�PLFUREHQFKPDUNV� «�

�� &38V �SLSHOLQHV� ,/3� VFKHGXOLQJ�6XSHUVFDODUV� 9/,:V� HPEHGGHG� «�

�� 0XOWLSURFHVVRUV �7/3� FRKHUHQFH�LQWHUFRQQHFWV� VFDODELOLW\� FOXVWHUV� «�

�� )XWXUH� �SK\VLFDO OLPLWDWLRQV� 7/3�,/3LQ WKH &38�«�

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK�

'$5.�

����

,6$�GHVLJQ�RSWLRQV

� +RZ LW DOO VWDUWHG

� ,QVWUXFWLRQ VHW RYHUYLHZ

� 7KH SURIHVVRU MRNH

� 2SWLRQV IRU ,6$ GHVLJQ

� 6SHFLI\LQJ +:

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK�

'$5.�

����

+RZ�LW�DOO�VWDUWHG«WKH�IRVVLOV

� (1,$& -�3� (FNHUW DQG -� 0DXFKO\� 8QLY� RI 3HQQV\OYDQLD� ::�

� (OHFWUR 1XPHULF ,QWHJUDWRU $QG &DOFXODWRU� ������ YDFXXP WXEHV

� ('9$&� -� 9 1HXPDQQ� RSHUDWLRQDO ����

� (OHFWULF 'LVFUHWH 9DULDEOH $XWRPDWLF &RPSXWHU �VWRUHG SURJUDPV�

� ('6$&� 0�:LONHV� &DPEULGJH 8QLYHUVLW\� ����

� (OHFWULF 'HOD\ 6WRUDJH $XWRPDWLF &DOFXODWRU

� 0DUN�,��� +� $LNHQ� +DUYDUG� ::�� (OHFWUR �PHFKDQLF

� .� =XVH� *HUPDQ\� HOHFWURPHFK� FRPSXWHU� VSHFLDO SXUSRVH�::�

� %(6.� .7+� (� 6WHPPH �QRZ DW &KDOPHUV� HDUO\ ��V

� 60,/� /7+ PLG ��V

Page 2: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK�

'$5.�

����

+RZ GR \RX WHOO D JRRG LGHD IURP D EDG

7KH %RRN� 7KH SHUIRUPDQFH�FHQWULF DSSURDFK

� &3, �H[HFXWLRQ�F\FOHV � �LQVWUXFWLRQV H[HFXWHG �a,6$JRRGQHVV�

� &3, F\FOH WLPH ! SHUIRUPDQFH

� &3, &3,&38

� &3,0HP

7KH ERRN UDUHO\ FRYHUV RWKHU GHVLJQ WUDGHRIIV

� 7KH IHDWXUH FHQWULF DSSURDFK���

� 7KH FRVW�FHQWULF DSSURDFK���

� (QHUJ\�FHQWULF DSSURDFK«

� 9HULILFDWLRQ �FHQWULF DSSURDFK���

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK�

'$5.�

����

7KH %RRN� 4XDQWLWDWLYH PHWKRGRORJ\

Make design decisions based on execution statistics. Select workloads (programs representative for usage)

Instruction mix measurements: statistics of relative usage of different components in an ISA

Experimental methodologies

� Profiling through tracing

� ISA simulators

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK�

'$5.�

����

7ZR�JXLGLQJ�VWDUV��� WKH�5,6&�DSSURDFK�

0DNH WKH FRPPRQ FDVH IDVW

�6LPXODWH DQG SURILOH DQWLFLSDWHG H[HFXWLRQ

�0DNH FRVW�IXQFWLRQV IRU IHDWXUHV

�2SWLPL]H IRU RYHUDOO HQG UHVXOW �HQGSHUIRUPDQFH�

:DWFK RXW IRU $PGDKOV ODZ� Speedup = Execution_time OLD / Execution_time NEW

� >(1-Fraction ENHANCED) + Fraction ENHANCED / /Speedup ENHANCED) @

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK�

'$5.�

����

,QVWUXFWLRQ 6HW $UFKLWHFWXUH �,6$�

�� WKH LQWHUIDFH EHWZHHQ VRIWZDUH DQG KDUGZDUH�

Tradeoffs between many options:

•functionality for OS and compiler

•wish for many addressing modes

•compact instruction representation

•format compatible with the memory system of choice

•desire to last for many generations

•bridging the semantic gap (old desire...)

•Note: the biggest “customer” is the compiler

Page 3: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK�

'$5.�

����

,6$�WUHQGV�WRGD\

� CPU families built around “Instruction Set Architectures” ISA

� Many incarnations of the same ISA

� ISAs lasting longer (~10 years)

� Consolidation in the market - fewer ISAs (not for embedded…)

� 15 years ago ISAs were driven by academia

� Today ISAs technically do not matter all that much (market-driven)

� How many of you will ever design an ISA?

� How many ISAs will be designed in Sweden?

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

&RPSLOHU�2UJDQL]DWLRQ

FortranFront-end

C Front-end

C++Front-end...

IntermediateRepresentation

High-levelOptimization

Global & LocalOptimization

Code Generation

Code

Machine-independentTranslation

Procedure in-liningLoop transformation

Register AllocationCommon sub-expressions

Instruction selectionconstant folding

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

7KUHH�GLIIHUHQW�PHPRU\�DOORFDWLRQV�PRGHOV

� 6WDFN� ORFDO YDULDEOHV LQ DFWLYDWLRQ UHFRUG

� DGGUHVVLQJ UHODWLYH WR VWDFN SRLQWHU

� VWDFN SRLQWHU PRGLILHG RQ FDOO�UHWXUQ

� *OREDO GDWD DUHD� ODUJH FRQVWDQWV

� JOREDO VWDWLF VWUXFWXUHV

� +HDS� G\QDPLF REMHFWV

� RIWHQ DFFHVVHG WKURXJK SRLQWHUV

0

text

heap

data

stack

Context B

Segments

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

&ODVVLILFDWLRQ�RI�,6$V

� ,6$V DUH FODVVLILHG ZLWK UHVSHFW WR��5HJLVWHU PRGHO

�1XPEHU RI RSHUDQGV IRU HDFK LQVWUXFWLRQ

�$GGUHVVLQJ PRGHV SHUPLWWHG

�2SHUDWLRQV SURYLGHG LQ WKH LQVWUXFWLRQ VHW

�7\SH DQG VL]H RI RSHUDQGV

Page 4: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

([HFXWLRQ�LQ�D�&38

”Machine Code”

”Data”

&38

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

2SHUDQG�PRGHOV�Example: C := A + B

S tack A ccum u la to r R eg is te r

P U S H [A ] P U S H [B ] A D D P O P [C ]

LO A D [A ] A D D [B ] S T O R E [C ]

LO A D R 1 ,[A ] A D D R 1 ,[B ] S T O R E [C ],R 1

MemAccumulatorimplicit

MemStackimplicit

MemRegisterexplicitly

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

6WDFN�EDVHG�PDFKLQH�Example: C := A + B

A:12B:14C:10PUSH [A]

PUSH [B]ADDPOP [C]

0HP�

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

6WDFN�EDVHG�PDFKLQH�Example: C := A + B

A:12B:14C:10PUSH [A]

PUSH [B]ADDPOP [C]

0HP�

��

Page 5: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

6WDFN�EDVHG�PDFKLQH�Example: C := A + B

A:12B:14C:10PUSH [A]

PUSH [B]ADDPOP [C]

0HP�

��

��

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

6WDFN�EDVHG�PDFKLQH�Example: C := A + B

A:12B:14C:10PUSH [A]

PUSH [B]ADDPOP [C]

0HP�

��

��

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

6WDFN�EDVHG�PDFKLQH�Example: C := A + B

A:12B:14C:10PUSH [A]

PUSH [B]ADDPOP [C]

0HP�

��

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

6WDFN�EDVHG�PDFKLQH�Example: C := A + B

A:12B:14C:26PUSH [A]

PUSH [B]ADDPOP [C]

0HP�

��

Page 6: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

6WDFN�EDVHG�

� ,PSOLFLW RSHUDQGV

� &RPSDFW FRGH IRUPDW ��LQVWU� �E\WH�

� 6LPSOH WR LPSOHPHQW

� 1RW RSWLPDO IRU VSHHG���

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

5HJLVWHU�EDVHG�PDFKLQH�Example: C := A + B

A:12B:14C:10

LD R1, [A]LD R7, [B]ADD R2, R1, R7ST R2, [C]

'DWD�

��

��

"

" "

”Machine Code”����

��

��

��

��

��

��

��

��

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

5HJLVWHU�EDVHG

� ;���

� 5,6&V �$OSKD� 63$5&� +3�3$«�

� 9/,: �,$��� «�

� ([SOLFLW RSHUDQGV �L�H�� ´UHJLVWHUV´�

�:DVWHIXO LQVWU� IRUPDW ��LQVWU� �E\WHV�

� 6XLWV RSWLPL]LQJ FRPSLOHUV

� 2SWLPDO IRU VSHHG���

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

General-purpose register model dominates today

Reason: general model for compilers and efficient implementation wise

3URSHUWLHV�RI�RSHUDQG�PRGHOV

CompilerConstruction

ImplementationEfficiency

Code Size

Stack + -- ++

Accumulator -- - +

Register ++ ++ --

Page 7: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

7UDGLWLRQDO $GGUHVVLQJ 0RGHV �9$;�

Addressing mode

Example instruction

Meaning

Register Add R4,R3 Regs[R4] ← Regs[R4]+Regs[R3]

Immediate

(30%) Add R4,#3 Regs[R4] ← Regs[R4]+ 3

Displacement

(40%) Add R4,100(R1) Regs[R4] ← Regs[R4]+Mem[100+Regs[R1]]

Register deferred or indirect

(20%)

Add R4,(R1) Regs[R4] ← Regs[R4]+Mem[Regs[R1]]

Indexed Add R3,(R1+R2) Regs[R3] ← Regs[R3]+Mem[Regs[R1]+ Regs[R2]]

Direct or absolute Add R1,(1001) Regs[R1] ← Regs[R2]+Mem[1001]

Memory indirect or memory deferred

Add R1,@(R3) Regs[R1] ← Regs[R1]+Mem[Mem[Regs[R3]]]

Autoincrement Add R1,(R2)+ Regs[R1] ← Regs[R1]+Mem[Regs[R2]]; Regs[R2] ← Regs[R2]+d

Autodecrement Add R1,-(R2) Regs[R2] ← Regs[R2]-d; Regs[R1] ← Regs[R1]+Mem[Regs[R2]]

Scaled

(10%) Add R1,100(R2)[R3] Regs[R1]

← Regs[R1]+Mem[100+Regs[R2]+Regs[R3]*d]

Are all of these addressing modes needed?

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

$FWXDO�XVH�RI�DGGU��PRGHV

� :KDW DGGUHVVLQJ PRGHV GRPLQDWHXVDJH"

� Immediate and Displacement are the most common addressing modes in SPEC89 on a VAX

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

,PSRUWDQW�$GGUHVVLQJ�0RGHV

Addressingmode

Exampleinstruction

Meaning When used

Immediate Add R4,#3 Regs[R4] ← Regs[R4]+ 3 For constants.

Displacement Add R4,100(R1) Regs[R4] ← Regs[R4]+

Mem[100+Regs[R1]]

Accessing local variables.

Registerdeferred orindirect

Add R4,(R1) Regs[R4] ← Regs[R4]+

Mem[Regs[R1]]

Accessing using a pointer or acomputed address.

Scaled Add R1,100(R2)[R3] Regs[R1] ← Regs[R1]+Mem[100+Regs[R2]+Regs[R3]*d]

Used to index arrays. May beapplied to any indexedaddressing mode in somemachines.

Are all of these addressing modes needed?

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

6L]H�RI�GLVSODFHPHQW

� +RZ ELJ D UDQJH VKRXOG WKHGLVSODFHPHQW YDOXH FRYHU"

� 12-16 bits cover 75% - 99% of the displacements

i.e., #bits

Page 8: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

6L]H�RI�LPPHGLDWHV

� +RZ LPSRUWDQW DUH LPPHGLDWHV DQGKRZ ELJ DUH WKH\"

� Immediate operands are very important for ALU and compare operations

� 16-bit immediates seem sufficient (75%-80%)

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

2SHUDWLRQ�W\SHV�LQ�WKH�,6$

Operator type Examples

Arithmetical and logical Integer arithmetic and logical operations: add, and, subtract, or

Data transfer Loads/stores (move instructions on machines with memoryaddressing)

Control Branch, jump, procedure call and return

System Operating system call, virtual memory management instructions

Floating point Floating-point operations: add, multiply,...

Decimal Decimal add, decimal multiply, decimal-to-character conversions

String String move, string compare, string search

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

&RQWURO�LQVWUXFWLRQV

� &RQGLWLRQDO EUDQFKHV

� 8QFRQGLWLRQDO EUDQFKHV �MXPSV�

� 3URFHGXUH FDOO�UHWXUQV

Conditional branches dominate by far

Intuition: program loops are common!

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

6L]H�RI�EUDQFK�GLVSODFHPHQW

� +RZ ELJ D EUDQFK GLVSODFHPHQW LV QHHGHG"

� 8-bit branch displacements cover most cases

Intuition: most program loops are tight!

Page 9: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

%UDQFK�FRQGLWLRQ�HYDOXDWLRQ

Na m e How? Adva nta ge s Dis a dva n t-a ge s

Cond itionCode (C C)

S pe cia l b itsa rem a n ipu la te d

CC s e t fo rfre e

Extra s ta te

Cond itionre g is te r

Te s t ge ne ra lpu rpos ere g is te r

s im p le Us e s upre g is te rs

Com pa rea ndbra nch

Com pa re ispa rt o fbra nch

One ins tr.Ins te a d o ftwo

Extra workpe r ins tr.

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

,QVWUXFWLRQ�IRUPDWV

� $ YDULDEOH LQVWUXFWLRQ IRUPDW \LHOGV FRPSDFW FRGHEXW LQVWUXFWLRQ GHFRGLQJ LV PRUH FRPSOH[

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

6XPPDU\��&ODVVLILFDWLRQ RI�,6$V

� ,6$V DUH FODVVLILHG ZLWK UHVSHFW WR��5HJLVWHU PRGHO

�1XPEHU RI RSHUDQGV IRU HDFK LQVWUXFWLRQ

�$GGUHVVLQJ PRGHV SHUPLWWHG

�2SHUDWLRQV SURYLGHG LQ WKH LQVWUXFWLRQ VHW

�7\SH DQG VL]H RI RSHUDQGV

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

,6$V�YHUVXV�FRPSLOHUV

Rules of thumb when designing an ISA:

� Regularity (operations, data types and addressing modes should be orthogonal)

� Provide primitives, not high-level constructs. Complex instructions are often too specialized.

Page 10: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

7KH LPSDFW RI FRPSLOHU RSWLPL]DWLRQV

� Compiler optimizations affect the number of instructions as well as the distribution of executed instructions (the instruction mix)

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

$�JHQHULF�DUFKLWHFWXUH/RDG�VWRUH DUFKLWHFWXUH ��� ELWV�

� Many (32) gp integer registers (GPR) and single precision floating point registers (GPR0 = 0)

� Fixed instruction width and format

� Addressing modes: immediate and displacement

� Supported data types: bytes, half word (16 bits), word (32 bits), single and double precision IEEE floating points

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

*HQHULF�LQVWUXFWLRQV

Ins tructiontype

Example Meaning

Load LW R1,30(R2) Regs [R1] ←Mem[30+Regs[R2]]

Store SW 30(R2),R1 Mem[30+Regs[R2]] ←Regs[R1]

ALU ADD R1,R2,R3 Regs[R1] ← Regs[R2] +Regs[R3]

Control BEQZ R1,KALLE if (Regs [R1]==0) PC ← KALLE

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

6SHFLI\LQJ�KDUGZDUH

� ���Q � 7UDQVIHU Q ELWV

� 5�Q � %LW Q RI UHJLVWHU 5�

� 5����� � 7KH PRVW VLJQLILFDQW E\WH RI 5� �%LJ HQGLDQ��

� �� � $ E\WH RI DOO ]HURHV �UHSHDW WKH ILHOG Q WLPHV�

� 0>��@ � E\WH �� LQ PHPRU\

� ����0>��@ � &RQFDWHQDWH ]HUR�E\WH �06%� ZLWK WKHE\WH #PHP����

Page 11: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

*HQHULF�0RYH�,QVWUXFWLRQV

� /RDG DQG 6WRUH

�/%� /%8� 6% �� E\WH FKXQNV

�/+� /+8� 6+ �� KDOI ZRUG FKXQNV

�/:� 6: �� ZRUG FKXQNV

�/)� 6) �� ZRUG FKXQNV WR IORDWLQJ SRLQW UHJV

�/'� 6' GRXEOH SUHFLVLRQ WR )3 UHJV �� UHJVSHU 23�

� 5HJLVWHU WR 5HJLVWHU PRYHV

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

([DPSOHV�+DUGZDUH�'HVFULSWLRQV

/%8 5�� ���5�� �� ORDG E\WH XQVLJQHG

5� ����� ��� �� 0>���5�@

0000000000000000000000002 M(“41”+R3)24 8

R1:

0 31

SSSSSSSSSSSSSSSS2 M(“40”+ R3) M(“41”+R3)16 8 8

R1:0 31

LH R1, 40(R3) -- load halfword (signed)R1 <--32 (M[40+R3]0)16 ## M[40+R3] ## M[41+R3]

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

*HQHULF�$/8�,QVWUXFWLRQV

� ,QWHJHU DULWKPHWLF

� >DGG� VXE@ [ >VLJQHG� XQVLJQHG@ [>UHJLVWHU�LPPHGLDWH@

� H�J�� $''� $'',� $''8� $''8,� 68%� 68%,�68%8� 68%8,

� /RJLFDO

� >DQG� RU� [RU@ [ >UHJLVWHU� LPPHGLDWH@

� H�J�� $1'� $1',� 25� 25,� ;25� ;25,

� /RDG XSSHU KDOI LPPHGLDWH ORDG

� ,W WDNHV WZR LQVWUXFWLRQ WR ORDG D �� ELW LPPHGLDWH

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

([DPSOHV +DUGZDUH 'HVFULSWLRQV ���

$'', 5�� 5�� �� �� DGG LPPHGLDWH

5� �����5� � ³�´

R2 + “6”32

R1:0 31

“42” 0000000000000000216 16

R1:0 31

LHI R1, #42 -- load high immediateR1 <--32 “42” ## 016

Page 12: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

0RUH�*HQHULF�$/8�2SV

� 6KLIWV

� >OHIW� ULJKW@ [ >ORJLFDO� DULWKPHWLF@ [>LPPHGLDWH� UHJ@

�H�J�� 6//� 65$,� «

� 6HW FRQGLWLRQDO

� >OW� JW� OH� JH� HT� QH@ [ >LPPHGLDWH� UHJ@

�H�J�� 6/7� 6*(,� «

�3XWV D � RU D � LQ WKH GHVWLQDWLRQ UHJLVWHU

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

*HQHULF�,QVWUXFWLRQ�)RUPDWV

Opcode Func6 11

R-type0 31

Rs15

Rs25

Opcode Offset added to PC6 26

J-type0 31

Opcode Immediate6 16

I-type0 31

Rs5

Rd5

Rd5

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

*HQHULF�)3�,QVWUXFWLRQV

� )ORDWLQJ 3RLQW DULWKPHWLF� >DGG� VXE� PXOW� GLY@ [ >GRXEOH� VLQJOH@

�H�J�� $'''� $'')� 68%'� 68%'� «

� &RPSDUHV �VHWV ³FRPSDUH ELW´�� >OW� JW� OH� JH� HT� QH@ [ >GRXEOH�LPPHGLDWH@

�H�J�� /7'� *()� «

� &RQYHUW IURP�WR LQWHJHU� )SUHJV�&97)�,� &97)�'� &97,�'� «

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

6LPSOH�&RQWURO

� %UDQFKHV LI HTXDO RU LI QRW HTXDO� %(4=� %1(=� FPS WR UHJLVWHU� 3& � 3&���LPPHGLDWH

��

� %)37� %)3)� FPS WR ³)3 FRPSDUH ELW´� 3& � 3&���LPPHGLDWH

��

� -XPSV� -� -XPS �� 3& � 3& � LPPHGLDWH

��

� -$/� -XPS $QG /LQN �� 5�� � 3&��� 3& � 3& �LPPHGLDWH

��

� -$/5� -XPS $QG /LQN 5HJLVWHU �� 5�� � 3&��� 3& � 3& �5HJ

� -5� -XPS 5HJLVWHU �� 3& � 3& � 5HJ �³UHWXUQ IURP -$/ RU-$/5�

� 75$3� JR WR 26 YLD D YHFWRUHG DGGUHVV �PRUH ODWHU�

� 57)� 5H7XUQ )URP XVHU FRGH �PRUH ODWHU�

Page 13: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

,PSOHPHQWLQJ�,6$V���SLSHOLQHV�

(ULN +DJHUVWHQ

8SSVDOD 8QLYHUVLW\

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

(;$03/(� SLSHOLQH LPSOHPHQWDWLRQ

$GG 5�� 5�� 5�

$

0HP

, 5 ; :

5HJV�

� �

23� �,IHWFK

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

/RDG 2SHUDWLRQ�

/' 5�� PHP>FQVW�5�@

$

0HP

, 5 ; :

5HJV�

,IHWFK

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

6WRUH 2SHUDWLRQ�

67 PHP>FQVW�5�@� 5�

$

0HP

, 5 ; :

5HJV�

,IHWFK�

Page 14: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

([DPSOH� ��VWDJH SLSHOLQH

IF ID EX M WB

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

([DPSOH����VWDJH�SLSHOLQH

IF ID EX M WB

(d)

s1

s2

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

([DPSOH����VWDJH�SLSHOLQH

IF ID EX M WB

(d)

s1

s2

st data

pc

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

([DPSOH� ��VWDJH SLSHOLQH

IF ID EX M WB

(d)

s1

s2

st data

pc

dest dataearlyreg write

Page 15: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

,QLWLDOO\

'&%$

0HP

, 5 ; :

5HJV

/' 5HJ$� ���� � 5HJ&�

,) 5HJ& � ��� *272 $

5HJ% � 5HJ$ � �

5HJ& � 5HJ& � �

3&

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

&\FOH �

'&%$

0HP

, 5 ; :

5HJV

/' 5HJ$� ���� � 5HJ&�

,) 5HJ& � ��� *272 $

5HJ% � 5HJ$ � �

5HJ& � 5HJ& � �

3&

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

&\FOH �

'&%$

0HP

, 5 ; :

5HJV

/' 5HJ$� ���� � 5HJ&�

,) 5HJ& � ��� *272 $

5HJ% � 5HJ$ � �

5HJ& � 5HJ& � �

3&

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

&\FOH �

'&%$

0HP

, 5 ; :

5HJV

/' 5HJ$� ���� � 5HJ&�

,) 5HJ& � ��� *272 $

5HJ% � 5HJ$ � �

5HJ& � 5HJ& � �

3&

Page 16: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

&\FOH �

'&%$

0HP

, 5 ; :

5HJV

/' 5HJ$� ���� � 5HJ&�

,) 5HJ& � ��� *272 $

5HJ% � 5HJ$ � �

5HJ& � 5HJ& � �

3&

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

&\FOH �

'&%$

0HP

, 5 ; :

5HJV

/' 5HJ$� ���� � 5HJ&�

,) 5HJ& � ��� *272 $

5HJ% � 5HJ$ � �

5HJ& � 5HJ& � �

3&

$

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

&\FOH �

'&%$

0HP

, 5 ; :

5HJV

/' 5HJ$� ���� � 5HJ&�

,) 5HJ& � ��� *272 $

5HJ% � 5HJ$ � �

5HJ& � 5HJ& � �

3&

$

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

&\FOH �

'&%$

0HP

, 5 ; :

5HJV

/' 5HJ$� ���� � 5HJ&�

,) 5HJ& � ��� *272 $

5HJ% � 5HJ$ � �

5HJ& � 5HJ& � �

3&

$

%UDQFK Î 1H[W 3&

Page 17: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

&\FOH �

'&%$

0HP

, 5 ; :

5HJV

/' 5HJ$� ���� � 5HJ&�

,) 5HJ& � ��� *272 $

5HJ% � 5HJ$ � �

5HJ& � 5HJ& � �

3&

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

3LSHOLQH�&KDOOHQJHV

� %DODQFH WKH SLSHOLQH VWDJHV

� +DQGOH WKH IHHG�EDFN FDVHV

� 0LQLPL]H SLSHOLQH VWDOOV

� 3UHGLFW DQG SHUIRUP VSHFXODWLYH ZRUN

� 8QGR VSHFXODWLYH ZRUN

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

)XQGDPHQWDO�OLPLWDWLRQV+D]DUGV SUHYHQW LQVWUXFWLRQV IURP H[HFXWLQJ LQ SDUDOOHO�

Structural hazards: Simultaneous use of same resourceIf unified I+D$: LW will conflict with later I-fetch

Data hazards: Data dependencies between instructionsLW R1, 100(R2) /* result avail in 2 - 100 cycles */ADD R5, R1, R7

Control hazards: Change in program flowBNEQ R1, #OFFSETADD R5, R2, R3

Serialization of the execution by stalling the pipelineis one, although inefficient, way to avoid hazards

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

)XQGDPHQWDO W\SHV RI GDWD KD]DUGV

Code sequence Opi A

Opi+1A

5$: �5HDG�$IWHU�:ULWH�

2SL�� UHDGV $ EHIRUH 2SL PRGLILHV $� 2SL�� UHDGV ROG $�

WAR (Write-After-Read) Opi+1 modifies A before Opi reads A. Opi reads new A

WAW (Write-After-Write) Op i+1 modifies A before Opi. The value in A is the one written by Opi, i.e., an old A.

Page 18: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

+D]DUG�DYRLGDQFH�WHFKQLTXHV�

6WDWLF WHFKQLTXHV �FRPSLOHU�� FRGHVFKHGXOLQJ WR DYRLG KD]DUGV

'\QDPLF WHFKQLTXHV� KDUGZDUH PHFKDQLVPVWR HOLPLQDWH RU UHGXFH LPSDFW RI KD]DUGV �H�J��RXW�RI�RUGHU VWXII�

+\EULG WHFKQLTXHV� UHO\ RQ FRPSLOHU DV ZHOO DVKDUGZDUH WHFKQLTXHV WR UHVROYH KD]DUGV �H�J�9/,: VXSSRUW ± ODWHU�

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

)L[��%\SDVV�KDUGZDUH

IF ID EX M WB

� )RUZDUGLQJ �RU E\SDVVLQJ��SURYLGH D GLUHFW SDWK IURP 0 DQG:% WR (;

� 2QO\ KHOSV IRU $/8 RSV� :KDWDERXW ORDG RSHUDWLRQV"

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

$�PRUH�RSWLPL]HG�'/;�SLSHOLQH���

IF ID EX M WB

(d)

s1

s2

st data

pc

dest data

Data$DTLB…L2$…Mem

Instr$ITLB…L2$…Mem

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

'DWD GHSHQGHQF\ //

'&%$

0HP

, 5 ; :

5HJV

/' 5HJ$� ���� � 5HJ&�

,) 5HJ& � ��� *272 $

5HJ% � 5HJ$ � �

5HJ& � 5HJ& � �

Page 19: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

&\FOH �

'&%$

0HP

, 5 ; :

5HJV

/' 5HJ$� ���� � 5HJ&�

,) 5HJ& � ��� *272 $

5HJ% � 5HJ$ � �

5HJ& � 5HJ& � �

3&

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

)L[� FRGH VFKHGXOLQJ

'

&%

$

0HP

, 5 ; :

5HJV

/' 5HJ$� ���� � 5HJ&�

,) 5HJ& � ��� *272 $

5HJ% � 5HJ$ � �

5HJ& � 5HJ& � �6ZDS��

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

%UDQFK GHOD\V

'

&

%

$

0HP

, 5 ; :

5HJV

/' 5HJ$� ���� � 5HJ&�

,) 5HJ& � ��� *272 $

5HJ% � 5HJ$ � �

5HJ& � 5HJ& � �

´6WDOO´

$

� F\FOHV SHU LWHUDWLRQ RI � LQVWUXFWLRQV /1HHG ORQJHU EDVLF EORFNV ZLWK LQGHSHQGHQW LQVWU�

%UDQFK Î 1H[W 3&

´6WDOO´

´6WDOO´

´6WDOO´

1H[W 3&

3&

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

$YRLGLQJ�FRQWURO�KD]DUGV

IF ID EX M WB

Branch conditionand target addr. needed here

Branch conditionand target addr. available here

'XSOLFDWH UHVRXUFHV LQ $/8 WR FRPSXWH EUDQFK FRQGLWLRQDQG EUDQFK WDUJHW DGGUHVV HDUOLHU

%UDQFK GHOD\ FDQQRW EH FRPSOHWHO\ HOLPLQDWHG

%UDQFK SUHGLFWLRQ DQG FRGH VFKHGXOLQJ FDQUHGXFH WKH EUDQFK SHQDOW\

Page 20: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

7DNLQJ�D�%UDQFK

IF ID EX M WB

(d)

s1

s2

st data

pc

dest data

PC := PC + Imm

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

)L[�� 0LQLPL]LQJ %UDQFK 'HOD\ (IIHFWV

IF ID EX M WB

(d)

s1

s2

st data

pc

dest data

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

)L[���6WDWLF�WULFNV

Predict Branch not taken (a fairly rare case)zExecute successor instructions in sequence

z“Squash” instructions in pipeline if the branch is actually taken

zWorks well if state is updated late in the pipeline

z30%-38% of conditional branches are not taken on average

Predict Branch taken (a fairly common case)z62%-70% of conditional branches are taken on average

zDoes not make sense for the generic arch. but may do for other pipeline organizations

Delayed branch (schedule useful instr. in delay slot)zDefine branch to take place aftera following instruction

zCONS: this is visible to SW, i.e., forces compatibility between generations

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

6WDWLF�VFKHGXOLQJ�WR�DYRLG�VWDOOV

� 6FKHGXOLQJ DQ LQVWUXFWLRQ IURP EHIRUH LV DOZD\V VDIH

� 6FKHGXOLQJ IURP WDUJHW RU IURP WKH QRW�WDNHQ SDWK LV QRW DOZD\VVDIH� PXVW EH JXDUDQWHHG WKDW VSHFXODWLYH LQVWU� GR QR KDUP�

Page 21: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

(YDOXDWLQJ EUDQFK KD]DUG DYRLGDQFH WHFKQLTXHV

Pipeline speedup = #stages1 + Branch frequency x Branch penalty

S c h e d u lin gs c h e m e

B ra n c h p e n a lt y f o rin t e g . P g m

C P I S p e e d u p v s.U n p ip e l in e d

S t a l l p ip e l in e 1 1 . 1 7 4 . 3

P re d ic t t a k e n 1 1 . 1 7 4 . 3

P re d ic t n o tt a k e n

0 . 6 9 1 . 1 2 4 . 5

D e la y e d b ra n c h 0 . 2 1 1 . 0 4 4 . 8

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

),;�� 3UHGLFW QH[W 3&

'&%$

0HP

5 ;:

5HJV

/' 5HJ$� ���� � 5HJ&�

,) 5HJ& � ��� *272 $

5HJ% � 5HJ$ � �

5HJ& � 5HJ& � �

3&

$

%UDQFK Î 1H[W 3&

EXEEOH

EXEEOH

EXEEOH

,

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

%UDQFK 3UHGLFWRU %DVHG RQ +LVWRU\

'&%$

0HP

5 ;:

5HJV

/' 5HJ$� ���� � 5HJ&�

,) 5HJ& � ��� *272 $

5HJ% � 5HJ$ � �

5HJ& � 5HJ& � �

3&

$

%UDQFK Î 1H[W 3&

3&

,3&

$GGUHVV 7DJ

1H[W3&

1H[W )HZ ,QVWUXFWLRQ

%UDQFK7DUJHW

%XIIHU �L�H�� &DFKH�

*XHVV WKH QH[W3& KHUH��

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

'\QDPLF�EUDQFK�SUHGLFWLRQ

%UDQFKHV OLPLW SHUIRUPDQFH EHFDXVH�

� %UDQFK SHQDOWLHV DUH KLJK

� 3UHYHQW D ORW RI ,/3 IURP EHLQJ H[SORLWHG

Solution: Dynamic branch predictionto predict the outcome of conditional branches.

Benefits:PReduce time to determine branch conditionPReduce time to calculate the branch target address

Page 22: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

0RUH�RQ�'\QDPLF�%UDQFK�3UHGLFWLRQ

(ULN +DJHUVWHQ

8SSVDOD 8QLYHUVLW\

6ZHGHQ

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

%UDQFK�KLVWRU\�WDEOH$ VLPSOH EUDQFK SUHGLFWLRQ VFKHPH

IF ID EX MEM WB

PC

1

0

0

1

0

1

0

branches predicted as taken

� 7KH EUDQFK�SUHGLFWLRQ EXIIHU LV LQGH[HG E\ ELWV IURPEUDQFK�LQVWUXFWLRQ 3& YDOXHV

� ,I SUHGLFWLRQ LV ZURQJ� WKHQ LQYHUW SUHGLFWLRQProblem: can cause two mispredictions in a row

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

$�WZR�ELW�SUHGLFWLRQ�VFKHPH

Predict Not taken

Predict taken Predict taken

Predict Not taken

Not taken

Taken

Not taken

Taken

Taken Not taken

Taken

Not taken

� 5HTXLUHV SUHGLFWLRQ WR PLVV WZLFH LQ RUGHU WR FKDQJHSUHGLFWLRQ ! EHWWHU SHUIRUPDQFH

� 3HUIRUPDQFH�� ������ PLVV SUHGLFWLRQ IUHTXHQF\ IRU 63(&��

� ,QWHJHU SURJUDPV KDYH KLJKHU PLVV IUHTXHQF\ WKDQ )3SURJUDPV

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

'\QDPLF 6FKHGXOLQJ 3DVW %UDQFKHV

/'

$''

68%

67

! �"

/'

$''

68%

67

!�"

/'

$''

68%

67

!�"

/'

$''

68%

67

�"

<

<

<

Page 23: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

1�OHYHO�KLVWRU\

� 1RW RQO\ WKH 3& RI WKH %5 LQVWUXFWLRQ PDWWHUV�DOVR KRZ \RX¶YH JRW WKHUH LV LPSRUWDQW

� $SSURDFK�� 5HFRUG WKH RXWFRPH RI WKH ODVW 1 EUDQFKHV LQ DYHFWRU RI 1 ELWV

� ,QFOXGH WKH ELWV LQ WKH LQGH[LQJ RI WKH EUDQFK WDEOH

� 3URV�&RQV� 6DPH %5 LQVWUXFWLRQ PD\ KDYHPXOWLSOH HQWULHV LQ WKH EUDQFK WDEOH

�1�0� SUHGLFWLRQ 1 OHYHOV RI 0�ELW SUHGLFWLRQ

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

7RXUQDPHQW�SUHGLFWLRQ�

� ,VVXHV�� 1R RQH SUHGLFWRU VXLWV DOO DSSOLFDWLRQV

� $SSURDFK�� ,PSOHPHQW VHYHUDO SUHGLFWRUV DQG G\QDPLFDOO\VHOHFW WKH PRVW DSSURSULDWH RQH

� 3HUIRUPDQFH H[DPSOH 63(&���� ��ELW SUHGLFWLRQ� �� PLVV SUHGLFWLRQ

� ����� ��OHYHO� ��ELW� �� PLVV SUHGLFWLRQ

� 7RXUQDPHQWV� �� PLVV SUHGLFWLRQ

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

5HWXUQ�DGGUHVV�VWDFN

� 3RSXODU VXEURXWLQHV DUH FDOOHG IURPPDQ\ SODFHV LQ WKH FRGH�

� %UDQFK SUHGLFWLRQ PD\ EH FRQIXVHG��

� 0D\ KXUW RWKHU SUHGLFWLRQV

� 1HZ DSSURDFK�

�3XVK WKH UHWXUQ DGGUHVV RQ D >VPDOO@ VWDFNDW WKH WLPH RI WKH FDOO

�3RS DGGUHVVHV RQ UHWXUQ

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

%UDQFK�WDUJHW�EXIIHU

P Predicts branch target address in the IF stage

P Can be combined with 2-bit branch prediction

Page 24: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

3XWWLQJ�LW�WRJHWKHU

� %7% VWRUHV LQIR DERXW WDNHQ LQVWUXFWLRQV

� &RPELQHG ZLWK D VHSDUDWH EUDQFK KLVWRU\WDEOH

� ,QVWUXFWLRQ IHWFK VWDJH KLJKO\ LQWHJUDWHG IRUEUDQFK RSWLPL]DWLRQV

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

)ROGLQJ�EUDQFKHV

� %7% RIWHQ FRQWDLQV WKH QH[W IHZLQVWUXFWLRQV DW WKH GHVWLQDWLRQ DGGUHVV

� 8QFRQGLWLRQDO EUDQFKHV �DQG VRPH FRQGDV ZHOO� EUDQFKHV H[HFXWH LQ ]HUR F\FOHV

�([HFXWH WKH GHVW LQVWUXFWLRQ LQVWHDG RI WKHEUDQFK �LI WKHUH LV D KLW LQ WKH %7% DW WKH ,) VWDJH�

� ´%UDQFK IROGLQJ´

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

%UDQFK�SUHGLFWLRQ�SHQDOWLHVIRU D %UDQFK 7DUJHW %XIIHU VFKHPH

Ins tructionin buffe r

Pred ic tion Actualbranch

Penaltycyc les

Yes Taken Taken 0

Yes Taken Not taken 2

Yes Not taken Not taken 0

Yes Not taken Taken 2

No Taken 2

No Not taken 1

,WDQLXP� UDQJLQJ IURP � WR � F\FOHV«

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

([FHSWLRQ�KDQGOLQJ�LQ�SLSHOLQHV

0XVW UHVWDUW DQ LQVWUXFWLRQ WKDW FDXVHV DQ H[FHSWLRQ�LQWHUUXSW� WUDS� IDXOW� ³SUHFLVH LQWHUUXSWV´

�«DV ZHOO DV DOO LQVWUXFWLRQV IROORZLQJ LW��

A solution:

1. Force a trap instruction into the pipeline

2. Turn off all writes for the faulting instruction

3. Save the PC for the faulting instruction- to be used in return from exception

- may need to save multiple PC values

Page 25: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

*XDUDQWHHLQJ WKH H[HFXWLRQ RUGHU

Exceptions may be generated in another order than the instruction execution order

Pipeline stage Problem causing exception

IF Page fault on instruction fetch;misaligned memory access; memoryprotection violation

ID Undefined or illegal opcode

EX Arithmetic exception

MEM Page fault on data access;misaligned memory access; memoryprotection violation

WB none

([DPSOH VHTXHQFH�

OZ �H�J�� SDJH IDXOW LQ 0(0�

DGG �H�J�� SDJH IDXOW LQ ,)�

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��

'$5.�

����

6XPPDU\�3LSHOLQLQJ

3LSHOLQLQJ�

�6SHHGV XS WKURXJKSXW� QRW ODWHQF\

�6SHHGXS ≤ �VWDJHV

�+D]DUGV DUH IXQGDPHQWDO OLPLWV�

� 6WUXFWXUDO� QHHG PRUH +:

� 'DWD �5$:�:$5�:$:�� QHHG IRUZDUGLQJ DQGFRPSLOHU VFKHGXOLQJ

� &RQWURO� GHOD\HG EUDQFK

�3UHFLVH H[FHSWLRQV D PDMRU KHDGDFKH

2YHUODSSLQJ�([HFXWLRQ

(ULN +DJHUVWHQ

8SSVDOD 8QLYHUVLW\

6ZHGHQ

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

0XOWLF\FOH�RSHUDWLRQV�LQ�WKH�SLSHOLQH��IORDWLQJ�SRLQW�

� ,QWHJHU XQLW� +DQGOHV LQWHJHU LQVWUXFWLRQV� EUDQFKHV� DQG ORDGV�VWRUHV

� 2WKHU XQLWV� 0D\ WDNH VHYHUDO F\FOHV HDFK� 6RPH XQLWV DUH SLSHOLQHG

�PXOW�DGG� RWKHUV DUH QRW �GLY�

Page 26: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

3DUDOOHOLVP EHWZHHQ LQWHJHU DQG )3 LQVWUXFWLRQV

MU LTD F 2 ,F 4 ,F 6 IF ID M 1 M2 M3 M4 M5 M6 M7 ME M W B

AD D D F 8 ,F 1 0 ,F 1 2 IF ID A 1 A2 A3 A 4 ME M W B

S U B I R 2 ,R 3 ,# 8 IF ID E X ME M W B

LD F 1 4 ,0 (R 2 ) IF ID E X ME M W B

+RZ WR DYRLG VWUXFWXUDO DQG 5$: KD]DUGV�

6WDOO LQ ,' VWDJH ZKHQ� 7KH IXQFWLRQDO XQLW FDQ EH RFFXSLHG

� 0DQ\ LQVWUXFWLRQV FDQ UHDFK WKH :% VWDJH DW WKH VDPHWLPH

5$: KD]DUGV�� 1RUPDO E\SDVVLQJ IURP 0(0 DQG :% VWDJHV

� 6WDOO LQ ,' VWDJH LI DQ\ RI WKH VRXUFH RSHUDQGV LV DGHVWLQDWLRQ RSHUDQG RI DQ LQVWUXFWLRQ LQ DQ\ RI WKH )3

IXQFWLRQDO XQLWV'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

:$5�DQG�:$:�KD]DUGV�IRU�PXOWLF\FOH�RSHUDWLRQV

�:$5 KD]DUGV DUH D QRQ�LVVXH EHFDXVHRSHUDQGV DUH UHDG LQ SURJUDP RUGHU

Example of a WAW hazard:DIVF F0,F2,F4 FP divide 24 cycles...SUBF F0,F8,F10 FP sub 3 cycles

SUB finishes before DIV ; out-of-order completion

WAW hazards are avoided by:zstalling the SUBF until DIVF reaches the MEM stage, or

zdisabling the write to register F0 for the DIVF instruction

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

)3�([FHSWLRQV

([DPSOH� ',9) )��)��)� �� F\FOHV

$'') )���)���)� � F\FOHV

68%) )���)���)�� � F\FOHV

SUBF may generate a trap before DIVF has completed!!

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

'\QDPLF�,QVWUXFWLRQ 6FKHGXOLQJ�ZLOO RQO\ EH WRXFKHG EULHIO\ LQ WKLV FRXUVH�

.H\ LGHD� DOORZ VXEVHTXHQW LQGHSHQGHQW LQVWUXFWLRQV WR SURFHHG

',9' )��)��)� � WDNHV ORQJ WLPH

$''' )���)��)� � VWDOOV ZDLWLQJ IRU )�

68%' )���)��)�� � /HW WKLV LQVWU� E\SDVV WKH $'''

� (QDEOHV RXW�RI�RUGHU H[HFXWLRQ � RXW�RI�RUGHU FRPSOHWLRQ�

IF ID M WBEX

Two historical schemes used in “recent” machines:

Tomasuloin IBM 360/91 in 1967 (also in Power-2)

Scoreboarddates back to CDC 6600 in 1963

Page 27: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6LPSOH�6FRUHERDUG�3LSHOLQH

IF

Rea

d op

eran

ds

WriteIssue

MemID stage

� ,VVXH� 'HFRGH DQG FKHFN IRU VWUXFWXUDO KD]DUGV

� 5HDG RSHUDQGV� ZDLW XQWLO QR GDWD KD]DUG� WKHQ UHDG RSHUDQGV�5$:�

� $OO GDWD KD]DUGV DUH KDQGOHG E\ WKH VFRUHERDUG PHFKDQLVP

:U5HJ

6FRUERDUG

,VVXH

,QW

0HP

)3

$GG

)3

0XO�

)3

0XO�

)3

'LY

0HP

5G5HJ

,)

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

([WHQGHG�6FRUHERDUG�

,VVXH� ,QVWUXFWLRQ LV LVVXHG ZKHQ�

1R VWUXFWXUDO KD]DUG IRU D IXQFWLRQDO XQLW

1R :$: ZLWK DQ LQVWUXFWLRQ LQ H[HFXWLRQ

5HDG� ,QVWUXFWLRQ UHDGV RSHUDQGV ZKHQ

WKH\ EHFRPH DYDLODEOH �5$:�

(;� 1RUPDO H[HFXWLRQ

:ULWH� ,QVWUXFWLRQ ZULWHV ZKHQ DOO SUHYLRXV LQVWUXFWLRQV

KDYH UHDG RU ZULWWHQ WKLV RSHUDQG �:$:� :$5�

The scoreboard is updated when an instruction proceedsto a new stage

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

/LPLWDWLRQV�ZLWK�VFRUHERDUGV�

7KH VFRUHERDUG WHFKQLTXH LV OLPLWHG E\�

� 1XPEHU RI VFRUHERDUG HQWULHV �ZLQGRZ VL]H�

� 1XPEHU DQG W\SHV RI IXQFWLRQDO XQLWV

� 1XPEHU RI SRUWV WR WKH UHJLVWHU EDQN

� +D]DUGV FDXVHG E\ QDPH GHSHQGHQFLHV

Tomasulo’s algorithm addresses the last two limitations

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

$QRWKHU�H[DPSOHDIV F0,F2,F4

ADDD F6, F0, F8

SUBD F8,F10,F14

MULD F6,F10, F8

5$:

:$5

5$:

:$:

�GHOD\HG D ORQJ WLPH

�GHOD\HG D IHZ F\FOHV

�H[HFXWHG ULJKW DZD\

:$5 DQG :$: DYRLGHG WKURXJK ´UHJLVWHU UHQDPLQJ´

DIV F0,F2,F4

ADDD F6, F0, F8

SUBD tmp1 ,F10,F14 ;can be executed right away

MULD tmp2 ,F10, tmp1 ;delayed a few cycles

Page 28: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

7RPDVXOR¶V�$OJRULWKP�WRXFKHG�EULHIO\ LQ�WKLV�FRXUVH�

� ,%0 ������ PLG ��¶V

� +LJK SHUIRUPDQFH ZLWKRXW FRPSLOHU VXSSRUW

� ([WHQGHG IRU PRGHUQ DUFKLWHFWXUHV

� 0DQ\ LPSOHPHQWDWLRQV �3RZHU3&� 3HQWLXP«�

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6LPSOH�7RPDVXOR¶V�$OJRULWKP

IF

Rea

d op

eran

ds

Issue

Mem

,VVXH

,QW

0HP

)3

$GG

)3

0XO�

)3

0XO�

)3

'LY

0HP

,)

5HV�

6WDWLRQ

��D����E����F����G����H��

&RPPRQ

'DWD %XV �&'%�

2S�GLY

'�)�

6��)�

6��)�

��

� � � � � � � � �

5H2UGHU

%XIIHU �52%�

2S�GLY

'�)�

6��F

6��I

��

2S�GLY

'�)�

6��F

6��I

��

:ULWH

6WDJH

5HJ� :ULWH

3DWK

#3 DIV F0,F2,F4#4 ADDD F6, F0, F8#5 SUBD F8,F10,F14#6 MULD F6,F10, F8

'�)�

9�F�I

��

)�

'�)�

9�F�I

��

��F�I

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6LPSOH�7RPDVXOR¶V�$OJRULWKP

IF

Rea

d op

eran

ds

Issue

Mem

,VVXH

,QW

0HP

)3

$GG

)3

0XO�

)3

0XO�

)3

'LY

0HP

,)

5HV�

6WDWLRQ

��D

��

��E

��

��F

��

��G

��

��H

��

&RPPRQ

'DWD %XV �&'%�

2S

'

6�

6�

� � � � � � � � �

5H2UGHU

%XIIHU �52%�

2S

'�)�

6��Y

6��Y

2S

'

6��Y�SWU

6��Y�SWU

:ULWH

6WDJH

5HJ� :ULWH

3DWK

#3 DIV F0,F2,F4#4 ADDD F6, F0, F8#5 SUBD F8,F10,F14#6 MULD F6,F10, F8

'

DQVZ

'

'

DQVZ

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6LPSOH�7RPDVXOR¶V�$OJRULWKP

IF

Rea

d op

eran

ds

Issue

Mem

,VVXH

,QW

0HP

)3

$GG

)3

0XO�

)3

0XO�

)3

'LY

0HP

,)

5HV�

6WDWLRQ

��D

��

��E

��

��F

��

��G

��

��H

��

&RPPRQ

'DWD %XV �&'%�

� � � � � � � � �

5H2UGHU

%XIIHU �52%�

:ULWH

6WDJH

5HJ� :ULWH

3DWK

#3 DIV F0,F2,F4#4 ADDD F6,F0,F8#5 SUBD F8,F10,F14#6 MULD F6,F10,F8

'

2S

'

6�

6�

2S

'�)�

6��Y

6��Y

2S

'

6��Y�SWU

6��Y�SWU

'

DQVZ

'

DQVZ

Page 29: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6LPSOH�7RPDVXOR¶V�$OJRULWKP

IF

Rea

d op

eran

ds

Issue

Mem

,VVXH

,QW

0HP

)3

$GG

)3

0XO�

)3

0XO�

)3

'LY

0HP

,)

5HV�

6WDWLRQ

��D

��

��E

��

��F

��

��G

��

��H

��

&RPPRQ

'DWD %XV �&'%�

� � � � � � � � �

5H2UGHU

%XIIHU �52%�

:ULWH

6WDJH

5HJ� :ULWH

3DWK

#3 DIV F0,F2,F4#4 ADDD F6,F0,F8#5 SUBD F8,F10,F14#6 MULD F6,F10,F8

'

��

2S

'

6�

6�

2S

'�)�

6��Y

6��Y

2S

'

6��Y�SWU

6��Y�SWU

'

DQVZ

'

DQVZ

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6LPSOH�7RPDVXOR¶V�$OJRULWKP

IF

Rea

d op

eran

ds

Issue

Mem

,VVXH

,QW

0HP

)3

$GG

)3

0XO�

)3

0XO�

)3

'LY

0HP

,)

5HV�

6WDWLRQ

��D

��

��E

��

��F

��

��G

��

��H

��

&RPPRQ

'DWD %XV �&'%�

� � � � � � � � �

5H2UGHU

%XIIHU �52%�

:ULWH

6WDJH

5HJ� :ULWH

3DWK

#3 DIV F0,F2,F4#4 ADDD F6,F0,F8#5 SUBD F8,F10,F14#6 MULD F6,F10,F8

'

��

2S

'

6�

6�

2S

'�)�

6��Y

6��Y

2S

'

6��Y�SWU

6��Y�SWU

'

DQVZ

'

DQVZ

��

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6LPSOH�7RPDVXOR¶V�$OJRULWKP

IF

Rea

d op

eran

ds

Issue

Mem

,VVXH

,QW

0HP

)3

$GG

)3

0XO�

)3

0XO�

)3

'LY

0HP

,)

5HV�

6WDWLRQ

��D

��

��E

��

��F

��

��G

��

��H

��

&RPPRQ

'DWD %XV �&'%�

� � � � � � � � �

5H2UGHU

%XIIHU �52%�

:ULWH

6WDJH

5HJ� :ULWH

3DWK

#3 DIV F0,F2,F4#4 ADDD F6,F0,F8#5 SUBD F8,F10,F14#6 MULD F6,F10,F8

'

��

2S

'

6�

6�

2S

'�)�

6��Y

6��Y

2S

'

6��Y�SWU

6��Y�SWU

'

DQVZ

'

DQVZ

��

��

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6LPSOH�7RPDVXOR¶V�$OJRULWKP

IF

Rea

d op

eran

ds

Issue

Mem

,VVXH

,QW

0HP

)3

$GG

)3

0XO�

)3

0XO�

)3

'LY

0HP

,)

5HV�

6WDWLRQ

��D

��

��E

��

��F

��

��G

��

��H

��

&RPPRQ

'DWD %XV �&'%�

� � � � � � � � �

5H2UGHU

%XIIHU �52%�

:ULWH

6WDJH

5HJ� :ULWH

3DWK

#3 DIV F0,F2,F4#4 ADDD F6,F0,F8#5 SUBD F8,F10,F14#6 MULD F6,F10,F8

'

��

2S

'

6�

6�

2S

'�)�

6��Y

6��Y

2S

'

6��Y�SWU

6��Y�SWU

'

DQVZ

'

DQVZ

��

E�F

Page 30: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6LPSOH�7RPDVXOR¶V�$OJRULWKP

IF

Rea

d op

eran

ds

Issue

Mem

,VVXH

,QW

0HP

)3

$GG

)3

0XO�

)3

0XO�

)3

'LY

0HP

,)

5HV�

6WDWLRQ

��D

��

��E

��

��F

��

��G

��

��H

��

&RPPRQ

'DWD %XV �&'%�

� � � � � � � � �

5H2UGHU

%XIIHU �52%�

:ULWH

6WDJH

5HJ� :ULWH

3DWK

#3 DIV F0,F2,F4#4 ADDD F6,F0,F8#5 SUBD F8,F10,F14#6 MULD F6,F10,F8

'

��

2S

'

6�

6�

2S

'�)�

6��Y

6��Y

2S

'

6��Y�SWU

6��Y�SWU

'

DQVZ

'

DQVZ

��

E�F

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6XPPLQJ�XS�7RPDVXOR¶V

� 2XW�RI�RUGHU �2�2�2� H[HFXWLRQ

� ,Q RUGHU FRPPLW� $OORZV IRU VSHFXODWLYH H[HFXWLRQ �EH\RQG EUDQFKHV�

� $OORZV IRU SUHFLVH H[FHSWLRQV

� 'LVWULEXWHG LPSOHPHQWDWLRQ� 5HVHUYDWLRQ VWDWLRQV ± ZDLW IRU 5$: UHVROXWLRQ

� 5HRUGHU %XIIHU �52%�

� &RPPRQ 'DWD %XV ´VQRRSV´ �&'%�

� ´5HJLVWHU UHQDPLQJ´ DYRLGV :$:� :$5

� &RVWO\ WR LPSOHPHQW �FRPSOH[LW\ DQG SRZHU�

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

'\QDPLF 6FKHGXOLQJ 3DVW %UDQFKHV

/'

$''

68%

67

! �"

/'

$''

68%

67

!�"

/'

$''

68%

67

��"

/'

$''

68%

67

�"

<

<

<

´3UHGLFW WDNHQ´

´3UHGLFW WDNHQ´

´3UHGLFW WDNHQ´

6FKHGXOH VSHFXODWLYHLQVWUXFWLRQV SDVW EUDQFKHV

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

'\QDPLF 6FKHGXOLQJ 3DVW %UDQFKHV

/'

$''

68%

67

! �"

/'

$''

68%

67

!�"

/'

$''

68%

67

��"

/'

$''

68%

67

�"

<

<

<

:URQJ 3UHGLFWLRQ���

'R QRWFRPPLW�

Page 31: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

5HYLVLWLQJ�([FHSWLRQV�

$ SLSHOLQH LPSOHPHQWV SUHFLVH LQWHUUXSWV LII�

$OO LQVWUXFWLRQV EHIRUH WKH IDXOWLQJ LQVWUXFWLRQ FDQFRPSOHWH

$OO LQVWUXFWLRQV DIWHU �DQG LQFOXGLQJ� WKHIDXOWLQJ LQVWUXFWLRQ PXVW QRW FKDQJH WKHV\VWHP VWDWH DQG PXVW EH UHVWDUWDEOH

6WDWLF�6FKHGXOLQJRI�,QVWUXFWLRQV�

(ULN +DJHUVWHQ

8SSVDOD 8QLYHUVLW\

6ZHGHQ

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

$UFKLWHFWXUDO�DVVXPSWLRQV

)URP 7R /DWHQF\

)3 $/8 )3 $/8 �

)3 $/8 6' �

/' )3 $/8 �

/DWHQF\ QXPEHU RI F\FOHV EHWZHHQ WKHWZR LQVWUXFWLRQV

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6FKHGXOLQJ�H[DPSOHfor (i=1; i<=1000; i=i+1)

x[i] = x[i] + 10;

Iterations are independent => parallel execution

ORRS� /' )�� ��5�� � )� DUUD\ HOHPHQW

$''' )�� )�� )� � $GG VFDODU FRQVWDQW

6' ��5��� )� � 6DYH UHVXOW

68%, 5�� 5�� �� � GHFUHPHQW DUUD\ SWU�

%1(= 5�� ORRS � UHLWHUDWH LI 5� � �

Can we eliminate all penalties in each iteration?

How about moving SD down?

Page 32: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6FKHGXOLQJ�LQ�HDFK�ORRS�LWHUDWLRQ

ORRS� /' )�� ��5��

VWDOO

$''' )�� )�� )�

VWDOO

VWDOO

6' ��5��� )�

68%, 5�� 5�� ��

%1(= 5�� ORRS

VWDOO

Original loop

Can we do better by scheduling across iterations?

5 instruction + 4 bubbles = 9 cycles / iteration(~one cycle per iteration on a vector architecture)

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

9HFWRU�DUFKLWHFWXUHV��D IRRWQRWH�

&5$<� 1(&� )XMLWVX� «

� � YHFWRU UHJLVWHU FRQWDLQV �� YHFWRU HQWULHV HDFK

� $ VLQJOH /'�67 LQVWU ORDGV�VWRUHV HQWLUH YHFWRUV

� $ VLQJOH $/8 LQVWU 9� Í 9� RS 9�

� �� ELW PDVN YHFWRUV PDNH H[HFXWLRQ FRQGLWLRQDO

� 2YHUODSV 0HP DQG $/8 RSV

� 2QH IRUP RI ´6,0'´ �� 6LQJOH ,QVWUXFWLRQ 0XOWLSOH 'DWD

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6FKHGXOLQJ�LQ�HDFK�ORRS�LWHUDWLRQ

ORRS� /' )�� ��5��

VWDOO

$''' )�� )�� )�

VWDOO

VWDOO

6' ��5��� )�

68%, 5�� 5�� ��

%1(= 5�� ORRS

VWDOO

Original loop

5 instruction + 4 bubbles = 9c / iteration

ORRS� /' )�� ��5��

VWDOO

$''' )�� )�� )�

68%, 5�� 5�� ��

%1(= 5�� ORRS

6' ��5��� )�

Statically scheduled loop

Can we do even better by scheduling across iterations?

5 instruction + 1 bubble = 6c / iteration

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

8QRSWLPL]HG�ORRS�XQUROOLQJ��[ORRS� /' )�� ��5��

VWDOO

$''' )�� )�� )�

VWDOO � GURS 68%, %1(=

VWDOO

6' ��5��� )�

/' )�� ���5��

VWDOO

$''' )�� )�� )�

VWDOO � GURS 68%, %1(=

VWDOO

6' ���5��� )�

/' )��� ����5��

VWDOO

$''' )��� )��� )�

VWDOO � GURS 68%, %1(=

VWDOO

6' ����5��� )��

/' )��� ����5��

VWDOO

$''' )��� )��� )�

68%, 5�� 5�� ��� � DOWHU WR � �

%1(= 5�� ORRS

6' ����5��� )�� z 24c/ 4 iterations = 6 c / iteration

Page 33: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

2SWLPL]HG VFKHGXOHG XQUROOHG ORRS

ORRS� /' )�� ��5��

/' )�� ���5��

/' )��� ����5��

/' )��� ����5��

$''' )�� )�� )�

$''' )�� )�� )�

$''' )��� )��� )�

$''' )��� )��� )�

6' ��5��� )�

6' ���5��� )�

6' ����5��� )��

68%, 5�� 5�� ���

%1(= 5�� ORRS

6' ��5��� )��

Important steps:Push loads up

Push stores down

Note: the displacement of the last store must be changed

All penalties are eliminated. CPI=1

14 cycles / 4 iterations ==> 3.5 cycles / iteration

From 9c to 3.5c per iteration ==> speedup 2.6

Benefits of loop unrolling:Provides a larger seq. instr. window (larger basic block)

Simplifies for static and dynamic methods to extract ILP

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

/'

$''

67

68%

%1(4

Iteration 0

Iteration 1

Iteration 2Iteration 3

Iteration 4

6RIWZDUH�SLSHOLQLQJ�����6\PEROLF ORRS XQUROOLQJ

� 7KH LQVWUXFWLRQV LQ D ORRS DUH WDNHQ IURP GLIIHUHQWLWHUDWLRQV LQ WKH RULJLQDO ORRS

SoftwarePipelinedIteration 1 BNEQ SUB ST ADD LD

BNEQ SUB ST ADD LD SoftwarePipelined

Iteration 2

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

� ([DPSOH�ORRS� /' )����5��

$''' )��)��)�

6' ��5���)�

68%, 5��5����

%1(= 5��ORRS

Look at three iterations of the loop body:LD F0,0(R1) ; Iteration iADDD F4,F0,F2SD 0(R1),F4

LD F0,0(R1) ; Iteration i+1ADDD F4,F0,F2SD 0(R1),F4LD F0,0(R1) ; Iteration i+2ADDD F4,F0,F2

SD 0(R1),F4

l

6RIWZDUH�SLSHOLQLQJ�����

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

,QVWUXFWLRQV IURP WKUHH FRQVHFXWLYH LWHUDWLRQV IRUP WKH ORRSERG\�

� SURORJXH FRGH !ORRS� 6' ��5���)� � IURP LWHUDWLRQ L

$''' )��)��)� � IURP LWHUDWLRQ L��

/' )������5�� � IURP LWHUDWLRQ L��

68%, 5��5����

%1(= 5��ORRS

� SURORJXH FRGH !

6RIWZDUH�SLSHOLQLQJ�����

z No data dependencies within a loop iteration

z The dependence distance is 1 iterations

z WAR hazard elimination is needed (register renaming)

z 5c / iteration, but only uses 2 FP regs (instead og 8)

Page 34: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6RIWZDUH�SLSHOLQLQJ

� ´6\PEROLF /RRS 8QUROOLQJ´

� 9HU\ WULFN\ IRU FRPSOLFDWHG ORRSV

� /HVV FRGH H[SDQVLRQ WKDQ RXWOLQLQJ

� 5HJLVWHU�OHDQ LI ´URWDWLQJ´ LV XVHG

� 1HHGHG WR KLGH ODUJH ODWHQFLHV �VHH ,$����

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

'HWHFWLQJ�GDWD�GHSHQGHQFLHV

� )LQGLQJ GHSHQGHQFLHV LV IXQGDPHQWDO WR� SHUIRUP LQVWUXFWLRQ VFKHGXOLQJ�

� GHWHUPLQH WKH GHJUHH RI SDUDOOHOLVP LQ ORRSV� DQG

� HOLPLQDWH QDPH GHSHQGHQFLHV

for (i = 1; i <= 100; i = i+1) {A[i] = B[i] + C[i];

D[i] = A[i] + E[i];}

The absence of loop-carried dependencies increases the amount of exploitableparallelism

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

A loop iteration is often dependent on results calculated in an earlier iteration.

Example:for (i = 6; i <= 100; i = i+1)

Y[i] = Y[i-5] + Y[i];

� 7KLV ORRS KDV D GHSHQGHQFH GLVWDQFH RI � DQG ZHFDQ H[WUDFW ,/3 LQ � FRQVHFXWLYH LWHUDWLRQVWhat dependencies can the compiler detect?

/RRS�FDUULHG�GHSHQGHQFLHV

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

7KH�*&'�WHVW

� $ VLPSOH WHVW WR GHFLGH ZKHWKHUWKHUH LV DQ\ GHSHQGHQFLHV EHWZHHQORRS LWHUDWLRQV�� ,I ORRS FDUULHG GHSHQGHQFLHV H[LVW WKHQ�*&'�F�D� GLYLGHV �G � E�

Example:for (i = 1; i <= 100; i = i + 1)

X[2 * i + 3] = X[2 * i] * 5.0;

We have a=2, b=3, c=2 and d=0; GCD(2,2) = 2 which does not divide (0-3) = -3 =>

z There are no loop carried dependencies in this loop

Page 35: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

,VVXLQJ�PRUH�WKDQ�RQH�LQVWUXFWLRQ�SHU�F\FOH

(ULN +DJHUVWHQ

8SSVDOD 8QLYHUVLW\

6ZHGHQ

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

0XOWLSOH LQVWUXFWLRQ LVVXH SHU FORFN

*RDO� ([WUDFWLQJ ,/3 VR WKDW &3, � � � L�H�� ,3& ! �

Superscalar:z Combine static and dynamic scheduling to issue multiple instructions per clock

z HW finds independent instructions in “sequential” code

zPredominant: (PowerPC, SPARC, Alpha, HP-PA)

Very Long Instruction Words (VLIW):z Static scheduling used to form packages of independent instructions that can be issued together

zRelies on compiler to find independent instructions (IA-64)

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6XSHUVFDODUV

0HP

, 5

5HJV

% 00:

, 5 % 00:

, 5 % 00:

, 5 % 00:

,

,

,

,

,VVXHORJLF

¼

6(.

���F\FOHV

�� F\FOHV

�� F\FOHV

� F\FOHV

�*%

�0%

��N%

�N%

7KUHDG �

«3&

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

([DPSOH��$�6XSHUVFDODU�'/;� ,VVXH � LQVWUXFWLRQV VLPXOWDQHRXVO\� � )3 � LQWHJHU

� )HWFK ���ELWV�FORFN F\FOH� ,QWHJHU LQVWU� RQ OHIW� )3 RQULJKW

� &DQ RQO\ LVVXH �QG LQVWUXFWLRQ LI �VW LQVWUXFWLRQ LVVXHV

� 1HHG PRUH SRUWV WR WKH UHJLVWHU ILOH

7\SH 3LSH VWDJHV

,QW� ,) ,' (; 0(0 :%

)3 ,) ,' (; 0(0 :%

,QW� ,) ,' (; 0(0 :%

)3 ,) ,' (; 0(0 :%

,QW� ,) ,' (; 0(0 :%

)3 ,) ,' (; 0(0 :%

zEX stage should be fully pipelined

z1 load delay slot corresponds to three instructions!

Page 36: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6WDWLFDOO\ 6FKHGXOHG 6XSHUVFDODU '/;

Issue: Difficult to find a sufficient number of instr. to issue

&DQ EH VFKHGXOHG G\QDPLFDOO\ ZLWK 7RPDVXOR¶V DOJ�

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

'HSHQGHQFLHV� 5HYLVLWHG

7ZR LQVWUXFWLRQV PXVW EH LQGHSHQGHQW LQ RUGHU WRH[HFXWH LQ SDUDOOHO

Three classes of dependencies that limit parallelism:

zData dependencies

zName dependencies

zControl dependencies

zDependencies are properties of the program

zCan lead to hazardswhich are properties of the implementation

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

/LPLWV�WR�VXSHUVFDODU�H[HFXWLRQ

� 'LIILFXOWLHV LQ VFKHGXOLQJ ZLWKLQ WKH FRQVWUDLQWV RQ QXPEHU RIIXQFWLRQDO XQLWV DQG WKH ,/3 LQ WKH FRGH FKXQN

z Instruction decode complexity increaseswith the number of issued instructions

z Data and control dependencies are in general more costly in a superscalar processor than in a single-issue processor

Simple superscalar relying on compiler instead of HW complexity Î VLIW

Techniques to enlarge the instruction window to extract more ILPare important

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

9/,:� 9HU\ /RQJ ,QVWUXFWLRQ :RUG

0HP

, 5

5HJV

% 00:

, 5 % 00:

, 5 % 00:

, 5 % 00:

,

,

,

,

¼

6(.

�*%

�0%

��N%

�N%

«

3&

Page 37: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

9HU\ /RQJ ,QVWUXFWLRQ :RUG �9/,:�

z Independent funtional units with no hazard detection

&RPSLOHU LV UHVSRQVLEOH IRU LQVWUXFWLRQ VFKHGXOLQJ

Me m re f 1 Me m re f 2 FP o p 1 FP o p 2 In t o p / b ran c h C lo c k

LD F 0 ,0 (R 1) LD F 6 ,-8 (R 1 ) 1

LD F 1 0 ,-1 6 (R 1 ) LD F 1 4 ,-2 4 (R 1 ) 2

LD F 1 8 ,-3 2 (R 1 ) LD F 2 2 ,-4 0 (R 1 ) ADD D F 4 ,F 0 ,F 2 ADD D F 8 ,F 6 ,F 2 3

LD F 2 6 ,-4 8 (R 1 ) ADD D F 1 2 ,F 1 0 ,F 2 ADD D F 1 6 ,F 1 4 ,F 2 4

ADD D F 2 0 ,F 1 8 ,F 2 ADD D F 2 4 ,F 2 2 ,F 2 5

S D 0 (R 1 ), F 4 S D -8 (R 1 ), F 8 ADD D F 2 8 ,F 2 6 ,F 2 6

S D -1 6 (R 1 ), F 1 2 S D -2 4 (R 1 ), F 8 7

S D -3 2 (R 1 ),F 2 0 S D -4 0 (R 1 ),F 2 4 S UBI R 1 ,R 1 ,#4 8 8

S D 0 (R 1 ),F 2 8 BNE Z R 1 ,LO O P 9

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

/LPLWV�WR�9/,:

'LIILFXOW WR H[SORLW SDUDOOHOLVP

� 1 IXQFWLRQDO XQLWV DQG . ´GHSHQGHQW´ SLSHOLQH VWDJHVLPSOLHV 1[ . LQGHSHQGHQW LQVWUXFWLRQV WR DYRLG VWDOOV

0HPRU\ DQG UHJLVWHU EDQGZLGWK

z &RPSOH[LW\ LQFUHDVHV ZLWK QXPEHU RI IXQFWLRQDO XQLWV

z &RGH VL]H

1R ELQDU\ FRGH FRPSDWLELOLW\

%XW� «� VLPSOHU KDUGZDUH

� VKRUW VFKHGXOH

� KLJK IUHTXHQ]\

+:�VXSSRUW�IRU�VSHFXODWLRQ�DQG�LPSURYHG�,/3

(ULN +DJHUVWHQ

8SSVDOD 8QLYHUVLW\

6ZHGHQ

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

+:�VXSSRUW�IRU�VSHFXODWLRQ

A combination of three main ideas we’ve coveredDynamic Instruction scheduling; take advantage of ILP

Dynamic branch prediction; allows instruction scheduling across branches

6SHFXODWLYH H[HFXWLRQ H[HFXWH LQVWUXFWLRQVEHIRUH DOO FRQWURO GHSHQGHQFLHV DUH UHVROYHG

Hardware based speculation uses a data-flow approach:instructions execute when their operands are available

Page 38: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

+:�VXSSRUW�IRU�6:�VSHFXODWLRQ

� 0RYH /' XS DQG 67 GRZQ� +RZ IDU"

� 7KHVH WHFKQLTXHV ZLOO DOORZ ODUJHU PRYHV DQGLQFUHDVH WKH HIIHFWLYH VL]H RI D EDVLF EORFN

� 5HJURXSLQJ LQVWUXFWLRQV� WUDFH VFKHGXOLQJ�VXSHUEORFNV

� 5HPRYLQJ EUDQFKHV� SUHGLFDWH H[HFXWLRQ

� 0RYH /' DERYH 67� KD]DUG GHWHFWLRQ

� 0RYH /' DERYH EUDQFK

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

7UDFH�VFKHGXOLQJ�����Creates a sequence of instructions that are likely to be executed -- a trace.

Two steps:

z Trace selection:Find a likely sequence of basic blocks (trace) across statically predicted branches (e.g. if-then-else)

z Trace compaction: Schedule the trace to be as efficient as possible while preserving correctness in the case the prediction is wrong

� <LHOGV PRUH LQVWUXFWLRQ OHYHO SDUDOOHOLVP

� (IILFLHQW VWDWLF EUDQFK SUHGLFWLRQ NH\ WRVXFFHVV

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

7UDFH�VFKHGXOLQJ�����

A[i]:=B[i]+C[i]

A[i] =0?

XB[i]:=

C[i]:=

T F

z The leftmost sequence is chosen as the most likely trace

z The assignment to B is control dependent on the if statement.

z Trace compaction has to respect data dependencies

z The rightmost (less likely) trace has to be augmented with fix up code

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

&RPSLOHU�VSHFXODWLRQ7KH FRPSLOHU PRYHV LQVWUXFWLRQV EHIRUH D EUDQFKVR WKDW WKH\ FDQ EH H[HFXWHG EHIRUH WKHEUDQFK FRQGLWLRQ LV NQRZQ

Advantage: creates longer schedulable code sequences => more ILP can exploited

Example: if (A == 0) then A = B; else A = A+4;

Non speculative code Speculative codeLW R1,0(R3) LW R1,0(R3)BNEZ R1,L1 LW R14,0(R2)LW R1,0(R2) BEQZ R1,L3J L2 ADD R14,R1,4

L1: ADD R1,R1,4 L3: SW 0(R3),R14L2: SW 0(R3),R1

z What about exceptions?

� UHJ UHQDPH

Page 39: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6SHFXODWLYH�LQVWUXFWLRQV

0RYLQJ D /' XS� PD\ PDNH LW VSHFXODWLYH

� 0RYLQJ SDVW D EUDQFK

� 0RYLQJ SDVW D 67 �WKDW PD\ EH WR WKH VDPHDGGUHVV�

,VVXHV�

� 1RQ�LQWUXVLYH

� &RUUHFW H[FHSWLRQ KDQGOLQJ �DJDLQ�

� /RZ RYHUKHDG

� *RRG SUHGLFWLRQ'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

([DPSOH� 0RYLQJ /' DERYH D EUDQFK

/'�V 5�� ����5�� �6SHFXODWLYH /' WR 5�

«� �VHW ´SRLVRQ ELW´ LQ 5� LI H[FHSWLRQ

%51= 5�� ����

«

/'�FKN 5� �*HW H[FHSWLRQ LI SRLVRQ ELW RI 5� LV VHW

*RRG SHUIRUPDQFH LI WKH EUDQFK LV QRW WDNHQ

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

([DPSOH� 0RYLQJ /' DERYH D 67

/'�D 5�� ����5�� �DGYDQFHG ORDG

�FUHDWH HQWU\ LQ WKH $/$7 �DGGU�UHJ!

«�

67 5�� ���5�� �LQYDOLGDWH HQWU\ LI $/$7 DGGU PDWFK

«

/'�F 5� �5HGR /' LI HQWU\ LQ $/$7 LQYDOLG

� UHPRYH HQWU\ LQ $/$7

$/$7 �DGYDQFHG ORDG DGGUHVV WDEOH� LV DQ DVVRFLDWLYH GDWDVWUXFWXUH VWRULQJ WXSOHV RI� �DGGU� GHVW�UHJ!

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

&RQGLWLRQDO�H[HFXWLRQ

� 5HPRYHV WKH QHHG IRU VRPH EUDQFKHV -

� &RQGLWLRQDO ,QVWUXFWLRQV

�&RQGLWLRQDO UHJLVWHU PRYHCMOVZ R1, R2, R3 ;move R2 to R1 if (R3 == 0)

�&RPSDUH�DQG�VZDS �DWRPLFV FRYHUHG ODWHU�

� 3UHGLFDWH H[HFXWLRQ

�$ PRUH JHQHUDOL]HG WHFKQLTXH

�(DFK LQVWUXFWLRQ H[HFXWHG LI WKH DVVRFLDWHG��ELW SUHGLFDWH 5(* LV ��

Page 40: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

3UHGLFDWH�H[DPSOH

CGT R3,R1,R2BRNZ R3, elseLD R7, 100(R1)ADD R1, R1, #1BR end

else: LD R7, 100(R2)ADD R2, R2, #1

end:� LQVWU H[HFXWHG LQ ´WKHQ SDWK´� EUDQFKHV

IF R1 > R2 thenLD R7, 100(R1)ADD R1, R1, #1

elseLD R7, 100(R2)ADD R2, R2, #1

end

6WDQGDUG7HFKQLTXH

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

3UHGLFDWH�H[DPSOH

CGT R3,R1,R2BRNZ R3, elseLD R7, 100(R1)ADD R1, R1, #1BR end

else: LD R7, 100(R2)ADD R2, R2, #1

end:� LQVWU H[HFXWHG LQ ´WKHQ SDWK´� EUDQFKHV

IF R1 > R2 thenLD R7, 100(R1)ADD R1, R1, #1

elseLD R7, 100(R2)ADD R2, R2, #1

end

6WDQGDUG7HFKQLTXH

8VLQJ3UHGLFDWHV

…{IF R1 > R2 then P6=1;P7=0

else P6=0;P7=0} ;one instr!P6: LD R7, 100(R1)P6: ADD R1, R1, #1P7: LD R7, 100(R2)P7: ADD R2, R2, #1

2QH LQVWUXFWLRQ VHWV WKH WZR SUHGLFDWH 5HJV(DFK LQVWU� LQ WKH ´WKHQ´ JXDUGHG E\ 3�(DFK LQVWU� LQ WKH ´HOVH´ JXDUGHG E\ 3�Î 2QH EDVLF EORFNÎ )HZHU WRWDO LQVWU� LQVWU H[HFXWHG LQ ´WKHQ SDWK´

� EUDQFKHV

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

+:�YV��6:�VSHFXODWLRQAdvantages:

z Dynamic runtime disambiguation of memory addresses

z Dynamic branch prediction is often better than static which limits the performance of SW speculation.

z HW speculation can maintain a precise exception model

Main disadvantage:z Complex implementation and extensive need of hardware resources (conforms with technology trends)

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6XPPDU\

� 6RIWZDUH�FRPSLOHU� WULFNV�� /RRS XQUROOLQJ

� 6RIWZDUH SLSHOLQLQJ

� 6WDWLF LQVWUXFWLRQVFKHGXOLQJ �ZLWKUHJLVWHU UHQDPLQJ�

� 7UDFH VFKHGXOLQJ�LPSOLHV VWDWLFEUDQFK SUHGLFWLRQ�

� 6SHFXODWLYHH[HFXWLRQ

� +DUGZDUH WULFNV�

� '\QDPLF LQVWUXFWLRQVFKHGXOLQJ

� '\QDPLF EUDQFKSUHGLFWLRQ

� 0XOWLSOH LVVXH

� 6XSHUVFDODU

� 9/,:

� &RQGLWLRQDOLQVWUXFWLRQV

� 6SHFXODWLYH H[HFXWLRQ

Page 41: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

([DPSOH��,$���DQG�,WDQLXP

(ULN +DJHUVWHQ

8SSVDOD 8QLYHUVLW\

6ZHGHQ

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

/LWWOH�RI�HYHU\WKLQJ

� 9/,:

� $GYDQFHG ORDGV VXSSRUWHG E\ $/$7

� /RDG VSHFXODWLRQ VXSSRUWHG E\SUHGLFDWLRQ

� '\QDPLF EUDQFK SUHGLFWLRQ

� ´$OO WKH WULFNV LQ WKH ERRN´

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

,WDQLXP�LQVWUXFWLRQV

� ,QVWUXFWLRQ EXQGOH ���� ELWV�

� ��ELWV� WHPSODWH �LGHQWLILHV , W\SHV DQG GHSHQGHQFLHV�

� � [ ���ELWV� LQVWUXFWLRQ

� &DQ LVVXH XS WR WZR EXQGOHV SHU F\FOH �� LQVWU�

� /DWHQFLHV�,QVWUXFWLRQ /DWHQF\

,�/' �

)3�/' �

3UHG EUDQFK ���

0LVVSUHG EUDQFK �

,�$/8 �

)3�$/8 �

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

,WDQLXP�5HJLVWHUV

� ��� ���ELW *35 �Z� SRLVRQ ELW�

� ��� ���ELW )3 5(*6

� �� ��ELW SUHGLFDWH 5(*6

� � ���ELW EUDQFK UHJLVWHUV

� $ EXQFK RI &65V �FRQWURO�VWDWXVUHJLVWHUV�

Page 42: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

'\QDPLF�UHJLVWHU�ZLQGRZ

���

3K\VLFDO 5HJV

��

([SOLFLW 5HJV�VHHQ E\ WKH LQVWUXFWLQV�

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

'\QDPLF�UHJLVWHU�ZLQGRZIRU�*35V

8QXVHG

���

3K\VLFDO 5HJV

��

([SOLFLW 5HJV�VHHQ E\ PDLQ�

*OREDO

�� ��

'\Q�PDLQ

*OREDO

��

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

8QXVHG

&DOOLQJ�SURFHGXUH�$

���

3K\VLFDO 5HJV

([SOLFLW 5HJV�VHHQ E\ PDLQ�

*OREDO

�� ��

'\Q�PDLQ

*OREDO

��

([SOLFLW 5HJV�VHHQ E\ SURF $�

*OREDO

��

2XWSXW

,QSXW

,QSXW��

���� ��

��

��

��

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

8QXVHG

&DOOLQJ�3URFHGXUH�%�DXWRPDWLF SDVVLQJ RI SDUDPHWHUV�

���

3K\VLFDO 5HJV

([SOLFLW 5HJV�VHHQ E\ PDLQ�

*OREDO

�� ��

'\Q�PDLQ

*OREDO

��

([SOLFLW 5HJV�3URF $�

*OREDO

��

2XWSXW

,QSXW

VKDUHG��

���� ��

��

��

��

*OREDO

���

��

([SOLFLW 5HJV�3URF %�

Page 43: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

5HJLVWHU�6WDFN�(QJLQH��56(�

� 6DYHV DQG UHVWRUHV UHJLVWHUV WR PHPRU\RQ UHJLVWHU VSLOOV

� ,PSOHPHQWHG LQ KDUGZDUH

�:RUNV LQ WKH EDFNJURXQG

� *LYHV WKH LOOXVLRQ RI DQ XQOLPLWHGUHJLVWHU VWDFN

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

5HJLVWHU�URWDWLRQ�)3�DQG�*35V

� 8VHG LQ VRIWZDUH SLSHOLQLQJ

� 5HJLVWHU UHQDPLQJ IRU HDFK LWHUDWLRQ

� 5HPRYHV WKH QHHG IRUSURORJXH�HSLORJXH

� 56( �UHJLVWHU VWDFN HQJLQH�

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

:KDW�LV�WKH�DOWHUQDWLYH"

� 9/,: ZDV PHDQW WR VLPSOLI\ +:

� ,V WKLV WKH EHVW \RX FDQ GR ZLWK ��� 0WUDQVLVWRUV DQG ���:"

�:LOO LW VFDOH ZLWK WHFKQRORJ\"

� 2WKHU DOWHUQDWLYHV�

� ,QFUHDVH FDFKH VL]H�

� ,QFUHDVH WKH IUHTXHQF\� RU�

�5XQ PRUH WKDQ RQH WKUHDG�FKLS

:KHUH�LV�WHFKQRORJ\�SXVKLQJ�XV"

7HFKQRORJ\�4XHVWLRQV IRU�WKH�)XWXUH�

Page 44: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

&DQ ZH FRQWLQXH WKLV WUHQG"

� +XJH PRQROLWKLF VLQJOH�WKUHDG PRQVWHU &38V

� 'LPLQLVKLQJ UHWXUQ RQ��

�SLSHOLQH VWDJHV

�SDUDOOHO SLSHOLQHV

� 7KH PHPRU\ ZDOO LV WKH OLPLW

� VKRXOG ZH FRQWLQXH VKDUH PHPRU\"

� :LOO ZH QHHG SDUDOOHO SURJUDPV"

� :KDW LV LQ WKH WHFKQRORJ\ FU\VWDO EDOO"

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

7HFKQRORJ\ ³,PSURYHPHQWV´

2000 2005 2010 2015 YearQuantitative data and trends according to V. Agarwal et al., ISCA 2000Based on SIA (Semiconductor Industry Association) prediction, 1999

Wire delay 5mm trace (RC model)

390ps

170ps

830 ps(1.2GHz)

70 ps (14 GHz)

time

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

7HFKQRORJ\ ³,PSURYHPHQWV´

2000 2005 2010 2015 Year

16 0.7GHz 5GHz

8 1.9GHz 10GHz

# Gate Delays/Clock Period

SIA prediction, 1999

1.2GHz

14GHz

Clock

N

10

5

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

7HFKQRORJ\ ³,PSURYHPHQWV´

2000 2005 2010 2015 Year

Span (bits reachable in one cycle)[M bits, log scale]

clock=16 gate delays

clock=8 gate delays

SIA10

100

Page 45: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

7HFKQRORJ\ ³,PSURYHPHQWV´

2000 2005 2010 2015.25 .18 .13 .10 .070 .050 .0035 Feature size(micron)

Year

Span (Fraction of the chip area reachable in one cycle)

SIA

1.0

0.75

0.50

0.25

Clock = 16 gate delays

Clock = 8 gate delays

100%

0.4 - 1.4 %

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

6R� ZKDW ZLOO WKH SHUIRUPDQFH EH"

0HP

, 5

5HJV

% 00:

, 5 % 00:

, 5 % 00:

, 5 % 00:

,

,

,

,

,VVXHORJLF

¼

6(.

���F\FOHV

��� F\FOHV

�� F\FOHV

� F\FOHV

�*%

��0%

� 0%

��N%

7KUHDG �

«3&

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

7HFKQRORJ\ ³,PSURYHPHQWV´VHH WKH ,6&$ SDSHU IRU GHWDLOV«

2000 2005 2010 2015 Year

Relative CPU Performance[log scale]

Historical ra

te: 55% /y

1000

100

10

1

Business as usual (Same arch.): ~12% /y6.0 - 7.4 x

1720 x

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

&DQ ZH FRQWLQXH WR PDNH ODUJH DQGFRPSOLFDWHG &38"«

� +XJH PRQROLWKLF VLQJOH�WKUHDG PRQVWHUV

� 'LPLQLVKLQJ UHWXUQ RQ��

�SLSHOLQH VWDJHV

�SDUDOOHO SLSHOLQHV

� 6PDOOHU UHDFK Î

�KDUG WR DGG FRPSOH[LW\� VPDOOHU FDFKHV

� 0HPRU\ LV RIWHQ WKH ERWWOHQHFN DQ\KRZ

1RW�DFFRUGLQJ�WR�WKH�WHFKQRORJ\�IRUHFDVW

Page 46: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

&38�)XWXUH��WKHUH�LV�KRSH

� +DYH \RX HYHU KHDUG RI SDUDOOHO WKUHDGV"

� 7KUHDG�OHYHO SDUDOOHOLVP �7/3�

Æ 0XOWLSOH ³WKUHDGV´ SHU &38 FKLS« 607 �6LPXOWDQHRXV 0XOWL 7KUHDGLQJ�

« &03 �&KLS 0XOWL 3URFHVVRU�

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

607� 6LPXOWDQHRXV 0XOWLWKUHDGLQJ´&RPELQH 7/3,/3 WR ILQG LQGHSHQGHQW LQVWU�´

0HP

, 5

5HJV � «

% 00:

, 5 % 00:

, 5 % 00:

, 5 % 00:

,

,

,

,

,VVXHORJLF

¼

6(.

7KUHDG �

7KUHDG 1

«

3&

3&

5HJV 1

��6LQJOH�WKUHDG�SHUIRUPDQFH

� 0XOWL�WKUHDG�SHUIRUPDQFH� 6WLOO�RQH�PRQRO\WKLF�SLSH

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

&03� &KLS 0XOWLSURFHVVRUPRUH 7/3 JHRJUDSKLFDO ORFDOLW\

0HP

, 5

5HJV

% 00:

, 5 % 00:

, 5 % 00:

, 5 % 00:

,

,

,

,,VVXHORJLF

¼

6(.

7KUHDG �

7KUHDG 1

«

3&

3& 5HJV

6(.

,VVXHORJLF

� 6LPSOH��� 0D[LPL]HV JHRJUDSKLFDO ORFDOLW\

� 6LQJOH�WKUHDG SHUIRUPDQFH'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

&03��&KLS�0XOWLSURFHVVRU

Mem

CPU

$1

CPU

$1

CPU

$1

CPU

$1

L2$

Mem I/F

ExternalI/F

t treads

Simple fast CPUcore (w/ SMT?)+ local caches

”Shared Memory”

Page 47: ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28 November 2002 'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ LW XX VH (ULN +DJHUVWHQ_ ZZZ GRFV

28 November 2002

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

7KH�IXWXUH�LV�KHUH�WRGD\�

� ,QWHO ;(21� �[ 607 ´K\SHUWKUHDGLQJ´

� ,%0 3RZHU�� �[ &03

� 6XQ� 5XPHUV KDV LW«

� ,$��� 0D\EH ����

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

7KH &RPSXWHU 6KUXQN $JDLQ

Old Mainframes

Super Minis:

Microprocessor: Mem

Chip Multiprocessor (CMP): Mem

'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���

'$5.�

����

« DQG VR GLG WKH ODUJH VHUYHU

Mem

Cch

ips

Mem

Mem

Mem

Mem

Mem

Mem

Mem

Inte

rcon

nect

7 WKUHDGV�&03