’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28...
Transcript of ’$5. LQ D QXWVKHOO ,6$GHVLJQRSWLRQVuser.it.uu.se/~eh/courses/dark2/new-slides/4_per_page/...28...
28 November 2002
,6$�GHVLJQ�RSWLRQV�
(ULN +DJHUVWHQ
8SSVDOD 8QLYHUVLW\
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK�
'$5.�
����
'$5.��LQ�D�QXWVKHOO
�� 0HPRU\ 6\VWHPV �FDFKHV� 90� '5$0�PLFUREHQFKPDUNV� «�
�� &38V �SLSHOLQHV� ,/3� VFKHGXOLQJ�6XSHUVFDODUV� 9/,:V� HPEHGGHG� «�
�� 0XOWLSURFHVVRUV �7/3� FRKHUHQFH�LQWHUFRQQHFWV� VFDODELOLW\� FOXVWHUV� «�
�� )XWXUH� �SK\VLFDO OLPLWDWLRQV� 7/3�,/3LQ WKH &38�«�
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK�
'$5.�
����
,6$�GHVLJQ�RSWLRQV
� +RZ LW DOO VWDUWHG
� ,QVWUXFWLRQ VHW RYHUYLHZ
� 7KH SURIHVVRU MRNH
� 2SWLRQV IRU ,6$ GHVLJQ
� 6SHFLI\LQJ +:
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK�
'$5.�
����
+RZ�LW�DOO�VWDUWHG«WKH�IRVVLOV
� (1,$& -�3� (FNHUW DQG -� 0DXFKO\� 8QLY� RI 3HQQV\OYDQLD� ::�
� (OHFWUR 1XPHULF ,QWHJUDWRU $QG &DOFXODWRU� ������ YDFXXP WXEHV
� ('9$&� -� 9 1HXPDQQ� RSHUDWLRQDO ����
� (OHFWULF 'LVFUHWH 9DULDEOH $XWRPDWLF &RPSXWHU �VWRUHG SURJUDPV�
� ('6$&� 0�:LONHV� &DPEULGJH 8QLYHUVLW\� ����
� (OHFWULF 'HOD\ 6WRUDJH $XWRPDWLF &DOFXODWRU
� 0DUN�,��� +� $LNHQ� +DUYDUG� ::�� (OHFWUR �PHFKDQLF
� .� =XVH� *HUPDQ\� HOHFWURPHFK� FRPSXWHU� VSHFLDO SXUSRVH�::�
� %(6.� .7+� (� 6WHPPH �QRZ DW &KDOPHUV� HDUO\ ��V
� 60,/� /7+ PLG ��V
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK�
'$5.�
����
+RZ GR \RX WHOO D JRRG LGHD IURP D EDG
7KH %RRN� 7KH SHUIRUPDQFH�FHQWULF DSSURDFK
� &3, �H[HFXWLRQ�F\FOHV � �LQVWUXFWLRQV H[HFXWHG �a,6$JRRGQHVV�
� &3, F\FOH WLPH ! SHUIRUPDQFH
� &3, &3,&38
� &3,0HP
7KH ERRN UDUHO\ FRYHUV RWKHU GHVLJQ WUDGHRIIV
� 7KH IHDWXUH FHQWULF DSSURDFK���
� 7KH FRVW�FHQWULF DSSURDFK���
� (QHUJ\�FHQWULF DSSURDFK«
� 9HULILFDWLRQ �FHQWULF DSSURDFK���
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK�
'$5.�
����
7KH %RRN� 4XDQWLWDWLYH PHWKRGRORJ\
Make design decisions based on execution statistics. Select workloads (programs representative for usage)
Instruction mix measurements: statistics of relative usage of different components in an ISA
Experimental methodologies
� Profiling through tracing
� ISA simulators
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK�
'$5.�
����
7ZR�JXLGLQJ�VWDUV��� WKH�5,6&�DSSURDFK�
0DNH WKH FRPPRQ FDVH IDVW
�6LPXODWH DQG SURILOH DQWLFLSDWHG H[HFXWLRQ
�0DNH FRVW�IXQFWLRQV IRU IHDWXUHV
�2SWLPL]H IRU RYHUDOO HQG UHVXOW �HQGSHUIRUPDQFH�
:DWFK RXW IRU $PGDKOV ODZ� Speedup = Execution_time OLD / Execution_time NEW
� >(1-Fraction ENHANCED) + Fraction ENHANCED / /Speedup ENHANCED) @
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK�
'$5.�
����
,QVWUXFWLRQ 6HW $UFKLWHFWXUH �,6$�
�� WKH LQWHUIDFH EHWZHHQ VRIWZDUH DQG KDUGZDUH�
Tradeoffs between many options:
•functionality for OS and compiler
•wish for many addressing modes
•compact instruction representation
•format compatible with the memory system of choice
•desire to last for many generations
•bridging the semantic gap (old desire...)
•Note: the biggest “customer” is the compiler
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK�
'$5.�
����
,6$�WUHQGV�WRGD\
� CPU families built around “Instruction Set Architectures” ISA
� Many incarnations of the same ISA
� ISAs lasting longer (~10 years)
� Consolidation in the market - fewer ISAs (not for embedded…)
� 15 years ago ISAs were driven by academia
� Today ISAs technically do not matter all that much (market-driven)
� How many of you will ever design an ISA?
� How many ISAs will be designed in Sweden?
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
&RPSLOHU�2UJDQL]DWLRQ
FortranFront-end
C Front-end
C++Front-end...
IntermediateRepresentation
High-levelOptimization
Global & LocalOptimization
Code Generation
Code
Machine-independentTranslation
Procedure in-liningLoop transformation
Register AllocationCommon sub-expressions
Instruction selectionconstant folding
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
7KUHH�GLIIHUHQW�PHPRU\�DOORFDWLRQV�PRGHOV
� 6WDFN� ORFDO YDULDEOHV LQ DFWLYDWLRQ UHFRUG
� DGGUHVVLQJ UHODWLYH WR VWDFN SRLQWHU
� VWDFN SRLQWHU PRGLILHG RQ FDOO�UHWXUQ
� *OREDO GDWD DUHD� ODUJH FRQVWDQWV
� JOREDO VWDWLF VWUXFWXUHV
� +HDS� G\QDPLF REMHFWV
� RIWHQ DFFHVVHG WKURXJK SRLQWHUV
0
text
heap
data
stack
Context B
Segments
…
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
&ODVVLILFDWLRQ�RI�,6$V
� ,6$V DUH FODVVLILHG ZLWK UHVSHFW WR��5HJLVWHU PRGHO
�1XPEHU RI RSHUDQGV IRU HDFK LQVWUXFWLRQ
�$GGUHVVLQJ PRGHV SHUPLWWHG
�2SHUDWLRQV SURYLGHG LQ WKH LQVWUXFWLRQ VHW
�7\SH DQG VL]H RI RSHUDQGV
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
([HFXWLRQ�LQ�D�&38
”Machine Code”
”Data”
&38
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
2SHUDQG�PRGHOV�Example: C := A + B
S tack A ccum u la to r R eg is te r
P U S H [A ] P U S H [B ] A D D P O P [C ]
LO A D [A ] A D D [B ] S T O R E [C ]
LO A D R 1 ,[A ] A D D R 1 ,[B ] S T O R E [C ],R 1
MemAccumulatorimplicit
MemStackimplicit
MemRegisterexplicitly
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
6WDFN�EDVHG�PDFKLQH�Example: C := A + B
A:12B:14C:10PUSH [A]
PUSH [B]ADDPOP [C]
0HP�
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
6WDFN�EDVHG�PDFKLQH�Example: C := A + B
A:12B:14C:10PUSH [A]
PUSH [B]ADDPOP [C]
0HP�
��
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
6WDFN�EDVHG�PDFKLQH�Example: C := A + B
A:12B:14C:10PUSH [A]
PUSH [B]ADDPOP [C]
0HP�
��
��
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
6WDFN�EDVHG�PDFKLQH�Example: C := A + B
�
A:12B:14C:10PUSH [A]
PUSH [B]ADDPOP [C]
0HP�
��
��
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
6WDFN�EDVHG�PDFKLQH�Example: C := A + B
�
A:12B:14C:10PUSH [A]
PUSH [B]ADDPOP [C]
0HP�
��
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
6WDFN�EDVHG�PDFKLQH�Example: C := A + B
A:12B:14C:26PUSH [A]
PUSH [B]ADDPOP [C]
0HP�
��
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
6WDFN�EDVHG�
� ,PSOLFLW RSHUDQGV
� &RPSDFW FRGH IRUPDW ��LQVWU� �E\WH�
� 6LPSOH WR LPSOHPHQW
� 1RW RSWLPDO IRU VSHHG���
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
5HJLVWHU�EDVHG�PDFKLQH�Example: C := A + B
�
�
�
�
�
�
A:12B:14C:10
LD R1, [A]LD R7, [B]ADD R2, R1, R7ST R2, [C]
'DWD�
�
�
�
��
��
"
" "
”Machine Code”����
��
�
��
��
��
��
��
��
��
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
5HJLVWHU�EDVHG
� ;���
� 5,6&V �$OSKD� 63$5&� +3�3$«�
� 9/,: �,$��� «�
� ([SOLFLW RSHUDQGV �L�H�� ´UHJLVWHUV´�
�:DVWHIXO LQVWU� IRUPDW ��LQVWU� �E\WHV�
� 6XLWV RSWLPL]LQJ FRPSLOHUV
� 2SWLPDO IRU VSHHG���
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
General-purpose register model dominates today
Reason: general model for compilers and efficient implementation wise
3URSHUWLHV�RI�RSHUDQG�PRGHOV
CompilerConstruction
ImplementationEfficiency
Code Size
Stack + -- ++
Accumulator -- - +
Register ++ ++ --
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
7UDGLWLRQDO $GGUHVVLQJ 0RGHV �9$;�
Addressing mode
Example instruction
Meaning
Register Add R4,R3 Regs[R4] ← Regs[R4]+Regs[R3]
Immediate
(30%) Add R4,#3 Regs[R4] ← Regs[R4]+ 3
Displacement
(40%) Add R4,100(R1) Regs[R4] ← Regs[R4]+Mem[100+Regs[R1]]
Register deferred or indirect
(20%)
Add R4,(R1) Regs[R4] ← Regs[R4]+Mem[Regs[R1]]
Indexed Add R3,(R1+R2) Regs[R3] ← Regs[R3]+Mem[Regs[R1]+ Regs[R2]]
Direct or absolute Add R1,(1001) Regs[R1] ← Regs[R2]+Mem[1001]
Memory indirect or memory deferred
Add R1,@(R3) Regs[R1] ← Regs[R1]+Mem[Mem[Regs[R3]]]
Autoincrement Add R1,(R2)+ Regs[R1] ← Regs[R1]+Mem[Regs[R2]]; Regs[R2] ← Regs[R2]+d
Autodecrement Add R1,-(R2) Regs[R2] ← Regs[R2]-d; Regs[R1] ← Regs[R1]+Mem[Regs[R2]]
Scaled
(10%) Add R1,100(R2)[R3] Regs[R1]
← Regs[R1]+Mem[100+Regs[R2]+Regs[R3]*d]
Are all of these addressing modes needed?
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
$FWXDO�XVH�RI�DGGU��PRGHV
� :KDW DGGUHVVLQJ PRGHV GRPLQDWHXVDJH"
� Immediate and Displacement are the most common addressing modes in SPEC89 on a VAX
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
,PSRUWDQW�$GGUHVVLQJ�0RGHV
Addressingmode
Exampleinstruction
Meaning When used
Immediate Add R4,#3 Regs[R4] ← Regs[R4]+ 3 For constants.
Displacement Add R4,100(R1) Regs[R4] ← Regs[R4]+
Mem[100+Regs[R1]]
Accessing local variables.
Registerdeferred orindirect
Add R4,(R1) Regs[R4] ← Regs[R4]+
Mem[Regs[R1]]
Accessing using a pointer or acomputed address.
Scaled Add R1,100(R2)[R3] Regs[R1] ← Regs[R1]+Mem[100+Regs[R2]+Regs[R3]*d]
Used to index arrays. May beapplied to any indexedaddressing mode in somemachines.
Are all of these addressing modes needed?
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
6L]H�RI�GLVSODFHPHQW
� +RZ ELJ D UDQJH VKRXOG WKHGLVSODFHPHQW YDOXH FRYHU"
� 12-16 bits cover 75% - 99% of the displacements
i.e., #bits
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
6L]H�RI�LPPHGLDWHV
� +RZ LPSRUWDQW DUH LPPHGLDWHV DQGKRZ ELJ DUH WKH\"
� Immediate operands are very important for ALU and compare operations
� 16-bit immediates seem sufficient (75%-80%)
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
2SHUDWLRQ�W\SHV�LQ�WKH�,6$
Operator type Examples
Arithmetical and logical Integer arithmetic and logical operations: add, and, subtract, or
Data transfer Loads/stores (move instructions on machines with memoryaddressing)
Control Branch, jump, procedure call and return
System Operating system call, virtual memory management instructions
Floating point Floating-point operations: add, multiply,...
Decimal Decimal add, decimal multiply, decimal-to-character conversions
String String move, string compare, string search
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
&RQWURO�LQVWUXFWLRQV
� &RQGLWLRQDO EUDQFKHV
� 8QFRQGLWLRQDO EUDQFKHV �MXPSV�
� 3URFHGXUH FDOO�UHWXUQV
Conditional branches dominate by far
Intuition: program loops are common!
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
6L]H�RI�EUDQFK�GLVSODFHPHQW
� +RZ ELJ D EUDQFK GLVSODFHPHQW LV QHHGHG"
� 8-bit branch displacements cover most cases
Intuition: most program loops are tight!
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
%UDQFK�FRQGLWLRQ�HYDOXDWLRQ
Na m e How? Adva nta ge s Dis a dva n t-a ge s
Cond itionCode (C C)
S pe cia l b itsa rem a n ipu la te d
CC s e t fo rfre e
Extra s ta te
Cond itionre g is te r
Te s t ge ne ra lpu rpos ere g is te r
s im p le Us e s upre g is te rs
Com pa rea ndbra nch
Com pa re ispa rt o fbra nch
One ins tr.Ins te a d o ftwo
Extra workpe r ins tr.
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
,QVWUXFWLRQ�IRUPDWV
� $ YDULDEOH LQVWUXFWLRQ IRUPDW \LHOGV FRPSDFW FRGHEXW LQVWUXFWLRQ GHFRGLQJ LV PRUH FRPSOH[
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
6XPPDU\��&ODVVLILFDWLRQ RI�,6$V
� ,6$V DUH FODVVLILHG ZLWK UHVSHFW WR��5HJLVWHU PRGHO
�1XPEHU RI RSHUDQGV IRU HDFK LQVWUXFWLRQ
�$GGUHVVLQJ PRGHV SHUPLWWHG
�2SHUDWLRQV SURYLGHG LQ WKH LQVWUXFWLRQ VHW
�7\SH DQG VL]H RI RSHUDQGV
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
,6$V�YHUVXV�FRPSLOHUV
Rules of thumb when designing an ISA:
� Regularity (operations, data types and addressing modes should be orthogonal)
� Provide primitives, not high-level constructs. Complex instructions are often too specialized.
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
7KH LPSDFW RI FRPSLOHU RSWLPL]DWLRQV
� Compiler optimizations affect the number of instructions as well as the distribution of executed instructions (the instruction mix)
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
$�JHQHULF�DUFKLWHFWXUH/RDG�VWRUH DUFKLWHFWXUH ��� ELWV�
� Many (32) gp integer registers (GPR) and single precision floating point registers (GPR0 = 0)
� Fixed instruction width and format
� Addressing modes: immediate and displacement
� Supported data types: bytes, half word (16 bits), word (32 bits), single and double precision IEEE floating points
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
*HQHULF�LQVWUXFWLRQV
Ins tructiontype
Example Meaning
Load LW R1,30(R2) Regs [R1] ←Mem[30+Regs[R2]]
Store SW 30(R2),R1 Mem[30+Regs[R2]] ←Regs[R1]
ALU ADD R1,R2,R3 Regs[R1] ← Regs[R2] +Regs[R3]
Control BEQZ R1,KALLE if (Regs [R1]==0) PC ← KALLE
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
6SHFLI\LQJ�KDUGZDUH
� ���Q � 7UDQVIHU Q ELWV
� 5�Q � %LW Q RI UHJLVWHU 5�
� 5����� � 7KH PRVW VLJQLILFDQW E\WH RI 5� �%LJ HQGLDQ��
� �� � $ E\WH RI DOO ]HURHV �UHSHDW WKH ILHOG Q WLPHV�
� 0>��@ � E\WH �� LQ PHPRU\
� ����0>��@ � &RQFDWHQDWH ]HUR�E\WH �06%� ZLWK WKHE\WH #PHP����
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
*HQHULF�0RYH�,QVWUXFWLRQV
� /RDG DQG 6WRUH
�/%� /%8� 6% �� E\WH FKXQNV
�/+� /+8� 6+ �� KDOI ZRUG FKXQNV
�/:� 6: �� ZRUG FKXQNV
�/)� 6) �� ZRUG FKXQNV WR IORDWLQJ SRLQW UHJV
�/'� 6' GRXEOH SUHFLVLRQ WR )3 UHJV �� UHJVSHU 23�
� 5HJLVWHU WR 5HJLVWHU PRYHV
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
([DPSOHV�+DUGZDUH�'HVFULSWLRQV
/%8 5�� ���5�� �� ORDG E\WH XQVLJQHG
5� ����� ��� �� 0>���5�@
0000000000000000000000002 M(“41”+R3)24 8
R1:
0 31
SSSSSSSSSSSSSSSS2 M(“40”+ R3) M(“41”+R3)16 8 8
R1:0 31
LH R1, 40(R3) -- load halfword (signed)R1 <--32 (M[40+R3]0)16 ## M[40+R3] ## M[41+R3]
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
*HQHULF�$/8�,QVWUXFWLRQV
� ,QWHJHU DULWKPHWLF
� >DGG� VXE@ [ >VLJQHG� XQVLJQHG@ [>UHJLVWHU�LPPHGLDWH@
� H�J�� $''� $'',� $''8� $''8,� 68%� 68%,�68%8� 68%8,
� /RJLFDO
� >DQG� RU� [RU@ [ >UHJLVWHU� LPPHGLDWH@
� H�J�� $1'� $1',� 25� 25,� ;25� ;25,
� /RDG XSSHU KDOI LPPHGLDWH ORDG
� ,W WDNHV WZR LQVWUXFWLRQ WR ORDG D �� ELW LPPHGLDWH
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
([DPSOHV +DUGZDUH 'HVFULSWLRQV ���
$'', 5�� 5�� �� �� DGG LPPHGLDWH
5� �����5� � ³�´
R2 + “6”32
R1:0 31
“42” 0000000000000000216 16
R1:0 31
LHI R1, #42 -- load high immediateR1 <--32 “42” ## 016
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
0RUH�*HQHULF�$/8�2SV
� 6KLIWV
� >OHIW� ULJKW@ [ >ORJLFDO� DULWKPHWLF@ [>LPPHGLDWH� UHJ@
�H�J�� 6//� 65$,� «
� 6HW FRQGLWLRQDO
� >OW� JW� OH� JH� HT� QH@ [ >LPPHGLDWH� UHJ@
�H�J�� 6/7� 6*(,� «
�3XWV D � RU D � LQ WKH GHVWLQDWLRQ UHJLVWHU
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
*HQHULF�,QVWUXFWLRQ�)RUPDWV
Opcode Func6 11
R-type0 31
Rs15
Rs25
Opcode Offset added to PC6 26
J-type0 31
Opcode Immediate6 16
I-type0 31
Rs5
Rd5
Rd5
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
*HQHULF�)3�,QVWUXFWLRQV
� )ORDWLQJ 3RLQW DULWKPHWLF� >DGG� VXE� PXOW� GLY@ [ >GRXEOH� VLQJOH@
�H�J�� $'''� $'')� 68%'� 68%'� «
� &RPSDUHV �VHWV ³FRPSDUH ELW´�� >OW� JW� OH� JH� HT� QH@ [ >GRXEOH�LPPHGLDWH@
�H�J�� /7'� *()� «
� &RQYHUW IURP�WR LQWHJHU� )SUHJV�&97)�,� &97)�'� &97,�'� «
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
6LPSOH�&RQWURO
� %UDQFKHV LI HTXDO RU LI QRW HTXDO� %(4=� %1(=� FPS WR UHJLVWHU� 3& � 3&���LPPHGLDWH
��
� %)37� %)3)� FPS WR ³)3 FRPSDUH ELW´� 3& � 3&���LPPHGLDWH
��
� -XPSV� -� -XPS �� 3& � 3& � LPPHGLDWH
��
� -$/� -XPS $QG /LQN �� 5�� � 3&��� 3& � 3& �LPPHGLDWH
��
� -$/5� -XPS $QG /LQN 5HJLVWHU �� 5�� � 3&��� 3& � 3& �5HJ
� -5� -XPS 5HJLVWHU �� 3& � 3& � 5HJ �³UHWXUQ IURP -$/ RU-$/5�
� 75$3� JR WR 26 YLD D YHFWRUHG DGGUHVV �PRUH ODWHU�
� 57)� 5H7XUQ )URP XVHU FRGH �PRUH ODWHU�
28 November 2002
,PSOHPHQWLQJ�,6$V���SLSHOLQHV�
(ULN +DJHUVWHQ
8SSVDOD 8QLYHUVLW\
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
(;$03/(� SLSHOLQH LPSOHPHQWDWLRQ
$GG 5�� 5�� 5�
$
0HP
, 5 ; :
5HJV�
� �
23� �,IHWFK
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
/RDG 2SHUDWLRQ�
/' 5�� PHP>FQVW�5�@
$
0HP
, 5 ; :
5HJV�
,IHWFK
�
�
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
6WRUH 2SHUDWLRQ�
67 PHP>FQVW�5�@� 5�
$
0HP
, 5 ; :
5HJV�
�
,IHWFK�
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
([DPSOH� ��VWDJH SLSHOLQH
IF ID EX M WB
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
([DPSOH����VWDJH�SLSHOLQH
IF ID EX M WB
(d)
s1
s2
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
([DPSOH����VWDJH�SLSHOLQH
IF ID EX M WB
(d)
s1
s2
st data
pc
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
([DPSOH� ��VWDJH SLSHOLQH
IF ID EX M WB
(d)
s1
s2
st data
pc
dest dataearlyreg write
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
,QLWLDOO\
'&%$
0HP
, 5 ; :
5HJV
/' 5HJ$� ���� � 5HJ&�
,) 5HJ& � ��� *272 $
5HJ% � 5HJ$ � �
5HJ& � 5HJ& � �
3&
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
&\FOH �
'&%$
0HP
, 5 ; :
5HJV
/' 5HJ$� ���� � 5HJ&�
,) 5HJ& � ��� *272 $
5HJ% � 5HJ$ � �
5HJ& � 5HJ& � �
3&
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
&\FOH �
'&%$
0HP
, 5 ; :
5HJV
/' 5HJ$� ���� � 5HJ&�
,) 5HJ& � ��� *272 $
5HJ% � 5HJ$ � �
5HJ& � 5HJ& � �
3&
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
&\FOH �
'&%$
0HP
, 5 ; :
5HJV
/' 5HJ$� ���� � 5HJ&�
,) 5HJ& � ��� *272 $
5HJ% � 5HJ$ � �
5HJ& � 5HJ& � �
�
3&
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
&\FOH �
'&%$
0HP
, 5 ; :
5HJV
/' 5HJ$� ���� � 5HJ&�
,) 5HJ& � ��� *272 $
5HJ% � 5HJ$ � �
5HJ& � 5HJ& � �
�
3&
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
&\FOH �
'&%$
0HP
, 5 ; :
5HJV
/' 5HJ$� ���� � 5HJ&�
,) 5HJ& � ��� *272 $
5HJ% � 5HJ$ � �
5HJ& � 5HJ& � �
�
3&
$
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
&\FOH �
'&%$
0HP
, 5 ; :
5HJV
/' 5HJ$� ���� � 5HJ&�
,) 5HJ& � ��� *272 $
5HJ% � 5HJ$ � �
5HJ& � 5HJ& � �
�
3&
$
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
&\FOH �
'&%$
0HP
, 5 ; :
5HJV
/' 5HJ$� ���� � 5HJ&�
,) 5HJ& � ��� *272 $
5HJ% � 5HJ$ � �
5HJ& � 5HJ& � �
3&
$
%UDQFK Î 1H[W 3&
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
&\FOH �
'&%$
0HP
, 5 ; :
5HJV
/' 5HJ$� ���� � 5HJ&�
,) 5HJ& � ��� *272 $
5HJ% � 5HJ$ � �
5HJ& � 5HJ& � �
3&
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
3LSHOLQH�&KDOOHQJHV
� %DODQFH WKH SLSHOLQH VWDJHV
� +DQGOH WKH IHHG�EDFN FDVHV
� 0LQLPL]H SLSHOLQH VWDOOV
� 3UHGLFW DQG SHUIRUP VSHFXODWLYH ZRUN
� 8QGR VSHFXODWLYH ZRUN
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
)XQGDPHQWDO�OLPLWDWLRQV+D]DUGV SUHYHQW LQVWUXFWLRQV IURP H[HFXWLQJ LQ SDUDOOHO�
Structural hazards: Simultaneous use of same resourceIf unified I+D$: LW will conflict with later I-fetch
Data hazards: Data dependencies between instructionsLW R1, 100(R2) /* result avail in 2 - 100 cycles */ADD R5, R1, R7
Control hazards: Change in program flowBNEQ R1, #OFFSETADD R5, R2, R3
Serialization of the execution by stalling the pipelineis one, although inefficient, way to avoid hazards
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
)XQGDPHQWDO W\SHV RI GDWD KD]DUGV
Code sequence Opi A
Opi+1A
5$: �5HDG�$IWHU�:ULWH�
2SL�� UHDGV $ EHIRUH 2SL PRGLILHV $� 2SL�� UHDGV ROG $�
WAR (Write-After-Read) Opi+1 modifies A before Opi reads A. Opi reads new A
WAW (Write-After-Write) Op i+1 modifies A before Opi. The value in A is the one written by Opi, i.e., an old A.
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
+D]DUG�DYRLGDQFH�WHFKQLTXHV�
6WDWLF WHFKQLTXHV �FRPSLOHU�� FRGHVFKHGXOLQJ WR DYRLG KD]DUGV
'\QDPLF WHFKQLTXHV� KDUGZDUH PHFKDQLVPVWR HOLPLQDWH RU UHGXFH LPSDFW RI KD]DUGV �H�J��RXW�RI�RUGHU VWXII�
+\EULG WHFKQLTXHV� UHO\ RQ FRPSLOHU DV ZHOO DVKDUGZDUH WHFKQLTXHV WR UHVROYH KD]DUGV �H�J�9/,: VXSSRUW ± ODWHU�
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
)L[��%\SDVV�KDUGZDUH
IF ID EX M WB
� )RUZDUGLQJ �RU E\SDVVLQJ��SURYLGH D GLUHFW SDWK IURP 0 DQG:% WR (;
� 2QO\ KHOSV IRU $/8 RSV� :KDWDERXW ORDG RSHUDWLRQV"
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
$�PRUH�RSWLPL]HG�'/;�SLSHOLQH���
IF ID EX M WB
(d)
s1
s2
st data
pc
dest data
Data$DTLB…L2$…Mem
Instr$ITLB…L2$…Mem
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
'DWD GHSHQGHQF\ //
'&%$
0HP
, 5 ; :
5HJV
/' 5HJ$� ���� � 5HJ&�
,) 5HJ& � ��� *272 $
5HJ% � 5HJ$ � �
5HJ& � 5HJ& � �
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
&\FOH �
'&%$
0HP
, 5 ; :
5HJV
/' 5HJ$� ���� � 5HJ&�
,) 5HJ& � ��� *272 $
5HJ% � 5HJ$ � �
5HJ& � 5HJ& � �
�
3&
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
)L[� FRGH VFKHGXOLQJ
'
&%
$
0HP
, 5 ; :
5HJV
/' 5HJ$� ���� � 5HJ&�
,) 5HJ& � ��� *272 $
5HJ% � 5HJ$ � �
5HJ& � 5HJ& � �6ZDS��
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
%UDQFK GHOD\V
'
&
%
$
0HP
, 5 ; :
5HJV
/' 5HJ$� ���� � 5HJ&�
,) 5HJ& � ��� *272 $
5HJ% � 5HJ$ � �
5HJ& � 5HJ& � �
´6WDOO´
$
� F\FOHV SHU LWHUDWLRQ RI � LQVWUXFWLRQV /1HHG ORQJHU EDVLF EORFNV ZLWK LQGHSHQGHQW LQVWU�
%UDQFK Î 1H[W 3&
´6WDOO´
´6WDOO´
´6WDOO´
1H[W 3&
3&
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
$YRLGLQJ�FRQWURO�KD]DUGV
IF ID EX M WB
Branch conditionand target addr. needed here
Branch conditionand target addr. available here
'XSOLFDWH UHVRXUFHV LQ $/8 WR FRPSXWH EUDQFK FRQGLWLRQDQG EUDQFK WDUJHW DGGUHVV HDUOLHU
%UDQFK GHOD\ FDQQRW EH FRPSOHWHO\ HOLPLQDWHG
%UDQFK SUHGLFWLRQ DQG FRGH VFKHGXOLQJ FDQUHGXFH WKH EUDQFK SHQDOW\
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
7DNLQJ�D�%UDQFK
IF ID EX M WB
(d)
s1
s2
st data
pc
dest data
PC := PC + Imm
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
)L[�� 0LQLPL]LQJ %UDQFK 'HOD\ (IIHFWV
IF ID EX M WB
(d)
s1
s2
st data
pc
dest data
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
)L[���6WDWLF�WULFNV
Predict Branch not taken (a fairly rare case)zExecute successor instructions in sequence
z“Squash” instructions in pipeline if the branch is actually taken
zWorks well if state is updated late in the pipeline
z30%-38% of conditional branches are not taken on average
Predict Branch taken (a fairly common case)z62%-70% of conditional branches are taken on average
zDoes not make sense for the generic arch. but may do for other pipeline organizations
Delayed branch (schedule useful instr. in delay slot)zDefine branch to take place aftera following instruction
zCONS: this is visible to SW, i.e., forces compatibility between generations
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
6WDWLF�VFKHGXOLQJ�WR�DYRLG�VWDOOV
� 6FKHGXOLQJ DQ LQVWUXFWLRQ IURP EHIRUH LV DOZD\V VDIH
� 6FKHGXOLQJ IURP WDUJHW RU IURP WKH QRW�WDNHQ SDWK LV QRW DOZD\VVDIH� PXVW EH JXDUDQWHHG WKDW VSHFXODWLYH LQVWU� GR QR KDUP�
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
(YDOXDWLQJ EUDQFK KD]DUG DYRLGDQFH WHFKQLTXHV
Pipeline speedup = #stages1 + Branch frequency x Branch penalty
S c h e d u lin gs c h e m e
B ra n c h p e n a lt y f o rin t e g . P g m
C P I S p e e d u p v s.U n p ip e l in e d
S t a l l p ip e l in e 1 1 . 1 7 4 . 3
P re d ic t t a k e n 1 1 . 1 7 4 . 3
P re d ic t n o tt a k e n
0 . 6 9 1 . 1 2 4 . 5
D e la y e d b ra n c h 0 . 2 1 1 . 0 4 4 . 8
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
),;�� 3UHGLFW QH[W 3&
'&%$
0HP
5 ;:
5HJV
/' 5HJ$� ���� � 5HJ&�
,) 5HJ& � ��� *272 $
5HJ% � 5HJ$ � �
5HJ& � 5HJ& � �
3&
$
%UDQFK Î 1H[W 3&
EXEEOH
EXEEOH
EXEEOH
,
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
%UDQFK 3UHGLFWRU %DVHG RQ +LVWRU\
'&%$
0HP
5 ;:
5HJV
/' 5HJ$� ���� � 5HJ&�
,) 5HJ& � ��� *272 $
5HJ% � 5HJ$ � �
5HJ& � 5HJ& � �
3&
$
%UDQFK Î 1H[W 3&
3&
,3&
$GGUHVV 7DJ
1H[W3&
1H[W )HZ ,QVWUXFWLRQ
%UDQFK7DUJHW
%XIIHU �L�H�� &DFKH�
*XHVV WKH QH[W3& KHUH��
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
'\QDPLF�EUDQFK�SUHGLFWLRQ
%UDQFKHV OLPLW SHUIRUPDQFH EHFDXVH�
� %UDQFK SHQDOWLHV DUH KLJK
� 3UHYHQW D ORW RI ,/3 IURP EHLQJ H[SORLWHG
Solution: Dynamic branch predictionto predict the outcome of conditional branches.
Benefits:PReduce time to determine branch conditionPReduce time to calculate the branch target address
28 November 2002
0RUH�RQ�'\QDPLF�%UDQFK�3UHGLFWLRQ
(ULN +DJHUVWHQ
8SSVDOD 8QLYHUVLW\
6ZHGHQ
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
%UDQFK�KLVWRU\�WDEOH$ VLPSOH EUDQFK SUHGLFWLRQ VFKHPH
IF ID EX MEM WB
PC
1
0
0
1
0
1
0
branches predicted as taken
� 7KH EUDQFK�SUHGLFWLRQ EXIIHU LV LQGH[HG E\ ELWV IURPEUDQFK�LQVWUXFWLRQ 3& YDOXHV
� ,I SUHGLFWLRQ LV ZURQJ� WKHQ LQYHUW SUHGLFWLRQProblem: can cause two mispredictions in a row
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
$�WZR�ELW�SUHGLFWLRQ�VFKHPH
Predict Not taken
Predict taken Predict taken
Predict Not taken
Not taken
Taken
Not taken
Taken
Taken Not taken
Taken
Not taken
� 5HTXLUHV SUHGLFWLRQ WR PLVV WZLFH LQ RUGHU WR FKDQJHSUHGLFWLRQ ! EHWWHU SHUIRUPDQFH
� 3HUIRUPDQFH�� ������ PLVV SUHGLFWLRQ IUHTXHQF\ IRU 63(&��
� ,QWHJHU SURJUDPV KDYH KLJKHU PLVV IUHTXHQF\ WKDQ )3SURJUDPV
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
'\QDPLF 6FKHGXOLQJ 3DVW %UDQFKHV
/'
$''
68%
67
! �"
/'
$''
68%
67
!�"
/'
$''
68%
67
!�"
/'
$''
68%
67
�"
<
<
<
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
1�OHYHO�KLVWRU\
� 1RW RQO\ WKH 3& RI WKH %5 LQVWUXFWLRQ PDWWHUV�DOVR KRZ \RX¶YH JRW WKHUH LV LPSRUWDQW
� $SSURDFK�� 5HFRUG WKH RXWFRPH RI WKH ODVW 1 EUDQFKHV LQ DYHFWRU RI 1 ELWV
� ,QFOXGH WKH ELWV LQ WKH LQGH[LQJ RI WKH EUDQFK WDEOH
� 3URV�&RQV� 6DPH %5 LQVWUXFWLRQ PD\ KDYHPXOWLSOH HQWULHV LQ WKH EUDQFK WDEOH
�1�0� SUHGLFWLRQ 1 OHYHOV RI 0�ELW SUHGLFWLRQ
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
7RXUQDPHQW�SUHGLFWLRQ�
� ,VVXHV�� 1R RQH SUHGLFWRU VXLWV DOO DSSOLFDWLRQV
� $SSURDFK�� ,PSOHPHQW VHYHUDO SUHGLFWRUV DQG G\QDPLFDOO\VHOHFW WKH PRVW DSSURSULDWH RQH
� 3HUIRUPDQFH H[DPSOH 63(&���� ��ELW SUHGLFWLRQ� �� PLVV SUHGLFWLRQ
� ����� ��OHYHO� ��ELW� �� PLVV SUHGLFWLRQ
� 7RXUQDPHQWV� �� PLVV SUHGLFWLRQ
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
5HWXUQ�DGGUHVV�VWDFN
� 3RSXODU VXEURXWLQHV DUH FDOOHG IURPPDQ\ SODFHV LQ WKH FRGH�
� %UDQFK SUHGLFWLRQ PD\ EH FRQIXVHG��
� 0D\ KXUW RWKHU SUHGLFWLRQV
� 1HZ DSSURDFK�
�3XVK WKH UHWXUQ DGGUHVV RQ D >VPDOO@ VWDFNDW WKH WLPH RI WKH FDOO
�3RS DGGUHVVHV RQ UHWXUQ
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
%UDQFK�WDUJHW�EXIIHU
P Predicts branch target address in the IF stage
P Can be combined with 2-bit branch prediction
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
3XWWLQJ�LW�WRJHWKHU
� %7% VWRUHV LQIR DERXW WDNHQ LQVWUXFWLRQV
� &RPELQHG ZLWK D VHSDUDWH EUDQFK KLVWRU\WDEOH
� ,QVWUXFWLRQ IHWFK VWDJH KLJKO\ LQWHJUDWHG IRUEUDQFK RSWLPL]DWLRQV
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
)ROGLQJ�EUDQFKHV
� %7% RIWHQ FRQWDLQV WKH QH[W IHZLQVWUXFWLRQV DW WKH GHVWLQDWLRQ DGGUHVV
� 8QFRQGLWLRQDO EUDQFKHV �DQG VRPH FRQGDV ZHOO� EUDQFKHV H[HFXWH LQ ]HUR F\FOHV
�([HFXWH WKH GHVW LQVWUXFWLRQ LQVWHDG RI WKHEUDQFK �LI WKHUH LV D KLW LQ WKH %7% DW WKH ,) VWDJH�
� ´%UDQFK IROGLQJ´
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
%UDQFK�SUHGLFWLRQ�SHQDOWLHVIRU D %UDQFK 7DUJHW %XIIHU VFKHPH
Ins tructionin buffe r
Pred ic tion Actualbranch
Penaltycyc les
Yes Taken Taken 0
Yes Taken Not taken 2
Yes Not taken Not taken 0
Yes Not taken Taken 2
No Taken 2
No Not taken 1
,WDQLXP� UDQJLQJ IURP � WR � F\FOHV«
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
([FHSWLRQ�KDQGOLQJ�LQ�SLSHOLQHV
0XVW UHVWDUW DQ LQVWUXFWLRQ WKDW FDXVHV DQ H[FHSWLRQ�LQWHUUXSW� WUDS� IDXOW� ³SUHFLVH LQWHUUXSWV´
�«DV ZHOO DV DOO LQVWUXFWLRQV IROORZLQJ LW��
A solution:
1. Force a trap instruction into the pipeline
2. Turn off all writes for the faulting instruction
3. Save the PC for the faulting instruction- to be used in return from exception
- may need to save multiple PC values
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
*XDUDQWHHLQJ WKH H[HFXWLRQ RUGHU
Exceptions may be generated in another order than the instruction execution order
Pipeline stage Problem causing exception
IF Page fault on instruction fetch;misaligned memory access; memoryprotection violation
ID Undefined or illegal opcode
EX Arithmetic exception
MEM Page fault on data access;misaligned memory access; memoryprotection violation
WB none
([DPSOH VHTXHQFH�
OZ �H�J�� SDJH IDXOW LQ 0(0�
DGG �H�J�� SDJH IDXOW LQ ,)�
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK��
'$5.�
����
6XPPDU\�3LSHOLQLQJ
3LSHOLQLQJ�
�6SHHGV XS WKURXJKSXW� QRW ODWHQF\
�6SHHGXS ≤ �VWDJHV
�+D]DUGV DUH IXQGDPHQWDO OLPLWV�
� 6WUXFWXUDO� QHHG PRUH +:
� 'DWD �5$:�:$5�:$:�� QHHG IRUZDUGLQJ DQGFRPSLOHU VFKHGXOLQJ
� &RQWURO� GHOD\HG EUDQFK
�3UHFLVH H[FHSWLRQV D PDMRU KHDGDFKH
2YHUODSSLQJ�([HFXWLRQ
(ULN +DJHUVWHQ
8SSVDOD 8QLYHUVLW\
6ZHGHQ
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
0XOWLF\FOH�RSHUDWLRQV�LQ�WKH�SLSHOLQH��IORDWLQJ�SRLQW�
� ,QWHJHU XQLW� +DQGOHV LQWHJHU LQVWUXFWLRQV� EUDQFKHV� DQG ORDGV�VWRUHV
� 2WKHU XQLWV� 0D\ WDNH VHYHUDO F\FOHV HDFK� 6RPH XQLWV DUH SLSHOLQHG
�PXOW�DGG� RWKHUV DUH QRW �GLY�
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
3DUDOOHOLVP EHWZHHQ LQWHJHU DQG )3 LQVWUXFWLRQV
MU LTD F 2 ,F 4 ,F 6 IF ID M 1 M2 M3 M4 M5 M6 M7 ME M W B
AD D D F 8 ,F 1 0 ,F 1 2 IF ID A 1 A2 A3 A 4 ME M W B
S U B I R 2 ,R 3 ,# 8 IF ID E X ME M W B
LD F 1 4 ,0 (R 2 ) IF ID E X ME M W B
+RZ WR DYRLG VWUXFWXUDO DQG 5$: KD]DUGV�
6WDOO LQ ,' VWDJH ZKHQ� 7KH IXQFWLRQDO XQLW FDQ EH RFFXSLHG
� 0DQ\ LQVWUXFWLRQV FDQ UHDFK WKH :% VWDJH DW WKH VDPHWLPH
5$: KD]DUGV�� 1RUPDO E\SDVVLQJ IURP 0(0 DQG :% VWDJHV
� 6WDOO LQ ,' VWDJH LI DQ\ RI WKH VRXUFH RSHUDQGV LV DGHVWLQDWLRQ RSHUDQG RI DQ LQVWUXFWLRQ LQ DQ\ RI WKH )3
IXQFWLRQDO XQLWV'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
:$5�DQG�:$:�KD]DUGV�IRU�PXOWLF\FOH�RSHUDWLRQV
�:$5 KD]DUGV DUH D QRQ�LVVXH EHFDXVHRSHUDQGV DUH UHDG LQ SURJUDP RUGHU
Example of a WAW hazard:DIVF F0,F2,F4 FP divide 24 cycles...SUBF F0,F8,F10 FP sub 3 cycles
SUB finishes before DIV ; out-of-order completion
WAW hazards are avoided by:zstalling the SUBF until DIVF reaches the MEM stage, or
zdisabling the write to register F0 for the DIVF instruction
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
)3�([FHSWLRQV
([DPSOH� ',9) )��)��)� �� F\FOHV
$'') )���)���)� � F\FOHV
68%) )���)���)�� � F\FOHV
SUBF may generate a trap before DIVF has completed!!
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
'\QDPLF�,QVWUXFWLRQ 6FKHGXOLQJ�ZLOO RQO\ EH WRXFKHG EULHIO\ LQ WKLV FRXUVH�
.H\ LGHD� DOORZ VXEVHTXHQW LQGHSHQGHQW LQVWUXFWLRQV WR SURFHHG
',9' )��)��)� � WDNHV ORQJ WLPH
$''' )���)��)� � VWDOOV ZDLWLQJ IRU )�
68%' )���)��)�� � /HW WKLV LQVWU� E\SDVV WKH $'''
� (QDEOHV RXW�RI�RUGHU H[HFXWLRQ � RXW�RI�RUGHU FRPSOHWLRQ�
IF ID M WBEX
Two historical schemes used in “recent” machines:
Tomasuloin IBM 360/91 in 1967 (also in Power-2)
Scoreboarddates back to CDC 6600 in 1963
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6LPSOH�6FRUHERDUG�3LSHOLQH
IF
Rea
d op
eran
ds
WriteIssue
MemID stage
� ,VVXH� 'HFRGH DQG FKHFN IRU VWUXFWXUDO KD]DUGV
� 5HDG RSHUDQGV� ZDLW XQWLO QR GDWD KD]DUG� WKHQ UHDG RSHUDQGV�5$:�
� $OO GDWD KD]DUGV DUH KDQGOHG E\ WKH VFRUHERDUG PHFKDQLVP
:U5HJ
6FRUERDUG
,VVXH
,QW
0HP
)3
$GG
)3
0XO�
)3
0XO�
)3
'LY
0HP
5G5HJ
,)
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
([WHQGHG�6FRUHERDUG�
,VVXH� ,QVWUXFWLRQ LV LVVXHG ZKHQ�
1R VWUXFWXUDO KD]DUG IRU D IXQFWLRQDO XQLW
1R :$: ZLWK DQ LQVWUXFWLRQ LQ H[HFXWLRQ
5HDG� ,QVWUXFWLRQ UHDGV RSHUDQGV ZKHQ
WKH\ EHFRPH DYDLODEOH �5$:�
(;� 1RUPDO H[HFXWLRQ
:ULWH� ,QVWUXFWLRQ ZULWHV ZKHQ DOO SUHYLRXV LQVWUXFWLRQV
KDYH UHDG RU ZULWWHQ WKLV RSHUDQG �:$:� :$5�
The scoreboard is updated when an instruction proceedsto a new stage
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
/LPLWDWLRQV�ZLWK�VFRUHERDUGV�
7KH VFRUHERDUG WHFKQLTXH LV OLPLWHG E\�
� 1XPEHU RI VFRUHERDUG HQWULHV �ZLQGRZ VL]H�
� 1XPEHU DQG W\SHV RI IXQFWLRQDO XQLWV
� 1XPEHU RI SRUWV WR WKH UHJLVWHU EDQN
� +D]DUGV FDXVHG E\ QDPH GHSHQGHQFLHV
Tomasulo’s algorithm addresses the last two limitations
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
$QRWKHU�H[DPSOHDIV F0,F2,F4
ADDD F6, F0, F8
SUBD F8,F10,F14
MULD F6,F10, F8
5$:
:$5
5$:
:$:
�GHOD\HG D ORQJ WLPH
�GHOD\HG D IHZ F\FOHV
�H[HFXWHG ULJKW DZD\
:$5 DQG :$: DYRLGHG WKURXJK ´UHJLVWHU UHQDPLQJ´
DIV F0,F2,F4
ADDD F6, F0, F8
SUBD tmp1 ,F10,F14 ;can be executed right away
MULD tmp2 ,F10, tmp1 ;delayed a few cycles
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
7RPDVXOR¶V�$OJRULWKP�WRXFKHG�EULHIO\ LQ�WKLV�FRXUVH�
� ,%0 ������ PLG ��¶V
� +LJK SHUIRUPDQFH ZLWKRXW FRPSLOHU VXSSRUW
� ([WHQGHG IRU PRGHUQ DUFKLWHFWXUHV
� 0DQ\ LPSOHPHQWDWLRQV �3RZHU3&� 3HQWLXP«�
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6LPSOH�7RPDVXOR¶V�$OJRULWKP
IF
Rea
d op
eran
ds
Issue
Mem
,VVXH
,QW
0HP
)3
$GG
)3
0XO�
)3
0XO�
)3
'LY
0HP
,)
5HV�
6WDWLRQ
��D����E����F����G����H��
&RPPRQ
'DWD %XV �&'%�
2S�GLY
'�)�
6��)�
6��)�
��
� � � � � � � � �
5H2UGHU
%XIIHU �52%�
2S�GLY
'�)�
6��F
6��I
��
2S�GLY
'�)�
6��F
6��I
��
:ULWH
6WDJH
5HJ� :ULWH
3DWK
#3 DIV F0,F2,F4#4 ADDD F6, F0, F8#5 SUBD F8,F10,F14#6 MULD F6,F10, F8
'�)�
9�F�I
��
)�
'�)�
9�F�I
��
��F�I
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6LPSOH�7RPDVXOR¶V�$OJRULWKP
IF
Rea
d op
eran
ds
Issue
Mem
,VVXH
,QW
0HP
)3
$GG
)3
0XO�
)3
0XO�
)3
'LY
0HP
,)
5HV�
6WDWLRQ
��D
��
��E
��
��F
��
��G
��
��H
��
&RPPRQ
'DWD %XV �&'%�
2S
'
6�
6�
�
� � � � � � � � �
5H2UGHU
%XIIHU �52%�
2S
'�)�
6��Y
6��Y
�
2S
'
6��Y�SWU
6��Y�SWU
�
:ULWH
6WDJH
5HJ� :ULWH
3DWK
#3 DIV F0,F2,F4#4 ADDD F6, F0, F8#5 SUBD F8,F10,F14#6 MULD F6,F10, F8
'
DQVZ
�
'
'
DQVZ
�
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6LPSOH�7RPDVXOR¶V�$OJRULWKP
IF
Rea
d op
eran
ds
Issue
Mem
,VVXH
,QW
0HP
)3
$GG
)3
0XO�
)3
0XO�
)3
'LY
0HP
,)
5HV�
6WDWLRQ
��D
��
��E
��
��F
��
��G
��
��H
��
&RPPRQ
'DWD %XV �&'%�
� � � � � � � � �
5H2UGHU
%XIIHU �52%�
:ULWH
6WDJH
5HJ� :ULWH
3DWK
#3 DIV F0,F2,F4#4 ADDD F6,F0,F8#5 SUBD F8,F10,F14#6 MULD F6,F10,F8
�
'
2S
'
6�
6�
�
2S
'�)�
6��Y
6��Y
�
2S
'
6��Y�SWU
6��Y�SWU
�
'
DQVZ
'
DQVZ
�
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6LPSOH�7RPDVXOR¶V�$OJRULWKP
IF
Rea
d op
eran
ds
Issue
Mem
,VVXH
,QW
0HP
)3
$GG
)3
0XO�
)3
0XO�
)3
'LY
0HP
,)
5HV�
6WDWLRQ
��D
��
��E
��
��F
��
��G
��
��H
��
&RPPRQ
'DWD %XV �&'%�
� � � � � � � � �
5H2UGHU
%XIIHU �52%�
:ULWH
6WDJH
5HJ� :ULWH
3DWK
#3 DIV F0,F2,F4#4 ADDD F6,F0,F8#5 SUBD F8,F10,F14#6 MULD F6,F10,F8
�
'
��
2S
'
6�
6�
�
2S
'�)�
6��Y
6��Y
�
2S
'
6��Y�SWU
6��Y�SWU
�
'
DQVZ
'
DQVZ
�
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6LPSOH�7RPDVXOR¶V�$OJRULWKP
IF
Rea
d op
eran
ds
Issue
Mem
,VVXH
,QW
0HP
)3
$GG
)3
0XO�
)3
0XO�
)3
'LY
0HP
,)
5HV�
6WDWLRQ
��D
��
��E
��
��F
��
��G
��
��H
��
&RPPRQ
'DWD %XV �&'%�
� � � � � � � � �
5H2UGHU
%XIIHU �52%�
:ULWH
6WDJH
5HJ� :ULWH
3DWK
#3 DIV F0,F2,F4#4 ADDD F6,F0,F8#5 SUBD F8,F10,F14#6 MULD F6,F10,F8
�
'
��
2S
'
6�
6�
�
2S
'�)�
6��Y
6��Y
�
2S
'
6��Y�SWU
6��Y�SWU
�
'
DQVZ
'
DQVZ
�
��
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6LPSOH�7RPDVXOR¶V�$OJRULWKP
IF
Rea
d op
eran
ds
Issue
Mem
,VVXH
,QW
0HP
)3
$GG
)3
0XO�
)3
0XO�
)3
'LY
0HP
,)
5HV�
6WDWLRQ
��D
��
��E
��
��F
��
��G
��
��H
��
&RPPRQ
'DWD %XV �&'%�
� � � � � � � � �
5H2UGHU
%XIIHU �52%�
:ULWH
6WDJH
5HJ� :ULWH
3DWK
#3 DIV F0,F2,F4#4 ADDD F6,F0,F8#5 SUBD F8,F10,F14#6 MULD F6,F10,F8
�
'
��
2S
'
6�
6�
�
2S
'�)�
6��Y
6��Y
�
2S
'
6��Y�SWU
6��Y�SWU
�
'
DQVZ
'
DQVZ
�
��
��
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6LPSOH�7RPDVXOR¶V�$OJRULWKP
IF
Rea
d op
eran
ds
Issue
Mem
,VVXH
,QW
0HP
)3
$GG
)3
0XO�
)3
0XO�
)3
'LY
0HP
,)
5HV�
6WDWLRQ
��D
��
��E
��
��F
��
��G
��
��H
��
&RPPRQ
'DWD %XV �&'%�
� � � � � � � � �
5H2UGHU
%XIIHU �52%�
:ULWH
6WDJH
5HJ� :ULWH
3DWK
#3 DIV F0,F2,F4#4 ADDD F6,F0,F8#5 SUBD F8,F10,F14#6 MULD F6,F10,F8
�
'
��
2S
'
6�
6�
�
2S
'�)�
6��Y
6��Y
�
2S
'
6��Y�SWU
6��Y�SWU
�
'
DQVZ
'
DQVZ
�
��
E�F
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6LPSOH�7RPDVXOR¶V�$OJRULWKP
IF
Rea
d op
eran
ds
Issue
Mem
,VVXH
,QW
0HP
)3
$GG
)3
0XO�
)3
0XO�
)3
'LY
0HP
,)
5HV�
6WDWLRQ
��D
��
��E
��
��F
��
��G
��
��H
��
&RPPRQ
'DWD %XV �&'%�
� � � � � � � � �
5H2UGHU
%XIIHU �52%�
:ULWH
6WDJH
5HJ� :ULWH
3DWK
#3 DIV F0,F2,F4#4 ADDD F6,F0,F8#5 SUBD F8,F10,F14#6 MULD F6,F10,F8
�
'
��
2S
'
6�
6�
�
2S
'�)�
6��Y
6��Y
�
2S
'
6��Y�SWU
6��Y�SWU
�
'
DQVZ
'
DQVZ
�
��
E�F
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6XPPLQJ�XS�7RPDVXOR¶V
� 2XW�RI�RUGHU �2�2�2� H[HFXWLRQ
� ,Q RUGHU FRPPLW� $OORZV IRU VSHFXODWLYH H[HFXWLRQ �EH\RQG EUDQFKHV�
� $OORZV IRU SUHFLVH H[FHSWLRQV
� 'LVWULEXWHG LPSOHPHQWDWLRQ� 5HVHUYDWLRQ VWDWLRQV ± ZDLW IRU 5$: UHVROXWLRQ
� 5HRUGHU %XIIHU �52%�
� &RPPRQ 'DWD %XV ´VQRRSV´ �&'%�
� ´5HJLVWHU UHQDPLQJ´ DYRLGV :$:� :$5
� &RVWO\ WR LPSOHPHQW �FRPSOH[LW\ DQG SRZHU�
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
'\QDPLF 6FKHGXOLQJ 3DVW %UDQFKHV
/'
$''
68%
67
! �"
/'
$''
68%
67
!�"
/'
$''
68%
67
��"
/'
$''
68%
67
�"
<
<
<
´3UHGLFW WDNHQ´
´3UHGLFW WDNHQ´
´3UHGLFW WDNHQ´
6FKHGXOH VSHFXODWLYHLQVWUXFWLRQV SDVW EUDQFKHV
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
'\QDPLF 6FKHGXOLQJ 3DVW %UDQFKHV
/'
$''
68%
67
! �"
/'
$''
68%
67
!�"
/'
$''
68%
67
��"
/'
$''
68%
67
�"
<
<
<
:URQJ 3UHGLFWLRQ���
'R QRWFRPPLW�
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
5HYLVLWLQJ�([FHSWLRQV�
$ SLSHOLQH LPSOHPHQWV SUHFLVH LQWHUUXSWV LII�
$OO LQVWUXFWLRQV EHIRUH WKH IDXOWLQJ LQVWUXFWLRQ FDQFRPSOHWH
$OO LQVWUXFWLRQV DIWHU �DQG LQFOXGLQJ� WKHIDXOWLQJ LQVWUXFWLRQ PXVW QRW FKDQJH WKHV\VWHP VWDWH DQG PXVW EH UHVWDUWDEOH
6WDWLF�6FKHGXOLQJRI�,QVWUXFWLRQV�
(ULN +DJHUVWHQ
8SSVDOD 8QLYHUVLW\
6ZHGHQ
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
$UFKLWHFWXUDO�DVVXPSWLRQV
)URP 7R /DWHQF\
)3 $/8 )3 $/8 �
)3 $/8 6' �
/' )3 $/8 �
/DWHQF\ QXPEHU RI F\FOHV EHWZHHQ WKHWZR LQVWUXFWLRQV
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6FKHGXOLQJ�H[DPSOHfor (i=1; i<=1000; i=i+1)
x[i] = x[i] + 10;
Iterations are independent => parallel execution
ORRS� /' )�� ��5�� � )� DUUD\ HOHPHQW
$''' )�� )�� )� � $GG VFDODU FRQVWDQW
6' ��5��� )� � 6DYH UHVXOW
68%, 5�� 5�� �� � GHFUHPHQW DUUD\ SWU�
%1(= 5�� ORRS � UHLWHUDWH LI 5� � �
Can we eliminate all penalties in each iteration?
How about moving SD down?
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6FKHGXOLQJ�LQ�HDFK�ORRS�LWHUDWLRQ
ORRS� /' )�� ��5��
VWDOO
$''' )�� )�� )�
VWDOO
VWDOO
6' ��5��� )�
68%, 5�� 5�� ��
%1(= 5�� ORRS
VWDOO
Original loop
Can we do better by scheduling across iterations?
5 instruction + 4 bubbles = 9 cycles / iteration(~one cycle per iteration on a vector architecture)
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
9HFWRU�DUFKLWHFWXUHV��D IRRWQRWH�
&5$<� 1(&� )XMLWVX� «
� � YHFWRU UHJLVWHU FRQWDLQV �� YHFWRU HQWULHV HDFK
� $ VLQJOH /'�67 LQVWU ORDGV�VWRUHV HQWLUH YHFWRUV
� $ VLQJOH $/8 LQVWU 9� Í 9� RS 9�
� �� ELW PDVN YHFWRUV PDNH H[HFXWLRQ FRQGLWLRQDO
� 2YHUODSV 0HP DQG $/8 RSV
� 2QH IRUP RI ´6,0'´ �� 6LQJOH ,QVWUXFWLRQ 0XOWLSOH 'DWD
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6FKHGXOLQJ�LQ�HDFK�ORRS�LWHUDWLRQ
ORRS� /' )�� ��5��
VWDOO
$''' )�� )�� )�
VWDOO
VWDOO
6' ��5��� )�
68%, 5�� 5�� ��
%1(= 5�� ORRS
VWDOO
Original loop
5 instruction + 4 bubbles = 9c / iteration
ORRS� /' )�� ��5��
VWDOO
$''' )�� )�� )�
68%, 5�� 5�� ��
%1(= 5�� ORRS
6' ��5��� )�
Statically scheduled loop
Can we do even better by scheduling across iterations?
5 instruction + 1 bubble = 6c / iteration
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
8QRSWLPL]HG�ORRS�XQUROOLQJ��[ORRS� /' )�� ��5��
VWDOO
$''' )�� )�� )�
VWDOO � GURS 68%, %1(=
VWDOO
6' ��5��� )�
/' )�� ���5��
VWDOO
$''' )�� )�� )�
VWDOO � GURS 68%, %1(=
VWDOO
6' ���5��� )�
/' )��� ����5��
VWDOO
$''' )��� )��� )�
VWDOO � GURS 68%, %1(=
VWDOO
6' ����5��� )��
/' )��� ����5��
VWDOO
$''' )��� )��� )�
68%, 5�� 5�� ��� � DOWHU WR � �
%1(= 5�� ORRS
6' ����5��� )�� z 24c/ 4 iterations = 6 c / iteration
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
2SWLPL]HG VFKHGXOHG XQUROOHG ORRS
ORRS� /' )�� ��5��
/' )�� ���5��
/' )��� ����5��
/' )��� ����5��
$''' )�� )�� )�
$''' )�� )�� )�
$''' )��� )��� )�
$''' )��� )��� )�
6' ��5��� )�
6' ���5��� )�
6' ����5��� )��
68%, 5�� 5�� ���
%1(= 5�� ORRS
6' ��5��� )��
Important steps:Push loads up
Push stores down
Note: the displacement of the last store must be changed
All penalties are eliminated. CPI=1
14 cycles / 4 iterations ==> 3.5 cycles / iteration
From 9c to 3.5c per iteration ==> speedup 2.6
Benefits of loop unrolling:Provides a larger seq. instr. window (larger basic block)
Simplifies for static and dynamic methods to extract ILP
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
/'
$''
67
68%
%1(4
Iteration 0
Iteration 1
Iteration 2Iteration 3
Iteration 4
6RIWZDUH�SLSHOLQLQJ�����6\PEROLF ORRS XQUROOLQJ
� 7KH LQVWUXFWLRQV LQ D ORRS DUH WDNHQ IURP GLIIHUHQWLWHUDWLRQV LQ WKH RULJLQDO ORRS
SoftwarePipelinedIteration 1 BNEQ SUB ST ADD LD
BNEQ SUB ST ADD LD SoftwarePipelined
Iteration 2
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
� ([DPSOH�ORRS� /' )����5��
$''' )��)��)�
6' ��5���)�
68%, 5��5����
%1(= 5��ORRS
Look at three iterations of the loop body:LD F0,0(R1) ; Iteration iADDD F4,F0,F2SD 0(R1),F4
LD F0,0(R1) ; Iteration i+1ADDD F4,F0,F2SD 0(R1),F4LD F0,0(R1) ; Iteration i+2ADDD F4,F0,F2
SD 0(R1),F4
l
6RIWZDUH�SLSHOLQLQJ�����
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
,QVWUXFWLRQV IURP WKUHH FRQVHFXWLYH LWHUDWLRQV IRUP WKH ORRSERG\�
� SURORJXH FRGH !ORRS� 6' ��5���)� � IURP LWHUDWLRQ L
$''' )��)��)� � IURP LWHUDWLRQ L��
/' )������5�� � IURP LWHUDWLRQ L��
68%, 5��5����
%1(= 5��ORRS
� SURORJXH FRGH !
6RIWZDUH�SLSHOLQLQJ�����
z No data dependencies within a loop iteration
z The dependence distance is 1 iterations
z WAR hazard elimination is needed (register renaming)
z 5c / iteration, but only uses 2 FP regs (instead og 8)
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6RIWZDUH�SLSHOLQLQJ
� ´6\PEROLF /RRS 8QUROOLQJ´
� 9HU\ WULFN\ IRU FRPSOLFDWHG ORRSV
� /HVV FRGH H[SDQVLRQ WKDQ RXWOLQLQJ
� 5HJLVWHU�OHDQ LI ´URWDWLQJ´ LV XVHG
� 1HHGHG WR KLGH ODUJH ODWHQFLHV �VHH ,$����
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
'HWHFWLQJ�GDWD�GHSHQGHQFLHV
� )LQGLQJ GHSHQGHQFLHV LV IXQGDPHQWDO WR� SHUIRUP LQVWUXFWLRQ VFKHGXOLQJ�
� GHWHUPLQH WKH GHJUHH RI SDUDOOHOLVP LQ ORRSV� DQG
� HOLPLQDWH QDPH GHSHQGHQFLHV
for (i = 1; i <= 100; i = i+1) {A[i] = B[i] + C[i];
D[i] = A[i] + E[i];}
The absence of loop-carried dependencies increases the amount of exploitableparallelism
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
A loop iteration is often dependent on results calculated in an earlier iteration.
Example:for (i = 6; i <= 100; i = i+1)
Y[i] = Y[i-5] + Y[i];
� 7KLV ORRS KDV D GHSHQGHQFH GLVWDQFH RI � DQG ZHFDQ H[WUDFW ,/3 LQ � FRQVHFXWLYH LWHUDWLRQVWhat dependencies can the compiler detect?
/RRS�FDUULHG�GHSHQGHQFLHV
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
7KH�*&'�WHVW
� $ VLPSOH WHVW WR GHFLGH ZKHWKHUWKHUH LV DQ\ GHSHQGHQFLHV EHWZHHQORRS LWHUDWLRQV�� ,I ORRS FDUULHG GHSHQGHQFLHV H[LVW WKHQ�*&'�F�D� GLYLGHV �G � E�
Example:for (i = 1; i <= 100; i = i + 1)
X[2 * i + 3] = X[2 * i] * 5.0;
We have a=2, b=3, c=2 and d=0; GCD(2,2) = 2 which does not divide (0-3) = -3 =>
z There are no loop carried dependencies in this loop
28 November 2002
,VVXLQJ�PRUH�WKDQ�RQH�LQVWUXFWLRQ�SHU�F\FOH
(ULN +DJHUVWHQ
8SSVDOD 8QLYHUVLW\
6ZHGHQ
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
0XOWLSOH LQVWUXFWLRQ LVVXH SHU FORFN
*RDO� ([WUDFWLQJ ,/3 VR WKDW &3, � � � L�H�� ,3& ! �
Superscalar:z Combine static and dynamic scheduling to issue multiple instructions per clock
z HW finds independent instructions in “sequential” code
zPredominant: (PowerPC, SPARC, Alpha, HP-PA)
Very Long Instruction Words (VLIW):z Static scheduling used to form packages of independent instructions that can be issued together
zRelies on compiler to find independent instructions (IA-64)
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6XSHUVFDODUV
0HP
, 5
5HJV
% 00:
, 5 % 00:
, 5 % 00:
, 5 % 00:
,
,
,
,
,VVXHORJLF
�
¼
6(.
���F\FOHV
�� F\FOHV
�� F\FOHV
� F\FOHV
�*%
�0%
��N%
�N%
7KUHDG �
«3&
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
([DPSOH��$�6XSHUVFDODU�'/;� ,VVXH � LQVWUXFWLRQV VLPXOWDQHRXVO\� � )3 � LQWHJHU
� )HWFK ���ELWV�FORFN F\FOH� ,QWHJHU LQVWU� RQ OHIW� )3 RQULJKW
� &DQ RQO\ LVVXH �QG LQVWUXFWLRQ LI �VW LQVWUXFWLRQ LVVXHV
� 1HHG PRUH SRUWV WR WKH UHJLVWHU ILOH
7\SH 3LSH VWDJHV
,QW� ,) ,' (; 0(0 :%
)3 ,) ,' (; 0(0 :%
,QW� ,) ,' (; 0(0 :%
)3 ,) ,' (; 0(0 :%
,QW� ,) ,' (; 0(0 :%
)3 ,) ,' (; 0(0 :%
zEX stage should be fully pipelined
z1 load delay slot corresponds to three instructions!
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6WDWLFDOO\ 6FKHGXOHG 6XSHUVFDODU '/;
Issue: Difficult to find a sufficient number of instr. to issue
&DQ EH VFKHGXOHG G\QDPLFDOO\ ZLWK 7RPDVXOR¶V DOJ�
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
'HSHQGHQFLHV� 5HYLVLWHG
7ZR LQVWUXFWLRQV PXVW EH LQGHSHQGHQW LQ RUGHU WRH[HFXWH LQ SDUDOOHO
Three classes of dependencies that limit parallelism:
zData dependencies
zName dependencies
zControl dependencies
zDependencies are properties of the program
zCan lead to hazardswhich are properties of the implementation
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
/LPLWV�WR�VXSHUVFDODU�H[HFXWLRQ
� 'LIILFXOWLHV LQ VFKHGXOLQJ ZLWKLQ WKH FRQVWUDLQWV RQ QXPEHU RIIXQFWLRQDO XQLWV DQG WKH ,/3 LQ WKH FRGH FKXQN
z Instruction decode complexity increaseswith the number of issued instructions
z Data and control dependencies are in general more costly in a superscalar processor than in a single-issue processor
Simple superscalar relying on compiler instead of HW complexity Î VLIW
Techniques to enlarge the instruction window to extract more ILPare important
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
9/,:� 9HU\ /RQJ ,QVWUXFWLRQ :RUG
0HP
, 5
5HJV
% 00:
, 5 % 00:
, 5 % 00:
, 5 % 00:
,
,
,
,
�
¼
6(.
�*%
�0%
��N%
�N%
«
3&
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
9HU\ /RQJ ,QVWUXFWLRQ :RUG �9/,:�
z Independent funtional units with no hazard detection
&RPSLOHU LV UHVSRQVLEOH IRU LQVWUXFWLRQ VFKHGXOLQJ
Me m re f 1 Me m re f 2 FP o p 1 FP o p 2 In t o p / b ran c h C lo c k
LD F 0 ,0 (R 1) LD F 6 ,-8 (R 1 ) 1
LD F 1 0 ,-1 6 (R 1 ) LD F 1 4 ,-2 4 (R 1 ) 2
LD F 1 8 ,-3 2 (R 1 ) LD F 2 2 ,-4 0 (R 1 ) ADD D F 4 ,F 0 ,F 2 ADD D F 8 ,F 6 ,F 2 3
LD F 2 6 ,-4 8 (R 1 ) ADD D F 1 2 ,F 1 0 ,F 2 ADD D F 1 6 ,F 1 4 ,F 2 4
ADD D F 2 0 ,F 1 8 ,F 2 ADD D F 2 4 ,F 2 2 ,F 2 5
S D 0 (R 1 ), F 4 S D -8 (R 1 ), F 8 ADD D F 2 8 ,F 2 6 ,F 2 6
S D -1 6 (R 1 ), F 1 2 S D -2 4 (R 1 ), F 8 7
S D -3 2 (R 1 ),F 2 0 S D -4 0 (R 1 ),F 2 4 S UBI R 1 ,R 1 ,#4 8 8
S D 0 (R 1 ),F 2 8 BNE Z R 1 ,LO O P 9
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
/LPLWV�WR�9/,:
'LIILFXOW WR H[SORLW SDUDOOHOLVP
� 1 IXQFWLRQDO XQLWV DQG . ´GHSHQGHQW´ SLSHOLQH VWDJHVLPSOLHV 1[ . LQGHSHQGHQW LQVWUXFWLRQV WR DYRLG VWDOOV
0HPRU\ DQG UHJLVWHU EDQGZLGWK
z &RPSOH[LW\ LQFUHDVHV ZLWK QXPEHU RI IXQFWLRQDO XQLWV
z &RGH VL]H
1R ELQDU\ FRGH FRPSDWLELOLW\
%XW� «� VLPSOHU KDUGZDUH
� VKRUW VFKHGXOH
� KLJK IUHTXHQ]\
+:�VXSSRUW�IRU�VSHFXODWLRQ�DQG�LPSURYHG�,/3
(ULN +DJHUVWHQ
8SSVDOD 8QLYHUVLW\
6ZHGHQ
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
+:�VXSSRUW�IRU�VSHFXODWLRQ
A combination of three main ideas we’ve coveredDynamic Instruction scheduling; take advantage of ILP
Dynamic branch prediction; allows instruction scheduling across branches
6SHFXODWLYH H[HFXWLRQ H[HFXWH LQVWUXFWLRQVEHIRUH DOO FRQWURO GHSHQGHQFLHV DUH UHVROYHG
Hardware based speculation uses a data-flow approach:instructions execute when their operands are available
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
+:�VXSSRUW�IRU�6:�VSHFXODWLRQ
� 0RYH /' XS DQG 67 GRZQ� +RZ IDU"
� 7KHVH WHFKQLTXHV ZLOO DOORZ ODUJHU PRYHV DQGLQFUHDVH WKH HIIHFWLYH VL]H RI D EDVLF EORFN
� 5HJURXSLQJ LQVWUXFWLRQV� WUDFH VFKHGXOLQJ�VXSHUEORFNV
� 5HPRYLQJ EUDQFKHV� SUHGLFDWH H[HFXWLRQ
� 0RYH /' DERYH 67� KD]DUG GHWHFWLRQ
� 0RYH /' DERYH EUDQFK
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
7UDFH�VFKHGXOLQJ�����Creates a sequence of instructions that are likely to be executed -- a trace.
Two steps:
z Trace selection:Find a likely sequence of basic blocks (trace) across statically predicted branches (e.g. if-then-else)
z Trace compaction: Schedule the trace to be as efficient as possible while preserving correctness in the case the prediction is wrong
� <LHOGV PRUH LQVWUXFWLRQ OHYHO SDUDOOHOLVP
� (IILFLHQW VWDWLF EUDQFK SUHGLFWLRQ NH\ WRVXFFHVV
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
7UDFH�VFKHGXOLQJ�����
A[i]:=B[i]+C[i]
A[i] =0?
XB[i]:=
C[i]:=
T F
z The leftmost sequence is chosen as the most likely trace
z The assignment to B is control dependent on the if statement.
z Trace compaction has to respect data dependencies
z The rightmost (less likely) trace has to be augmented with fix up code
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
&RPSLOHU�VSHFXODWLRQ7KH FRPSLOHU PRYHV LQVWUXFWLRQV EHIRUH D EUDQFKVR WKDW WKH\ FDQ EH H[HFXWHG EHIRUH WKHEUDQFK FRQGLWLRQ LV NQRZQ
Advantage: creates longer schedulable code sequences => more ILP can exploited
Example: if (A == 0) then A = B; else A = A+4;
Non speculative code Speculative codeLW R1,0(R3) LW R1,0(R3)BNEZ R1,L1 LW R14,0(R2)LW R1,0(R2) BEQZ R1,L3J L2 ADD R14,R1,4
L1: ADD R1,R1,4 L3: SW 0(R3),R14L2: SW 0(R3),R1
z What about exceptions?
� UHJ UHQDPH
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6SHFXODWLYH�LQVWUXFWLRQV
0RYLQJ D /' XS� PD\ PDNH LW VSHFXODWLYH
� 0RYLQJ SDVW D EUDQFK
� 0RYLQJ SDVW D 67 �WKDW PD\ EH WR WKH VDPHDGGUHVV�
,VVXHV�
� 1RQ�LQWUXVLYH
� &RUUHFW H[FHSWLRQ KDQGOLQJ �DJDLQ�
� /RZ RYHUKHDG
� *RRG SUHGLFWLRQ'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
([DPSOH� 0RYLQJ /' DERYH D EUDQFK
/'�V 5�� ����5�� �6SHFXODWLYH /' WR 5�
«� �VHW ´SRLVRQ ELW´ LQ 5� LI H[FHSWLRQ
%51= 5�� ����
«
/'�FKN 5� �*HW H[FHSWLRQ LI SRLVRQ ELW RI 5� LV VHW
*RRG SHUIRUPDQFH LI WKH EUDQFK LV QRW WDNHQ
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
([DPSOH� 0RYLQJ /' DERYH D 67
/'�D 5�� ����5�� �DGYDQFHG ORDG
�FUHDWH HQWU\ LQ WKH $/$7 �DGGU�UHJ!
«�
67 5�� ���5�� �LQYDOLGDWH HQWU\ LI $/$7 DGGU PDWFK
«
/'�F 5� �5HGR /' LI HQWU\ LQ $/$7 LQYDOLG
� UHPRYH HQWU\ LQ $/$7
$/$7 �DGYDQFHG ORDG DGGUHVV WDEOH� LV DQ DVVRFLDWLYH GDWDVWUXFWXUH VWRULQJ WXSOHV RI� �DGGU� GHVW�UHJ!
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
&RQGLWLRQDO�H[HFXWLRQ
� 5HPRYHV WKH QHHG IRU VRPH EUDQFKHV -
� &RQGLWLRQDO ,QVWUXFWLRQV
�&RQGLWLRQDO UHJLVWHU PRYHCMOVZ R1, R2, R3 ;move R2 to R1 if (R3 == 0)
�&RPSDUH�DQG�VZDS �DWRPLFV FRYHUHG ODWHU�
� 3UHGLFDWH H[HFXWLRQ
�$ PRUH JHQHUDOL]HG WHFKQLTXH
�(DFK LQVWUXFWLRQ H[HFXWHG LI WKH DVVRFLDWHG��ELW SUHGLFDWH 5(* LV ��
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
3UHGLFDWH�H[DPSOH
CGT R3,R1,R2BRNZ R3, elseLD R7, 100(R1)ADD R1, R1, #1BR end
else: LD R7, 100(R2)ADD R2, R2, #1
end:� LQVWU H[HFXWHG LQ ´WKHQ SDWK´� EUDQFKHV
IF R1 > R2 thenLD R7, 100(R1)ADD R1, R1, #1
elseLD R7, 100(R2)ADD R2, R2, #1
end
6WDQGDUG7HFKQLTXH
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
3UHGLFDWH�H[DPSOH
CGT R3,R1,R2BRNZ R3, elseLD R7, 100(R1)ADD R1, R1, #1BR end
else: LD R7, 100(R2)ADD R2, R2, #1
end:� LQVWU H[HFXWHG LQ ´WKHQ SDWK´� EUDQFKHV
IF R1 > R2 thenLD R7, 100(R1)ADD R1, R1, #1
elseLD R7, 100(R2)ADD R2, R2, #1
end
6WDQGDUG7HFKQLTXH
8VLQJ3UHGLFDWHV
…{IF R1 > R2 then P6=1;P7=0
else P6=0;P7=0} ;one instr!P6: LD R7, 100(R1)P6: ADD R1, R1, #1P7: LD R7, 100(R2)P7: ADD R2, R2, #1
2QH LQVWUXFWLRQ VHWV WKH WZR SUHGLFDWH 5HJV(DFK LQVWU� LQ WKH ´WKHQ´ JXDUGHG E\ 3�(DFK LQVWU� LQ WKH ´HOVH´ JXDUGHG E\ 3�Î 2QH EDVLF EORFNÎ )HZHU WRWDO LQVWU� LQVWU H[HFXWHG LQ ´WKHQ SDWK´
� EUDQFKHV
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
+:�YV��6:�VSHFXODWLRQAdvantages:
z Dynamic runtime disambiguation of memory addresses
z Dynamic branch prediction is often better than static which limits the performance of SW speculation.
z HW speculation can maintain a precise exception model
Main disadvantage:z Complex implementation and extensive need of hardware resources (conforms with technology trends)
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6XPPDU\
� 6RIWZDUH�FRPSLOHU� WULFNV�� /RRS XQUROOLQJ
� 6RIWZDUH SLSHOLQLQJ
� 6WDWLF LQVWUXFWLRQVFKHGXOLQJ �ZLWKUHJLVWHU UHQDPLQJ�
� 7UDFH VFKHGXOLQJ�LPSOLHV VWDWLFEUDQFK SUHGLFWLRQ�
� 6SHFXODWLYHH[HFXWLRQ
� +DUGZDUH WULFNV�
� '\QDPLF LQVWUXFWLRQVFKHGXOLQJ
� '\QDPLF EUDQFKSUHGLFWLRQ
� 0XOWLSOH LVVXH
� 6XSHUVFDODU
� 9/,:
� &RQGLWLRQDOLQVWUXFWLRQV
� 6SHFXODWLYH H[HFXWLRQ
28 November 2002
([DPSOH��,$���DQG�,WDQLXP
(ULN +DJHUVWHQ
8SSVDOD 8QLYHUVLW\
6ZHGHQ
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
/LWWOH�RI�HYHU\WKLQJ
� 9/,:
� $GYDQFHG ORDGV VXSSRUWHG E\ $/$7
� /RDG VSHFXODWLRQ VXSSRUWHG E\SUHGLFDWLRQ
� '\QDPLF EUDQFK SUHGLFWLRQ
� ´$OO WKH WULFNV LQ WKH ERRN´
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
,WDQLXP�LQVWUXFWLRQV
� ,QVWUXFWLRQ EXQGOH ���� ELWV�
� ��ELWV� WHPSODWH �LGHQWLILHV , W\SHV DQG GHSHQGHQFLHV�
� � [ ���ELWV� LQVWUXFWLRQ
� &DQ LVVXH XS WR WZR EXQGOHV SHU F\FOH �� LQVWU�
� /DWHQFLHV�,QVWUXFWLRQ /DWHQF\
,�/' �
)3�/' �
3UHG EUDQFK ���
0LVVSUHG EUDQFK �
,�$/8 �
)3�$/8 �
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
,WDQLXP�5HJLVWHUV
� ��� ���ELW *35 �Z� SRLVRQ ELW�
� ��� ���ELW )3 5(*6
� �� ��ELW SUHGLFDWH 5(*6
� � ���ELW EUDQFK UHJLVWHUV
� $ EXQFK RI &65V �FRQWURO�VWDWXVUHJLVWHUV�
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
'\QDPLF�UHJLVWHU�ZLQGRZ
�
���
3K\VLFDO 5HJV
�
��
([SOLFLW 5HJV�VHHQ E\ WKH LQVWUXFWLQV�
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
'\QDPLF�UHJLVWHU�ZLQGRZIRU�*35V
8QXVHG
�
���
3K\VLFDO 5HJV
�
��
([SOLFLW 5HJV�VHHQ E\ PDLQ�
*OREDO
�� ��
'\Q�PDLQ
*OREDO
��
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
8QXVHG
&DOOLQJ�SURFHGXUH�$
�
���
3K\VLFDO 5HJV
�
([SOLFLW 5HJV�VHHQ E\ PDLQ�
*OREDO
�� ��
'\Q�PDLQ
*OREDO
�
��
([SOLFLW 5HJV�VHHQ E\ SURF $�
*OREDO
��
2XWSXW
,QSXW
,QSXW��
���� ��
��
��
��
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
8QXVHG
&DOOLQJ�3URFHGXUH�%�DXWRPDWLF SDVVLQJ RI SDUDPHWHUV�
�
���
3K\VLFDO 5HJV
�
([SOLFLW 5HJV�VHHQ E\ PDLQ�
*OREDO
�� ��
'\Q�PDLQ
*OREDO
�
��
([SOLFLW 5HJV�3URF $�
*OREDO
��
2XWSXW
,QSXW
VKDUHG��
���� ��
��
��
�
�
��
*OREDO
���
�
��
([SOLFLW 5HJV�3URF %�
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
5HJLVWHU�6WDFN�(QJLQH��56(�
� 6DYHV DQG UHVWRUHV UHJLVWHUV WR PHPRU\RQ UHJLVWHU VSLOOV
� ,PSOHPHQWHG LQ KDUGZDUH
�:RUNV LQ WKH EDFNJURXQG
� *LYHV WKH LOOXVLRQ RI DQ XQOLPLWHGUHJLVWHU VWDFN
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
5HJLVWHU�URWDWLRQ�)3�DQG�*35V
� 8VHG LQ VRIWZDUH SLSHOLQLQJ
� 5HJLVWHU UHQDPLQJ IRU HDFK LWHUDWLRQ
� 5HPRYHV WKH QHHG IRUSURORJXH�HSLORJXH
� 56( �UHJLVWHU VWDFN HQJLQH�
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
:KDW�LV�WKH�DOWHUQDWLYH"
� 9/,: ZDV PHDQW WR VLPSOLI\ +:
� ,V WKLV WKH EHVW \RX FDQ GR ZLWK ��� 0WUDQVLVWRUV DQG ���:"
�:LOO LW VFDOH ZLWK WHFKQRORJ\"
� 2WKHU DOWHUQDWLYHV�
� ,QFUHDVH FDFKH VL]H�
� ,QFUHDVH WKH IUHTXHQF\� RU�
�5XQ PRUH WKDQ RQH WKUHDG�FKLS
:KHUH�LV�WHFKQRORJ\�SXVKLQJ�XV"
7HFKQRORJ\�4XHVWLRQV IRU�WKH�)XWXUH�
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
&DQ ZH FRQWLQXH WKLV WUHQG"
� +XJH PRQROLWKLF VLQJOH�WKUHDG PRQVWHU &38V
� 'LPLQLVKLQJ UHWXUQ RQ��
�SLSHOLQH VWDJHV
�SDUDOOHO SLSHOLQHV
� 7KH PHPRU\ ZDOO LV WKH OLPLW
� VKRXOG ZH FRQWLQXH VKDUH PHPRU\"
� :LOO ZH QHHG SDUDOOHO SURJUDPV"
� :KDW LV LQ WKH WHFKQRORJ\ FU\VWDO EDOO"
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
7HFKQRORJ\ ³,PSURYHPHQWV´
2000 2005 2010 2015 YearQuantitative data and trends according to V. Agarwal et al., ISCA 2000Based on SIA (Semiconductor Industry Association) prediction, 1999
Wire delay 5mm trace (RC model)
390ps
170ps
830 ps(1.2GHz)
70 ps (14 GHz)
time
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
7HFKQRORJ\ ³,PSURYHPHQWV´
2000 2005 2010 2015 Year
16 0.7GHz 5GHz
8 1.9GHz 10GHz
# Gate Delays/Clock Period
SIA prediction, 1999
1.2GHz
14GHz
Clock
N
10
5
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
7HFKQRORJ\ ³,PSURYHPHQWV´
2000 2005 2010 2015 Year
Span (bits reachable in one cycle)[M bits, log scale]
clock=16 gate delays
clock=8 gate delays
SIA10
100
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
7HFKQRORJ\ ³,PSURYHPHQWV´
2000 2005 2010 2015.25 .18 .13 .10 .070 .050 .0035 Feature size(micron)
Year
Span (Fraction of the chip area reachable in one cycle)
SIA
1.0
0.75
0.50
0.25
Clock = 16 gate delays
Clock = 8 gate delays
100%
0.4 - 1.4 %
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
6R� ZKDW ZLOO WKH SHUIRUPDQFH EH"
0HP
, 5
5HJV
% 00:
, 5 % 00:
, 5 % 00:
, 5 % 00:
,
,
,
,
,VVXHORJLF
�
¼
6(.
���F\FOHV
��� F\FOHV
�� F\FOHV
� F\FOHV
�*%
��0%
� 0%
��N%
7KUHDG �
«3&
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
7HFKQRORJ\ ³,PSURYHPHQWV´VHH WKH ,6&$ SDSHU IRU GHWDLOV«
2000 2005 2010 2015 Year
Relative CPU Performance[log scale]
Historical ra
te: 55% /y
1000
100
10
1
Business as usual (Same arch.): ~12% /y6.0 - 7.4 x
1720 x
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
&DQ ZH FRQWLQXH WR PDNH ODUJH DQGFRPSOLFDWHG &38"«
� +XJH PRQROLWKLF VLQJOH�WKUHDG PRQVWHUV
� 'LPLQLVKLQJ UHWXUQ RQ��
�SLSHOLQH VWDJHV
�SDUDOOHO SLSHOLQHV
� 6PDOOHU UHDFK Î
�KDUG WR DGG FRPSOH[LW\� VPDOOHU FDFKHV
� 0HPRU\ LV RIWHQ WKH ERWWOHQHFN DQ\KRZ
1RW�DFFRUGLQJ�WR�WKH�WHFKQRORJ\�IRUHFDVW
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
&38�)XWXUH��WKHUH�LV�KRSH
� +DYH \RX HYHU KHDUG RI SDUDOOHO WKUHDGV"
� 7KUHDG�OHYHO SDUDOOHOLVP �7/3�
Æ 0XOWLSOH ³WKUHDGV´ SHU &38 FKLS« 607 �6LPXOWDQHRXV 0XOWL 7KUHDGLQJ�
« &03 �&KLS 0XOWL 3URFHVVRU�
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
607� 6LPXOWDQHRXV 0XOWLWKUHDGLQJ´&RPELQH 7/3,/3 WR ILQG LQGHSHQGHQW LQVWU�´
0HP
, 5
5HJV � «
% 00:
, 5 % 00:
, 5 % 00:
, 5 % 00:
,
,
,
,
,VVXHORJLF
�
¼
6(.
7KUHDG �
7KUHDG 1
«
3&
3&
5HJV 1
��6LQJOH�WKUHDG�SHUIRUPDQFH
� 0XOWL�WKUHDG�SHUIRUPDQFH� 6WLOO�RQH�PRQRO\WKLF�SLSH
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
&03� &KLS 0XOWLSURFHVVRUPRUH 7/3 JHRJUDSKLFDO ORFDOLW\
0HP
, 5
5HJV
% 00:
, 5 % 00:
, 5 % 00:
, 5 % 00:
,
,
,
,,VVXHORJLF
�
¼
6(.
7KUHDG �
7KUHDG 1
«
3&
3& 5HJV
6(.
,VVXHORJLF
� 6LPSOH��� 0D[LPL]HV JHRJUDSKLFDO ORFDOLW\
� 6LQJOH�WKUHDG SHUIRUPDQFH'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
&03��&KLS�0XOWLSURFHVVRU
Mem
CPU
$1
CPU
$1
CPU
$1
CPU
$1
L2$
Mem I/F
ExternalI/F
t treads
Simple fast CPUcore (w/ SMT?)+ local caches
”Shared Memory”
28 November 2002
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
7KH�IXWXUH�LV�KHUH�WRGD\�
� ,QWHO ;(21� �[ 607 ´K\SHUWKUHDGLQJ´
� ,%0 3RZHU�� �[ &03
� 6XQ� 5XPHUV KDV LW«
� ,$��� 0D\EH ����
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
7KH &RPSXWHU 6KUXQN $JDLQ
Old Mainframes
Super Minis:
Microprocessor: Mem
Chip Multiprocessor (CMP): Mem
'HSW RI ,QIRUPDWLRQ 7HFKQRORJ\_ ZZZ�LW�XX�VH � (ULN +DJHUVWHQ_ ZZZ�GRFV�XX�VH�aHK���
'$5.�
����
« DQG VR GLG WKH ODUJH VHUYHU
Mem
Cch
ips
Mem
Mem
Mem
Mem
Mem
Mem
Mem
Inte
rcon
nect
7 WKUHDGV�&03