September 20, 2000 Prof. John Kubiatowicz
description
Transcript of September 20, 2000 Prof. John Kubiatowicz
![Page 1: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/1.jpg)
CS252/KubiatowiczLec 6.1
9/20/00
CS252Graduate Computer Architecture
Lecture 6
Tomasulo Hardware Scheduling for Out-Of-Order Execution
September 20, 2000
Prof. John Kubiatowicz
![Page 2: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/2.jpg)
CS252/KubiatowiczLec 6.8
9/20/00
Another Dynamic Scheduling Algorithm: Tomasulo’s
Algorithm• For IBM 360/91 about 3 years after CDC 6600 (1966)• Goal: High Performance without special compilers• Differences between IBM 360 & CDC 6600 ISA
– IBM has only 2 register specifiers/instr vs. 3 in CDC 6600– IBM has 4 FP registers vs. 8 in CDC 6600– IBM has memory-register ops
• Small number of floating point registers prevented smart compiler scheduling of operations– This led Tomasulo to try to figure out how to get more effective
registers — renaming in hardware!
• Why Study? The descendants of this have flourished!– Alpha 21264, HP 8000, MIPS 10000, Pentium II…, PowerPC 604…
![Page 3: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/3.jpg)
CS252/KubiatowiczLec 6.9
9/20/00
Tomasulo Organization
FP addersFP adders
Add1Add2Add3
FP multipliersFP multipliers
Mult1Mult2
From Mem FP Registers
Reservation Stations
Common Data Bus (CDB)
To Mem
FP OpQueue
Load Buffers
Store Buffers
Load1Load2Load3Load4Load5Load6
![Page 4: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/4.jpg)
CS252/KubiatowiczLec 6.10
9/20/00
Tomasulo Algorithm vs. Scoreboard
• Control & buffers distributed with Function Units (FU) vs. centralized in scoreboard; – FU buffers called “reservation stations”; have pending operands
• Registers in instructions replaced by values or pointers to reservation stations(RS); called register renaming ; – avoids WAR, WAW hazards– More reservation stations than registers, so hardware can do
optimizations compilers can’t
• Results to FU from RS, not through registers, but over Common Data Bus that broadcasts results to FUs
• Load and Stores treated as FUs with RSs as well• Integer instructions can go past branches, allowing
FP ops beyond basic block in FP queue
![Page 5: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/5.jpg)
CS252/KubiatowiczLec 6.11
9/20/00
Reservation Station Components
Op: Operation to perform in the unit (e.g., + or –)
Vj, Vk: Value of Source operands– Store buffers has V field, result to be stored
Qj, Qk: Reservation stations producing source registers (value to be written)– Note: No ready flags as in Scoreboard; Qj,Qk=0 => ready– Store buffers only have Qi for RS producing result
Busy: Indicates reservation station or FU is busy
Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.
![Page 6: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/6.jpg)
CS252/KubiatowiczLec 6.12
9/20/00
Three Stages of Tomasulo Algorithm
1.Issue—get instruction from FP Op Queue If reservation station is free (no structural hazard),
control issues instr & sends operands (renames registers).
2.Execute—operate on operands (EX) When both operands ready then execute;
if not ready, watch Common Data Bus for result
3.Write result—finish execution (WB) Write on Common Data Bus to all awaiting units;
mark reservation station as available
• Normal data bus: data + destination (“go to” bus)• Common data bus: data + source (“come from” bus)
– 64 bits of data + 4 bits of Functional Unit source address– Write if matches expected Functional Unit (produces result)– This resource does the broadcast
![Page 7: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/7.jpg)
CS252/KubiatowiczLec 6.13
9/20/00
Tomasulo ExampleInstruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F300 FU
![Page 8: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/8.jpg)
CS252/KubiatowiczLec 6.17
9/20/00
Tomasulo Example Cycle 4Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 Yes SUBD M(A1) Load2Add2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F304 FU Mult1 Load2 M(A1) Add1
• Load2 completing; what is waiting for Load2?
![Page 9: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/9.jpg)
CS252/KubiatowiczLec 6.33
9/20/00
Tomasulo Example Cycle 57
Instruction status: Exec WriteInstruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56 57ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 Yes DIVD M*F4 M(A1)
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3057 FU M*F4 M(A2) (M-M+M)(M-M) Result
• Once again: In-order issue, out-of-order execution and completion.
![Page 10: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/10.jpg)
CS252/KubiatowiczLec 6.34
9/20/00
Compare to Scoreboard Cycle 62
Instruction status: Read Exec Write Exec WriteInstruction j k Issue Oper Comp Result Issue ComplResultLD F6 34+ R2 1 2 3 4 1 3 4LD F2 45+ R3 5 6 7 8 2 4 5MULTD F0 F2 F4 6 9 19 20 3 15 16SUBD F8 F6 F2 7 9 11 12 4 7 8DIVD F10 F0 F6 8 21 61 62 5 56 57ADDD F6 F8 F2 13 14 16 22 6 10 11
• Why take longer on scoreboard/6600?•Structural Hazards•Lack of forwarding
![Page 11: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/11.jpg)
CS252/KubiatowiczLec 6.38
9/20/00
Tomasulo Loop ExampleLoop: LD F0 0 R1
MULTD F4 F0 F2
SD F4 0 R1
SUBI R1 R1 #8
BNEZ R1 Loop
• Assume Multiply takes 4 clocks• Assume first load takes 8 clocks (cache miss),
second load takes 1 clock (hit)• To be clear, will show clocks for SUBI, BNEZ,
in a single issue machine• Reality: integer instructions ahead
![Page 12: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/12.jpg)
CS252/KubiatowiczLec 6.39
9/20/00
Loop ExampleInstruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 Load1 No1 MULTD F4 F0 F2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F300 80 Fu
![Page 13: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/13.jpg)
CS252/KubiatowiczLec 6.40
9/20/00
Loop Example Cycle 1Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F301 80 Fu Load1
![Page 14: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/14.jpg)
CS252/KubiatowiczLec 6.41
9/20/00
Loop Example Cycle 2Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F302 80 Fu Load1 Mult1
![Page 15: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/15.jpg)
CS252/KubiatowiczLec 6.42
9/20/00
Loop Example Cycle 3Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F303 80 Fu Load1 Mult1
• Implicit renaming sets up “DataFlow” graph
![Page 16: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/16.jpg)
CS252/KubiatowiczLec 6.43
9/20/00
Loop Example Cycle 4Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F304 80 Fu Load1 Mult1
• Just Dispatching SUBI Instruction, since Single Issue Machine
![Page 17: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/17.jpg)
CS252/KubiatowiczLec 6.44
9/20/00
Loop Example Cycle 5Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F305 72 Fu Load1 Mult1
• And, BNEZ instruction
![Page 18: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/18.jpg)
CS252/KubiatowiczLec 6.45
9/20/00
Loop Example Cycle 6Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F306 72 Fu Load2 Mult1
• Notice that Load1 still busy (cache miss 8 cycles) and Load2 hits in 1 cycle: F0 never sees Load from location 80
![Page 19: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/19.jpg)
CS252/KubiatowiczLec 6.46
9/20/00
Loop Example Cycle 7Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F307 72 Fu Load2 Mult2
![Page 20: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/20.jpg)
CS252/KubiatowiczLec 6.47
9/20/00
Loop Example Cycle 8Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F308 72 Fu Load2 Mult2
• Register file completely detached from computation• First and Second iteration completely overlapped
![Page 21: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/21.jpg)
CS252/KubiatowiczLec 6.48
9/20/00
Loop Example Cycle 9Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F309 72 Fu Load2 Mult2
• Load1 completing: who is waiting?• Note: Dispatching SUBI
![Page 22: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/22.jpg)
CS252/KubiatowiczLec 6.49
9/20/00
Loop Example Cycle 10Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 10 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
4 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3010 64 Fu Load2 Mult2
• Load1 completed in 9, Load2 completing now: who is waiting? And note: Dispatching BNEZ
![Page 23: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/23.jpg)
CS252/KubiatowiczLec 6.50
9/20/00
Loop Example Cycle 11Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
3 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #84 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3011 64 Fu Load3 Mult2
• Next load in sequence
![Page 24: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/24.jpg)
CS252/KubiatowiczLec 6.51
9/20/00
Loop Example Cycle 12Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
2 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #83 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3012 64 Fu Load3 Mult2
• Why not issue third multiply?
![Page 25: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/25.jpg)
CS252/KubiatowiczLec 6.52
9/20/00
Loop Example Cycle 13Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
1 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #82 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3013 64 Fu Load3 Mult2
![Page 26: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/26.jpg)
CS252/KubiatowiczLec 6.53
9/20/00
Loop Example Cycle 14Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
0 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #81 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3014 64 Fu Load3 Mult2
• Mult1 completing. Who is waiting?
![Page 27: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/27.jpg)
CS252/KubiatowiczLec 6.54
9/20/00
Loop Example Cycle 15Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8
0 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3015 64 Fu Load3 Mult2
• Mult2 completing. Who is waiting?• R2 is shorthand for R(F2)
![Page 28: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/28.jpg)
CS252/KubiatowiczLec 6.55
9/20/00
Loop Example Cycle 16Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3016 64 Fu Load3 Mult1
![Page 29: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/29.jpg)
CS252/KubiatowiczLec 6.56
9/20/00
Loop Example Cycle 17Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 Yes 64 Mult1
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3017 64 Fu Load3 Mult1
![Page 30: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/30.jpg)
CS252/KubiatowiczLec 6.57
9/20/00
Loop Example Cycle 18Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 Yes 64 Mult1
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3018 64 Fu Load3 Mult1
![Page 31: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/31.jpg)
CS252/KubiatowiczLec 6.58
9/20/00
Loop Example Cycle 19Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 19 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 No2 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 19 Store3 Yes 64 Mult1
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3019 64 Fu Load3 Mult1
![Page 32: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/32.jpg)
CS252/KubiatowiczLec 6.59
9/20/00
Loop Example Cycle 20Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 19 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 No2 MULTD F4 F0 F2 7 15 16 Store2 No2 SD F4 0 R1 8 19 20 Store3 Yes 64 Mult1
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3020 64 Fu Load3 Mult1
![Page 33: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/33.jpg)
CS252/KubiatowiczLec 6.60
9/20/00
Why can Tomasulo overlap iterations of loops?
• Because Register Renaming– Multiple iterations use different physical destinations for registers (dynamic loop
unrolling).
• Because Reservation Stations – Permit instruction issue to advance past integer control flow operations– Also buffer old values of registers - totally avoiding the WAR stall that we saw in
the scoreboard.
• Even better with Multi-issue and Wide-issue• Other idea: Tomasulo building “DataFlow” graph on the fly.
![Page 34: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/34.jpg)
CS252/KubiatowiczLec 6.64
9/20/00
What about Precise Interrupts?
• Both Scoreboard and Tomasulo have:
In-order issue, out-of-order execution, and out-of-order completion
• Need to “fix” the out-of-order completion aspect so that we can find precise breakpoint in instruction stream.
![Page 35: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/35.jpg)
CS252/KubiatowiczLec 6.65
9/20/00
Relationship between precise interrupts and
specultation:• Speculation is a form of guessing.• Important for branch prediction:
– Need to “take our best shot” at predicting branch direction.– If we issue multiple instructions per cycle, lose lots of potential
instructions otherwise:» Consider 4 instructions per cycle» If take single cycle to decide on branch, waste from 4 - 7
instruction slots!
• If we speculate and are wrong, need to back up and restart execution to point at which we predicted incorrectly:– This is exactly same as precise exceptions!
• Technique for both precise interrupts/exceptions and speculation: in-order completion or commit
![Page 36: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/36.jpg)
CS252/KubiatowiczLec 6.66
9/20/00
HW support for precise interrupts
• Need HW buffer for results of uncommitted instructions: reorder buffer– 3 fields: instr, destination, value– Reorder buffer can be operand
source => more registers like RS– Use reorder buffer number instead
of reservation station when execution completes
– Supplies operands between execution complete & commit
– Once operand commits, result is put into register
– Instructions commit– As a result, easy to undo speculated
instructions on mispredicted branches or on exceptions
ReorderBuffer
FPOp
Queue
FP Adder FP Adder
Res Stations Res Stations
FP Regs
![Page 37: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/37.jpg)
CS252/KubiatowiczLec 6.67
9/20/00
Four Steps of Speculative Tomasulo Algorithm
1. Issue—get instruction from FP Op Queue If reservation station and reorder buffer slot free, issue instr &
send operands & reorder buffer no. for destination (this stage sometimes called “dispatch”)
2. Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch
CDB for result; when both in reservation station, execute; checks RAW (sometimes called “issue”)
3. Write result—finish execution (WB) Write on Common Data Bus to all awaiting FUs
& reorder buffer; mark reservation station available.
4. Commit—update register with reorder result When instr. is at head of reorder buffer & result is present,
update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called “graduation”, or retirement)
![Page 38: September 20, 2000 Prof. John Kubiatowicz](https://reader035.fdocuments.net/reader035/viewer/2022081503/56814d70550346895dbabffd/html5/thumbnails/38.jpg)
CS252/KubiatowiczLec 6.68
9/20/00
What are the hardware complexities with reorder buffer
(ROB)?ReorderBuffer
FPOp
Queue
FP Adder FP Adder
Res Stations Res Stations
FP Regs
Com
par n
etw
ork
• How do you find the latest version of a register?– As specified by Smith paper, need associative comparison network– Could use future file or just use the register result status buffer to track which specific reorder buffer has received the value
• Need as many ports on ROB as register file
Reorder Table
Dest
Reg
Resu
lt
Excep
tion
s?
Valid
Pro
gra
m C
ou
nte
r