Computer Architecture Lecture 6 Overview of Branch Prediction.
-
date post
21-Dec-2015 -
Category
Documents
-
view
221 -
download
7
Transcript of Computer Architecture Lecture 6 Overview of Branch Prediction.
![Page 1: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/1.jpg)
Computer Architecture
Lecture 6
Overview of Branch Prediction
![Page 2: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/2.jpg)
Prediction accuracy of a 4096- entry 2-bit prediction buffer vs. infinite buffer
0% 2% 4% 6% 8% 10% 12% 14% 16% 18% Frequency of mispredictions
10%
10%
5%
5%
12%
11%
9%
9%
9%
9%
0%
0%
li
eqntott
espresso
gcc
fpppp
spice
matrix300
4096 entries:
2bits per entry
Unlimited entries
2 bits per entry
![Page 3: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/3.jpg)
Comparison of 2 bit predictors
Frequency of mispredictions (%)0 2 4 6 8 10 12 14 16 18
10%
10%
5%
5%
12%
11%
9%
9%
9%
9%
0%
0%
li
eqntott
espresso
gcc
fpppp
spice
matrix300
5%
5%
11%
4%
6%
5%
Local 4096 entries:
2-bits per
Unlimited entries
2-bits
1024 entries (2,2)
![Page 4: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/4.jpg)
Tournament Predictor
Use predictor P1
11
P1 Correct
P2 Correct
P1 Correct
P1 Correct
P1 Correct
Use predictor
P2
00
Use predictor P1
10
Use predictor P2
01
P2 Correct
P2 Correct
![Page 5: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/5.jpg)
Misprediction rate of three predictors
• Note that predictors of equal capacity must be compared. Sizes of each level have to be selected to optimize prediction accurate. Influencing factors: degree of interference between branches, program likely to benefit from local/global history
Total Predictor Size (KBits)
Conditional Branch Mis-prediction Rate.
0 32 64 96 128 160 192 224 256 288 320 352 384 416 448 480 512
Correlating Predictor
Local 2-bit Predictor
8%
7%
6%
5%
4%
3%
2%
1%
0%
Tournament Predictor
![Page 6: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/6.jpg)
Why Prediction
Prediction Reduces Branch hazards in Pipelined Processors.
Used in almost all pipelined processors
0
Mux
1
Branch prediction (T/NT)
Branch Prediction Buffer
Branch Target Address Cache
PC+4
Actual Next PC
![Page 7: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/7.jpg)
A Branch Target Buffer
Branch predicted taken or untaken
Number
of entries
In branch target
buffer
Predicted PC
PC of instruction to fetchLookup
No: not branch instruction; proceed normally
=
Yes: Instruction is branch, use Predicted PC
Prediction Hardware (Counter Etc)
New PC
![Page 8: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/8.jpg)
Handling an instruction with a branch-target
ID
Send PC to memory and branch-target buffer
Entry found in the branch-target buffer?
Send out predicted
PCIs
Instruction
a taken branch?
Taken
Branch?
Mispredicted Branch, kill fetched instruction
Enter Branch instruction address and next PC into branch target buffer
No
No
No
Yes
YesYes
Branch correctly Predicted; Continue execution with no stalls
Normal instruction execution
IF
EX
![Page 9: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/9.jpg)
Penalties for possible combinations of whether the branch is in the buffer
Instruction in
buffer
Prediction Actual branch
Penalty cycles
Yes Taken Taken 0
Yes Taken Not taken 2
No Taken 2
No Not taken 0
![Page 10: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/10.jpg)
![Page 11: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/11.jpg)
Static Super Scalar pipeline in operation
Fetch 64-bits/clock cycle; Int on left, FP on right– Can only issue 2nd instruction if 1st instruction issues– More ports for FP registers to do FP load & FP op in a pair
Type Pipe StagesInt. instruction IF ID EX MEM WBFP instruction IF ID EX MEM WBInt. instruction IF ID EX MEM WBFP instruction IF ID EX MEM WBInt. instruction IF ID EX MEMWBFP instruction IF ID EX MEMWB
1 cycle load delay causes delay to 3 instructions in Superscalar instruction in right half can’t use it, nor instructions in
next slot
![Page 12: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/12.jpg)
Wait for Operands
Check for RS
Check for RAW
Wait for Operands
EXTAC
MemAccess
CDB #1
EX
M1
M2
.
.M7
Divide
Wait for Operands
Wait for Operands
Integer
LD/ST
FP
Write Reg
ISSUE/ Rename to RS
ISSUE/ Rename to RS
Instr.
Cach
e
Wider Bus
CDB #2
Wait for Operands
A1
A2
A3
A4
Wait for OperandsWait for Operands
Wait for Operands
Wait for Operands
Read Reg
Dynamic Super Scalar pipeline in operation
![Page 13: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/13.jpg)
Example 1
Loop: L.D F0,0(R1) ;F0=array elementADD.D F4,F0,F2S.D F4,0(R1) ; store result ADDIU R1,R1,#-8;8 bytes (per DW)
BNE R1,R2,LOOP ;branch R1!=R2
![Page 14: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/14.jpg)
Dual issue, 1 Integer Unit FPMUL = 3 cc
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 First issue
1 ADD.D F4,F0,F2 1
1 S.D F4,0(R1)
1 DADDIU R1,R1,#-8
1 BNE R1,R2,Loop
2 L.D F0,0(R1)
2 ADD.D F4,F0,F2
2 S.D F4,0(R1)
2 DADDIU R1,R1,#-8
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
![Page 15: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/15.jpg)
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 First issue
1 ADD.D F4,F0,F2 1
1 S.D F4,0(R1) 2
1 DADDIU R1,R1,#-8
2
1 BNE R1,R2,Loop
2 L.D F0,0(R1)
2 ADD.D F4,F0,F2
2 S.D F4,0(R1)
2 DADDIU R1,R1,#-8
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
![Page 16: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/16.jpg)
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 First issue
1 ADD.D F4,F0,F2 1
1 S.D F4,0(R1) 2 3
1 DADDIU R1,R1,#-8
2
1 BNE R1,R2,Loop 3
2 L.D F0,0(R1)
2 ADD.D F4,F0,F2
2 S.D F4,0(R1)
2 DADDIU R1,R1,#-8
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
![Page 17: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/17.jpg)
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1
1 S.D F4,0(R1) 2 3
1 DADDIU R1,R1,#-8
2 4
1 BNE R1,R2,Loop 3
2 L.D F0,0(R1) 4
2 ADD.D F4,F0,F2 4
2 S.D F4,0(R1)
2 DADDIU R1,R1,#-8
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
![Page 18: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/18.jpg)
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5
1 S.D F4,0(R1) 2 3 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3
2 L.D F0,0(R1) 4
2 ADD.D F4,F0,F2 4
2 S.D F4,0(R1) 5
2 DADDIU R1,R1,#-8
5
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
![Page 19: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/19.jpg)
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5,6 Wait for L.D
1 S.D F4,0(R1) 2 3 Wait for ADD.D
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4
2 ADD.D F4,F0,F2 4
2 S.D F4,0(R1) 5
2 DADDIU R1,R1,#-8 5
2 BNE R1,R2,Loop 6
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
![Page 20: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/20.jpg)
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5,6,7 Wait for L.D
1 S.D F4,0(R1) 2 3 Wait for ADD.D
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 Wait for BNE
2 ADD.D F4,F0,F2 4
2 S.D F4,0(R1) 5
2 DADDIU R1,R1,#-8 5
2 BNE R1,R2,Loop 6
3 L.D F0,0(R1) 7
3 ADD.D F4,F0,F2 7
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
![Page 21: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/21.jpg)
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 Wait for ADD.D
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 Wait for BNE
2 ADD.D F4,F0,F2 4 Wait for L.D
2 S.D F4,0(R1) 5 8 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 Wait for ALU
2 BNE R1,R2,Loop 6 Wait for DADDIU
3 L.D F0,0(R1) 7 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8
3 DADDIU R1,R1,#-8 8
3 BNE R1,R2,Loop
![Page 22: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/22.jpg)
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 9
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 9 Wait for BNE
2 ADD.D F4,F0,F2 4 Wait for L.D
2 S.D F4,0(R1) 5 8 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 9 Wait for ALU
2 BNE R1,R2,Loop 6 Wait for DADDIU
3 L.D F0,0(R1) 7 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8 Wait for ADD.D
3 DADDIU R1,R1,#-8 8 Wait for ALU
3 BNE R1,R2,Loop 9
![Page 23: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/23.jpg)
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 9
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 9 Wait for BNE
2 ADD.D F4,F0,F2 4 10 Wait for L.D
2 S.D F4,0(R1) 5 8 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU
2 BNE R1,R2,Loop 6 Wait for DADDIU
3 L.D F0,0(R1) 7 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8 Wait for ADD.D
3 DADDIU R1,R1,#-8 8 Wait for ALU
3 BNE R1,R2,Loop 9 Wait for DADDIU
![Page 24: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/24.jpg)
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 9
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 9 Wait for BNE
2 ADD.D F4,F0,F2 4 10,11 Wait for L.D
2 S.D F4,0(R1) 5 8 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU
2 BNE R1,R2,Loop 6 11 Wait for DADDIU
3 L.D F0,0(R1) 7 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8 Wait for ADD.D
3 DADDIU R1,R1,#-8 8 Wait for ALU
3 BNE R1,R2,Loop 9 Wait for DADDIU
![Page 25: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/25.jpg)
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 9
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 9 Wait for BNE
2 ADD.D F4,F0,F2 4 10,11,12 Wait for L.D
2 S.D F4,0(R1) 5 8 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU
2 BNE R1,R2,Loop 6 11 Wait for DADDIU
3 L.D F0,0(R1) 7 12 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8 Wait for ADD.D
3 DADDIU R1,R1,#-8 8 Wait for ALU
3 BNE R1,R2,Loop 9 Wait for DADDIU
![Page 26: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/26.jpg)
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 9
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 9 Wait for BNE
2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D
2 S.D F4,0(R1) 5 8 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU
2 BNE R1,R2,Loop 6 11 Wait for DADDIU
3 L.D F0,0(R1) 7 12 13 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8 13 Wait for ADD.D
3 DADDIU R1,R1,#-8 8 Wait for ALU
3 BNE R1,R2,Loop 9 Wait for DADDIU
![Page 27: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/27.jpg)
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 9
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 9 Wait for BNE
2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D
2 S.D F4,0(R1) 5 8 14 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU
2 BNE R1,R2,Loop 6 11 Wait for DADDIU
3 L.D F0,0(R1) 7 12 13 14 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8 13 Wait for ADD.D
3 DADDIU R1,R1,#-8 8 14 Wait for ALU
3 BNE R1,R2,Loop 9 Wait for DADDIU
![Page 28: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/28.jpg)
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 9
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 9 Wait for BNE
2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D
2 S.D F4,0(R1) 5 8 14 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU
2 BNE R1,R2,Loop 6 11 Wait for DADDIU
3 L.D F0,0(R1) 7 12 13 14 Wait for BNE
3 ADD.D F4,F0,F2 7 15 Wait for L.D
3 S.D F4,0(R1) 8 13 Wait for ADD.D
3 DADDIU R1,R1,#-8 8 14 15 Wait for ALU
3 BNE R1,R2,Loop 9 Wait for DADDIU
![Page 29: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/29.jpg)
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 9
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 9 Wait for BNE
2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D
2 S.D F4,0(R1) 5 8 14 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU
2 BNE R1,R2,Loop 6 11 Wait for DADDIU
3 L.D F0,0(R1) 7 12 13 14 Wait for BNE
3 ADD.D F4,F0,F2 7 15,16 Wait for L.D
3 S.D F4,0(R1) 8 13 Wait for ADD.D
3 DADDIU R1,R1,#-8 8 14 15 Wait for ALU
3 BNE R1,R2,Loop 9 16 Wait for DADDIU
![Page 30: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/30.jpg)
Dual issue, 1 Integer Unit, FPMUL = 3 cc
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 9
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 9 Wait for BNE
2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D
2 S.D F4,0(R1) 5 8 14 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU
2 BNE R1,R2,Loop 6 11 Wait for DADDIU
3 L.D F0,0(R1) 7 12 13 14 Wait for BNE
3 ADD.D F4,F0,F2 7 15-17 18 Wait for L.D
3 S.D F4,0(R1) 8 13 19 Wait for ADD.D
3 DADDIU R1,R1,#-8 8 14 15 Wait for ALU
3 BNE R1,R2,Loop 9 16 Wait for DADDIU
![Page 31: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/31.jpg)
![Page 32: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/32.jpg)
Dual issue, 2 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 First issue
1 ADD.D F4,F0,F2 1
1 S.D F4,0(R1)
1 DADDIU R1,R1,#-8
1 BNE R1,R2,Loop
2 L.D F0,0(R1)
2 ADD.D F4,F0,F2
2 S.D F4,0(R1)
2 DADDIU R1,R1,#-8
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
![Page 33: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/33.jpg)
Dual issue, 2 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 First issue
1 ADD.D F4,F0,F2 1
1 S.D F4,0(R1) 2
1 DADDIU R1,R1,#-8
2
1 BNE R1,R2,Loop
2 L.D F0,0(R1)
2 ADD.D F4,F0,F2
2 S.D F4,0(R1)
2 DADDIU R1,R1,#-8
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
![Page 34: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/34.jpg)
Dual issue, 2 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 First issue
1 ADD.D F4,F0,F2 1
1 S.D F4,0(R1) 2 3
1 DADDIU R1,R1,#-8
2 3
1 BNE R1,R2,Loop 3
2 L.D F0,0(R1)
2 ADD.D F4,F0,F2
2 S.D F4,0(R1)
2 DADDIU R1,R1,#-8
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
![Page 35: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/35.jpg)
Dual issue, 2 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 Wait for LD.D
1 S.D F4,0(R1) 2 3 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3
2 L.D F0,0(R1) 4
2 ADD.D F4,F0,F2 4
2 S.D F4,0(R1)
2 DADDIU R1,R1,#-8
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
![Page 36: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/36.jpg)
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5 Wait for LD.D
1 S.D F4,0(R1) 2 3 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4
2 ADD.D F4,F0,F2 4
2 S.D F4,0(R1) 5
2 DADDIU R1,R1,#-8
5
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
![Page 37: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/37.jpg)
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5,6 Wait for LD.D
1 S.D F4,0(R1) 2 3 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4 6 Wait for BNE
2 ADD.D F4,F0,F2 4 Wait for L.D
2 S.D F4,0(R1) 5 Wait for ADD.D
2 DADDIU R1,R1,#-8
5 6 Executes earlier
2 BNE R1,R2,Loop 6
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
![Page 38: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/38.jpg)
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5,6,7 Wait for LD.D
1 S.D F4,0(R1) 2 3 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4 6 7 Wait for BNE
2 ADD.D F4,F0,F2 4 Wait for L.D
2 S.D F4,0(R1) 5 7 Wait for ADD.D
2 DADDIU R1,R1,#-8
5 6 7 Executes earlier
2 BNE R1,R2,Loop 6
3 L.D F0,0(R1) 7
3 ADD.D F4,F0,F2 7
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
![Page 39: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/39.jpg)
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for LD.D
1 S.D F4,0(R1) 2 3 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4 6 7 8 Wait for BNE
2 ADD.D F4,F0,F2 4 Wait for L.D
2 S.D F4,0(R1) 5 7 Wait for ADD.D
2 DADDIU R1,R1,#-8
5 6 7 Executes earlier
2 BNE R1,R2,Loop 6 8
3 L.D F0,0(R1) 7
3 ADD.D F4,F0,F2 7
3 S.D F4,0(R1) 8
3 DADDIU R1,R1,#-8
8
3 BNE R1,R2,Loop
![Page 40: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/40.jpg)
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D
1 S.D F4,0(R1) 2 3 9 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4 6 7 8 Wait for BNE
2 ADD.D F4,F0,F2 4 9 Wait for L.D
2 S.D F4,0(R1) 5 7 Wait for ADD.D
2 DADDIU R1,R1,#-8
5 6 7 Executes earlier
2 BNE R1,R2,Loop 6 8 Wait for ADDIU
3 L.D F0,0(R1) 7 9 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8
3 DADDIU R1,R1,#-8
8 9
3 BNE R1,R2,Loop 9
![Page 41: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/41.jpg)
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D
1 S.D F4,0(R1) 2 3 9 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4 6 7 8 Wait for BNE
2 ADD.D F4,F0,F2 4 9,10 Wait for L.D
2 S.D F4,0(R1) 5 7 Wait for ADD.D
2 DADDIU R1,R1,#-8
5 6 7 Executes earlier
2 BNE R1,R2,Loop 6 8 Wait for ADDIU
3 L.D F0,0(R1) 7 9 10 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8 10 Wait for ADD.D
3 DADDIU R1,R1,#-8
8 9 10 Executes earlier
3 BNE R1,R2,Loop 9 Wait for ADDIU
![Page 42: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/42.jpg)
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D
1 S.D F4,0(R1) 2 3 9 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4 6 7 8 Wait for BNE
2 ADD.D F4,F0,F2 4 9,10,11 Wait for L.D
2 S.D F4,0(R1) 5 7 Wait for ADD.D
2 DADDIU R1,R1,#-8
5 6 7 Executes earlier
2 BNE R1,R2,Loop 6 8 Wait for ADDIU
3 L.D F0,0(R1) 7 9 10 11 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8 10 Wait for ADD.D
3 DADDIU R1,R1,#-8
8 9 10 Executes earlier
3 BNE R1,R2,Loop 9 11 Wait for ADDIU
![Page 43: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/43.jpg)
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D
1 S.D F4,0(R1) 2 3 9 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4 6 7 8 Wait for BNE
2 ADD.D F4,F0,F2 4 9-11 12 Wait for L.D
2 S.D F4,0(R1) 5 7 Wait for ADD.D
2 DADDIU R1,R1,#-8
5 6 7 Executes earlier
2 BNE R1,R2,Loop 6 8 Wait for ADDIU
3 L.D F0,0(R1) 7 9 10 11 Wait for BNE
3 ADD.D F4,F0,F2 7 12 Wait for L.D
3 S.D F4,0(R1) 8 10 Wait for ADD.D
3 DADDIU R1,R1,#-8
8 9 10 Executes earlier
3 BNE R1,R2,Loop 9 11 Wait for ADDIU
![Page 44: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/44.jpg)
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D
1 S.D F4,0(R1) 2 3 9 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4 6 7 8 Wait for BNE
2 ADD.D F4,F0,F2 4 9 12 Wait for L.D
2 S.D F4,0(R1) 5 7 13 Wait for ADD.D
2 DADDIU R1,R1,#-8
5 6 7 Executes earlier
2 BNE R1,R2,Loop 6 8 Wait for ADDIU
3 L.D F0,0(R1) 7 9 10 11 Wait for BNE
3 ADD.D F4,F0,F2 7 12,13 Wait for L.D
3 S.D F4,0(R1) 8 10 Wait for ADD.D
3 DADDIU R1,R1,#-8
8 9 10 Executes earlier
3 BNE R1,R2,Loop 9 11 Wait for ADDIU
![Page 45: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/45.jpg)
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D
1 S.D F4,0(R1) 2 3 9 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4 6 7 8 Wait for BNE
2 ADD.D F4,F0,F2 4 9 12 Wait for L.D
2 S.D F4,0(R1) 5 7 13 Wait for ADD.D
2 DADDIU R1,R1,#-8
5 6 7 Executes earlier
2 BNE R1,R2,Loop 6 8 Wait for ADDIU
3 L.D F0,0(R1) 7 9 10 11 Wait for BNE
3 ADD.D F4,F0,F2 7 12-14 15 Wait for L.D
3 S.D F4,0(R1) 8 10 16 Wait for ADD.D
3 DADDIU R1,R1,#-8
8 9 10 Executes earlier
3 BNE R1,R2,Loop 9 11 Wait for ADDIU
![Page 46: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/46.jpg)
Speculative Execution
Need to overcome Branch Hazards Precise Exception
![Page 47: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/47.jpg)
Speculative Pipeline
ISSUE/ Rename to
RS
Check for RS
Check for RAW
CDB
A1
A2
A3
A4
Wait for Operands
FP
Write Reg
Wait for Operands
EXTAC
MemAcces
LD/ST
Wait for Operands
EXInteger
M1
M2
.
.M7
Wait for Operands
DivideWait for Operands
ROB
Read Reg
![Page 48: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/48.jpg)
The Hardware: Reorder Buffer
If inst write results in program order, reg/memory always get the correct values
Reorder buffer (ROB) – reorder out-of-order inst to program order at the time of writing reg/memory (commit)
If some inst goes wrong, handle it at the time of commit – just flush inst afterwards
Inst cannot write reg/memory immediately after execution, so ROB also buffer the results
No such a place in Tomasulo original
ReorderBufferDecode
FU1 FU2
RS RS
Fetch Unit
Rename
L-bufS-buf
DM
Regfile
IM
![Page 49: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/49.jpg)
Issue — get instruction from FP Op QueueCondition: a free RS at the required FUActions: (1) decode the instruction; (2) allocate a RS
and ROB entry; (3) do source register renaming; (4) do dest register renaming; (5) read register file; (6) dispatch the decoded and renamed instruction to the RS and ROB
Execution — operate on operands (EX)Condition: At a given FU, At lease one instruction is
readyAction: select a ready instruction and send it to the FU
Write result — finish execution (WB)Condition: At a given FU, some instruction finishes FU
executionActions: (1) FU writes to CDB, broadcast to all RSs and
to the ROB; (2) FU broadcast tag (ROB index) to all RS; (3) de-allocate the RS. Note: no register status update at this time
Speculative Tomasulo Algorithm
![Page 50: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/50.jpg)
Speculative Tomasulo Algorithm
Commit—update register with reorder result Condition: ROB is not empty and ROB head
inst has finished execution Actions if no mis-prediction/exception: (1)
write result to register/memory, (2) update register status, (3) de-allocate the ROB entry
Actions if with mis-prediction/exception: flush the pipeline, e.g. (1) flush IFQ; (2) clear register status; (3) flush all RS and reset FU;
(4) reset ROB
![Page 51: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/51.jpg)
Loop: LD R2,0(R1) DADDIUR2,R2,#1 SD R2,0(R1) ;store
result
DADDIUR1,R1,#4 ;increment pointer
BNE R2,R3,LOOP ;branch if not last element
![Page 52: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/52.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 First issue
1 ADDIU R2,R2,#1 1
1 S.D R2,0(R1)
1 DADDIU R1,R1,#4
1 BNE R2,R3,Loop
2 L.D R2,0(R1)
2 ADDIU R2,R2,#1
2 S.D R2,0(R1)
2 DADDIU R1,R1,#4
2 BNE R2,R3,Loop
3 L.D R2,0(R1)
3 ADDIU R2,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
![Page 53: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/53.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 First issue
1 ADDIU R2,R2,#1 1 Wait for LW
1 S.D R2,0(R1) 2
1 DADDIU R1,R1,#4 2
1 BNE R2,R3,Loop
2 L.D R2,0(R1)
2 ADDIU R2,R2,#1
2 S.D R2,0(R1)
2 DADDIU R1,R1,#4
2 BNE R2,R3,Loop
3 L.D R2,0(R1)
3 ADDIU R2,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
![Page 54: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/54.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 First issue
1 ADDIU R2,R2,#1 1 Wait for LW
1 S.D R2,0(R1) 2 3 Wait for ADDIU
1 DADDIU R1,R1,#4 2 3
1 BNE R2,R3,Loop 3
2 L.D R2,0(R1)
2 ADDIU R2,R2,#1
2 S.D R2,0(R1)
2 DADDIU R1,R1,#4
2 BNE R2,R3,Loop
3 L.D R2,0(R1)
3 ADDIU R2,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
![Page 55: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/55.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 Wait for LW
1 S.D R2,0(R1) 2 3 Wait for ADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3
2 L.D R2,0(R1) 4
2 ADDIU R2,R2,#1 4
2 S.D R2,0(R1)
2 DADDIU R1,R1,#4
2 BNE R2,R3,Loop
3 L.D R2,0(R1)
3 ADDIU R2,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
![Page 56: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/56.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 Wait for LW
1 S.D R2,0(R1) 2 3 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 Wait for DADDIU
2 L.D F0,0(R1) 4 Wait for BNE
2 ADDIU R4,R2,#1 4 Wait for LW
2 S.D R2,0(R1) 5
2 DADDIU R1,R1,#4 5
2 BNE R2,R3,Loop
3 L.D F0,0(R1)
3 ADDIU R4,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
![Page 57: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/57.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for LW
1 S.D R2,0(R1) 2 3 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 Wait for DADDIU
2 L.D R2,0(R1) 4 Wait for BNE
2 ADDIU R2,R2,#1 4 Wait for LW
2 S.D R2,0(R1) 5 Wait for DADDIU
2 DADDIU R1,R1,#4 5 Wait for BNE
2 BNE R2,R3,Loop 6
3 L.D R2,0(R1)
3 ADDIU R2,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
![Page 58: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/58.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for LW
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 Wait for BNE
2 ADDIU R2,R2,#1 4 Wait for LW
2 S.D R2,0(R1) 5 Wait for DADDIU
2 DADDIU R1,R1,#4 5 Wait for BNE
2 BNE R2,R3,Loop 6 Wait for DADDIU
3 L.D R2,0(R1) 7
3 ADDIU R2,R2,#1 7
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
![Page 59: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/59.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 Wait for BNE
2 ADDIU R2,R2,#1 4 Wait for LW
2 S.D R2,0(R1) 5 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 Wait for BNE
2 BNE R2,R3,Loop 6 Wait for DADDIU
3 L.D R2,0(R1) 7 Wait for BNE
3 ADDIU R2,R2,#1 7 Wait for LW
3 S.D R2,0(R1) 8
3 DADDIU R1,R1,#4 8
3 BNE R2,R3,Loop
![Page 60: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/60.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 Wait for BNE
2 ADDIU R2,R2,#1 4 Wait for LW
2 S.D R2,0(R1) 5 9 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 Wait for DADDIU
3 L.D R2,0(R1) 7 Wait for BNE
3 ADDIU R2,R2,#1 7 Wait for LW
3 S.D R2,0(R1) 8 Wait for DADDIU
3 DADDIU R1,R1,#4 8 Wait for BNE
3 BNE R2,R3,Loop 9
![Page 61: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/61.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 Wait for LW
2 S.D R2,0(R1) 5 9 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 Wait for DADDIU
3 L.D R2,0(R1) 7 Wait for BNE
3 ADDIU R2,R2,#1 7 Wait for LW
3 S.D R2,0(R1) 8 Wait for DADDIU
3 DADDIU R1,R1,#4 8 Wait for BNE
3 BNE R2,R3,Loop 9 Wait for DADDIU
![Page 62: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/62.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 11 Wait for LW
2 S.D R2,0(R1) 5 9 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 Wait for DADDIU
3 L.D R2,0(R1) 7 Wait for BNE
3 ADDIU R2,R2,#1 7 Wait for LW
3 S.D R2,0(R1) 8 Wait for DADDIU
3 DADDIU R1,R1,#4 8 Wait for BNE
3 BNE R2,R3,Loop 9 Wait for DADDIU
![Page 63: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/63.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 11 12 Wait for LW
2 S.D R2,0(R1) 5 9 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 Wait for DADDIU
3 L.D R2,0(R1) 7 Wait for BNE
3 ADDIU R2,R2,#1 7 Wait for LW
3 S.D R2,0(R1) 8 Wait for DADDIU
3 DADDIU R1,R1,#4 8 Wait for BNE
3 BNE R2,R3,Loop 9 Wait for DADDIU
![Page 64: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/64.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 11 12 Wait for LW
2 S.D R2,0(R1) 5 9 13 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 13 Wait for DADDIU
3 L.D R2,0(R1) 7 Wait for BNE
3 ADDIU R2,R2,#1 7 Wait for LW
3 S.D R2,0(R1) 8 Wait for DADDIU
3 DADDIU R1,R1,#4 8 Wait for BNE
3 BNE R2,R3,Loop 9 Wait for DADDIU
![Page 65: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/65.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 11 12 Wait for LW
2 S.D R2,0(R1) 5 9 13 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 13 Wait for DADDIU
3 L.D R2,0(R1) 7 14 Wait for BNE
3 ADDIU R2,R2,#1 7 Wait for LW
3 S.D R2,0(R1) 8 Wait for DADDIU
3 DADDIU R1,R1,#4 8 14 Wait for BNE
3 BNE R2,R3,Loop 9 Wait for DADDIU
![Page 66: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/66.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 11 12 Wait for LW
2 S.D R2,0(R1) 5 9 13 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 13 Wait for DADDIU
3 L.D R2,0(R1) 7 14 15 Wait for BNE
3 ADDIU R2,R2,#1 7 Wait for LW
3 S.D R2,0(R1) 8 15 Wait for DADDIU
3 DADDIU R1,R1,#4 8 14 15 Wait for BNE
3 BNE R2,R3,Loop 9 Wait for DADDIU
![Page 67: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/67.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 11 12 Wait for LW
2 S.D R2,0(R1) 5 9 13 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 13 Wait for DADDIU
3 L.D R2,0(R1) 7 14 15 16 Wait for BNE
3 ADDIU R2,R2,#1 7 Wait for LW
3 S.D R2,0(R1) 8 15 Wait for DADDIU
3 DADDIU R1,R1,#4 8 14 15 Wait for BNE
3 BNE R2,R3,Loop 9 Wait for DADDIU
![Page 68: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/68.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 11 12 Wait for LW
2 S.D R2,0(R1) 5 9 13 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 13 Wait for DADDIU
3 L.D R2,0(R1) 7 14 15 16 Wait for BNE
3 ADDIU R2,R2,#1 7 17 Wait for LW
3 S.D R2,0(R1) 8 15 Wait for DADDIU
3 DADDIU R1,R1,#4 8 14 15 Wait for BNE
3 BNE R2,R3,Loop 9 Wait for DADDIU
![Page 69: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/69.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 11 12 Wait for LW
2 S.D R2,0(R1) 5 9 13 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 13 Wait for DADDIU
3 L.D R2,0(R1) 7 14 15 16 Wait for BNE
3 ADDIU R2,R2,#1 7 17 18 Wait for LW
3 S.D R2,0(R1) 8 15 Wait for DADDIU
3 DADDIU R1,R1,#4 8 14 15 Wait for BNE
3 BNE R2,R3,Loop 9 Wait for DADDIU
![Page 70: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/70.jpg)
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 11 12 Wait for LW
2 S.D R2,0(R1) 5 9 13 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 13 Wait for DADDIU
3 L.D R2,0(R1) 7 14 15 16 Wait for BNE
3 ADDIU R2,R2,#1 7 17 18 Wait for LW
3 S.D R2,0(R1) 8 15 19 Wait for DADDIU
3 DADDIU R1,R1,#4 8 14 15 Wait for BNE
3 BNE R2,R3,Loop 9 19 Wait for DADDIU
![Page 71: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/71.jpg)
Speculative execution:Dual issue, 2 CDB
![Page 72: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/72.jpg)
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1
1 ADDIU R2,R2,#1 1
1 S.D R2,0(R1)
1 DADDIU R1,R1,#4
1 BNE R2,R3,Loop
2 L.D R2,0(R1)
2 ADDIU R2,R2,#1
2 S.D R2,0(R1)
2 DADDIU R1,R1,#4
2 BNE R2,R3,Loop
3 L.D R2,0(R1)
3 ADDIU R2,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
![Page 73: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/73.jpg)
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D F0,0(R1) 1 2
1 ADDIU R4,R2,#1 1
1 S.D R2,0(R1) 2
1 DADDIU R1,R1,#4 2
1 BNE R2,R3,Loop
2 L.D F0,0(R1)
2 ADDIU R4,R2,#1
2 S.D R2,0(R1)
2 DADDIU R1,R1,#4
2 BNE R2,R3,Loop
3 L.D F0,0(R1)
3 ADDIU R4,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
![Page 74: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/74.jpg)
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3
1 ADDIU R2,R2,#1 1
1 S.D R2,0(R1) 2 3
1 DADDIU R1,R1,#4 2 3
1 BNE R2,R3,Loop 3
2 L.D R2,0(R1)
2 ADDIU R2,R2,#1
2 S.D R2,0(R1)
2 DADDIU R1,R1,#4
2 BNE R2,R3,Loop
3 L.D R2,0(R1)
3 ADDIU R2,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
![Page 75: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/75.jpg)
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4
1 ADDIU R2,R2,#1 1
1 S.D R2,0(R1) 2 3
1 DADDIU R1,R1,#4 2 3 4
1 BNE R2,R3,Loop 3
2 L.D R2,0(R1) 4
2 ADDIU R2,R2,#1 4
2 S.D R2,0(R1)
2 DADDIU R1,R1,#4
2 BNE R2,R3,Loop
3 L.D R2,0(R1)
3 ADDIU R2,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
![Page 76: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/76.jpg)
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D F0,0(R1) 1 2 3 4 5
1 ADDIU R4,R2,#1 1 5
1 S.D R2,0(R1) 2 3
1 DADDIU R1,R1,#4 2 3 4
1 BNE R2,R3,Loop 3
2 L.D F0,0(R1) 4 5
2 ADDIU R4,R2,#1 4
2 S.D R2,0(R1) 5
2 DADDIU R1,R1,#4 5
2 BNE R2,R3,Loop
3 L.D F0,0(R1)
3 ADDIU R4,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
![Page 77: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/77.jpg)
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4 5
1 ADDIU R2,R2,#1 1 5 6
1 S.D R2,0(R1) 2 3
1 DADDIU R1,R1,#4 2 3 4
1 BNE R2,R3,Loop 3
2 L.D R2,0(R1) 4 5 6
2 ADDIU R2,R2,#1 4
2 S.D R2,0(R1) 5 6
2 DADDIU R1,R1,#4 5 6
2 BNE R2,R3,Loop 6
3 L.D R2,0(R1)
3 ADDIU R2,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
![Page 78: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/78.jpg)
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4 5
1 ADDIU R2,R2,#1 1 5 6 7
1 S.D R2,0(R1) 2 3 7
1 DADDIU R1,R1,#4 2 3 4
1 BNE R2,R3,Loop 3 7
2 L.D R2,0(R1) 4 5 6 7
2 ADDIU R2,R2,#1 4
2 S.D R2,0(R1) 5 6
2 DADDIU R1,R1,#4 5 6 7
2 BNE R2,R3,Loop 6
3 L.D R2,0(R1) 7
3 ADDIU R2,R2,#1 7
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
![Page 79: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/79.jpg)
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4 5
1 ADDIU R2,R2,#1 1 5 6 7
1 S.D R2,0(R1) 2 3 7 7
1 DADDIU R1,R1,#4 2 3 4 8
1 BNE R2,R3,Loop 3 7 8
2 L.D R2,0(R1) 4 5 6 7
2 ADDIU R2,R2,#1 4 8
2 S.D R2,0(R1) 5 6
2 DADDIU R1,R1,#4 5 6 7
2 BNE R2,R3,Loop 6
3 L.D R2,0(R1) 7 8
3 ADDIU R2,R2,#1 7
3 S.D R2,0(R1) 8
3 DADDIU R1,R1,#4 8
3 BNE R2,R3,Loop
![Page 80: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/80.jpg)
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4 5
1 ADDIU R2,R2,#1 1 5 6 7
1 S.D R2,0(R1) 2 3 7 7
1 DADDIU R1,R1,#4 2 3 4 8
1 BNE R2,R3,Loop 3 7 8
2 L.D R2,0(R1) 4 5 6 7 9
2 ADDIU R2,R2,#1 4 8 9
2 S.D R2,0(R1) 5 6
2 DADDIU R1,R1,#4 5 6 7
2 BNE R2,R3,Loop 6
3 L.D R2,0(R1) 7 8 9
3 ADDIU R2,R2,#1 7
3 S.D R2,0(R1) 8 9
3 DADDIU R1,R1,#4 8 9
3 BNE R2,R3,Loop 9
![Page 81: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/81.jpg)
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4 5
1 ADDIU R2,R2,#1 1 5 6 7
1 S.D R2,0(R1) 2 3 7 7
1 DADDIU R1,R1,#4 2 3 4 8
1 BNE R2,R3,Loop 3 7 8
2 L.D R2,0(R1) 4 5 6 7 9
2 ADDIU R2,R2,#1 4 8 9 10
2 S.D R2,0(R1) 5 6 10
2 DADDIU R1,R1,#4 5 6 7
2 BNE R2,R3,Loop 6 10
3 L.D R2,0(R1) 7 8 9 10
3 ADDIU R2,R2,#1 7
3 S.D R2,0(R1) 8 9
3 DADDIU R1,R1,#4 8 9 10
3 BNE R2,R3,Loop 9
![Page 82: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/82.jpg)
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4 5
1 ADDIU R2,R2,#1 1 5 6 7
1 S.D R2,0(R1) 2 3 7 7
1 DADDIU R1,R1,#4 2 3 4 8
1 BNE R2,R3,Loop 3 7 8
2 L.D R2,0(R1) 4 5 6 7 9
2 ADDIU R2,R2,#1 4 8 9 10
2 S.D R2,0(R1) 5 6 10
2 DADDIU R1,R1,#4 5 6 7 11
2 BNE R2,R3,Loop 6 10 11
3 L.D R2,0(R1) 7 8 9 10
3 ADDIU R2,R2,#1 7 11
3 S.D R2,0(R1) 8 9
3 DADDIU R1,R1,#4 8 9 10
3 BNE R2,R3,Loop 9
![Page 83: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/83.jpg)
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4 5
1 ADDIU R2,R2,#1 1 5 6 7
1 S.D R2,0(R1) 2 3 7 7
1 DADDIU R1,R1,#4 2 3 4 8
1 BNE R2,R3,Loop 3 7 8
2 L.D R2,0(R1) 4 5 6 7 9
2 ADDIU R2,R2,#1 4 8 9 10
2 S.D R2,0(R1) 5 6 10
2 DADDIU R1,R1,#4 5 6 11
2 BNE R2,R3,Loop 6 10 11
3 L.D R2,0(R1) 7 8 9 10 12
3 ADDIU R2,R2,#1 7 11 12
3 S.D R2,0(R1) 8 9
3 DADDIU R1,R1,#4 8 9 10
3 BNE R2,R3,Loop 9
![Page 84: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/84.jpg)
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4 5
1 ADDIU R2,R2,#1 1 5 6 7
1 S.D R2,0(R1) 2 3 7 7
1 DADDIU R1,R1,#4 2 3 4 8
1 BNE R2,R3,Loop 3 7 8
2 L.D R2,0(R1) 4 5 6 7 9
2 ADDIU R2,R2,#1 4 8 9 10
2 S.D R2,0(R1) 5 6 10
2 DADDIU R1,R1,#4 5 6 11
2 BNE R2,R3,Loop 6 10 11
3 L.D R2,0(R1) 7 8 9 10 12
3 ADDIU R2,R2,#1 7 11 12 13
3 S.D R2,0(R1) 8 9 13
3 DADDIU R1,R1,#4 8 9 10
3 BNE R2,R3,Loop 9 13
![Page 85: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/85.jpg)
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4 5
1 ADDIU R2,R2,#1 1 5 6 7
1 S.D R2,0(R1) 2 3 7 7
1 DADDIU R1,R1,#4 2 3 4 8
1 BNE R2,R3,Loop 3 7 8
2 L.D R2,0(R1) 4 5 6 7 9
2 ADDIU R2,R2,#1 4 8 9 10
2 S.D R2,0(R1) 5 6 10
2 DADDIU R1,R1,#4 5 6 11
2 BNE R2,R3,Loop 6 10 11
3 L.D R2,0(R1) 7 8 9 10 12
3 ADDIU R2,R2,#1 7 11 12 13
3 S.D R2,0(R1) 8 9 13
3 DADDIU R1,R1,#4 8 9 10 14
3 BNE R2,R3,Loop 9 13 14
![Page 86: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/86.jpg)
IDEAL/Perfect Processor
Register renaming Infinite virtual registers available
Branch prediction All conditional branches are predicted
exactly Jump prediction
All jumps are perfectly predicted Memory address alias analysis
All memory addresses are known exactly.
![Page 87: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/87.jpg)
ILP perfect processor for six SPEC92
Programs
Instr
ucti
on
Issu
es p
er
cycle
0
20
40
60
80
100
120
140
160
gcc espresso li fpppp doducdtomcatv
54.862.6
17.9
75.2
118.7
150.1
![Page 88: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/88.jpg)
Effects of reducing the size of the window
Infinite 2k 512 128 32 8 4
160
140
120
100
80
60
40
20
0
Window size
Instruction issues per cycle
Tomcatv
Doduc
Fpppp
li
Practical possibilities
![Page 89: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/89.jpg)
Another View of Last SlideIPC
Program
Instr
ucti
on
issu
es p
er
cycle
gcc espresso li fpppp
Infinite 2K 512 128 32
doduct
0
10
20
30
40
50
6055
63
18
75
36
41
15
61
10
1512
49
13 11
35
8 8 9
14
10
119
59
16 15
9
150
60
45
34
14
tomcatv
70
80
120
130
140
Window Size
![Page 90: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/90.jpg)
Effect of branch-prediction schemes(1)
Instruction issues per cycle
Perfect Tournament Standard Static None
predictor 2-bit
60
50
40
30
20
10
0
Branch-prediction scheme
fpppp
Doduc
Tomcatv
li
Practical possibilities
![Page 91: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/91.jpg)
Effect of branch-prediction schemes(2)
Program
Instr
ucti
on
issu
es p
er
cycle
0
10
20
30
40
50
60
gcc espresso li fpppp doducd tomcatv
35
41
16
61
5860
9
1210
48
15
67 6
46
13
45
6 6 7
45
14
45
2 2 2
29
4
19
46
Perfect Selective predictor Standard 2-bit Static None
![Page 92: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/92.jpg)
Branch-prediction accuracy for conditional branches in SPEC92
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Frequency of mispredictions
88% 77%
86% 82%
li
espresso
fpppp
tomcatv
86% 82%
99% 99% 100
%
98%
96%
98%
Profile-based
2-bit counter
Tournament
![Page 93: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/93.jpg)
Intl processor based on the p6 micro- architecture
Processor First ship date
Clock rate range
L1 cache L2 cache
Pentium Pro
1995 100-200 MHz
8KB instr. + 8KB data
256 KB-1024 KB
Pentium II 1998 233-450 MHz
16KB instr. + 16KB data
256 KB-512 KB
Pentium II Xeon
1999 400-450 MHz
16KB instr. + 16KB data
512 KB-2 MB
Celeron 1999 500-900 MHz
16KB instr. + 16KB data
128 KB
Pentium III 1999 450-1100 MHz
16KB instr. + 16KB data
256 KB–512 KB
Pentium Xeon
2000 700-900 MHz
16KB instr. + 16KB data
1 MB-2 MB
![Page 94: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/94.jpg)
P6 Architecture (P-II Onwards…)
Instruction name
Pipeline stages
Repeat rate
Integer ALU 1 1
Integer Load 3 1
Integer Multiply 4 1
FP Add 3 1
FP multiply 5 2
FP divide (64-bits)
32 32
![Page 95: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/95.jpg)
P6 processor pipeline
Instruction
Fetch
16 bytes
Per cycle
16 bytesInstruction
Decode
3 instructions
Per cycle
6 uopsRenaming
3 upos
Per cycle
Reservation station
(20)Execution unit
(5 total)
Graduation unit
(3 uops per cycle)
Reorder buffer
(40 entries)The P6 processor pipeline showing the
throughput of each stage and the total buffering provided between stages:
![Page 96: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/96.jpg)
Speculation factor
Percentage of instructions that do not commit in Pentium 3
Ben
chm
ark
s
0
10
20
30
40
50
60
gcc tomcatv perl compressgo li vortex apsi fpppp hydro2d
![Page 97: Computer Architecture Lecture 6 Overview of Branch Prediction.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d5a5503460f94a3a4d2/html5/thumbnails/97.jpg)
Performance: Pentium 4 vs IIISpec
rati
o
0
200
400
600
800
1000
gcc mgridvortex applu
SPEC2000 benchmarks