CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction
description
Transcript of CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction
![Page 1: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/1.jpg)
CPE 432 Computer Design
6 – ILP: Part II – Branch Prediction
Dr. Gheith Abandah
Adapted from the slides of Prof. David Patterson, University of California, Berkeley
![Page 2: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/2.jpg)
04/22/23 CPE 432, 6-ILP2 2
Outline• Static Branch Prediction• Dynamic Branch Prediction• Branch History Table• Correlated Branch Prediction• Tournament Predictors• Branch Target Buffer
![Page 3: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/3.jpg)
04/22/23 CPE 432, 6-ILP2 3
12%
22%
18%
11% 12%
4%6%
9% 10%
15%
0%
5%
10%
15%
20%
25%
compres
seqntott
espre
sso gc
c li
doduc
ear
hydro2d
mdljdp
su2cor
Mis
pred
ictio
n R
ate
Static Branch Prediction• We learned how to schedule code around delayed
branch• To reorder code around branches, need to predict
branch statically when compile • Simplest scheme is to predict a branch as taken
– Average misprediction = untaken branch frequency = 34% SPEC
• More accurate scheme predicts branches using profile information collected from earlier runs, and modify prediction based on last run:
Integer Floating Point
![Page 4: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/4.jpg)
04/22/23 CPE 432, 6-ILP2 4
Dynamic Branch Prediction• Why does prediction work?
– Underlying algorithm has regularities– Data that is being operated on has regularities– Instruction sequence has redundancies that are artifacts of
way that humans/compilers think about problems
• Is dynamic branch prediction better than static branch prediction?– Seems to be – There are a small number of important branches in programs
which have dynamic behavior
• Performance = ƒ(accuracy, cost of misprediction)
![Page 5: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/5.jpg)
04/22/23 CPE 432, 6-ILP2 5
Branch History Table• Use the lower k bits of the PC to address index
the table
Branch Address
Branch History Table (BHT)
2k entries
Predict Taken or not Taken
k <n>
![Page 6: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/6.jpg)
04/22/23 CPE 432, 6-ILP2 6
BHT, n=1
• Problem: in a loop, 1-bit BHT will cause two mispredictions (avg is 9 iterations before exit):– End of loop case, when it exits instead of looping as before– First time through loop on next time through code, when it
predicts exit instead of looping
0 1
Not taken
TakenNot taken
Taken
Predict not taken
Predict taken
![Page 7: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/7.jpg)
04/22/23 CPE 432, 6-ILP2 7
• Solution: 2-bit scheme where change prediction only if get misprediction twice
• Red: stop, not taken• Green: go, taken• Adds hysteresis to decision making process
BHT, n=2
T
T NT
NT
Predict Taken
Predict Not Taken
Predict Taken
Predict Not TakenT
NTT
NT
![Page 8: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/8.jpg)
04/22/23 CPE 432, 6-ILP2 8
18%
5%
12%10% 9%
5%
9% 9%
0% 1%
0%2%4%6%8%
10%12%14%16%18%20%
eqnto
tt
espre
sso gc
c lispice
doduc
spice
fpppp
matrix300
nasa7
Mis
pred
ictio
n R
ate
BHT Accuracy
• Mispredict because either:– Wrong guess for that branch– Got branch history of wrong branch when index the table
• 4096 entry table:
Integer Floating Point
![Page 9: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/9.jpg)
04/22/23 CPE 432, 6-ILP2 9
Outline• Static Branch Prediction• Dynamic Branch Prediction• Branch History Table• Correlated Branch Prediction• Tournament Predictors• Branch Target Buffer
![Page 10: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/10.jpg)
04/22/23 CPE 432, 6-ILP2 10
Correlated Branch Prediction• Aka Two-Level Predictors• Motivation
1. if (a==2) a=0;2. if (b==2) b=0;3. if (a != b) {…}
Can predict that Condition (3) is not true if Conditions (1) and (2) are true.
![Page 11: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/11.jpg)
04/22/23 CPE 432, 6-ILP2 11
Correlated Branch Prediction• Idea: record m most recently executed branches
as taken or not taken, and use that pattern to select the proper n-bit branch history table
• In general, (m,n) predictor means record last m branches to select between 2m history tables, each with n-bit counters– Thus, old 2-bit BHT is a (0,2) predictor
• Global Branch History: m-bit shift register keeping T/NT status of last m branches.
• Each entry in table has m n-bit predictors.
![Page 12: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/12.jpg)
04/22/23 CPE 432, 6-ILP2 12
Correlating Branch Prediction
(2,2) predictor
– Behavior of recent branches selects between four predictions of next branch, updating just that prediction
Branch address
2-bits per branch predictor
Prediction
2-bit global branch history
4
![Page 13: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/13.jpg)
04/22/23 CPE 432, 6-ILP2 13
Correlated Branch PredictionExample:Find the size of (2, 2) predictor with k=10
Size = 2m * n * 2k
= 4 * 2 * 1K = 8 Kbits
![Page 14: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/14.jpg)
04/22/23 CPE 432, 6-ILP2 14
0%
Freq
uenc
y of
Mis
pred
ictio
ns
0%1%
5%6% 6%
11%
4%
6%5%
1%2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2)
Accuracy of Different Schemes
4096 Entries 2-bit BHTUnlimited Entries 2-bit BHT1024 Entries (2,2) BHT
nasa
7
mat
rix30
0
dodu
cd
spic
e
fppp
p
gcc
expr
esso
eqnt
ott li
tom
catv
![Page 15: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/15.jpg)
04/22/23 CPE 432, 6-ILP2 15
Outline• Static Branch Prediction• Dynamic Branch Prediction• Branch History Table• Correlated Branch Prediction• Tournament Predictors• Branch Target Buffer
![Page 16: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/16.jpg)
04/22/23 CPE 432, 6-ILP2 16
Tournament Predictors
• Multilevel branch predictor
• Use n-bit saturating counter to choose between predictors
• Usual choice between global and local predictors
![Page 17: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/17.jpg)
04/22/23 CPE 432, 6-ILP2 17
Tournament Predictors
Tournament predictor using, say, 4K 2-bit counters indexed by local branch address. Chooses between:
• Global predictor– 4K entries index by history of last 12 branches (212 = 4K)
– Each entry is a standard 2-bit predictor
• Local predictor– Local history table: 1024 10-bit entries recording last 10
branches, index by branch address
– The pattern of the last 10 occurrences of that particular branch used to index table of 1K entries with 3-bit saturating counters
![Page 18: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/18.jpg)
04/22/23 CPE 432, 6-ILP2 18
Alpha 21264 Branch Predictor
Selector
Global
Local
BHR 4K
<2>
addr 4K
<2>
<12>
k=12PredictionSelect
addr 1K10 1K
<10> <3>
Size = 4K*2 + 4K*2 + 1K*10 + 1K*3 = 29 Kbits
![Page 19: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/19.jpg)
04/22/23 CPE 432, 6-ILP2 19
Comparing Predictors (Fig. 2.8)• Advantage of tournament predictor is ability to
select the right predictor for a particular branch– Particularly crucial for integer benchmarks. – A typical tournament predictor will select the global predictor
almost 40% of the time for the SPEC integer benchmarks and less than 15% of the time for the SPEC FP benchmarks
![Page 20: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/20.jpg)
04/22/23 CPE 432, 6-ILP2 20
Outline• Static Branch Prediction• Dynamic Branch Prediction• Branch History Table• Correlated Branch Prediction• Tournament Predictors• Branch Target Buffer
![Page 21: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/21.jpg)
• Branch target calculation is costly and stalls the instruction fetch.
• BTB stores PCs the same way as caches• The PC of a branch is sent to the BTB• When a match is found the corresponding
Predicted PC is returned• If the branch was predicted taken, instruction
fetch continues at the returned predicted PC
Branch Target Buffers (BTB)
![Page 22: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/22.jpg)
Branch Target Buffers
![Page 23: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/23.jpg)
04/22/23 CPE 432, 6-ILP2 23
Pentium 4 Misprediction Rate (per 1000 instructions, not per branch)
11
13
7
12
9
10 0 0
5
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
164.gzip
175.vpr
176.gcc
181.m
cf
186.crafty
168.w
upwise
171.swim
172.m
grid
173.applu
177.m
esa
Bran
ch m
ispr
edic
tion
s pe
r 10
00 I
nstr
uctio
ns
SPECint2000 SPECfp2000
6% misprediction rate per branch SPECint (19% of INT instructions are branch)
2% misprediction rate per branch SPECfp(5% of FP instructions are branch)
![Page 24: CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816008550346895dcf0a1f/html5/thumbnails/24.jpg)
04/22/23 CPE 432, 6-ILP2 24
Dynamic Branch Prediction Summary• Prediction becoming important part of execution• Branch History Table: 2 bits for loop accuracy• Correlation: Recently executed branches correlated
with next branch– Either different branches (GA)– Or different executions of same branches (PA)
• Tournament predictors take insight to next level, by using multiple predictors – usually one based on global information and one based on local
information, and combining them with a selector– In 2006, tournament predictors using 30K bits are in processors
like the Power5 and Pentium 4
• Branch Target Buffer: include branch address & prediction