Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are...
-
date post
15-Jan-2016 -
Category
Documents
-
view
221 -
download
0
Transcript of Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are...
Advanced MicroarchitectureLecture 4: Branch Predictors
2
Direction vs. Target• Direction: 0 or 1• Target: 32- or 64-bit value
• Turns out targets are generally easier to predict– Don’t need to predict NT target– T target doesn’t usually change
• or has “nice” pattern like subroutine returns
Lecture 4: Correlated Branch Predictors
3
Branches Have Locality• If a branch was previously taken, there’s a
good chance it’ll be taken again in the future
for(i=0; i < 100000; i++){
/* do stuff */}
Lecture 4: Correlated Branch Predictors
This branch will be taken99,999 times in a row.
4
Simple Predictor• Always predict NT
– no fetch bubbles (always just fetch the next line)
– does horribly on previous for-loop example• Always predict T
– does pretty well on previous example– but what if you have other control besides
loops?
p = calloc(num,sizeof(*p));if(p == NULL)
error_handler( );Lecture 4: Correlated Branch Predictors
This branch is practicallynever taken
5
Last Outcome Predictor• Do what you did last time
Lecture 4: Correlated Branch Predictors
0xDC08: for(i=0; i < 100000; i++){
0xDC44: if( ( i % 100) == 0 )
tick( );
0xDC50:if( (i & 1) == 1)
odd( );
}
T
N
6
Misprediction Rates?
Lecture 4: Correlated Branch Predictors
DC08: TTTTTTTTTTT ... TTTTTTTTTTNTTTTTTTTT …
100,000 iterations
How often is branch outcome != previous outcome?2 / 100,000
TNNT
DC44: TTTTT ... TNTTTTT ... TNTTTTT ...
2 / 100DC50: TNTNTNTNTNTNTNTNTNTNTNTNTNTNT …
2 / 2
99.998%Prediction
Rate
98.0%
0.0%
7
Saturating Two-Bit Counter
Lecture 4: Correlated Branch Predictors
0 1
FSM for Last-OutcomePrediction
0 1
2 3
FSM for 2bC(2-bit Counter)
Predict NT
Predict T
Transistion on T outcome
Transistion on NT outcome
8
Example
Lecture 4: Correlated Branch Predictors
2
T
3
T
3
T
…3
N
N
1
T
0
0
T
1
T T T T…
T
1 1 1 1
T
1
T…1
0
T
1
T
2
T
3
T
3
T… 3
T
Initial Training/Warm-up1bC:
2bC:
Only 1 Mispredict per N branches now!DC08: 99.999% DC04: 99.0%
9
Importance of Branches• 98% 99%
– Whoop-Dee-Do!– Actually, it’s 2% misprediction rate 1%– That’s a halving of the number of mispredictions
• So what?– If misp rate equals 50%, and 1 in 5 insts is a branch, then
number of useful instructions that we can fetch is:5*(1 + ½ + (½)2 + (½)3 + … ) = 10
– If we halve the miss rate down to 25%:5*(1 + ¾ + (¾)2 + (¾)3 + … ) = 20
– Halving the miss rate doubles the number of useful instructions that we can try to extract ILP from
Lecture 4: Correlated Branch Predictors
10
Typical Organization of 2bC Predictor
Lecture 4: Correlated Branch Predictors
PC hash32 or 64 bits
log2 n bits
n entries/counters
Prediction
FSMUpdateLogic
table update
Actual outcome
… back to predictors
11
Typical Hash
• Just take the log2n least significant bits of the PC
• May need to ignore a few bits– In a 32-bit RISC ISA, all instructions are 4 bytes
wide, and all instruction addresses are 4-byte aligned least two significant bits of PC are always zeros and so they are not included• equivalent to right-shifting PC by two positions before
hashing– In a variable-length CISC ISA (ex. x86),
instructions may start on arbitrary byte boundaries• probably don’t want to shift
Lecture 4: Correlated Branch Predictors
12
How about the Branch at 0xDC50?• 1bc and 2bc don’t do too well (50% at best)• But it’s still obviously predictable• Why?
– It has a repeating pattern: (NT)*– How about other patterns? (TTNTN)*
• Use branch correlation– The outcome of a branch is often related to
previous outcome(s)
Lecture 4: Correlated Branch Predictors
13
Idea: Track the History of a Branch
Lecture 4: Correlated Branch Predictors
PC Previous Outcome
1Counter if prev=0
3 0Counter if prev=1
1 3 3 prev = 1 3 0 prediction = N
prev = 0 3 0 prediction = T
prev = 1 3 0 prediction = N
prev = 0 3 0 prediction = T
prev = 1 3 prediction = T3
prev = 1 3 prediction = T3
prev = 1 3 prediction = T2
prev = 0 3 prediction = T2
14
Deeper History Covers More Patterns
• What pattern has this branch predictor entry learned?
Lecture 4: Correlated Branch Predictors
PC
0 310 1 3 1 0 02 2
Last 3 Outcomes Counter if prev=000
Counter if prev=001
Counter if prev=010
Counter if prev=111
001 1; 011 0; 110 0; 100 100110011001… (0011)*
15
Predictor Organizations
Lecture 4: Correlated Branch Predictors
PC Hash
Different pattern foreach branch PC
PC Hash
Shared set ofpatterns
PC Hash
Mix of both
16
Example (1)• 1024 counters (210)
– 32 sets ( )• 5-bit PC hash chooses a set
– Each set has 32 counters• 32 x 32 = 1024• History length of 5 (log232 =
5)
• Branch collisions– 1000’s of branches
collapsed into only 32 sets
Lecture 4: Correlated Branch Predictors
PC Hash
5
5
17
Example (2)• 1024 counters (210)
– 128 sets ( )• 7-bit PC hash chooses a set
– Each set has 8 counters• 128 x 8 = 1024• History length of 3 (log28 =
3)
• Limited Patterns/Correlation– Can now only handle
history length of three
Lecture 4: Correlated Branch Predictors
PC Hash
7
3
18
Two-Level Predictor Organization• Branch History Table
(BHT)– 2a entries– h-bit history per entry
• Pattern History Table (PHT)– 2b sets– 2h counters per set
• Total Size in bits– h2a + 2(b+h)2
Lecture 4: Correlated Branch Predictors
PC Hash a
b
h
Each entry is a 2-bit counter
19
Classes of Two-Level Predictors• h = 0 or a = 0 (Degenerate Case)
– Regular table of 2bC’s (b = log2counters)
• h > 0, a > 1– “Local History” 2-level predictor
• h > 0, a = 1– “Global History” 2-level predictor
Lecture 4: Correlated Branch Predictors
20
Global vs. Local Branch History• Local Behavior
– What is the predicted direction of Branch A given the outcomes of previous instances of Branch A?
• Global Behavior– What is the predicted direction of Branch Z
given the outcomes of all* previous branches A, B, …, X and Y?
* number of previous branches tracked limited by the history length
Lecture 4: Correlated Branch Predictors
21
Why Global Correlations Exist• Example: related branch
conditions
p = findNode(foo);if ( p is parent )
do something;
do other stuff; /* may contain more branches */
if ( p is a child )do something else;
Lecture 4: Correlated Branch Predictors
Outcome of secondbranch is always
opposite of the firstbranch
A:
B:
22
Other Global Correlations• Testing same/similar conditions
– code might test for NULL before a function call, and the function might test for NULL again
– in some cases it may be faster to recompute a condition rather than save a previous computation in memory and re-load it
– partial correlations: one branch could test for cond1, and another branch could test for cond1 && cond2 (if cond1 is false, then the second branch can be predicted as false)
– multiple correlations: one branch tests cond1, a second tests cond2, and a third tests cond1 cond2 (which can always be predicted if the first two branches are known).
Lecture 4: Correlated Branch Predictors
23
A Global-History Predictor
Lecture 4: Correlated Branch Predictors
PC Hash
b
h
Single global branchhistory register (BHR)
PC Hash
bh
b+h
24
Similar Tradeoff Between B and H• For fixed number of counters
– Larger h Smaller b• Larger h longer history
– able to capture more patterns– longer warm-up/training time
• Smaller b more branches map to same set of counters
– more interference
– Larger b Smaller h• just the opposite…
Lecture 4: Correlated Branch Predictors
25
Motivation for Combined Indexing• Not all 2h “states” are used
– (TTNN)* only uses half of the states for a history length of 3, and only ¼ of the states for a history length of 4
– (TN)* only uses two states no matter how long the history length is
• Not all bits of the PC are uniformly distributed
• Not all bits of the history are uniformly likely to be correlated– more recent history more likely to be strongly
correlated
Lecture 4: Correlated Branch Predictors
26
Combined Index Example: gshare• S. McFarling (DEC-WRL TR, 1993)
Lecture 4: Correlated Branch Predictors
PC Hash
kk
XOR
k = log2counters
27
Gshare exampleBranchAddress
GlobalHistory
Gselect4/4
Gshare8/8
00000000 00000001 00000001 00000001
00000000 00000000 00000000 00000000
11111111 00000000 11110000 11111111
11111111 10000000 11110000 01111111
Lecture 4: Correlated Branch Predictors
Insufficient historyleads to a conflict
28
Some Interference May Be Tolerable• Branch A: always not-
taken• Branch B: always taken• Branch C: TNTNTN…• Branch D: TTNNTTNN…
Lecture 4: Correlated Branch Predictors
3
0
3
0
3
0
0
3
000
111
010
101
001
011
100
110
29
And Then It Might Not• Branch X: TTTNTTTN…• Branch Y: TNTNTN…• Branch Z: TTTT…
Lecture 4: Correlated Branch Predictors
000
111
010
101
001
011
100
110
0
3
3
3
3?
?
30
Interference Reducing Predictors• There are patterns and asymmetries in
branches• Not all patterns occur with same frequency• Branches have biases• This lecture:
– Bi-Mode (Lee et al., MICRO 97)– gskewed (Michaud et al., ISCA 97)
• These are global history predictors, but the ideas can be applied to other types of predictors
Lecture 4: Correlated Branch Predictors
31
Gskewed idea• Interference occurs because two (or more)
branches hash to the same index• A different hash function can prevent this
collision– but may cause other collisions
• Use multiple hash functions such that a collision can only occur in a few cases– use a majority vote to make final decision
Lecture 4: Correlated Branch Predictors
32
Gskewed organization
Lecture 4: Correlated Branch Predictors
PC
Global Histhash1 hash2 hash3
maj
prediction
PH
T1
PH
T2
PH
T3
if hash1(x) = hash1(y)then:
hash2(x) hash2(y)hash3(x) hash3(y)
33
Gskewed example
Lecture 4: Correlated Branch Predictors
A
B
maj
34
Combining Predictors• Some branches exhibit local history
correlations– ex. loop branches
• While others exhibit global history correlations– “spaghetti logic”, ex. if-elsif-elsif-elsif-else
branches
• Using a global history predictor prevents accurate prediction of branches exhibiting local history correlations
• And visa-versaLecture 4: Correlated Branch Predictors
35
Tournament Hybrid Predictors
Pred0 Pred1
MetaUpdat
e
---
Inc
Dec
---
Lecture 4: Correlated Branch Predictors
Pred0 Pred1Meta-
Predictor
Final Prediction
table of 2-/3-bit counters
If meta-counter MSB = 0,use pred0 else use pred1
36
Common Combinations• Global history + Local history• “easy” branches + global history
– 2bC and gshare• short history + long history
• Many types of behavior, many combinations
Lecture 4: Correlated Branch Predictors
37
Multi-Hybrids• Why only combine two predictors?
Lecture 4: Correlated Branch Predictors
M23
MM
prediction
M
prediction
• Tradeoff between making good individual predictions (P’s) vs. making good meta-predictions (M’s)– for a fixed hardware budget, improving one may
hurt the other
P3P2M01P1P0 P3P2P1P0
38
Prediction Fusion
• Selection discards information from n-1 predictors
• Fusion attempts to synthesize all information– more info to work with– possibly more junk to sort through
Lecture 4: Correlated Branch Predictors
M
prediction
P3
prediction
M
P2P1P0P3P2P1P0
39
Using Long Branch Histories• Long global history provides more context
for branch prediction/pattern matching– more potential sources of correlation
• Costs– For PHT-based approach, HW cost increases
exponentially: O(2h) counters– Training time increases, which may decrease
overall accuracy
Lecture 4: Correlated Branch Predictors
40
Predictor Training Time• Ex: prediction equals opposite for 2nd most
recent
Lecture 4: Correlated Branch Predictors
• Hist Len = 2• 4 states to train:
NN TNT TTN NTT N
• Hist Len = 3• 8 states to train:
NNN TNNT TNTN NNTT NTNN T…
41
Neural Branch Prediction• Uses “Perceptron” from classical machine
learning theory– simplest form of a neural-net (single-layer,
single-node)• Inputs are past branch outcomes• Compute weighted sum of inputs
– output is linear function of inputs– sign of output is used for the final prediction
Lecture 4: Correlated Branch Predictors
42
Perceptron Predictor
Lecture 4: Correlated Branch Predictors
1 0 1 0 0 1 0 1 1 1 0 0 1 1 0 0 0 0 1
1 -1 1 -1 -1 1 -1 1 1 1 -1 -1 1 1 -1 -1 -1 -1 1
xn x0x1x2xn-1
w0w1w2w3w4w5w6w7w8w9w10w11w12w13w14w15w16w17wn
xxx x x x x x x x x x x x x x x x x
Adder
0 prediction
“bias”
43
Perceptron Predictor (2)
• Magnitude of weight wi determines how correlated branch i is to the current branch
• Sign of weight determines postitive or negative correlation
• Ex. outcome is usually opposite as 5th oldest branch– w5 has large magnitude (L), but is negative
– if x5 is taken, then w5x5 = -L1 = -L• tends to make sum more negative (toward a NT
prediction)
– if x5 is not taken, then w5x5 = -L-1 = L
Lecture 4: Correlated Branch Predictors
44
Perceptron Predictor (3)• When actual branch outcome is known:
– if xi = outcome, then increment wi (positive correlation)
– if xi outcome, then decrement wi (negative correlation)
– for x0, increment if branch taken, decrement if NT
• “Done with training”– if |S wi| > q, then don’t update weights unless
mispred
Lecture 4: Correlated Branch Predictors
45
Perceptron Trains Quickly• If no correlation exists with branch i, then
wi will just get incremented and decremented back and forth, wi 0
• If correlation exists with branch j, then wj will be consistently incremented (or decremented) to have a large influence on the overall sum
Lecture 4: Correlated Branch Predictors
46
Linearly Inseparable Functions• Perceptron computes linear combination of
inputs• Can only learn linearly separable functions
Lecture 4: Correlated Branch Predictors
xi
xj
1-1-1
1N
N
N
Txi
xj
1-1-1
1N
T
T
N
f() = -3*xi -4*xj – 5
wi wj w0
• No values of wi, wj, w0 exist to satisfy these output
• No straight line exists that separates T’s from N’s
47
Overall Hardware Organization
Lecture 4: Correlated Branch Predictors
PC Hash
one set of weights
BHR…
adder…
prediction = sign(sum)
Size = (h+1)*k*n + h + Area(mult) + Area(adder)
h = history length, k = counter width, n = number of perceptrons in table
Table of weights
Table BHR
Multipliers
48
GEHL• GEometric History Length predictor
Lecture 4: Correlated Branch Predictors
very long branch history
h1 h2 h3 h4
PC
adder
prediction = sign(sum)
K-bit weights
L1L2 L3
L4
L(i) = ai-1 L(1)
History lengths form a geometric progression
49
PPM Predictors• PPM = Partial Pattern Matching
– Used in data compression– Idea: Use longest history necessary, but no longer
Lecture 4: Correlated Branch Predictors
Most Recent Oldest
2bc
Part
ial ta
gs
Part
ial ta
gs
Part
ial ta
gs
Part
ial ta
gsh1 h2 h3 h4
PC
= = = =
0 1
0 1
0 1
0 1
Pred
2bc
2bc
2bc
2bc
50
TAGE Predictor• Similar to PPM, but uses geometric history
lengths– Currently the most accurate type of branch
prediction algorithm
• References (www.jilp.org):– PPM: Michaud (CBP-1)– O-GEHL: Seznec (CBP-1)– TAGE: Seznec & Michaud (JILP)– L-TAGE: Seznec (CBP-2)
Lecture 4: Correlated Branch Predictors