Lec Jan22 2009
description
Transcript of Lec Jan22 2009
Anshul Kumar, CSE IITD
CSL718 : Pipelined ProcessorsCSL718 : Pipelined ProcessorsCSL718 : Pipelined Processors
Improving Branch Performance22nd Jan, 2009
Anshul Kumar, CSE IITD slide 2
Improving Branch PerformanceImproving Branch PerformanceImproving Branch Performance
• Branch Elimination– replace branch with other instructions
• Branch Speed Up– reduce time for computing CC and TIF
• Branch Prediction– guess the outcome and proceed, undo if necessary
• Branch Target Capture– make use of history
Anshul Kumar, CSE IITD slide 3
Branch EliminationBranch EliminationBranch Elimination
C
S
Use conditional instructions(predicated execution)
T
F
C : S
OP1BC CC = Z, ∗
+ 2
ADD R3, R2, R1OP2
OP1ADD R3, R2, R1, NZOP2
Anshul Kumar, CSE IITD slide 4
Branch Elimination - contd.Branch Elimination Branch Elimination -- contd.contd.
IF IF IF D AG DF DF DF EX EX
IF IF IF D AG TIF TIF TIF
IF IF IF D’ D AG
OP1
ADD/OP2
BC
CC
IF IF IF D AG DF DF DF EX EXADD(cond)
Anshul Kumar, CSE IITD slide 5
Improving Branch PerformanceImproving Branch PerformanceImproving Branch Performance
• Branch Elimination– replace branch with other instructions
• Branch Speed Up– reduce time for computing CC and TIF
• Branch Prediction– guess the outcome and proceed, undo if necessary
• Branch Target Capture– make use of history
Anshul Kumar, CSE IITD slide 6
Branch Speed Up : early target address generation
Branch Speed Up : Branch Speed Up : early target address generationearly target address generation
• Assume each instruction is Branch• Generate target address while decoding• If target in same page omit translation• After decoding discard target address if not
Branch
IF IF IF D TIF TIF TIFAGBC
Anshul Kumar, CSE IITD slide 7
Branch Speed Up : increase CC - branch gap Branch Speed Up : Branch Speed Up :
increase CC increase CC -- branch gapbranch gapIncrease the gap between condition checking
and branching• Early CC setting• Delayed branch
Anshul Kumar, CSE IITD slide 8
Early CC setting: insert n instructions (branch taken)
Early CC setting: Early CC setting: insert insert nn instructionsinstructions (branch taken)(branch taken)
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 6
I-1
T
I
T+1
CC
(Delay can be reduced withlarger target buffer)
n = 0
Anshul Kumar, CSE IITD slide 9
Early CC setting: insert n instructionsEarly CC setting: Early CC setting: insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 5
I-1
T
I
T+1
CCn = 1
IF IF D AG AG DF DF EX EXJ
Anshul Kumar, CSE IITD slide 10
Early CC setting: insert n instructionsEarly CC setting: Early CC setting: insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 4
I-1
T
I
T+1
CCn = 2
IF IF D AG AG DF DF EX EXJIF IF D AG AG DF DF EX EXK
Anshul Kumar, CSE IITD slide 11
Early CC setting: insert n instructionsEarly CC setting: Early CC setting: insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 4
I-1
T
I
T+1
CCn = 3
IF IF D AG AG DF DF EX EXJIF IF D AG AG DF DF EX EXK
IF IF D AG AG DF DF EX EXL
Anshul Kumar, CSE IITD slide 12
Early CC setting: insert n instructions (branch not taken)
Early CC setting: Early CC setting: insert insert nn instructionsinstructions (branch not taken)(branch not taken)
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF D
delay = 5
I-1
I+1
I
I+2
CCn = 0
Anshul Kumar, CSE IITD slide 13
Early CC setting: insert n instructionsEarly CC setting: Early CC setting: insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF D
delay = 4
I-1
I+1
I
I+2
CCn = 1
IF IF D AG AG DF DF EX EXJ
Anshul Kumar, CSE IITD slide 14
Early CC setting: insert n instructionsEarly CC setting: Early CC setting: insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF D
delay = 3
I-1
I+1
I
I+2
CCn = 2
IF IF D AG AG DF DF EX EXJIF IF D AG AG DF DF EX EXK
Anshul Kumar, CSE IITD slide 15
Early CC setting: insert n instructionsEarly CC setting: Early CC setting: insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF D
delay = 2
I-1
I+1
I
I+2
CCn = 3
IF IF D AG AG DF DF EX EXJIF IF D AG AG DF DF EX EXK
IF IF D AG AG DF DF EX EXL
Anshul Kumar, CSE IITD slide 16
Delayed Branch: insert n instructions (branch taken)
Delayed Branch: Delayed Branch: insert insert nn instructionsinstructions (branch taken)(branch taken)
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 6
I-1
T
I
T+1
CCn = 0
Anshul Kumar, CSE IITD slide 17
Delayed Branch : insert n instructionsDelayed Branch : Delayed Branch : insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG DF DF EX EX
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 5
I-1
T
J
T+1
CCn = 1
IF IF D AG AG TIF TIF I
Anshul Kumar, CSE IITD slide 18
Delayed Branch : insert n instructionsDelayed Branch : Delayed Branch : insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG DF DF EX EX
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 4
I-1
T
K
T+1
CCn = 2
IF IF D AG AG TIF TIFIIF IF D AG AG DF DF EX EXJ
Anshul Kumar, CSE IITD slide 19
Delayed Branch : insert n instructionsDelayed Branch : Delayed Branch : insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG DF DF EX EX
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 3
I-1
T
L
T+1
CCn = 3
IF IF D AG AG TIF TIF IIF IF D AG AG DF DF EX EXJ
IF IF D AG AG DF DF EX EXK
Anshul Kumar, CSE IITD slide 20
Delayed Branch : insert n instructions (branch not taken)
Delayed Branch : Delayed Branch : insert insert nn instructionsinstructions (branch not taken)(branch not taken)
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF D
delay = 5
I-1
I+1
I
I+2
CCn = 0
Anshul Kumar, CSE IITD slide 21
Delayed Branch : insert n instructionsDelayed Branch : Delayed Branch : insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG DF DF EX EX
IF IF D’ D AG
IF IF’ D’ IF D
delay = 4
I-1
I+1
J
I+2
CCn = 1
IF IF D AG AG TIF TIF I
Anshul Kumar, CSE IITD slide 22
Delayed Branch : insert n instructionsDelayed Branch : Delayed Branch : insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG DF DF EX EX
IF IF D’ D AG
IF IF’ D’ IF D
delay = 3
I-1
I+1
K
I+2
CCn = 2
IF IF D AG AG TIF TIF IIF IF D AG AG DF DF EX EXJ
Anshul Kumar, CSE IITD slide 23
Delayed Branch : insert n instructionsDelayed Branch : Delayed Branch : insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG DF DF EX EX
IF IF D’ D AG
IF IF’ D’ IF D
delay = 2
I-1
I+1
L
I+2
CCn = 3
IF IF D AG AG TIF TIFIIF IF D AG AG DF DF EX EXJ
IF IF D AG AG DF DF EX EXK
Anshul Kumar, CSE IITD slide 24
Summary - Branch Speed UpSummary Summary -- Branch Speed UpBranch Speed Up
n=0 n=1 n=2 n=3 n=4 n=5uncond 4 4 4 4 4 4cond (T) 6 5 4 4 4 4cond (I) 5 4 3 2 1 0uncond 4 3 2 1 0 0cond (T) 6 5 4 3 2 1cond (I) 5 4 3 2 1 0de
laye
dea
rly C
Cbr
anch
setti
ng
Anshul Kumar, CSE IITD slide 25
Improving Branch PerformanceImproving Branch PerformanceImproving Branch Performance
• Branch Elimination– replace branch with other instructions
• Branch Speed Up– reduce time for computing CC and TIF
• Branch Prediction– guess the outcome and proceed, undo if necessary
• Branch Target Capture– make use of history
Anshul Kumar, CSE IITD slide 26
Branch PredictionBranch PredictionBranch Prediction
• Treat conditional branches as unconditional branches / NOP
• Undo if necessaryStrategies:
– Fixed (always guess inline)– Static (guess on the basis of instruction type)– Dynamic (guess based on recent history)
Anshul Kumar, CSE IITD slide 27
Prediction based on statisticsPrediction based on statisticsPrediction based on statistics
Total 68.2% 72.2%
Instr % Branch
uncond 14.5 100%
cond 58 54%
loop 9.8 91%
call/ret 17.7 100%
Guess Correct
always 14.5%
never 27%
always 9%
always 17.7%
Guess Correct
always 14.5%
always 31%
always 9%
always 17.7%
Anshul Kumar, CSE IITD slide 28
Branch Prediction (guess inline, go inline) Branch PredictionBranch Prediction (guess inline, go inline)(guess inline, go inline)
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D
IF IF D
delay = 0
I-1
I+1
I
I+2
CC
Anshul Kumar, CSE IITD slide 29
Branch Prediction (guess inline, goto target) Branch PredictionBranch Prediction
(guess inline, (guess inline, gotogoto target)target)
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 6
I-1
T
I
T+1
CC
Anshul Kumar, CSE IITD slide 30
Branch Prediction (guess target, go inline) Branch PredictionBranch Prediction (guess target, go inline)(guess target, go inline)
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
D’ D
D’ D
delay = 5
I-1
I+1
I
I+2
CC
TD
Anshul Kumar, CSE IITD slide 31
Branch Prediction (guess target, goto target) Branch PredictionBranch Prediction
(guess target, (guess target, gotogoto target)target)
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 4
I-1
T
I
T+1
CC
Same as unconditional branch
Anshul Kumar, CSE IITD slide 32
Static prediction strategyStatic prediction strategyStatic prediction strategy
Let p = probability of taking branchguess target: delayt = 4 p + 5 (1 - p) = 5 - pguess inline: delayi = 6 p + 0 (1 - p) = 6 p⇒ if (delayt < delayi ) guess target
else guess inline(delayt < delayi ) ⇒ 5 - p < 6 p
⇒ p > 5/7 = .71
Anshul Kumar, CSE IITD slide 33
Static prediction strategy - thresholds for different instructions
Static prediction strategy Static prediction strategy -- thresholds for different instructionsthresholds for different instructions
actual → T Iguess T 4 5
↓
I 6 0guess target if 4 p + 5 (1 - p) < 6 p + 0 (1 - p)
i.e. p > .71
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIFI-1
I
CC
Anshul Kumar, CSE IITD slide 34
Static prediction strategy - thresholds for different instructions
Static prediction strategy Static prediction strategy -- thresholds for different instructionsthresholds for different instructions
actual → T Iguess T 4 6
↓
I 7 1guess target if 4 p + 6 (1 - p) < 7 p + 1 (1 - p)
i.e. p > .62
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF EX EXI-1
I
CC
Loop control
Anshul Kumar, CSE IITD slide 35
Static prediction strategy - thresholds for different instructions
Static prediction strategy Static prediction strategy -- thresholds for different instructionsthresholds for different instructions
actual → T Iguess T 3 5
↓
I 6 0guess target if 3 p + 5 (1 - p) < 6 p + 0 (1 - p)
i.e. p > .62
IF IF D AG AG DF DF EX EX
IF IF D AG TIF TIFI-1
I
CC
register address
Anshul Kumar, CSE IITD slide 36
Delayed Branch with NullificationDelayed Branch with NullificationDelayed Branch with Nullification
(Also called annulment )• Delay slot is used optionally• Branch instruction specifies the option• Option may be exercised based on
correctness of branch prediction• Helps in better utilization of delay slots
Anshul Kumar, CSE IITD slide 37
Variants of NullificationVariants of NullificationVariants of Nullification
D D
bc
D D
bc
D D
bc
D D
bc
1.No annulment
(branch-with-execute)
2.Annul if not taken
(branch-or-skip)
3.AnnulIf taken
(branch-with-skip)
4.Annulalways
Examples•SPARC: 1, 2•MC88100: 1, 4•i860: 2, 4•HP PA: 1, 2, 3
Anshul Kumar, CSE IITD slide 38
Annulment illustrationAnnulment illustrationAnnulment illustration
bc
D
bc
D
use branch-or-skip use branch-with-skip
Anshul Kumar, CSE IITD slide 39
Dynamic Branch Prediction - basic idea
Dynamic Branch Prediction Dynamic Branch Prediction -- basic ideabasic idea
Predict based on the history of previous branch
loop: xxx 2 mispredictionsxxx for everyxxx occurrencexxxBC loop
Anshul Kumar, CSE IITD slide 40
Dynamic Branch Prediction - 2 bit prediction scheme
Dynamic Branch Prediction Dynamic Branch Prediction -- 2 bit prediction scheme2 bit prediction scheme
0 1
2 3
N
T
N
T
N
T
T N0/1 3/2
predict taken predict not taken
Anshul Kumar, CSE IITD slide 41
Dynamic Branch Prediction - Bimodal predictor
Dynamic Branch Prediction Dynamic Branch Prediction -- Bimodal predictorBimodal predictor
Maintain saturating counters
0 1 2 3
T
N
T
N
T
N
TN
Anshul Kumar, CSE IITD slide 42
Dynamic Branch Prediction - History of last n occurrences
Dynamic Branch Prediction Dynamic Branch Prediction -- History of last History of last nn occurrencesoccurrences
1 1 0
current entry
1 1 1
updated entry
outcome of lastthree occurrencesof this branch
0 : not taken1 : taken
prediction using majority decision
actual outcome‘taken’
Anshul Kumar, CSE IITD slide 43
Dynamic Branch Prediction - storing prediction counters
Dynamic Branch Prediction Dynamic Branch Prediction -- storing prediction countersstoring prediction counters
store in separate buffer or in cache directory
CACHEdirectory storage
cache line
counterOne counter per branch orOne counter per cache line -
merge results if multiple branches
Anshul Kumar, CSE IITD slide 44
Correct guesses vs. history lengthCorrect guesses vs. history lengthCorrect guesses vs. history length
n Compiler Business Scientific Supervisor0 64.1 64.4 70.4 54.01 91.9 95.2 86.6 79.72 93.3 96.5 90.8 83.43 93.7 96.6 91.0 83.54 94.5 96.8 91.8 83.75 94.7 97.0 92.0 83.9
Anshul Kumar, CSE IITD slide 45
Two-Level PredictionTwoTwo--Level PredictionLevel Prediction
• Uses two levels of information to make a direction prediction– Branch History Table (BHT) - last n
occurrences– Pattern History Table (PHT) - saturating 2 bit
counters• Captures patterned behavior of branches
– Groups of branches are correlated– Particular branches have particular behavior
Anshul Kumar, CSE IITD slide 46
Correlation between branchesCorrelation between branchesCorrelation between branches
B1: if (x)...
B2: if (y)...
z = x && yB3: if (z)
...
• B3 can be predicted with 100% accuracy based on the outcomes of B1 and B2
Anshul Kumar, CSE IITD slide 47
PHT
T/NT
1 0 1 1 0GBHR
PHT
PC
T/NT
BHT
1 1 0 1 0
1 1 1 0 0
0 0 1 1 1
0 1 1 1 1
Global Predictor Local Predictor
Some Two-level PredictorsSome TwoSome Two--level Predictorslevel Predictors
bits from PC and BHT can be combined to index PHT
Anshul Kumar, CSE IITD slide 48
Two-level Predictor ClassificationTwoTwo--level Predictor Classificationlevel Predictor Classification
• Yeh and Patt 3-letter naming scheme– Type of history collected
• G (global), P (per branch), S (per set)
– PHT type• A (adaptive), S (static)
– PHT organization• g (global), p (per branch), s (per set)
• Examples - GAs, PAp etc.
Anshul Kumar, CSE IITD slide 49
Improving Branch PerformanceImproving Branch PerformanceImproving Branch Performance
• Branch Elimination– replace branch with other instructions
• Branch Speed Up– reduce time for computing CC and TIF
• Branch Prediction– guess the outcome and proceed, undo if necessary
• Branch Target Capture– make use of history
Anshul Kumar, CSE IITD slide 50
Branch Target CaptureBranch Target CaptureBranch Target Capture
• Branch Target Buffer (BTB)• Target Instruction Buffer (TIB)
instr addr pred stats targettarget addrtarget instr
prob of target change < 5%
Anshul Kumar, CSE IITD slide 51
BTB PerformanceBTB PerformanceBTB Performance
BTB missgo inline
inline
BTB hitgo to target
decision
result target inline target
delay 0 6 5 0
.4 .6
.8 .2 .2 .8
.4*.8*0 + .4*.2*6 + .6*.2*5 + .6*.8*0= 1.08
Anshul Kumar, CSE IITD slide 52
Dynamic information about branchDynamic information about branchDynamic information about branch
• Previous branch decisions
• Explicit prediction• Stored in cache
directory Branch History Table (BHT)
• Previous target address / instruction
• Implicit prediction• Stored in separate buffer Branch Target Buffer (BTB)Br Target Addr Cache (BTAC)
Target Instr Buffer (TIB)Br Target Instr Cache (BTIC)
These two can be combined
Anshul Kumar, CSE IITD slide 53
Storing prediction infoStoring prediction infoStoring prediction info
In cache
directory storage
cache line
counter
instr addr pred stats target
In separatebuffer
Anshul Kumar, CSE IITD slide 54
Combined prediction mechanismCombined prediction mechanismCombined prediction mechanism
• Explicit : use history bits• Implicit : use BTB hit/miss
– hit ⇒ go to target, miss ⇒ go inline• Combined : BTB hit/miss followed by
explicit prediction using history bits.– commonly used :
hit ⇒ go to target, miss ⇒ explicit prediction– alternatively :
miss ⇒ go inline, hit ⇒ explicit prediction
Anshul Kumar, CSE IITD slide 55
Combined predictionCombined predictionCombined prediction
BTB missI
BTB hit BTB miss
I
BTB hitT
I T
expl predict
Prediction ⇒ T: Target, I: Inline Actual outcome ⇒ T: Target, I: Inline
I T I T
T
I T I T
Iexpl predict
T
I T
Anshul Kumar, CSE IITD slide 56
Structure of TablesStructure of TablesStructure of Tables
Instruction fetch path with• BHT• BTAC• BTIC
Anshul Kumar, CSE IITD slide 57
Compute/fetch schemeCompute/fetch schemeCompute/fetch scheme
I - cache
IFA R
+
InstructionFetch address
ComputeBTA
BTAIIFA
Next sequentialaddress
A I I + 1 I + 2 I + 3
BTI BTI+1 BTI+2 BTI+3
(no dynamic branch prediction)
Anshul Kumar, CSE IITD slide 58
BHT (Branch History Table)BHT (Branch History Table)BHT (Branch History Table)
I-cache16 K
4-way set assocBHT
Predictionlogic
2 2 2 2History bits
InstructionFetch address
2 2 2 2
128 x 4entries
128 x 4 lines8 instr/line
4 instr/cycle
decode queue
issue queue
4 x 1 instr
4 x 1 instr
Taken / not takenBTA for a taken guess
Anshul Kumar, CSE IITD slide 59
BTAC schemeBTAC schemeBTAC scheme
I - cache
IFA R
+
InstructionFetch addressBTA
IIFA
Next sequentialaddress
A I I + 1 I + 2 I + 3
BTI BTI+1 BTI+2 BTI+3
BTAC
BA BTA
Anshul Kumar, CSE IITD slide 60
BTIC scheme - 1BTIC scheme BTIC scheme -- 11
I - cache
IFA R
+
InstructionFetch addressBTA
IIFA
Next sequentialaddress
A I
BTIC
BA BTI BTA+
To decoder
Anshul Kumar, CSE IITD slide 61
BTIC scheme - 2BTIC scheme BTIC scheme -- 22
I - cache
IFA R
+
InstructionFetch addressBTA+
IIFA
Next sequentialaddress
A I I+1
BTIC
BA BTI BTI+1
To decoder
computed
Anshul Kumar, CSE IITD slide 62
ReferencesReferencesReferences1. M.J. Flynn, "Computer Architecture :
Pipelined and Parallel Processor Design", Narosa Publishing House/ Jones and Bartlett, 1996.
2. D. Sima, T. Fountain, P. Kacsuk, "Advanced Computer Architectures : A Design Space Approach", Addison Wesley, 1997.
3. D.A. Patterson, J.L. Hennessy, "Computer Architecture : A Quantitative Approach", Morgan Kaufmann Publishers, 2006.