CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced...
Transcript of CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced...
![Page 2: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/2.jpg)
CA226 — AdvancedComputer Architecture
2
Types of Hazard
Structural hazardsresource conflicts;hardware cannot support all instruction combinations simultaneously
Data hazardswhen one instruction depends upon the result (which is not yet available) of aprevious instruction
Control hazardswhen the address of the next instruction cannot be determined immediately(branch, jump instructions — today)
![Page 3: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/3.jpg)
CA226 — AdvancedComputer Architecture
3
Control HazardsControl hazards:
• arise from pipelining of branch (and jump) instructions
As described thus far, branching decisions:
• are made during the Mem stage of the pipeline
A naive approach:
• stall until branch decision is known
![Page 4: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/4.jpg)
CA226 — AdvancedComputer Architecture
4
TerminologyWhenever we encounter a branch:
• it is:
• either taken, or not taken
• the cost may be different in each case
![Page 5: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/5.jpg)
CA226 — AdvancedComputer Architecture
5
Control Hazards
![Page 6: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/6.jpg)
CA226 — AdvancedComputer Architecture
6
Naive Branching
1 2 3 4 5 6 7
branch IF ID Ex Mem** WB
branch+4 stall stall stall **IF ID Ex
branch+8 stall stall stall IF ID
branch IF ID Ex Mem** WB
target stall stall stall **IF ID Ex
target+4 stall stall stall IF ID
![Page 7: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/7.jpg)
CA226 — AdvancedComputer Architecture
7
Unfortunately …This will result in:
• the pipeline being stalled for three cycles every time a branch is encountered
• and branch instructions are common
![Page 8: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/8.jpg)
CA226 — AdvancedComputer Architecture
8
…What might help is:
• a prediction
Predict that a branch will either be:
• taken, or not taken
![Page 9: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/9.jpg)
CA226 — AdvancedComputer Architecture
9
…Easiest thing to do:
• predict branch not taken
• simply allow subsequent instructions to continue to flow into the pipeline
![Page 10: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/10.jpg)
CA226 — AdvancedComputer Architecture
10
Predict Not Taken
Table 1. And branch is indeed not taken:
branch IF ID Ex Mem** WB
branch+4 IF ID Ex **Mem WB
branch+8 IF ID Ex Mem WB
branch+12 IF ID Ex Mem
Perfect!
• But what if the branch is in fact taken?
![Page 11: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/11.jpg)
CA226 — AdvancedComputer Architecture
11
Predict Not Taken
Table 2. But branch is in fact taken:
branch IF ID Ex Mem** WB
branch+4 IF ID Ex **Mem WB
branch+8 IF ID **Ex Mem WB
branch+12 IF **ID Ex Mem
target **IF
![Page 12: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/12.jpg)
CA226 — AdvancedComputer Architecture
12
Predict Not Taken
Table 3. But branch is in fact taken:
branch IF ID Ex Mem** WB
branch+4 IF ID Ex **nop nop
branch+8 IF ID **nop nop nop
branch+12 IF **nop nop nop
target **IF
Observe:
• none of the subsequent instructions has yet changed memory or any registersthat’s helpful!replace them with nop instructions
(Still a stall of three cycles when branch taken.)
![Page 13: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/13.jpg)
CA226 — AdvancedComputer Architecture
13
Slightly BetterWhen a branch instruction is detected:
• route the Branch Taken condition:
• from Ex(instead of from Mem)
• to ID(instead of to IF)
![Page 14: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/14.jpg)
CA226 — AdvancedComputer Architecture
14
MIPS Pipeline
![Page 15: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/15.jpg)
CA226 — AdvancedComputer Architecture
15
Example
Table 4. Branch not taken:
branch IF ID Ex** Mem WB
branch+4 IF stall **ID Ex Mem WB
branch+8 IF ID Ex Mem
branch+12 IF ID Ex
Note
We save two stalls:one because we learn the decision one cycle sooner, andone because we allow the subsequent instruction into IF
![Page 16: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/16.jpg)
CA226 — AdvancedComputer Architecture
16
Example — Branch Taken
Table 5. Branch taken:
branch IF ID Ex** Mem WB
branch+4 IF nop **ID nop nop nop
target **IF ID Ex Mem
target+4 IF ID Ex
Note
An effective stall of two cycles, but one better than before, because we learn if thebranch is taken one cycle sooner.
![Page 17: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/17.jpg)
CA226 — AdvancedComputer Architecture
17
Where do we stand?If a branch is not taken:
• we have a stall of one cycle
If a branch is taken:
• we have a stall of two cycles
![Page 18: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/18.jpg)
CA226 — AdvancedComputer Architecture
18
In PracticeUnfortunately:
• branches are commonand most branches are taken(which is indeed unfortunate)
![Page 19: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/19.jpg)
CA226 — AdvancedComputer Architecture
19
In PracticeAdd additional hardware in ID:
• detect branches
• decode the target address:target = IF/ID.nPC + (sign-extend(Regs[IF/ID.IR(0..15)]) <<2)(so we need at leastat least an adder)
• calculate whether the branch is taken:we need to:
• test equality, and for zero(and perhaps a couple of other tests)
![Page 20: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/20.jpg)
CA226 — AdvancedComputer Architecture
20
..
![Page 21: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/21.jpg)
CA226 — AdvancedComputer Architecture
21
..So:
• branching is so common and the cost of stalls so great,
• that it is worth the cost and complexity of additional hardware in the ID pipelinestage
![Page 22: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/22.jpg)
CA226 — AdvancedComputer Architecture
22
..So:
• we determine one stage earlier still whether a branch is taken or not(in ID, now, instead of in Ex)
So, we have:
• no stall if the branch is not taken, and
• a one-cycle stall if the branch is taken
![Page 23: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/23.jpg)
CA226 — AdvancedComputer Architecture
23
Now…
Table 6. Branch not taken:
1 2 3 4 5 6 7
branch IF ID** Ex Mem WB
branch+4 IF **ID Ex Mem WB
![Page 24: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/24.jpg)
CA226 — AdvancedComputer Architecture
24
Now…
Table 7. Branch taken:
branch IF ID** Ex Mem WB
branch+4 IF **nop nop nop nop
target **IF ID Ex Mem WB
target+4 IF ID Ex Mem
![Page 25: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/25.jpg)
CA226 — AdvancedComputer Architecture
25
..Try these in the simulator ….
bnez r0,target ; no stalldaddi r1,r0,1
beqz r0,target ; branch taken, stall of 1 cycledaddi r1,r1,1
Note to self:
• see branch.s
![Page 26: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/26.jpg)
CA226 — AdvancedComputer Architecture
26
Predict Not TakenIn effect:
• we’re guessing, here, that the branch will not be taken
• so this strategy is known as predict not taken
![Page 27: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/27.jpg)
CA226 — AdvancedComputer Architecture
27
..So:
• no stall if the branch is not taken
• a stall of one cycle if the branch is taken
What might the average number of stall cycles for branch instructions be?
![Page 28: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/28.jpg)
CA226 — AdvancedComputer Architecture
28
Unfortunately, …The common case in practice is …
• that the branch is taken!
• so the average number of stalls per branch, in practice, approaches 1
![Page 29: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/29.jpg)
CA226 — AdvancedComputer Architecture
29
Because …for (i=0; i<N; i+=1){ // do stuff}
Whenever we have such a loop:
• the branch is taken more often than not taken
![Page 30: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/30.jpg)
CA226 — AdvancedComputer Architecture
30
Because … daddi r1,r0,0 ; i=0; beq r1,r2,done ; if (i==N) goto done;loop: ; do stuff daddi r1,r1,1 ; i+=1; bne r1,r2,done ; if (i!=N) goto loop;done:
The bne instruction:
• is repeated about N times so the branch is usually taken,so the stalls-per-branch approaches 1
![Page 31: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/31.jpg)
CA226 — AdvancedComputer Architecture
31
Might we do better?A predict branch taken strategy:
• would be helpful
• unfortunately, this is not possible on MIPS:
• we only learn the target address after the ID stage
• so a cycle has already been wasted
![Page 32: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/32.jpg)
CA226 — AdvancedComputer Architecture
32
Might we do better?A predict branch taken strategy:
• would be helpful
• unfortunately, this is not possible on MIPS:
• we only learn the target address after the ID stage
• so a cycle has already been wasted
Hmm:
• Wasted.
• Or is it?
![Page 33: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/33.jpg)
CA226 — AdvancedComputer Architecture
33
..How might we:
• make good use of that "wasted" cycle?
![Page 34: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/34.jpg)
CA226 — AdvancedComputer Architecture
34
The "Branch Delay Slot"A branch delay slot is:
• the instruction following any branch (or jump) instruction
Approach:
• the instruction in the delay slot is always executed,whether the branch is taken or not
![Page 35: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/35.jpg)
CA226 — AdvancedComputer Architecture
35
The "Delay Slot"
Table 8. Branch not taken:
branch IF ID** Ex Mem WB
branch+4 (BDS) IF **ID Ex Mem WB
branch+8 IF ID Ex Mem WB
The instruction after the branch:
• is always executed,good!
![Page 36: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/36.jpg)
CA226 — AdvancedComputer Architecture
36
The "Delay Slot"
Table 9. Branch taken:
branch IF ID** Ex Mem WB
branch+4 (BDS) IF ID Ex Mem WB
target **IF ID Ex Mem WB
target+4 IF ID Ex Mem
The instruction after the branch:
• is always executed,"branch+4" is executed anyway,no stall!
![Page 37: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/37.jpg)
CA226 — AdvancedComputer Architecture
37
The "Delay Slot"On such hardware, compilers:
• must insert a suitable instruction into the delay slot
• or, if that is not possible, then a nop (poor solution)
![Page 38: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/38.jpg)
CA226 — AdvancedComputer Architecture
38
Some Cases — nop
This:
dadd r1,r2,r3 bnez r2,somewhere
Becomes:
dadd r1,r2,r3 bnez r2,somewhere nop ; poor solution, effectively a stall
Note
Correct, but not great.The nop is in effect a stall.
![Page 39: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/39.jpg)
CA226 — AdvancedComputer Architecture
39
Some Cases — Independent InstructionThis:
dadd r1,r2,r3 bnez r2,somewhere
Becomes:
bnez r2,somewhere dadd r1,r2,r3 ; the branch does not depend on r1
![Page 40: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/40.jpg)
CA226 — AdvancedComputer Architecture
40
Some Cases — Temporary RegistersThis:
dadd r1,r2,r3 or r20,r2,r3 ; r20 is temporary register within this loop bnez r1,target ...target: dsub r4,r5,r6
Becomes:
dadd r1,r2,r3 bnez r1,target or r20,r2,r3 ; doesn't matter if executed ... ; again, the delay cycle is effectively losttarget: ; but only if the branch is taken! (no nop) dsub r4,r5,r6
![Page 41: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/41.jpg)
CA226 — AdvancedComputer Architecture
41
Loop — Far BetterThis:
target: dsub r4,r5,r6 ; assume r4 is a temporary register ... ; do stuff daddi r1,r1,-1 bnez r1,target ; branch depends on r1 nop ; BDS: we want to use this slot
![Page 42: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/42.jpg)
CA226 — AdvancedComputer Architecture
42
Loop — Far BetterThis:
target: dsub r4,r5,r6 ; assume r4 is a temporary register ... ; do stuff daddi r1,r1,-1 bnez r1,target
Becomes:
dsub r4,r5,r6 ; moved uptarget: ... ; do stuff daddi r1,r1,-1 bnez r1,target dsub r4,r5,r6 ; repeated, from above
![Page 43: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/43.jpg)
CA226 — AdvancedComputer Architecture
43
..Try these in the simulator, again, ….
bnez r0,target ; no stalldaddi r1,r0,1
beqz r0,target ; branch taken, no stall with branch delay slotdaddi r1,r1,1
Note
This time with the branch delay slot enabled.
![Page 44: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/44.jpg)
CA226 — AdvancedComputer Architecture
44
More Insurmountable StallsExample:
dadd r1,r2,r3 bnez r1,target ; stall one cycle
ld r1,N(r0) bnez r1,target ; stall two cycles
The branch:
• depends upon an immediately preceding arithmetic instruction
• depends upon an immediately preceding load (stall two cycles)
![Page 45: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/45.jpg)
CA226 — AdvancedComputer Architecture
45
Another Insurmountable Stall
Table 10. If branch taken is resolved in Ex:
dadd r1,r2,r3 IF ID Ex** Mem WB
bnez r1,target IF ID **Ex Mem WB
delay slot IF ID Ex Mem WB
No problem:
• r1 can be forwarded, as before
![Page 46: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/46.jpg)
CA226 — AdvancedComputer Architecture
46
Another Insurmountable Stall
Table 11. If branch taken is resolved in ID:
dadd r1,r2,r3 IF ID Ex** Mem WB
bnez r1,target IF **ID Ex Mem WB
delay slot IF ID Ex Mem WB
Oops:
• forwarding can’t help here
![Page 47: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/47.jpg)
CA226 — AdvancedComputer Architecture
47
Another Insurmountable Stall
Table 12. If branch taken is resolved in ID:
dadd r1,r2,r3 IF ID Ex** Mem WB
bnez r1,target IF stall **ID Ex Mem WB
delay slot IF ID Ex Mem
Such a RAW dependency:
• results in a stall of one cycle
(Try to find another instruction which can be inserted in between.)
![Page 48: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/48.jpg)
CA226 — AdvancedComputer Architecture
48
JumpsJumps:
• are handled the same way:we learn the target address in ID,the instruction in the delay slot is always executed
![Page 49: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/49.jpg)
CA226 — AdvancedComputer Architecture
49
Jumps
Table 13. Jumps are always taken:
jump IF ID** Ex Mem WB
delay slot IF ID Ex Mem WB
target **IF ID Ex Mem WB
target+8 IF ID Ex Mem
![Page 50: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/50.jpg)
CA226 — AdvancedComputer Architecture
50
ExampleNote to self:
• take a look at ../winmips64/reverse-with-nops.s
![Page 51: CA226 — Advanced Computer Architectureray/teaching/CA226/05b-hazards.pdfCA226 — Advanced Computer Architecture 12 Predict Not Taken Table 3. But branch is in fact taken: branch](https://reader030.fdocuments.net/reader030/viewer/2022040319/5e3ea0ca1dc407768c604ac2/html5/thumbnails/51.jpg)
CA226 — AdvancedComputer Architecture
51
Done<script> (function() { var mathjax = 'mathjax/MathJax.js?config=asciimath'; // var mathjax= 'http://smblott.computing.dcu.ie/mathjax/MathJax.js?config=asciimath'; var element= document.createElement('script'); element.async = true; element.src = mathjax;element.type = 'text/javascript'; (document.getElementsByTagName('HEAD')[0]||document.body).appendChild(element); })(); </script>