Performance Enhancement with Pipelining
-
Upload
aneesh-raveendran -
Category
Technology
-
view
161 -
download
0
description
Transcript of Performance Enhancement with Pipelining
![Page 1: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/1.jpg)
Aneesh RaveendranCentre for Development of Advanced
Computing, INDIA
![Page 2: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/2.jpg)
• What is pipelining ?• Pipeline Taxonomies• Instruction Pipelines• MIPS Instruction Pipeline• Pipeline Hazards• MIPS Pipelined Datapath• Load Word Instruction Example• Pipeline Datapath Example• Pipeline Control• Pipeline Instruction Example
![Page 3: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/3.jpg)
• Pipeline Hazards• Control Hazards• Data Hazards• Detecting Data Hazards• Resolving Data Hazards• Forwarding Example• Stalling Example• Branch Hazards• Branching Example• Key terms
![Page 4: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/4.jpg)
• There are two main ways to increase the performance of a processor through high-level system architecture• Increasing the memory access speed• Increasing the number of supported concurrent
operations• Pipelining !• Parallelism ?
• Pipelining is the process by which instructions are parallelized over several overlapping stages of execution, in order to maximize datapath efficiency
![Page 5: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/5.jpg)
• Pipelining is analogous to many everyday scenarios
• Car manufacturing process• Batch laundry jobs• Basically, any assembly-line operation applies
• Two important concepts:• New inputs are accepted at one end before previously
accepted inputs appear as outputs at the other end;• The number of operations performed per second is
increased, even though the elapsed time needed to perform any one operation remains the same
![Page 6: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/6.jpg)
Looking at the textbook’s example, we have a 4-stage pipeline of laundry tasks:
1. Place one dirty load of clothes into washer
2. Place the washed clothes into a dryer
3. Place a dry load on a table and fold
4. Put the clothes away
Graphically speaking:• Sequential (top) vs.• Pipelined (bottom)
execution
![Page 7: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/7.jpg)
• There are two types of pipelines used in computer systems
• Arithmetic pipelines• Used to pipeline data intensive functionalities
• Instruction pipelines• Used to pipeline the basic instruction fetch and execute
sequence
• Other classifications include• Linear vs. nonlinear pipelines
• Presence (or lack) of feedforward and feedback paths between stages
• Static vs. dynamic pipelines• Dynamic pipelines are multifunctional, taking on a different form
depending on the function being executed
• Scalar vs. vector pipelines• Vector pipelines specifically target computations using vector
data
![Page 8: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/8.jpg)
• Let us now introduce the pipeline we’re working with
• It’s a 5-stage instruction, linear, static and scalar pipeline, consisting of the following steps:
• Fetch instruction from Memory (IF)• Read registers while decoding the instruction (ID)• Execute the operation or calculate an address (EX)• Access an operand in data memory (MEM)• Write the result into a register (WB)
• Again, theoretically, pipeline speedup = number of stages in pipeline
![Page 9: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/9.jpg)
Inst. Fetch (2ns), Reg. read/write (1ns), ALU op. (2ns), Data access (2ns)
![Page 10: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/10.jpg)
Clk
Cycle 1
Multiple Cycle Implementation:
Ifetch Reg Exec Mem Wr
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Load Ifetch Reg Exec Mem Wr
Ifetch Reg Exec Mem
Load Store
Pipeline Implementation:
Ifetch Reg Exec Mem WrStore
Clk
Single Cycle Implementation:
Load Store Waste
Ifetch
R-type
Ifetch Reg Exec Mem WrR-type
Cycle 1 Cycle 2
![Page 11: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/11.jpg)
• Suppose • 100 instructions are executed• The single cycle machine has a cycle time of 45 ns• The multicycle and pipeline machines have cycle times of 10
ns• The multicycle machine has a CPI of 4.6
• Single Cycle Machine• 45 ns/cycle x 1 CPI x 100 inst = 4500 ns
• Multicycle Machine• 10 ns/cycle x 4.6 CPI x 100 inst = 4600 ns
• Ideal pipelined machine• 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns
• Ideal pipelined vs. single cycle speedup• 4500 ns / 1040 ns = 4.33
• What has not yet been considered?
![Page 12: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/12.jpg)
• What makes it easy• all instructions are the same length• just a few instruction formats• memory operands appear only in loads and stores
• What makes it hard?• structural hazards: suppose we had only one memory• control hazards: need to worry about branch instructions• data hazards: an instruction depends on a previous instruction
• We’ll build a simple pipeline and look at these issues
![Page 13: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/13.jpg)
• structural hazards: attempt to use the same resource two different ways at the same time• E.g., two instructions try to read the same memory at the
same time
• data hazards: attempt to use item before it is ready• instruction depends on result of prior instruction still in the
pipelineadd r1, r2, r3sub r4, r2, r1
• control hazards: attempt to make a decision before condition is evaulated• branch instructions
beq r1, loopadd r1, r2, r3
• Can always resolve hazards by waiting• pipeline control must detect the hazard• take action (or delay action) to resolve hazards
![Page 14: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/14.jpg)
What do we need to split the datapath into stages ?
![Page 15: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/15.jpg)
Pipeline registers (buffers) are similar to multicycle processor design
![Page 16: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/16.jpg)
Instruction fetch stage
![Page 17: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/17.jpg)
Instruction decode and register file read stage
![Page 18: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/18.jpg)
Execute or address calculation stage
![Page 19: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/19.jpg)
Memory access stage
![Page 20: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/20.jpg)
Write back stage
![Page 21: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/21.jpg)
Write register number comes from the MEM/WB pipeline register along with the data
![Page 22: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/22.jpg)
Multiple-clock cycle (vs. single-clock cycle) pipelined diagrams
![Page 23: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/23.jpg)
Single-cycle pipeline diagram with one instruction on the pipeline
![Page 24: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/24.jpg)
Single-cycle pipeline diagram with two instructions on the pipeline
![Page 25: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/25.jpg)
• What control signals are required ?• First, notice that the pipeline registers are written every
clock cycle, hence do not require explicit control signals, otherwise:
• Instruction fetch and PC increment• Again, asserted at every clock cycle
• Instruction decode and register file read• Again, asserted at every clock cycle
• Execution and address calculation• Need to select the result register, the ALU operation, and either
Read data 2 or the sign-extended immediate for the ALU
• Memory access• Need to read from memory, write to memory or complete
branch
• Write back• Need to send back either ALU result or memory value to the
register file
![Page 26: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/26.jpg)
![Page 27: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/27.jpg)
Execution/Address Calculation stage control
linesMemory access stage
control lines
Write-back stage control
lines
InstructionReg Dst
ALU Op1
ALU Op0
ALU Src
Branch
Mem Read
Mem Write
Reg write
Mem to Reg
R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X
![Page 28: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/28.jpg)
![Page 29: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/29.jpg)
![Page 30: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/30.jpg)
![Page 31: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/31.jpg)
![Page 32: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/32.jpg)
![Page 33: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/33.jpg)
![Page 34: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/34.jpg)
![Page 35: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/35.jpg)
![Page 36: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/36.jpg)
![Page 37: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/37.jpg)
![Page 38: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/38.jpg)
• Structural hazard• Occurs when a combination of instructions is not supported by the
datapath• For example, a unified memory unit would need to be accessed in
stages 1 (IF) and 4 (MEM), which would cause a contention• Pipeline outright fails in the presence of structural hazards
• Control hazard• Occurs when a decision is made based on the results of one
instructions, while others are executing• For example, a branch instruction is either taken or not• Solutions that exist are stalling and predicting
• Data hazard• Occurs when an instruction depends on the results of an instruction
resident on the pipeline• For example, adding two register contents and storing their result into
a third register, then using that register’s contents for another operation
• Solutions that exist are based on forwarding
![Page 39: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/39.jpg)
• Three major solutions• Stall• Predict• Delayed branch slot
• Stalling involves always waiting for the PC to be updated with the correct address before moving on
• A pipeline stall (or bubble) allows us to perform this wait• Quite costly, as we have to stall even if the branch fails
![Page 40: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/40.jpg)
• Predicting involves guessing whether the branch is taken or not, and acting on that guess
• If correct, then proceed with normal pipeline execution• If incorrect, then stall pipeline execution
![Page 41: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/41.jpg)
• Delayed branch involves executing the next sequential instruction with the branch taking place after that delayed branch slot
• The assembler automatically adjusts the instructions to make it transparent from the programmer
• The instruction has to be safe, as in it shouldn’t affect the branch
• Longer pipelines requires the use of more branch delay slots• Actual MIPS architecture solution
![Page 42: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/42.jpg)
• Forwarding involves providing the inputs to a stage of one instruction before the completion of another instruction
• Valid if destination stage is later in time than the source stage
• Left diagram shows typical forwarding scenario (add then sub)
• Right diagram shows that we still need a stall in the case of a load-use data hazard (load then R-type)
![Page 43: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/43.jpg)
sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $14,
100($2)
![Page 44: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/44.jpg)
• We could insert “no operation” (nop) instructions to delay the pipeline execution until the correct result is in the register file
sub $2, $1, $3nopnopand $12, $2, $5or $13, $6, $2add $14, $2, $2sw $14, 100($2)
• Too slow as it adds extra useless clock cycles• In reality, we try to find useful instructions to execute
between data-dependent instructions, but this happens too often to be efficient
![Page 45: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/45.jpg)
• Let us try to formalize detecting a data hazard1. EX/MEM.RegisterRd = ID/EX.RegisterRs2. EX/MEM.RegisterRd = ID/EX.RegisterRt3. MEM/WB.RegisterRd = ID/EX.RegisterRs4. MEM/WB.RegisterRd = ID/EX.RegisterRt
sub $2, $1, $3and $12, $2, $5 Data hazard of type #1or $13, $6, $2 Data hazard of type #4add $14, $2, $2 No data hazard – register filesw $14, 100($2) No data hazard – correct operation
![Page 46: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/46.jpg)
• Two modifications are in order• Firstly, we don’t have to forward all the time!
• Some instructions don’t write registers (e.g. beq)• Use RegWrite signal in WB control block to determine condition
• Secondly, the $0 register must always return 0• Can’t limit programmer of using it as a destination register• Use RegisterRd to determine if $0 is being used
1. If (EX/MEM.RegWrite & (EX/MEM.RegisterRd ≠ 0) & (EX/MEM.RegisterRd=ID/EX.RegisterRs)) ForwardA= 10
2. If (EX/MEM.RegWrite & (EX/MEM.RegisterRd ≠ 0) & (EX/MEM.RegisterRd=ID/EX.RegisterRt)) ForwardB= 10
3. If (MEM/WB.RegWrite & (MEM/WB.RegisterRd ≠ 0) & (MEM/WB.RegisterRd=ID/EX.RegisterRs)) ForwardA= 01
4. If (MEM/WB.RegWrite & (MEM/WB.RegisterRd ≠ 0) & (MEM/WB.RegisterRd=ID/EX.RegisterRt)) ForwardB= 01
• Let us examine the hardware changes to our datapath
![Page 47: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/47.jpg)
![Page 48: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/48.jpg)
• Remember that there is no hazard in the WB stage, because the register file is able to be written and read in the same stage
Mux control Source DescriptionForwardA = 00 ID/EX First ALU operand comes from RFForwardA = 01 EX/MEM First ALU operand forwarded from prior ALU resultForwardA = 10 MEM/WB First ALU operand forwarded from data memory or prior ALU resultForwardB = 00 ID/EX Second ALU operand comes from RFForwardB = 01 EX/MEM Second ALU operand forwarded from prior ALU resultForwardB = 10 MEM/WB Second ALU operand forwarded from data memory or prior ALU result
![Page 49: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/49.jpg)
![Page 50: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/50.jpg)
![Page 51: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/51.jpg)
![Page 52: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/52.jpg)
![Page 53: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/53.jpg)
![Page 54: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/54.jpg)
![Page 55: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/55.jpg)
lw $2, 20($1)and $4, $2, $5or $8, $2, $6add $9, $4, $2slt $1, $6, $7
![Page 56: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/56.jpg)
• Let us try to formalize detecting a stalling data hazard• If (ID/EX.MemRead & ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt =
IF/ID/RegisterRt)))
• On the condition being true, we stall the pipeline!
![Page 57: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/57.jpg)
![Page 58: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/58.jpg)
![Page 59: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/59.jpg)
![Page 60: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/60.jpg)
![Page 61: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/61.jpg)
![Page 62: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/62.jpg)
![Page 63: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/63.jpg)
![Page 64: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/64.jpg)
• Other instructions are on the pipeline when we find out whether we take the branch or not!
![Page 65: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/65.jpg)
• Two solutions• Assume branch is not taken• Dynamic branch prediction
• We’ve already discussed the first solution• Note that three instruction stages have to be flushed
when the branch is taken• Done similarly to a data hazard stall (control values set to
0s)
• We can increase branch performance by moving the branch decision to the ID stage (rather than the MEM stage)
• Branch target address calculated by moving adder into ID stage
• Branch decision done by comparing Rs and Rt• Flushing the IF stage instruction involves nop instructions
![Page 66: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/66.jpg)
![Page 67: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/67.jpg)
![Page 68: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/68.jpg)
![Page 69: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/69.jpg)
• Store, in a branch prediction buffer, the history of each branch instruction
• 1-bit requires one wrong prediction to update history table• 2-bits requires two wrong predictions to update history table
![Page 70: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/70.jpg)
• Pipelining vs. Parallelism• Pipeline Stages• Pipeline Taxonomies• MIPS Instruction Pipeline• Structural Hazards• Control Hazards• Data Hazards• Pipeline Registers and Operation• Pipeline Control• Pipeline Throughput• Pipeline Efficiency
![Page 71: Performance Enhancement with Pipelining](https://reader031.fdocuments.net/reader031/viewer/2022013003/5561b94ed8b42a9f2f8b4bbf/html5/thumbnails/71.jpg)
• Control Hazard Stalling• Control Hazard Predicting• Control Hazard Delayed Branch• Data Hazard Forwarding• Data Hazard Detection• Forwarding Unit• Data Hazard Stalling• Branch Prediction Buffer