1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical...
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
1
Transcript of 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical...
1
Z3, built by German scientist Z3, built by German scientist Konrad Zuse (pictured) and Konrad Zuse (pictured) and demonstrated in 1941. Z3 used demonstrated in 1941. Z3 used mechanical relays and the mechanical relays and the program was on a punched tape. program was on a punched tape. It used a binary floating point It used a binary floating point format. The picture above was of format. The picture above was of a reproduction built in the 1960s.a reproduction built in the 1960s.
2
COMP 206:COMP 206:Computer Architecture and Computer Architecture and ImplementationImplementation
Montek SinghMontek Singh
Thu, Feb 5, 2009Thu, Feb 5, 2009
Topic: Topic: Pipelining II (Intermediate Pipelining II (Intermediate
Concepts)Concepts)
3
OutlineOutline Control of the pipelineControl of the pipeline Performance improvementPerformance improvement Problems: HazardsProblems: Hazards
Structural hazardsStructural hazards Data hazardsData hazards Hazard resolutionHazard resolution
Reading: Appendix A (HP4)
4
Pipeline with Control SignalsPipeline with Control Signals
Note that we want Note that we want control to follow control to follow instruction. For instruction. For example, RegWrite example, RegWrite at WB stage, not IDat WB stage, not ID
5
Detail: ALU ControlDetail: ALU Control
ALU op in low order ALU op in low order bitsbits
Op
31 26 01516202125
Rs1 Rs2 Rd Opx
561011
6
ALUSrcALUSrc
Simple control: Simple control: register for R-register for R-type, immediate type, immediate for lw and swfor lw and sw
7
RegDstRegDst
Chooses portion of Chooses portion of instruction to use instruction to use as destination as destination register (R-type and register (R-type and lw different syntax)lw different syntax)
8
Not to Belabor the Point…Not to Belabor the Point… Signals are decoded and carried as far as Signals are decoded and carried as far as
necessarynecessary ““Main Control”: generates control signals during Reg/DecMain Control”: generates control signals during Reg/Dec
Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle latercycle later
Control signals for Mem (MemWr, Branch) are used 2 Control signals for Mem (MemWr, Branch) are used 2 cycles latercycles later
Control signals for WrB (MemtoReg,MemWr) are used 3 Control signals for WrB (MemtoReg,MemWr) are used 3 cycles latercycles later
9
Control and DataControl and Data
10
AsideAside
Sounds more complex than it really is Sounds more complex than it really is to implementto implement
In Verilog or VHDL you would just write simple In Verilog or VHDL you would just write simple logical expressions (or just rename wires)logical expressions (or just rename wires)
Ex:Ex:assign ALUSrc = (Inst[31:26] == 35 ||assign ALUSrc = (Inst[31:26] == 35 ||
Inst[31:26] == 43);Inst[31:26] == 43);
11
Wr
Clk
Cycle 1
Multiple Cycle Implementation:
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Load Ifetch Reg Exec Mem Wr
Ifetch Reg Exec Mem
Load Store
Pipelined Implementation:
Ifetch Reg Exec Mem WrStore
Clk
Single Cycle Implementation:
Load Store Waste
Ifetch
R-type
Ifetch Reg Exec Mem WrR-type
Cycle 1 Cycle 2
Ifetch Reg Exec Mem
Single Cycle vs. Multiple Cycle vs. Single Cycle vs. Multiple Cycle vs. PipelinedPipelined
12
CPU Designs: SummaryCPU Designs: Summary Disadvantages of the Single Cycle ProcessorDisadvantages of the Single Cycle Processor
Long cycle timeLong cycle time Cycle time wasted for the faster instructionsCycle time wasted for the faster instructions
Multiple Clock Cycle ProcessorMultiple Clock Cycle Processor Divide the instructions into smaller stepsDivide the instructions into smaller steps Execute each step (instead of the entire instruction) in 1 Execute each step (instead of the entire instruction) in 1
cyclecycle
Pipelined ProcessorPipelined Processor Natural enhancement of the multiple clock cycle processorNatural enhancement of the multiple clock cycle processor Each functional unit used only once per instructionEach functional unit used only once per instruction If an instruction is going to use a functional unit:If an instruction is going to use a functional unit:
it must use it at the same stage as all other instructionsit must use it at the same stage as all other instructions Pipeline Control:Pipeline Control:
each stage’s control signal depends ONLY on the instruction that each stage’s control signal depends ONLY on the instruction that is currently in that stageis currently in that stage
13
Speedup & Throughput of a Speedup & Throughput of a PipelinePipeline
ttNnt
N
T
N
NnNn
nN
T
T
nN
n
Nt
T
tNntT
nNtT
t
N
n
p
p
seq
p
p
seq
1
)1( Throughput
),min(1
pipeline of Speedup
)CPI1( 1
1 pipelining with CPI
)1( pipelining with timeExecution
pipelining without timeExecution
delay Stage
1 operands ofNumber
1 stages pipe ofNumber
14
Example from HP4Example from HP4 Non-pipelined, 1ns clk, 4 cycles for ALU ops & Non-pipelined, 1ns clk, 4 cycles for ALU ops &
branches, and 5 cycles for memory opbranches, and 5 cycles for memory op Relative frequencies: 40%, 20%, 40%Relative frequencies: 40%, 20%, 40% Pipelined, 0.2ns overhead (setup, etc)Pipelined, 0.2ns overhead (setup, etc)
Avg. execution time (non-pipelined) = Clock × Avg. Avg. execution time (non-pipelined) = Clock × Avg. CPICPI
= 1ns × ((40% + 20%) × 4 + 40% × 5)= 1ns × ((40% + 20%) × 4 + 40% × 5)
= 1ns × 4.4= 1ns × 4.4
= 4.4 ns= 4.4 ns
Speedup = Unpipelined / PipelinedSpeedup = Unpipelined / Pipelined
= 4.4 ns / 1.2 ns= 4.4 ns / 1.2 ns
= 3.7 times= 3.7 times The overhead limits total speedupThe overhead limits total speedup
15
Not Quite this RosyNot Quite this Rosy Run into problems with contention for Run into problems with contention for
resources and dependencies between resources and dependencies between instructionsinstructions
Next: HazardsNext: Hazards
16
Pipeline Hazards: Structural Pipeline Hazards: Structural HazardHazard A A relationrelation between two instructions indicating between two instructions indicating
that: that: the two instructions the two instructions may want to use the same may want to use the same
hardware resourcehardware resource (function unit, register file port, (function unit, register file port, shared bus, cache port, etc.) shared bus, cache port, etc.)
……at the same timeat the same time
In principle, eliminated by duplicating resourcesIn principle, eliminated by duplicating resources Low hardware utilization; increased costLow hardware utilization; increased cost
MIPS pipeline as designed so far does not have MIPS pipeline as designed so far does not have structural hazardstructural hazard But we had to avoid it (see example later)But we had to avoid it (see example later)
Usually occurs when a long-latency functional Usually occurs when a long-latency functional unit is not fully pipelined (e.g., a floating point unit is not fully pipelined (e.g., a floating point unit)unit)
17
Structural Hazard: ExampleStructural Hazard: ExampleConsider system w/ single-ported memoryConsider system w/ single-ported memory
18
SolutionsSolutions Stall (insert a Stall (insert a bubblebubble))
We could also use other techniques, such as We could also use other techniques, such as split cachesplit cache, instruction buffer, etc. More when , instruction buffer, etc. More when we discuss memorywe discuss memory
19
Resolving Structural HazardsResolving Structural Hazards Early resolution (scheduling)Early resolution (scheduling)
Done well before the collision could occur, and usually at Done well before the collision could occur, and usually at a place different from where the collision could happena place different from where the collision could happen
Example: instructions are delayed in the ID stageExample: instructions are delayed in the ID stage Late resolutionLate resolution
Done at the place where the collision might happenDone at the place where the collision might happen Done just before the collision is about to happenDone just before the collision is about to happen Example: Using an arbiter or a priority encoderExample: Using an arbiter or a priority encoder
One instruction winsOne instruction winsOthers are denied access, stall, and wait for their next chanceOthers are denied access, stall, and wait for their next chance
Why allow structural hazards in the first place?Why allow structural hazards in the first place? Reduce costReduce cost Reduce unit latency (by avoiding pipeline latch delays)Reduce unit latency (by avoiding pipeline latch delays) Hazards may be infrequent (“make common case fast”)Hazards may be infrequent (“make common case fast”)
20
Data Hazard: ExampleData Hazard: Example Consider the following code fragmentConsider the following code fragment
sub sub $2$2, $1, $3, $1, $3 # Reg 2 written# Reg 2 written
and $12, and $12, $2$2, $5, $5 # $2 used# $2 used
or $13, $6, or $13, $6, $2$2
add $14, add $14, $2$2, , $2$2
sw $15, 100(sw $15, 100($2$2))
Clearly the programmer would expect Clearly the programmer would expect the newly set value of register 2 to be the newly set value of register 2 to be usedused
21
Data Hazard: Example (contd)Data Hazard: Example (contd)
22
No Problem with “sw”No Problem with “sw”
23
Maybe OK with “add”Maybe OK with “add”
If register file can be read and written in a half cycle each
24
Not Correct Result for “and”, “or”Not Correct Result for “and”, “or”
25
Types of Data HazardsTypes of Data Hazards RAW hazards corresponding to RAW hazards corresponding to value value
dependencesdependences are most difficult to deal with, are most difficult to deal with, since they can never be eliminatedsince they can never be eliminated The second instruction is waiting for information The second instruction is waiting for information
produced by the first instructionproduced by the first instruction WAR and WAW hazards are WAR and WAW hazards are name dependencesname dependences
Two instructions happen to use the same register (name), Two instructions happen to use the same register (name), although they don’t have toalthough they don’t have to
Can often be eliminated by renaming, either in software Can often be eliminated by renaming, either in software or hardwareor hardware Implies the use of additional resources, hence additional costImplies the use of additional resources, hence additional costRenaming is not always possible: implicit operands such as Renaming is not always possible: implicit operands such as
accumulator, PC, or condition codes cannot be renamedaccumulator, PC, or condition codes cannot be renamed These hazards don’t cause problems for MIPS pipelineThese hazards don’t cause problems for MIPS pipeline
Relative timing does not change even with pipelined execution, Relative timing does not change even with pipelined execution, because reads occur early and writes occur late in pipelinebecause reads occur early and writes occur late in pipeline
26
Easy Fix!!! Let Compiler DealEasy Fix!!! Let Compiler Dealsub sub $2$2, $1, $3, $1, $3 # Reg 2 written# Reg 2 written
nopnop
nopnop
and $12, $2, $5and $12, $2, $5 # $2 used, now OK# $2 used, now OK
or $13, $6, $2or $13, $6, $2
add $14, $2, $2add $14, $2, $2
sw $15, 100($2)sw $15, 100($2)
Original code sequence common, though, so Original code sequence common, though, so it’s not a good solution to waste so much it’s not a good solution to waste so much (clock) time(clock) time
27
Hardware Solution: ForwardingHardware Solution: Forwarding Correct value of $2 is available, just not stored Correct value of $2 is available, just not stored
in the register. Send from where available!in the register. Send from where available!
28
How to Detect the HazardHow to Detect the Hazard Let’s look at logic for the two types of data Let’s look at logic for the two types of data
hazards we’ve seen so farhazards we’ve seen so far1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1a. EX/MEM.RegisterRd = ID/EX.RegisterRs
1b. EX/MEM.RegisterRd = ID/EX.RegisterRt1b. EX/MEM.RegisterRd = ID/EX.RegisterRt
2a. MEM/WB.RegisterRd = ID/EX.RegisterRs2a. MEM/WB.RegisterRd = ID/EX.RegisterRs
2b. MEM/WB.RegisterRd = ID/EX.RegisterRt2b. MEM/WB.RegisterRd = ID/EX.RegisterRt
Rd – destinationRd – destination Rs and Rt – two sourcesRs and Rt – two sources
29
Type 1a and 2a HazardsType 1a and 2a Hazards
1a. EX/MEM.RegisterRd = ID/EX.Register1a. EX/MEM.RegisterRd = ID/EX.RegisterRsRs
2b. MEM/WB.RegisterRd = ID/EX.RegisterRt
To detect hazard, AND these conditions with the RegWrite signal
30
From This Pipeline to…From This Pipeline to…
31
Modified PipelineModified Pipeline
From EX/MEM & MEM/WB to each of 2 reg inputsFrom EX/MEM & MEM/WB to each of 2 reg inputs
32
With Control (RegWr)With Control (RegWr)
33
ExampleExamplesub sub $2$2, $1, $3, $1, $3
and and $4$4, , $2$2, $5, $5
or or $4$4, , $4$4, , $2$2
add $9, add $9, $4$4, , $2$2
Lots of potential hazardsLots of potential hazards
34
Clock 3Clock 3
35
Clock 4Clock 4
36
Clock 5Clock 5
37
Clock 6Clock 6
Make sure we get the correct value of $4Make sure we get the correct value of $4
38
Next TimeNext Time Sometimes forwarding not good enoughSometimes forwarding not good enough Control hazardsControl hazards