1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical...

38
1 Z3, built by German scientist Z3, built by German scientist Konrad Zuse (pictured) and Konrad Zuse (pictured) and demonstrated in 1941. Z3 used demonstrated in 1941. Z3 used mechanical relays and the mechanical relays and the program was on a punched tape. program was on a punched tape. It used a binary floating It used a binary floating point format. The picture point format. The picture above was of a reproduction above was of a reproduction built in the 1960s. built in the 1960s.
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical...

Page 1: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

1

Z3, built by German scientist Z3, built by German scientist Konrad Zuse (pictured) and Konrad Zuse (pictured) and demonstrated in 1941. Z3 used demonstrated in 1941. Z3 used mechanical relays and the mechanical relays and the program was on a punched tape. program was on a punched tape. It used a binary floating point It used a binary floating point format. The picture above was of format. The picture above was of a reproduction built in the 1960s.a reproduction built in the 1960s.

Page 2: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

2

COMP 206:COMP 206:Computer Architecture and Computer Architecture and ImplementationImplementation

Montek SinghMontek Singh

Thu, Feb 5, 2009Thu, Feb 5, 2009

Topic: Topic: Pipelining II (Intermediate Pipelining II (Intermediate

Concepts)Concepts)

Page 3: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

3

OutlineOutline Control of the pipelineControl of the pipeline Performance improvementPerformance improvement Problems: HazardsProblems: Hazards

Structural hazardsStructural hazards Data hazardsData hazards Hazard resolutionHazard resolution

Reading: Appendix A (HP4)

Page 4: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

4

Pipeline with Control SignalsPipeline with Control Signals

Note that we want Note that we want control to follow control to follow instruction. For instruction. For example, RegWrite example, RegWrite at WB stage, not IDat WB stage, not ID

Page 5: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

5

Detail: ALU ControlDetail: ALU Control

ALU op in low order ALU op in low order bitsbits

Op

31 26 01516202125

Rs1 Rs2 Rd Opx

561011

Page 6: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

6

ALUSrcALUSrc

Simple control: Simple control: register for R-register for R-type, immediate type, immediate for lw and swfor lw and sw

Page 7: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

7

RegDstRegDst

Chooses portion of Chooses portion of instruction to use instruction to use as destination as destination register (R-type and register (R-type and lw different syntax)lw different syntax)

Page 8: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

8

Not to Belabor the Point…Not to Belabor the Point… Signals are decoded and carried as far as Signals are decoded and carried as far as

necessarynecessary ““Main Control”: generates control signals during Reg/DecMain Control”: generates control signals during Reg/Dec

Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle latercycle later

Control signals for Mem (MemWr, Branch) are used 2 Control signals for Mem (MemWr, Branch) are used 2 cycles latercycles later

Control signals for WrB (MemtoReg,MemWr) are used 3 Control signals for WrB (MemtoReg,MemWr) are used 3 cycles latercycles later

Page 9: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

9

Control and DataControl and Data

Page 10: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

10

AsideAside

Sounds more complex than it really is Sounds more complex than it really is to implementto implement

In Verilog or VHDL you would just write simple In Verilog or VHDL you would just write simple logical expressions (or just rename wires)logical expressions (or just rename wires)

Ex:Ex:assign ALUSrc = (Inst[31:26] == 35 ||assign ALUSrc = (Inst[31:26] == 35 ||

Inst[31:26] == 43);Inst[31:26] == 43);

Page 11: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

11

Wr

Clk

Cycle 1

Multiple Cycle Implementation:

Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10

Load Ifetch Reg Exec Mem Wr

Ifetch Reg Exec Mem

Load Store

Pipelined Implementation:

Ifetch Reg Exec Mem WrStore

Clk

Single Cycle Implementation:

Load Store Waste

Ifetch

R-type

Ifetch Reg Exec Mem WrR-type

Cycle 1 Cycle 2

Ifetch Reg Exec Mem

Single Cycle vs. Multiple Cycle vs. Single Cycle vs. Multiple Cycle vs. PipelinedPipelined

Page 12: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

12

CPU Designs: SummaryCPU Designs: Summary Disadvantages of the Single Cycle ProcessorDisadvantages of the Single Cycle Processor

Long cycle timeLong cycle time Cycle time wasted for the faster instructionsCycle time wasted for the faster instructions

Multiple Clock Cycle ProcessorMultiple Clock Cycle Processor Divide the instructions into smaller stepsDivide the instructions into smaller steps Execute each step (instead of the entire instruction) in 1 Execute each step (instead of the entire instruction) in 1

cyclecycle

Pipelined ProcessorPipelined Processor Natural enhancement of the multiple clock cycle processorNatural enhancement of the multiple clock cycle processor Each functional unit used only once per instructionEach functional unit used only once per instruction If an instruction is going to use a functional unit:If an instruction is going to use a functional unit:

it must use it at the same stage as all other instructionsit must use it at the same stage as all other instructions Pipeline Control:Pipeline Control:

each stage’s control signal depends ONLY on the instruction that each stage’s control signal depends ONLY on the instruction that is currently in that stageis currently in that stage

Page 13: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

13

Speedup & Throughput of a Speedup & Throughput of a PipelinePipeline

ttNnt

N

T

N

NnNn

nN

T

T

nN

n

Nt

T

tNntT

nNtT

t

N

n

p

p

seq

p

p

seq

1

)1( Throughput

),min(1

pipeline of Speedup

)CPI1( 1

1 pipelining with CPI

)1( pipelining with timeExecution

pipelining without timeExecution

delay Stage

1 operands ofNumber

1 stages pipe ofNumber

Page 14: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

14

Example from HP4Example from HP4 Non-pipelined, 1ns clk, 4 cycles for ALU ops & Non-pipelined, 1ns clk, 4 cycles for ALU ops &

branches, and 5 cycles for memory opbranches, and 5 cycles for memory op Relative frequencies: 40%, 20%, 40%Relative frequencies: 40%, 20%, 40% Pipelined, 0.2ns overhead (setup, etc)Pipelined, 0.2ns overhead (setup, etc)

Avg. execution time (non-pipelined) = Clock × Avg. Avg. execution time (non-pipelined) = Clock × Avg. CPICPI

= 1ns × ((40% + 20%) × 4 + 40% × 5)= 1ns × ((40% + 20%) × 4 + 40% × 5)

= 1ns × 4.4= 1ns × 4.4

= 4.4 ns= 4.4 ns

Speedup = Unpipelined / PipelinedSpeedup = Unpipelined / Pipelined

= 4.4 ns / 1.2 ns= 4.4 ns / 1.2 ns

= 3.7 times= 3.7 times The overhead limits total speedupThe overhead limits total speedup

Page 15: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

15

Not Quite this RosyNot Quite this Rosy Run into problems with contention for Run into problems with contention for

resources and dependencies between resources and dependencies between instructionsinstructions

Next: HazardsNext: Hazards

Page 16: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

16

Pipeline Hazards: Structural Pipeline Hazards: Structural HazardHazard A A relationrelation between two instructions indicating between two instructions indicating

that: that: the two instructions the two instructions may want to use the same may want to use the same

hardware resourcehardware resource (function unit, register file port, (function unit, register file port, shared bus, cache port, etc.) shared bus, cache port, etc.)

……at the same timeat the same time

In principle, eliminated by duplicating resourcesIn principle, eliminated by duplicating resources Low hardware utilization; increased costLow hardware utilization; increased cost

MIPS pipeline as designed so far does not have MIPS pipeline as designed so far does not have structural hazardstructural hazard But we had to avoid it (see example later)But we had to avoid it (see example later)

Usually occurs when a long-latency functional Usually occurs when a long-latency functional unit is not fully pipelined (e.g., a floating point unit is not fully pipelined (e.g., a floating point unit)unit)

Page 17: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

17

Structural Hazard: ExampleStructural Hazard: ExampleConsider system w/ single-ported memoryConsider system w/ single-ported memory

Page 18: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

18

SolutionsSolutions Stall (insert a Stall (insert a bubblebubble))

We could also use other techniques, such as We could also use other techniques, such as split cachesplit cache, instruction buffer, etc. More when , instruction buffer, etc. More when we discuss memorywe discuss memory

Page 19: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

19

Resolving Structural HazardsResolving Structural Hazards Early resolution (scheduling)Early resolution (scheduling)

Done well before the collision could occur, and usually at Done well before the collision could occur, and usually at a place different from where the collision could happena place different from where the collision could happen

Example: instructions are delayed in the ID stageExample: instructions are delayed in the ID stage Late resolutionLate resolution

Done at the place where the collision might happenDone at the place where the collision might happen Done just before the collision is about to happenDone just before the collision is about to happen Example: Using an arbiter or a priority encoderExample: Using an arbiter or a priority encoder

One instruction winsOne instruction winsOthers are denied access, stall, and wait for their next chanceOthers are denied access, stall, and wait for their next chance

Why allow structural hazards in the first place?Why allow structural hazards in the first place? Reduce costReduce cost Reduce unit latency (by avoiding pipeline latch delays)Reduce unit latency (by avoiding pipeline latch delays) Hazards may be infrequent (“make common case fast”)Hazards may be infrequent (“make common case fast”)

Page 20: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

20

Data Hazard: ExampleData Hazard: Example Consider the following code fragmentConsider the following code fragment

sub sub $2$2, $1, $3, $1, $3 # Reg 2 written# Reg 2 written

and $12, and $12, $2$2, $5, $5 # $2 used# $2 used

or $13, $6, or $13, $6, $2$2

add $14, add $14, $2$2, , $2$2

sw $15, 100(sw $15, 100($2$2))

Clearly the programmer would expect Clearly the programmer would expect the newly set value of register 2 to be the newly set value of register 2 to be usedused

Page 21: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

21

Data Hazard: Example (contd)Data Hazard: Example (contd)

Page 22: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

22

No Problem with “sw”No Problem with “sw”

Page 23: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

23

Maybe OK with “add”Maybe OK with “add”

If register file can be read and written in a half cycle each

Page 24: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

24

Not Correct Result for “and”, “or”Not Correct Result for “and”, “or”

Page 25: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

25

Types of Data HazardsTypes of Data Hazards RAW hazards corresponding to RAW hazards corresponding to value value

dependencesdependences are most difficult to deal with, are most difficult to deal with, since they can never be eliminatedsince they can never be eliminated The second instruction is waiting for information The second instruction is waiting for information

produced by the first instructionproduced by the first instruction WAR and WAW hazards are WAR and WAW hazards are name dependencesname dependences

Two instructions happen to use the same register (name), Two instructions happen to use the same register (name), although they don’t have toalthough they don’t have to

Can often be eliminated by renaming, either in software Can often be eliminated by renaming, either in software or hardwareor hardware Implies the use of additional resources, hence additional costImplies the use of additional resources, hence additional costRenaming is not always possible: implicit operands such as Renaming is not always possible: implicit operands such as

accumulator, PC, or condition codes cannot be renamedaccumulator, PC, or condition codes cannot be renamed These hazards don’t cause problems for MIPS pipelineThese hazards don’t cause problems for MIPS pipeline

Relative timing does not change even with pipelined execution, Relative timing does not change even with pipelined execution, because reads occur early and writes occur late in pipelinebecause reads occur early and writes occur late in pipeline

Page 26: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

26

Easy Fix!!! Let Compiler DealEasy Fix!!! Let Compiler Dealsub sub $2$2, $1, $3, $1, $3 # Reg 2 written# Reg 2 written

nopnop

nopnop

and $12, $2, $5and $12, $2, $5 # $2 used, now OK# $2 used, now OK

or $13, $6, $2or $13, $6, $2

add $14, $2, $2add $14, $2, $2

sw $15, 100($2)sw $15, 100($2)

Original code sequence common, though, so Original code sequence common, though, so it’s not a good solution to waste so much it’s not a good solution to waste so much (clock) time(clock) time

Page 27: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

27

Hardware Solution: ForwardingHardware Solution: Forwarding Correct value of $2 is available, just not stored Correct value of $2 is available, just not stored

in the register. Send from where available!in the register. Send from where available!

Page 28: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

28

How to Detect the HazardHow to Detect the Hazard Let’s look at logic for the two types of data Let’s look at logic for the two types of data

hazards we’ve seen so farhazards we’ve seen so far1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1a. EX/MEM.RegisterRd = ID/EX.RegisterRs

1b. EX/MEM.RegisterRd = ID/EX.RegisterRt1b. EX/MEM.RegisterRd = ID/EX.RegisterRt

2a. MEM/WB.RegisterRd = ID/EX.RegisterRs2a. MEM/WB.RegisterRd = ID/EX.RegisterRs

2b. MEM/WB.RegisterRd = ID/EX.RegisterRt2b. MEM/WB.RegisterRd = ID/EX.RegisterRt

Rd – destinationRd – destination Rs and Rt – two sourcesRs and Rt – two sources

Page 29: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

29

Type 1a and 2a HazardsType 1a and 2a Hazards

1a. EX/MEM.RegisterRd = ID/EX.Register1a. EX/MEM.RegisterRd = ID/EX.RegisterRsRs

2b. MEM/WB.RegisterRd = ID/EX.RegisterRt

To detect hazard, AND these conditions with the RegWrite signal

Page 30: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

30

From This Pipeline to…From This Pipeline to…

Page 31: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

31

Modified PipelineModified Pipeline

From EX/MEM & MEM/WB to each of 2 reg inputsFrom EX/MEM & MEM/WB to each of 2 reg inputs

Page 32: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

32

With Control (RegWr)With Control (RegWr)

Page 33: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

33

ExampleExamplesub sub $2$2, $1, $3, $1, $3

and and $4$4, , $2$2, $5, $5

or or $4$4, , $4$4, , $2$2

add $9, add $9, $4$4, , $2$2

Lots of potential hazardsLots of potential hazards

Page 34: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

34

Clock 3Clock 3

Page 35: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

35

Clock 4Clock 4

Page 36: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

36

Clock 5Clock 5

Page 37: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

37

Clock 6Clock 6

Make sure we get the correct value of $4Make sure we get the correct value of $4

Page 38: 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

38

Next TimeNext Time Sometimes forwarding not good enoughSometimes forwarding not good enough Control hazardsControl hazards