Gary MarsdenSlide 1University of Cape Town Pipelining Technique where multiple instructions are...

47
Gary Marsden Slide 1 University of Cape Town Pipelining Technique where multiple instructions are overlapped in execution (key for speed) Time 7 6 PM 8 9 10 11 12 1 2 AM A B C D Time 7 6 PM 8 9 10 11 12 1 2 AM A B C D Task order Task order

description

Gary MarsdenSlide 3University of Cape Town Pipelined datapath  Having 5 steps in an instruction means a 5 stage pipeline –5 instructions being executed at a given time 1.IF: Instruction fetch 2.ID: Instruction Decode 3.EX: Execute and effective address calculation 4.MEM: Memory access 5.WB: Write back

Transcript of Gary MarsdenSlide 1University of Cape Town Pipelining Technique where multiple instructions are...

Page 1: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 1University of Cape Town

PipeliningTechnique where multiple instructions are

overlapped in execution (key for speed)Time

76 PM 8 9 10 11 12 1 2 AM

A

B

C

D

Time76 PM 8 9 10 11 12 1 2 AM

A

B

C

D

Taskorder

Taskorder

Page 2: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 2University of Cape Town

AnalogyEach step is called a pipe stage or pipe

segmentPipelining improves throughput rather than

the speed of a given instruction– Concorde vs 747

Only possible in multi-cycle datapathsAll stages must be ready to proceed at

same timeClock cycle determined by slowest stageGoal: balance length of each stage

Page 3: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 3University of Cape Town

Pipelined datapath Having 5 steps in an instruction means a 5

stage pipeline– 5 instructions being executed at a given time1. IF: Instruction fetch2. ID: Instruction Decode3. EX: Execute and effective address calculation4. MEM: Memory access5. WB: Write back

Page 4: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 4University of Cape Town

Comparative timingRegWrite

Total time

1

1

8765

Instructionfetch Reg ALU Data

access Reg

8 nsInstruction

fetch Reg ALU Dataaccess Reg

8 nsInstruction

fetch

8 ns

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 4 6 8 10 12 14 16 18

2 4 6 8 10 12 14

...

Programexecutionorder(in instructions)

Instructionfetch Reg ALU Data

access Reg

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 nsInstruction

fetch Reg ALU Dataaccess Reg

2 nsInstruction

fetch Reg ALU Dataaccess Reg

2 ns 2 ns 2 ns 2 ns 2 ns

Programexecutionorder(in instructions)

Page 5: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 5University of Cape Town

View of datapath

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Instruction

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

ReaddataAddress

Datamemory

1

ALUresult

Mux

ALUZero

IF: Instruction fetch ID: Instruction decode/register file read

EX: Execute/address calculation

MEM: Memory access WB: Write back

Page 6: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 6University of Cape Town

Progression in pipeGeneral left-right progression

– Like a car assembly lineTwo exceptions

– Write back stage places result back in to register which is in the middle of the datapath

– Selection of PC value - could be a branchRight-left flow may affect subsequent

instructions Like multi-path, we need registers to hold

values between stages

Page 7: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 7University of Cape Town

Symbolic view

IM Reg DM RegALU

IM Reg DM RegALU

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7

Time (in clock cycles)

lw $2, 200($0)

lw $3, 300($0)

Programexecutionorder(in instructions)

lw $1, 100($0) IM Reg DM RegALU

Page 8: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 8University of Cape Town

Extra registers

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Instruction

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Page 9: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 9University of Cape Town

Execution of load instruction

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Instruction

IF/ID EX/MEM

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX MEM/WB

Executionlw

Address

Datamemory

Page 10: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 10University of Cape Town

Execution of store

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Instruction

IF/ID EX/MEM

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALUZero

ID/EX MEM/WB

Executionsw

Address

Page 11: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 11University of Cape Town

OoopsWhen doing a write back for ‘lw’ we don’t

know where to write!

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Instruction

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALUZero

ID/EX

Page 12: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 12University of Cape Town

A note on notations

IM Reg DM Reg

IM Reg DM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $10, 20($1)

Programexecutionorder(in instructions)

sub $11, $2, $3

ALU

ALU

Programexecutionorder(in instructions)

Time ( in clock cycles)

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Instructionfetch

Instructiondecode

Instructionfetch

Instructiondecode Execution Write back

Execution

Dataaccess

Dataaccess Write backlw $10, $20($1)

sub $11, $2, $3

Page 13: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 13University of Cape Town

Pipeline control Just like we did for the single datapath

machine, but with a twist– Label control lines on existing data path– Assume PC written on each cycle (no PCWrite)– To control pipeline stage, need only control

values for that stage– Usual five stages for control: IF, ID, EXE, MEM,

WB

Page 14: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 14University of Cape Town

Pipeline control diagram

PC

Instructionmemory

Address

Instruction

Instruction[20– 16]

MemtoReg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0Registers

Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1Write

data

Read

data Mux

1

ALUcontrol

RegWrite

MemRead

Instruction[15– 11]

6

IF/ID ID/EX EX/MEM MEM/WB

MemWrite

Address

Datamemory

PCSrc

Zero

Add Addresult

Shiftleft 2

ALUresult

ALUZero

Add

0

1

Mux

0

1

Mux

Page 15: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 15University of Cape Town

Buffering pipeline control

Control

EX

M

WB

M

WB

WB

IF/ID ID/EX EX/MEM MEM/WB

Instruction

Page 16: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 16University of Cape Town

Another scary picture

PC

Instructionmemory

Instruction

Add

Instruction[20– 16]

MemtoReg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

RegWrite

MemRead

Control

ALU

Instruction[15– 11]

6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

MemWrite

AddressData

memory

Address

Page 17: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 17University of Cape Town

ObservationsAlthough a new instruction starts every

clock cycle, still need 5 cycles to completeTakes four cycles before we are up to full

efficiencyWhen stage is inactive, control lines are

deassertedControl sequencing is implicit in pipeline

stages– No mInstructions like before

Page 18: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 18University of Cape Town

Data hazard Sequences of instructions with dependencies make

high-performance pipelines hard to design– Sub $2,$1,$3; – AND $12, $2, $5– Oopsie!

Resolving– Forbid the compiler to do this

• Interleave only independent instructions• Use a No-op (wasteful)

– Stall– Forward

Page 19: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 19University of Cape Town

Data hazard diagram

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecutionorder(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of register $2:

DM Reg

Reg

Reg

Reg

DM

Page 20: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 20University of Cape Town

Overcoming data hazardsThe ‘add’ problem we can overcome with

hardware design– Write register file in first half of clock cycle, read

in secondDoesn’t help with ‘and’ and ‘or’

– Need to detect hazard and forward correct value

Page 21: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 21University of Cape Town

Detecting hazardsWe can’t get the computer to draw a

diagram, instead we use the following notation– 1(a) EX/MEM.WriteReg = IF/ID.ReadReg1– 1(b) EX/MEM.WriteReg = IF/ID.ReadReg2– 2(a) MEM/WB.WriteReg = IF/ID.ReadReg1– 2(b) MEM/WB.WriteReg = IF/ID.ReadReg2

Page 22: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 22University of Cape Town

Forwarding If we can detect a hazard, we can forward

the correct value as soon as it is available– We will see how to do this soon

By ‘forwarding’ we can pull the value from the appropriate pipeline register rather than waiting for it to be written back at the end of an instruction

Page 23: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 23University of Cape Town

Forwarding to resolve hazards

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecution order(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of register $2 :

DM Reg

Reg

Reg

Reg

X X X – 20 X X X X XValue of EX/MEM :X X X X – 20 X X X XValue of MEM/WB :

DM

Page 24: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 24University of Cape Town

Achieving control for forwarding

Registers

Mux M

ux

ALU

ID/EX MEM/WB

Datamemory

Mux

Forwardingunit

EX/MEM

b. With forwarding

ForwardB

Rd EX/MEM.RegisterRd

MEM/WB.RegisterRd

RtRtRs

ForwardA

Mux

ALU

ID/EX MEM/WB

Datamemory

EX/MEM

a. No forwarding

Registers

Mux

Page 25: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 25University of Cape Town

Until…

PC Instructionmemory

Registers

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Forwardingunit

IF/ID

Instruction

Mux

RdEX/MEM.RegisterRd

MEM/WB.RegisterRd

Rt

Rt

Rs

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

Page 26: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 26University of Cape Town

StallsForwarding is an efficient way to solve data

hazards, but not all can be solved this way

Reg

IM

Reg

Reg

IM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $2, 20($1)

Programexecutionorder(in instructions)

and $4, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

DM Reg

Reg

Reg

DM

Page 27: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 27University of Cape Town

Load Word problemsWe cannot forward when an instruction

tries to read a register following a lw that is writing to that register

We need to detect this– Hazard detection unit in addition to the

forwarding unitConditions

– If(ID/EX.MemRead AND– ((ID/EX.RegWrite = IF/ID.RegRead1) OR– (ID/EX.RegWrite = IF/ID.RegRead2) ))

lw is only instruction to set this line

Page 28: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 28University of Cape Town

StallsOnce detected, we have to stall execution

until the value is available (whereupon it is forwarded)

Sometimes called a ‘bubble’ the idea being that we send an air bubble up the pipe, not data

Not strictly true– The control unit just gets the stalled stages of

the pipeline to repeat what they were doing until the value is available

Page 29: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 29University of Cape Town

Bubbles in the pipe

lw $2, 20($1)

Programexecutionorder(in instructions)

and $4, $2, $5

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

Reg

IM

Reg

Reg

IM DM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6Time (in clock cycles)

IM Reg DM RegIM

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9 CC 10

DM Reg

RegReg

Reg

bubble

Page 30: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 30University of Cape Town

Adding hazard detection

PC Instructionmemory

Registers

Mux

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

0

Mux

IF/ID

Instruction

ID/EX.MemRead

IF/IDWrite

PCWrite

ID/EX.RegisterRt

IF/ID.RegisterRd

IF/ID.RegisterRtIF/ID.RegisterRtIF/ID.RegisterRs

RtRs

Rd

Rt EX/MEM.RegisterRd

MEM/WB.RegisterRd

Page 31: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 31University of Cape Town

Branch hazardsAnother type of hazard involves branches:

an instruction must be fetched every cycle to keep the pipeline full… but the decision about a branch does not come to the MEM stage

Called a ‘control’ or a ‘branch’ hazard– Occur less frequently than data hazards– Are simple to understand– Not much we can do really

Page 32: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 32University of Cape Town

Effect of a branch

Reg

Reg

CC 1

Time (in clock cycles)

40 beq $1, $3, 7

Programexecutionorder(in instructions)

IM Reg

IM DM

IM DM

IM DM

DM

DM Reg

Reg Reg

Reg

Reg

RegIM

44 and $12, $2, $5

48 or $13, $6, $2

52 add $14, $2, $2

72 lw $4, 50($7)

CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9

Reg

Page 33: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 33University of Cape Town

Coping with branchingStall subsequent instructions on ‘beq’

– This increases the cost of a branch from one cycle to four cycles

Assume branch not taken– Carry on as before– Only penalty will be if the branch is taken– We can then ‘flush’ buffers

Page 34: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 34University of Cape Town

Lessening the impactCurrently branch decisions are made at stage

4We could save one stage by getting the value

from the buffer at stage 3 (like forwarding)Can even calculate the branch in first stage!

– Move branch adder from MEM to ID stage– Add a bunch of XOR gates to do comparison of

register values (do not use the ALU)– Need to alter forwarding unit to cope with this

Impact down to one lost cycle

Page 35: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 35University of Cape Town

Datapath to lessen branch impact

PC Instructionmemory

4

Registers

Mux

Mux

Mux

ALU

EX

M

WB

M

WB

WB

ID/EX

0

EX/MEM

MEM/WB

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

IF.Flush

IF/ID

Signextend

Control

Mux

=

Shiftleft 2

Mux

Page 36: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 36University of Cape Town

Branch prediction ‘Assume branch not taken’ is a very

primitive form of branch predictionWe can use a ‘branch prediction buffer’ or

‘branch history table’ to see what happened the last time the branch was executed– Think about loops

Buffers are usually 2-bit– One bit buffers can flip-flop– 2 bit buffers need two wrong guesses before

they change

Page 37: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 37University of Cape Town

It doesn’t stop thereSome processors support ‘superpipeline’

– These are simply pipelines with more stagesOthers have ‘superscalar’ pipelines

– Basically the entire pipeline is replicated– Big overhead in control– Usually between 2 to 9 datapaths

• 4 superscalar pipelines give a CPI of 0.25!Final wrinkle is dynamic pipeline scheduling

– Copes with stalls, stalling the next instruction but allowing, non-dependent, subsequent instructions to go

Page 38: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 38University of Cape Town

Pipelining for realBoth the Pentium and PPC 604 use

dynamically scheduled pipelines– Have a 512 entry branch prediction table

Page 39: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 39University of Cape Town

Pipelines in reality30% of Pentium

is legacy

Branch

Instructioncache andfetch unit Instruction

decodeMicrocode(control)

Reorder buffer(control)

Reservation stations(control)

Memorybuffer

I/O unit

Data cache

Integerdata- path

Floating-point

datapathQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Page 40: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 40University of Cape Town

Pentium fetch/execute 1. Prefetch/Fetch: Instructions are fetched from

the instruction cache and aligned in prefetch buffers for decoding.

2. Decode1: Instructions are decoded into the Pentium's internal instruction format. Branch prediction also takes place at this stage.

3. Decode2: Same as above, and microcode ROM kicks in here, if necessary. Also, address computations take place at this stage.

4. Execute: The integer hardware executes the instruction.

5. Write-back: The results of the computation are written back to the register file

Page 41: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 41University of Cape Town

Pentium branch prediction3 types of prediction

– Only 20% miss

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 42: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 42University of Cape Town

P4 pipeline20 stages deep

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 43: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 43University of Cape Town

PowerPC processorScary!

Page 44: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 44University of Cape Town

BewarePipelining is not as easy as it looks

– Subtle and complex interplay Instruction set has a huge impact on

pipeline efficiency– Variable instruction lengths and addressing

modes problematic Increasing depth of pipe does not always

improve performance

Page 45: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 45University of Cape Town

Performance trade off

1 2 4 8 16

Pipeline depth

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Relative performance

Page 46: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 46University of Cape Town

Comparisons

Slower

Clock rate

FasterSlower

Instruction throughput(instructions per clock cycle or 1/CPI)

Multicycledatapath

(section 5.4)

Pipelineddatapath

(Chapter 6)

Single-cycledatapath

(section 5.3)

Faster

Shared

Hardware

Several1

Clock cycles of latency for an instruction

Single-cycledatapath

(section 5.3)

Pipelineddatapath

(Chapter 6)

Multicycledatapath

(section 5.4)

Specialized

Page 47: Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)

Gary Marsden Slide 47University of Cape Town

SummaryPipleines speed up throughput Pipeline has stages corresponding to

execution steps of multi-cycle instructionsRequires buffers and special purpose

components to be addedProblems with data hazards

– Forward and stallingProblems with branch prediction,

– Do nothing, assume not taken, move comparison early, use branch prediction table