Pipelining Lessons

28
cs 152 L1 3 .1 DAP Fa97, U.CB Pipelining Lessons ° Pipelining doesn’t help latency of single task, it helps throughput of entire workload ° Multiple tasks operating simultaneously using different resources ° Potential speedup = Number pipe stages ° Pipeline rate limited by slowest pipeline stage ° Unbalanced lengths of pipe stages reduces speedup ° Time to “fillpipeline and time to drain” it reduces speedup 6 PM 7 8 9 Time B C D A 30 30 30 30 30 30 30 T a s k O r d e r

description

30. 30. 30. 30. 30. 30. 30. A. B. C. D. Pipelining Lessons. 6 PM. 7. 8. 9. Pipelining doesn’t help latency of single task, it helps throughput of entire workload Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages - PowerPoint PPT Presentation

Transcript of Pipelining Lessons

Page 1: Pipelining Lessons

cs 152 L1 3 .1 DAP Fa97, U.CB

Pipelining Lessons

° Pipelining doesn’t help latency of single task, it helps throughput of entire workload

° Multiple tasks operating simultaneously using different resources

° Potential speedup = Number pipe stages

° Pipeline rate limited by slowest pipeline stage

° Unbalanced lengths of pipe stages reduces speedup

° Time to “fill” pipeline and time to “drain” it reduces speedup

° Stall for Dependences

6 PM 7 8 9

Time

B

C

D

A

303030 3030 3030Task

Order

Page 2: Pipelining Lessons

cs 152 L1 3 .2 DAP Fa97, U.CB

The Five Stages of Load

° Ifetch: Instruction Fetch• Fetch the instruction from the Instruction Memory

° Reg/Dec: Registers Fetch and Instruction Decode

° Exec: Calculate the memory address

° Mem: Read the data from the Data Memory

° Wr: Write the data back to the register file

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem WrLoad

Page 3: Pipelining Lessons

cs 152 L1 3 .3 DAP Fa97, U.CB

Conventional Pipelined Execution Representation

IFetch Dcd Exec Mem WB

IFetch Dcd Exec Mem WB

IFetch Dcd Exec Mem WB

IFetch Dcd Exec Mem WB

IFetch Dcd Exec Mem WB

IFetch Dcd Exec Mem WBProgram Flow

Time

Page 4: Pipelining Lessons

cs 152 L1 3 .4 DAP Fa97, U.CB

Single Cycle, Multiple Cycle, vs. Pipeline

Clk

Cycle 1

Multiple Cycle Implementation:

Ifetch Reg Exec Mem Wr

Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10

Load Ifetch Reg Exec Mem Wr

Ifetch Reg Exec Mem

Load Store

Pipeline Implementation:

Ifetch Reg Exec Mem WrStore

Clk

Single Cycle Implementation:

Load Store Waste

Ifetch

R-type

Ifetch Reg Exec Mem WrR-type

Cycle 1 Cycle 2

Page 5: Pipelining Lessons

cs 152 L1 3 .5 DAP Fa97, U.CB

Why Pipeline? Because the resources are there!

Instr.

Order

Time (clock cycles)

Inst 0

Inst 1

Inst 2

Inst 4

Inst 3

AL

UIm Reg Dm Reg

AL

UIm Reg Dm Reg

AL

UIm Reg Dm RegA

LUIm Reg Dm Reg

AL

UIm Reg Dm Reg

Page 6: Pipelining Lessons

cs 152 L1 3 .6 DAP Fa97, U.CB

Can pipelining get us into trouble?

° Yes: Pipeline Hazards• structural hazards: attempt to use the same resource two

different ways at the same time

- E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV)

• data hazards: attempt to use item before it is ready

- E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer

- instruction depends on result of prior instruction still in the pipeline

• control hazards: attempt to make a decision before condition is evaulated

- E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in

- branch instructions

° Can always resolve hazards by waiting• pipeline control must detect the hazard

• take action (or delay action) to resolve hazards

Page 7: Pipelining Lessons

cs 152 L1 3 .7 DAP Fa97, U.CB

Mem

Single Memory is a Structural Hazard

Instr.

Order

Time (clock cycles)

Load

Instr 1

Instr 2

Instr 3

Instr 4A

LUMem Reg Mem Reg

AL

UMem Reg Mem Reg

AL

UMem Reg Mem RegA

LUReg Mem Reg

AL

UMem Reg Mem Reg

Right half highlight means read, left half write

Page 8: Pipelining Lessons

cs 152 L1 3 .8 DAP Fa97, U.CB

° Stall: wait until decision is clear• It’s possible to move up decision to 2nd stage by adding

hardware to check registers as being read

° Impact: 2 clock cycles per branch instruction => slow

Control Hazard Solutions

Instr.

Order

Time (clock cycles)

Add

Beq

Load

AL

UMem Reg Mem Reg

AL

UMem Reg Mem RegA

LUReg Mem RegMem

Page 9: Pipelining Lessons

cs 152 L1 3 .9 DAP Fa97, U.CB

° Predict: guess one direction then back up if wrong• Predict not taken

° Impact: 2 clock cycles if wrong (right 50% of time)

° More dynamic scheme: history of 1 branch ( 90%)

Control Hazard Solutions

Instr.

Order

Time (clock cycles)

Add

Beq

Load

AL

UMem Reg Mem Reg

AL

UMem Reg Mem Reg

MemA

LUReg Mem Reg

Page 10: Pipelining Lessons

cs 152 L1 3 .10 DAP Fa97, U.CB

° Redefine branch behavior (takes place after next instruction) “delayed branch”

° Impact: 0 clock cycles per branch instruction if can find instruction to put in “slot” ( 50% of time)

° As launch more instruction per clock cycle, less useful

Control Hazard Solutions

Instr.

Order

Time (clock cycles)

Add

Beq

Misc

AL

UMem Reg Mem Reg

AL

UMem Reg Mem Reg

MemA

LUReg Mem Reg

Load Mem

AL

UReg Mem Reg

Page 11: Pipelining Lessons

cs 152 L1 3 .11 DAP Fa97, U.CB

Data Hazard on r1

add r1 ,r2,r3

sub r4, r1 ,r3

and r6, r1 ,r7

or r8, r1 ,r9

xor r10, r1 ,r11

Page 12: Pipelining Lessons

cs 152 L1 3 .12 DAP Fa97, U.CB

• Dependencies backwards in time are hazards

Data Hazard on r1:

Instr.

Order

Time (clock cycles)

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

IF

ID/RF

EX MEM WBAL

UIm Reg Dm Reg

AL

UIm Reg Dm RegA

LUIm Reg Dm Reg

Im

AL

UReg Dm Reg

AL

UIm Reg Dm Reg

Page 13: Pipelining Lessons

cs 152 L1 3 .13 DAP Fa97, U.CB

• “Forward” result from one stage to another

• “or” OK if define read/write properly

Data Hazard Solution:

Instr.

Order

Time (clock cycles)

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

IF

ID/RF

EX MEM WBAL

UIm Reg Dm Reg

AL

UIm Reg Dm RegA

LUIm Reg Dm Reg

Im

AL

UReg Dm Reg

AL

UIm Reg Dm Reg

Page 14: Pipelining Lessons

cs 152 L1 3 .14 DAP Fa97, U.CB

• Dependencies backwards in time are hazards

• Can’t solve with forwarding: • Must delay/stall instruction dependent on loads

Forwarding (or Bypassing): What about Loads

Time (clock cycles)

lw r1,0(r2)

sub r4,r1,r3

IF

ID/RF

EX MEM WBAL

UIm Reg Dm Reg

AL

UIm Reg Dm Reg

Page 15: Pipelining Lessons

cs 152 L1 3 .15 DAP Fa97, U.CB

Pipelining the Load Instruction

° The five independent functional units in the pipeline datapath are:

• Instruction Memory for the Ifetch stage

• Register File’s Read ports (Read Register 1 and Read Register 2) for the Reg/Dec stage

• ALU for the Exec stage

• Data Memory for the Mem stage

• Register File’s Write port (Write Register) for the Wr stage

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Ifetch Reg/Dec Exec Mem Wr1st lw

Ifetch Reg/Dec Exec Mem Wr2nd lw

Ifetch Reg/Dec Exec Mem Wr3rd lw

Page 16: Pipelining Lessons

cs 152 L1 3 .16 DAP Fa97, U.CB

The Four Stages of R-type

° Ifetch: Instruction Fetch• Fetch the instruction from the Instruction Memory

° Reg/Dec: Registers Fetch and Instruction Decode

° Exec: • ALU operates on the two register operands

• Update PC

° Wr: Write the ALU output back to the register file

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec WrR-type

Page 17: Pipelining Lessons

cs 152 L1 3 .17 DAP Fa97, U.CB

Pipelining the R-type and Load Instruction

° We have pipeline conflict or structural hazard:• Two instructions try to write to the register file at the same time!

• Only one write port

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec WrR-type

Ops! We have a problem!

Page 18: Pipelining Lessons

cs 152 L1 3 .18 DAP Fa97, U.CB

Important Observation

° Each functional unit can only be used once per instruction

° Each functional unit must be used at the same stage for all instructions:

• Load uses Register File’s Write Port during its 5th stage

• R-type uses Register File’s Write Port during its 4th stage

Ifetch Reg/Dec Exec Mem WrLoad

1 2 3 4 5

Ifetch Reg/Dec Exec WrR-type

1 2 3 4

° 2 ways to solve this pipeline hazard.

Page 19: Pipelining Lessons

cs 152 L1 3 .19 DAP Fa97, U.CB

Solution 1: Insert “Bubble” into the Pipeline

° Insert a “bubble” into the pipeline to prevent 2 writes at the same cycle

• The control logic can be complex.

• Lose instruction fetch and issue opportunity.

° No instruction is started in Cycle 6!

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Exec WrR-type

Ifetch Reg/Dec Exec WrR-type Pipeline

Bubble

Ifetch Reg/Dec Exec Wr

Page 20: Pipelining Lessons

cs 152 L1 3 .20 DAP Fa97, U.CB

Solution 2: Delay R-type’s Write by One Cycle

° Delay R-type’s register write by one cycle:• Now R-type instructions also use Reg File’s write port at Stage 5

• Mem stage is a NOOP stage: nothing is being done.

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Exec Mem WrLoad

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Mem WrR-type

Ifetch Reg/Dec Exec WrR-type Mem

Exec

Exec

Exec

Exec

1 2 3 4 5

Page 21: Pipelining Lessons

cs 152 L1 3 .21 DAP Fa97, U.CB

The Four Stages of Store

° Ifetch: Instruction Fetch• Fetch the instruction from the Instruction Memory

° Reg/Dec: Registers Fetch and Instruction Decode

° Exec: Calculate the memory address

° Mem: Write the data into the Data Memory

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec MemStore Wr

Page 22: Pipelining Lessons

cs 152 L1 3 .22 DAP Fa97, U.CB

The Three Stages of Beq

° Ifetch: Instruction Fetch• Fetch the instruction from the Instruction Memory

° Reg/Dec: • Registers Fetch and Instruction Decode

° Exec: • compares the two register operand,

• select correct branch target address

• update PC

Cycle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec MemBeq Wr

Page 23: Pipelining Lessons

cs 152 L1 3 .23 DAP Fa97, U.CB

Pipelined Datapath

Can you find a problem even if there are no dependencies? What instructions can we execute to manifest the problem?

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Page 24: Pipelining Lessons

cs 152 L1 3 .24 DAP Fa97, U.CB

Corrected Datapath

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALUZero

ID/EX

Page 25: Pipelining Lessons

cs 152 L1 3 .25 DAP Fa97, U.CB

Pipeline Control

PC

Instructionmemory

Address

Inst

ruct

ion

Instruction[20– 16]

MemtoReg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0Registers

Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1Write

data

Read

data Mux

1

ALUcontrol

RegWrite

MemRead

Instruction[15– 11]

6

IF/ID ID/EX EX/MEM MEM/WB

MemWrite

Address

Datamemory

PCSrc

Zero

AddAdd

result

Shiftleft 2

ALUresult

ALU

Zero

Add

0

1

Mux

0

1

Mux

Page 26: Pipelining Lessons

cs 152 L1 3 .26 DAP Fa97, U.CB

° We have 5 stages. What needs to be controlled in each stage?• Instruction Fetch and PC Increment• Instruction Decode / Register Fetch• Execution• Memory Stage• Write Back

° How would control be handled in an automobile plant?• a fancy control center telling everyone what to do?• should we use a finite state machine?

Pipeline control

Page 27: Pipelining Lessons

cs 152 L1 3 .27 DAP Fa97, U.CB

° Pass control signals along just like the data

Pipeline Control

Execution/Address Calculation stage control lines

Memory access stage control lines

Write-back stage control

lines

InstructionReg Dst

ALU Op1

ALU Op0

ALU Src Branch

Mem Read

Mem Write

Reg write

Mem to Reg

R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X

Control

EX

M

WB

M

WB

WB

IF/ID ID/EX EX/MEM MEM/WB

Instruction

Page 28: Pipelining Lessons

cs 152 L1 3 .28 DAP Fa97, U.CB

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Me

mto

Re

g

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

Re

gWrit

e

MemRead

Control

ALU

Instruction[15– 11]

6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Me

mW

rite

AddressData

memory

Address