Pipelined Datapath

104
Pipelined Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili

description

Pipelined Datapath. Lecture notes from MKP, H. H. Lee and S. Yalamanchili. Reading. Sections 4.5 – 4.9. Pipeline Performance. Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelined datapath with single-cycle datapath. Pipeline Performance. - PowerPoint PPT Presentation

Transcript of Pipelined Datapath

Page 1: Pipelined  Datapath

Pipelined Datapath

Lecture notes from MKP, H. H. Lee and S. Yalamanchili

Page 2: Pipelined  Datapath

(2)

Reading• Sections 4.5 – 4.10• Practice Problems: 1, 3, 8, 12

Page 3: Pipelined  Datapath

(3)

Pipeline Performance• Assume time for stages is

100ps for register read or write 200ps for other stages

• Compare pipelined datapath with single-cycle datapath

Instr Instr fetch Register read

ALU op Memory access

Register write

Total time

lw 200ps 100 ps 200ps 200ps 100 ps 800ps

sw 200ps 100 ps 200ps 200ps 700ps

R-format 200ps 100 ps 200ps 100 ps 600ps

beq 200ps 100 ps 200ps 500ps

Page 4: Pipelined  Datapath

(4)

Pipeline PerformanceSingle-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

Page 5: Pipelined  Datapath

(5)

Pipeline Speedup• If all stages are balanced

i.e., all take the same time

• If not balanced, speedup is less• Speedup due to increased throughput

Latency (time for each instruction) does not decrease

Page 6: Pipelined  Datapath

(6)

Basic IdeaAll instructions

are 32-bitsFew & regular

instruction formats

Alignment of memory operands

Page 7: Pipelined  Datapath

(7)

Pipelining• What makes it easy

All instructions are the same length Simple instruction formats Memory operands appear only in loads and stores

• What makes it hard? structural hazards: suppose we had only one memory control hazards: need to worry about branch

instructions data hazards: an instruction depends on a previous

instruction• What really makes it hard:

exception handling trying to improve performance with out-of-order

execution, etc.

Page 8: Pipelined  Datapath

(8)

Pipeline registers• Need registers between stages

To hold information produced in previous cycle

Pipeline stage execution time

Page 9: Pipelined  Datapath

(9)

Graphically Representing Pipelines

• Shading indicates the unit is being used by the instruction

• Shading on the right half of the register file (ID or WB) or memory means the element is being read in that stage

• Shading on the left half means the element is being written in that stage

IF ID MEM WBEX

2 4 6 8 10Time

lw

IF ID MEM WBEXadd

Page 10: Pipelined  Datapath

(10)

Graphically Representing Pipelines

• Can help with answering questions like: how many cycles does it take to execute this code? what is the ALU doing during cycle 4? use this representation to help understand datapaths

IM Reg DM Reg

IM Reg DM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $10, 20($1)

Programexecutionorder(in instructions)

sub $11, $2, $3

ALU

ALU

Page 11: Pipelined  Datapath

(11)

Structural Hazard

IF ID MEM WBEX

2 4 6 8 10Time

IF ID MEM WBEX

IF ID MEM WBEX

IF ID MEM WBEX

lw

add

sub

add

Need to separate instruction and data memory

Page 12: Pipelined  Datapath

(12)

IF for Load, Store, …

Pipeline stage execution time

Page 13: Pipelined  Datapath

(13)

ID for Load, Store, …

Pipeline stage execution time

Page 14: Pipelined  Datapath

(14)

EX for Load

Pipeline stage execution time

Page 15: Pipelined  Datapath

(15)

MEM for Load

Pipeline stage execution time

Page 16: Pipelined  Datapath

(16)

WB for Load

Wrongregisternumber

Page 17: Pipelined  Datapath

(17)

Corrected Datapath for Load

Pipeline stage execution time

Page 18: Pipelined  Datapath

(18)

EX for Store

Pipeline stage execution time

Page 19: Pipelined  Datapath

(19)

MEM for Store

Pipeline stage execution time

Page 20: Pipelined  Datapath

(20)

WB for Store

Pipeline stage execution time

Page 21: Pipelined  Datapath

(21)

Pipelining Example

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

add $14, $5, $6 lw $13, 24($1) add $12, $3, $4 sub $11, $2, $3 lw $10, 20($1)

RegDst

0

1

Mux

Instruction[20– 16]

Instruction[15– 11]

Pipeline stage execution time

Note what is happening in the register file

Page 22: Pipelined  Datapath

(22)

Pipelined Control (Simplified)

Page 23: Pipelined  Datapath

(23)

Pipelined Control

• Control signals derived from instruction As in single-cycle

implementation

• Pass control signals along like data

Execution/AddressCalculation stage control

linesMemory access stage

control lines

Write-backstage control

lines

InstructionRegDst

ALUOp1

ALUOp0

ALUSrc Branch

MemRead

MemWrite

Regwrite

Memto Reg

R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X

Page 24: Pipelined  Datapath

(24)

Pipelined Control

Page 25: Pipelined  Datapath

(25)

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

r ite

MemRead

Control

ALU

Instruction[15– 11]

6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Writ

e

AddressData

memory

Address

IF: lw $10, 9($1)

Page 26: Pipelined  Datapath

(26)

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

r ite

MemRead

Control

ALU

Instruction[15– 11]

6

X

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Writ

e

AddressData

memory

Address

IF: sub $11, $2, $3 ID: lw $10, 9($1)

11

010

0001E

“lw”

Page 27: Pipelined  Datapath

(27)

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

r ite

MemRead

Control

ALU

Instruction[15– 11]

6

X

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Writ

e

AddressData

memory

Address

11

010

00E

ID: sub $11, $2, $3 EX: lw $10, 9($1)IF: and $12, $4, $5

1

0

10

000

1100

“sub”

Page 28: Pipelined  Datapath

(28)

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

r ite

MemRead

Control

ALU

Instruction[15– 11]

6

X

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Writ

e

AddressData

memory

Address

10

000

10E

EX: sub $11, $2, $3 MEM: lw $10, 9($1)ID: and $12, $4, $5

0

1

10

000

1100

IF: or $13, $6, $7

110

10

“and”

Page 29: Pipelined  Datapath

(29)

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

r ite

MemRead

Control

ALU

Instruction[15– 11]

6

X

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Writ

e

AddressData

memory

Address

10

000

10E

MEM: sub $11, .. WB: lw $10, 9($1)

EX: and $12, $4, $5

0

1

10

000

1100

ID: or $13, $6, $7

100

00

“or”

IF: add $14, $8, $9

1

1

Page 30: Pipelined  Datapath

(30)

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

r ite

MemRead

Control

ALU

Instruction[15– 11]

6

X

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Writ

e

AddressData

memory

Address

10

000

10E

WB: sub $11, ..MEM: and $12…

0

1

10

000

1100

EX: or $13, $6, $7

100

00

“add”

ID: add $14, $8, $9

1

0

IF: xxxx

Page 31: Pipelined  Datapath

(31)

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

r ite

MemRead

Control

ALU

Instruction[15– 11]

6

M

WB

WBIF/ID

PCSrc

EX/MEM

MEM/WB

Mux

0

1

Mem

Writ

e

AddressData

memory

Address

10

000

10

WB: and $12…

0

1

MEM: or $13, ..

100

00

EX: add $14, $8, $9

1

0

IF: xxxx ID: xxxx

X

M

WB

ID/EX

E

Page 32: Pipelined  Datapath

(32)

Datapath with ControlWB: or $13…

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

r ite

MemRead

Control

ALU

Instruction[15– 11]

6

M

WB

WBIF/ID

PCSrc

EX/MEM

MEM/WB

Mux

0

1

Mem

Writ

e

AddressData

memory

Address

MEM: add $14, ..

1000

0

EX: xxxx

1

0

IF: xxxx ID: xxxx

X

M

WB

ID/EX

E

Page 33: Pipelined  Datapath

(33)

Datapath with ControlWB: add $14..MEM: xxxxEX: xxxxIF: xxxx ID: xxxx

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

r ite

MemRead

Control

ALU

Instruction[15– 11]

6

M

WB

WBIF/ID

PCSrc

EX/MEM

MEM/WB

Mux

0

1

Mem

Writ

e

AddressData

memory

Address

1

0X

M

WB

ID/EX

E

Page 34: Pipelined  Datapath

(34)

Data Hazards (4.7)• An instruction depends on completion of data

access by a previous instruction add $s0, $t0, $t1

sub $t2, $s0, $t3

Page 35: Pipelined  Datapath

(35)

• Problem with starting next instruction before first is finished dependencies that “go backward in time” are data

hazards

Dependencies

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecutionorder(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of register $2:

DM Reg

Reg

Reg

Reg

DM

Page 36: Pipelined  Datapath

(36)

• Have compiler guarantee no hazards• Where do we insert the “nops” ?

sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)

• Problem: this really slows us down!

Software Solution

Page 37: Pipelined  Datapath

(37)

A Better Solution• Consider this sequence:

sub $2, $1,$3and $12,$2,$5or $13,$6,$2add $14,$2,$2sw $15,100($2)

• We can resolve hazards with forwarding How do we detect when to forward?

Page 38: Pipelined  Datapath

(38)

Dependencies & Forwarding

Do not wait for results to be written to the

register file – find them in the pipeline forward

to ALU

Page 39: Pipelined  Datapath

(39)

Forwarding

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

r ite

MemRead

Control

ALU

Instruction[15– 11]

6

X

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Writ

e

AddressData

memory

Address

10

000

10E

MEM: sub $11, .. WB: lw $10, 9($1)

EX: and $6, $4, $5

0

1

10

000

1100

ID: or $13, $6, $7

100

00

“or”

IF: add $14, $8, $9

1

1

Page 40: Pipelined  Datapath

(40)

Forwarding (simplified)

DataMemory

RegisterFile

MU

X

ID/EX EX/MEM MEM/WB

ALU

Page 41: Pipelined  Datapath

(41)

Forwarding (from EX/MEM)

ALU

DataMemory

RegisterFile

MU

X

ID/EX EX/MEM MEM/WB

MU

XM

UX

Page 42: Pipelined  Datapath

(42)

Forwarding (from MEM/WB)

ALU

DataMemory

RegisterFile

MU

X

ID/EX EX/MEM MEM/WB

MU

XM

UX

Page 43: Pipelined  Datapath

(43)

Forwarding (operand selection)

ALU

DataMemory

RegisterFile

MU

X

ID/EX EX/MEM MEM/WB

MU

XM

UX

ForwardingUnit

Page 44: Pipelined  Datapath

(44)

Forwarding (operand propagation)

ALU

DataMemory

RegisterFile

MU

X

ID/EX EX/MEM MEM/WB

MU

XM

UX

ForwardingUnit

Rt

Rs

MU

X

Rd

Rt

EX/MEM Rd

MEM/WB Rd

Combinational Logic!

Page 45: Pipelined  Datapath

(45)

Detecting the Need to Forward

• Pass register numbers along pipeline e.g., ID/EX.RegisterRs = register number for Rs

sitting in ID/EX pipeline register• ALU operand register numbers in EX stage are

given by ID/EX.RegisterRs, ID/EX.RegisterRt

• Data hazards when1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1b. EX/MEM.RegisterRd = ID/EX.RegisterRt

2a. MEM/WB.RegisterRd = ID/EX.RegisterRs2b. MEM/WB.RegisterRd = ID/EX.RegisterRt

Fwd fromEX/MEM

pipeline reg

Fwd fromMEM/WB

pipeline reg

Page 46: Pipelined  Datapath

(46)

Detecting the Need to Forward• But only if forwarding instruction will write to a

register! EX/MEM.RegWrite, MEM/WB.RegWrite

• And only if Rd for that instruction is not $zero EX/MEM.RegisterRd ≠ 0,

MEM/WB.RegisterRd ≠ 0

Page 47: Pipelined  Datapath

(47)

Forwarding Paths

Page 48: Pipelined  Datapath

(48)

Forwarding Conditions• EX hazard

if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10

if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10

• MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)

and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01

if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

Page 49: Pipelined  Datapath

(49)

Double Data Hazard• Consider the sequence:

add $1,$1,$2add $1,$1,$3add $1,$1,$4

• Both hazards occur Want to use the most recent

• Revise MEM hazard condition Only forward if EX hazard condition isn’t true

Page 50: Pipelined  Datapath

(50)

Revised Forwarding Condition• MEM hazard

if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01

if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

Checking precedence of EX hazard

Page 51: Pipelined  Datapath

(51)

Datapath with Forwarding

Page 52: Pipelined  Datapath

(52)

Concurrent Execution• Correct execution is about managing

dependencies Producer-consumer Structural (using the same hardware component)

• We will come across other types of dependencies later!

Page 53: Pipelined  Datapath

(53)

Load-Use Data Hazard

Need to stall for one cycle

Page 54: Pipelined  Datapath

(54)

Forwarding

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2R

egW

r ite

MemRead

Control

ALU

Instruction[15– 11]

6

X

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Writ

e

AddressData

memory

Address

10

000

10E

MEM: lw $11, 0($2) WB: lw $10, 9($1)

EX: and $6, $4, $11

0

1

10

000

1100

ID: or $13, $6, $7

100

00

“or”

IF: add $14, $8, $9

1

1

Page 55: Pipelined  Datapath

(55)

Load-Use Hazard Detection• Check when using instruction is decoded in ID

stage• ALU operand register numbers in ID stage are

given by IF/ID.RegisterRs, IF/ID.RegisterRt

• Load-use hazard when ID/EX.MemRead and

((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt))

• If detected, stall and insert bubble

Page 56: Pipelined  Datapath

(56)

Code Scheduling to Avoid Stalls• Reorder code to avoid use of load result in the

next instruction• C code for A = B + E; C = B + F;

lw $t1, 0($t0)lw $t2, 4($t0)add $t3, $t1, $t2sw $t3, 12($t0)lw $t4, 8($t0)add $t5, $t1, $t4sw $t5, 16($t0)

stall

stall

lw $t1, 0($t0)lw $t2, 4($t0)lw $t4, 8($t0)add $t3, $t1, $t2sw $t3, 12($t0)add $t5, $t1, $t4sw $t5, 16($t0)

11 cycles13 cycles

Page 57: Pipelined  Datapath

(57)

How to Stall the Pipeline• Force control values in ID/EX register

to 0 EX, MEM and WB perform a nop (no-operation)

• Prevent update of PC and IF/ID register Using instruction is decoded again Following instruction is fetched again 1-cycle stall allows MEM to read data for lw

o Can subsequently forward to EX stage

Page 58: Pipelined  Datapath

(58)

Stall/Bubble in the Pipeline

Stall inserted here

Page 59: Pipelined  Datapath

(59)

Stall/Bubble in the Pipeline

Or, more accurately…

Page 60: Pipelined  Datapath

(60)

Datapath with Hazard Detection

Pipeline stage execution time

ALUSrc mux is missing!

Page 61: Pipelined  Datapath

(61)

Control Hazards (4.8)• Branch instruction determines flow of control

Fetching next instruction depends on branch outcome Pipeline cannot always fetch correct instruction

o Still working on ID stage of branch

• In MIPS pipeline Need to compare registers and determine the branch

condition

Page 62: Pipelined  Datapath

(62)

Branch Hazards• If branch outcome determined in MEM

PC

Flush theseinstructions(Set controlvalues to 0)

Page 63: Pipelined  Datapath

(63)

Reducing Branch Delay• Move hardware to determine outcome to ID

stage Target address adder Register comparator Add IF.Flush signal to squash IF/ID register

• Example: branch taken36: sub $10, $4, $840: beq $1, $3, 7244: and $12, $2, $548: or $13, $2, $652: add $14, $4, $256: slt $15, $6, $7 ...72: lw $4, 50($7)

Page 64: Pipelined  Datapath

(64)

Example: Branch Taken

Page 65: Pipelined  Datapath

(65)

Example: Branch Taken

Page 66: Pipelined  Datapath

(66)

Data Hazards for Branches• If a comparison register is a destination of 2nd or

3rd preceding ALU instruction

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

add $4, $5, $6

add $1, $2, $3

beq $1, $4, target

Can resolve using forwarding

Page 67: Pipelined  Datapath

(67)

Data Hazards for Branches• If a comparison register is a destination of

preceding ALU instruction or 2nd preceding load instruction Need 1 stall cycle

beq stalled

IF ID EX MEM WB

IF ID EX MEM WB

IF ID

ID EX MEM WB

add $4, $5, $6

lw $1, addr

beq $1, $4, target

Page 68: Pipelined  Datapath

(68)

Data Hazards for Branches• If a comparison register is a destination of

immediately preceding load instruction Need 2 stall cycles

beq stalled

IF ID EX MEM WB

IF ID

ID

ID EX MEM WB

beq stalled

lw $1, addr

beq $1, $0, target

Page 69: Pipelined  Datapath

(69)

Delay Slot (MIPS)• Expose pipeline• Load and jump/branch entail a “delay slot” • The instruction right after the jump or branch is

executed before the jump/branchjal function_A

add $4, $5, $6 ; executed before jmplw $12, 8($4) ; executed after return

• Jump/branch and the delay slot instruction are considered “indivisible”

• In the delay slot, the compiler needs to schedule A useful instruction (either before the jmp, or after the

jmp w/o side effect) otherwise a NOP

Page 70: Pipelined  Datapath

(70)

Branch Prediction• Longer pipelines cannot readily determine

branch outcome early Stall penalty becomes unacceptable

• Predict outcome of branch Only stall if prediction is wrong

• In MIPS pipeline Can predict branches not taken Fetch instruction after branch, with no delay

Page 71: Pipelined  Datapath

(71)

MIPS with Predict Not Taken

Prediction correct

Prediction incorrect

Page 72: Pipelined  Datapath

(72)

1-Bit Predictor: Shortcoming• Inner loop branches mispredicted twice!

outer: … …inner: … … beq …, …, inner … beq …, …, outer

Mispredict as taken on last iteration of inner loop

Then mispredict as not taken on first iteration of inner loop next time around

Page 73: Pipelined  Datapath

(73)

2-Bit Predictor: State Machine• Only change prediction on two successive

mispredictions

Page 74: Pipelined  Datapath

(74)

More-Realistic Branch Prediction• Static branch prediction

Based on typical branch behavior Example: loop and if-statement branches

o Predict backward branches takeno Predict forward branches not taken

• Dynamic branch prediction Hardware measures actual branch behavior

o e.g., record recent history of each branch Assume future behavior will continue the trend

o When wrong, stall while re-fetching, and update history

Page 75: Pipelined  Datapath

(75)

AMD Bobcat

http://hothardware.com

Later in this course

ECE 6100

ECE 6100

Later in this course

Instruction Level Parallelism (ILP)

Page 76: Pipelined  Datapath

(76)

Intel Sandy Bridge

bdti.com

Page 77: Pipelined  Datapath

(77)

Exceptions and Interrupts (4.9)• “Unexpected” events requiring change

in flow of control Different ISAs use the terms differently

• Exception Arises within the CPU

o e.g., undefined opcode, overflow, syscall, …

• Interrupt From an external I/O controller

• Dealing with them without sacrificing performance is hard

Page 78: Pipelined  Datapath

(78)

Handling Exceptions• In MIPS, exceptions managed by a System

Control Coprocessor (CP0)• Save PC of offending (or interrupted)

instruction In MIPS: Exception Program Counter (EPC)

• Save indication of the problem In MIPS: Cause register We’ll assume 1-bit

o 0 for undefined opcode, 1 for overflow

• Jump to handler at 80000180

Page 79: Pipelined  Datapath

(79)

An Alternate Mechanism• Vectored Interrupts

Handler address determined by the cause• Example:

Undefined opcode: C000 0000 Overflow: C000 0020 …: C000 0040

• Instructions either Deal with the interrupt, or Jump to real handler

Page 80: Pipelined  Datapath

(80)

Handler Actions• Read cause, and transfer to relevant handler• Determine action required• If restartable

Take corrective action use EPC to return to program

• Otherwise Terminate program Report error using EPC, cause, …

Page 81: Pipelined  Datapath

(81)

Exceptions in a Pipeline• Another form of control hazard• Consider overflow on add in EX stage

add $1, $2, $1 Prevent $1 from being clobbered Complete previous instructions Flush add and subsequent instructions Set Cause and EPC register values Transfer control to handler

• Similar to mispredicted branch Use much of the same hardware

Page 82: Pipelined  Datapath

(82)

Pipeline with Exceptions

Page 83: Pipelined  Datapath

(83)

Exception Properties• Restartable exceptions

Pipeline can flush the instruction Handler executes, then returns to the instruction

o Re-fetched and executed from scratch

• PC saved in EPC register Identifies causing instruction Actually PC + 4 is saved

o Handler must adjust

Page 84: Pipelined  Datapath

(84)

Exception Example• Exception on add in

40 sub $11, $2, $444 and $12, $2, $548 or $13, $2, $64C add $1, $2, $150 slt $15, $6, $754 lw $16, 50($7)…

• Handler80000180 sw $25, 1000($0)80000184 sw $26, 1004($0)…

Page 85: Pipelined  Datapath

(85)

Exception Example

Page 86: Pipelined  Datapath

(86)

Exception Example

Page 87: Pipelined  Datapath

(87)

Multiple Exceptions• Pipelining overlaps multiple instructions

Could have multiple exceptions at once• Simple approach: deal with exception from

earliest instruction Flush subsequent instructions “Precise” exceptions

• In complex pipelines Multiple instructions issued per cycle Out-of-order completion Maintaining precise exceptions is difficult!

Page 88: Pipelined  Datapath

(88)

Imprecise Exceptions• Just stop pipeline and save state

Including exception cause(s)• Let the handler work out

Which instruction(s) had exceptions Which to complete or flush

o May require “manual” completion• Simplifies hardware, but more complex handler

software• Not feasible for complex multiple-issue

out-of-order pipelines

Page 89: Pipelined  Datapath

(89)

Performance• How do we assess the impact of stall cycles?

• How close do we approach the ideal of one instruction per cycle execution time?

• Back to the CPI model!

Page 90: Pipelined  Datapath

(90)

Recall: Program Execution time

~= Instruction_count * CPIavg * clock_cycle_time

algorithms/compiler architecture technology

Relative frequency

Number of instruction classes

Page 91: Pipelined  Datapath

(91)

Assessing Performance• Ideal CPI is increased by dependencies • Performance impact on CPI can be assessed by

computing the impact on a per instruction basis

Increase in CPI = Base CPI + Probability_of_event * penalty_for_event

For example, an event may be a branch misprediction or the occurrence of a data hazard

The probability is computed for the occurrence of the event on an instruction

• Examples: pipelined processors

Page 92: Pipelined  Datapath

(92)

Instruction-Level Parallelism (ILP)(4.10)• Pipelining: executing multiple instructions in

parallel• To increase ILP

Deeper pipelineo Less work per stage shorter clock cycle

Multiple issueo Replicate pipeline stages multiple pipelineso Start multiple instructions per clock cycleo CPI < 1, so use Instructions Per Cycle (IPC)o E.g., 4GHz 4-way multiple-issue

16 BIPS, peak CPI = 0.25, peak IPC = 4o But dependencies reduce this in practice

Page 93: Pipelined  Datapath

(93)

Multiple Issue• Static multiple issue

Compiler groups instructions to be issued together Packages them into “issue slots” Compiler detects and avoids hazards

• Dynamic multiple issue CPU examines instruction stream and chooses

instructions to issue each cycle Compiler can help by reordering instructions CPU resolves hazards using advanced techniques at

runtime

Page 94: Pipelined  Datapath

(94)

MIPS with Static Dual Issue• Two-issue packets

One ALU/branch instruction One load/store instruction 64-bit aligned

o ALU/branch, then load/storeo Pad an unused instruction with nop

Address Instruction type Pipeline Stages

n ALU/branch IF ID EX MEM WB

n + 4 Load/store IF ID EX MEM WB

n + 8 ALU/branch IF ID EX MEM WB

n + 12 Load/store IF ID EX MEM WB

n + 16 ALU/branch IF ID EX MEM WB

n + 20 Load/store IF ID EX MEM WB

Page 95: Pipelined  Datapath

(95)

MIPS with Static Dual Issue

Address computation

ALU computation

ALU operation Load/store

Aligned Instruction Pair

Page 96: Pipelined  Datapath

(96)

Instruction Level Parallelism (ILP)

IF ID MEM WB

• Single (program) thread of execution• Issue multiple instructions from the same

instruction stream• Average CPI<1• Often called out of order (OOO) cores

Multiple instructions in EX at the same time

Page 97: Pipelined  Datapath

(97)

Dynamically Scheduled CPU

Results also sent to any waiting reservation stations

Reorders buffer for register writes Can supply

operands for issued instructions

Preserves dependencies

Hold pending operands

Page 98: Pipelined  Datapath

(98)

AMD Bulldozer

forum.beyond3d.comb

Page 99: Pipelined  Datapath

(99)

AMD Bobcat

http://hothardware.com

Later in this course

ECE 6100

ECE 6100

Later in this course

Instruction Level Parallelism (ILP)

Page 100: Pipelined  Datapath

(100)

The P4 MicroarchitectureFrom, “The Microarchitecture of the Pentium 4 Processor 1,” G. Hinton et.al, Intel Technology Journal Q1, 2001

Page 101: Pipelined  Datapath

(101)

Study Guide• Given a code block, and initial register values

(those that are accessed) be able to determine state of all pipeline registers at some future clock cycle.

• Determine the size of each pipeline register• Track pipeline state in the case of forwarding

and branches• Compute the number of cycles to execute a

code block• Modify the datapath to include forwarding and

hazard detection for branches (this is trickier and time consuming but well worth it)

Page 102: Pipelined  Datapath

(102)

Study Guide (cont.)• Schedule code (manually) to improve

performance, for example to eliminate hazards and fill delay slots

• Modify the data path to add new instructions such as j

• Modify the data path to accommodate a two cycle data memory access, i.e., the data memory itself is a two cycle pipeline Modify the forwarding and hazard control logic

• Given a code sequence, be able to compute the number of stall cycles

Page 103: Pipelined  Datapath

(103)

Study Guide (cont.)• Track the state of the 2-bit branch predictor

over a sequence of branches in a code segment, for example a for-loop

• Show the pipeline state before and after an exception has taken place.

Page 104: Pipelined  Datapath

(104)

Glossary• Branch prediction • Branch hazards• Branch delay Control

hazard• Data hazard• Delay slot • Dynamic instruction

issue • Forwarding• Imprecise exception

• Instruction scheduling

• Instruction level parallelism (ILP)

• Load-to-use hazard• Pipeline bubbles• Stall cycles• Static instruction

issue• Structural hazard