Pipelining IV

19
Pipelining IV Topics Topics Implementing pipeline control Pipelining and performance analysis Systems I

description

Pipelining IV. Systems I. Topics Implementing pipeline control Pipelining and performance analysis. Implementing Pipeline Control. Combinational logic generates pipeline control signals Action occurs at start of following cycle. Initial Version of Pipeline Control. bool F_stall = - PowerPoint PPT Presentation

Transcript of Pipelining IV

Page 1: Pipelining IV

Pipelining IVPipelining IV

TopicsTopics Implementing pipeline control Pipelining and performance analysis

Systems I

Page 2: Pipelining IV

2

Implementing Pipeline Control

Combinational logic generates pipeline control signals Action occurs at start of following cycle

E

M

W

F

D

CCCC

rB

srcA

srcB

icode valE valM dstE dstM

Bchicode valE valA dstE dstM

icode ifun valC valA valB dstE dstM srcA srcB

valC valPicode ifun rA

predPC

d_srcB

d_srcA

e_Bch

D_icode

E_icode

M_icode

E_dstM

Pipecontrollogic

D_bubble

D_stall

E_bubble

F_stall

Page 3: Pipelining IV

3

Initial Version of Pipeline Controlbool F_stall =

# Conditions for a load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } ||# Stalling at fetch while ret passes through pipelineIRET in { D_icode, E_icode, M_icode };

bool D_stall = # Conditions for a load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB };

bool D_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Bubble for ret IRET in { D_icode, E_icode, M_icode };

bool E_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB};

Page 4: Pipelining IV

4

Control Combinations

Special cases that can arise on same clock cycle

Combination ACombination A Not-taken branch ret instruction at branch target

Combination BCombination B Instruction that reads from memory to %esp Followed by ret instruction

LoadE

UseD

M

Load/use

JXXE

D

M

Mispredict

JXXE

D

M

Mispredict

EretD

M

ret 1

retE

bubbleD

M

ret 2

bubbleE

bubbleD

retM

ret 3

EretD

M

ret 1

EretD

M

ret 1

retE

bubbleD

M

ret 2

retE

bubbleD

M

ret 2

bubbleE

bubbleD

retM

ret 3

bubbleE

bubbleD

retM

ret 3

Combination B

Combination A

Page 5: Pipelining IV

5

Control Combination A

Should handle as mispredicted branch Stalls F pipeline register But PC selection logic will be using M_valM anyhow

JXXE

D

M

Mispredict

JXXE

D

M

Mispredict

E

retD

M

ret 1

E

retD

M

ret 1

E

retD

M

ret 1

Combination A

Condition F D E M W

Processing ret stall bubble normal normal normal

Mispredicted Branch normal bubble bubble normal normal

Combination stall bubble bubble normal normal

Page 6: Pipelining IV

6

Stall in F

Your book provides two inconsistent meanings for “stall in F”

Instruction remains in F and injects a bubble into D

Instruction squashed into D, same PC fetched

Figure 4.61

Use the one that keeps 1 instr per pipeline stage

E M WD

F

F D E M WF D E M W

Page 7: Pipelining IV

7

JXX + ret works great!

0x000: xorl %eax,%eax

1 2 3 4 5 6 7 8 9

F D E M WF D E M W

0x002: jne target # Not taken F D E M WF D E M W

E M W

10

0x011: t: ret # Target

bubble

0x012: nop # Target + 1

F D

E M W

D

F

bubble

0x007: irmovl $1,%eax # Fall through

0x00d: nop

F D E M WF D E M W

F D E M WF D E M W

Page 8: Pipelining IV

8

Control Combination B

Would attempt to bubble and stall pipeline register D Signaled by processor as pipeline error

LoadE

UseD

M

Load/use

E

retD

M

ret 1

E

retD

M

ret 1

E

retD

M

ret 1

Combination B

Condition F D E M W

Processing ret stall bubble normal normal normal

Load/Use Hazard stall stall bubble normal normal

Combination stall bubble + stall

bubble normal normal

Page 9: Pipelining IV

9

Handling Control Combination B

Load/use hazard should get priority ret instruction should be held in decode stage for additional

cycle

LoadE

UseD

M

Load/use

E

retD

M

ret 1

E

retD

M

ret 1

E

retD

M

ret 1

Combination B

Condition F D E M W

Processing ret stall bubble normal normal normal

Load/Use Hazard stall stall bubble normal normal

Combination stall stall bubble normal normal

Page 10: Pipelining IV

10

Corrected Pipeline Control Logic

Load/use hazard should get priority ret instruction should be held in decode stage for additional

cycle

Condition FF DD EE MM WW

Processing ret stallstall bubblebubble normalnormal normalnormal normalnormal

Load/Use Hazard stallstall stallstall bubblebubble normalnormal normalnormal

Combination stallstall stallstall bubblebubble normalnormal normalnormal

bool D_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode } # but not condition for a load/use hazard && !(E_icode in { IMRMOVL, IPOPL }

&& E_dstM in { d_srcA, d_srcB });

Page 11: Pipelining IV

11

Load/use hazard with ret

mrmovl F Dret F

mrmovl F D Eret F Daddl F

mrmovl F D E M bubble Eret F D Daddl F F

mrmovl F D E M W bubble E Mret F D D Eaddl F F bubble Daddl F

Page 12: Pipelining IV

12

Pipeline Summary

Data Hazards Most handled by forwarding

No performance penalty

Load/use hazard requires one cycle stall

Control Hazards Cancel instructions when detect mispredicted branch

Two clock cycles wasted

Stall fetch stage while ret passes through pipelineThree clock cycles wasted

Control Combinations Must analyze carefully First version had subtle bug

Only arises with unusual instruction combination

Page 13: Pipelining IV

13

Performance Analysis with Pipelining

Ideal pipelined machine: CPI = 1Ideal pipelined machine: CPI = 1 One instruction completed per cycle But much faster cycle time than unpipelined machine

However - hazards are working against the idealHowever - hazards are working against the ideal Hazards resolved using forwarding are fine Stalling degrades performance and instruction comletion

rate is interrupted

CPI is measure of “architectural efficiency” of designCPI is measure of “architectural efficiency” of design

Cycle

Seconds

nInstructio

Cycles

Program

nsInstructio

Program

Seconds timeCPU

Page 14: Pipelining IV

14

CPI for PIPE

CPI CPI 1.0 1.0 Fetch instruction each clock cycle Effectively process new instruction almost every cycle

Although each individual instruction has latency of 5 cycles

CPI CPI >> 1.0 1.0 Sometimes must stall or cancel branches

Computing CPIComputing CPI C clock cycles I instructions executed to completion B bubbles injected (C = I + B)

CPI = C/I = (I+B)/I = 1.0 + B/I Factor B/I represents average penalty due to bubbles

Page 15: Pipelining IV

15

Computing CPI

CPICPI Function of useful instruction and bubbles

Cb/Ci represents the pipeline penalty due to stalls

Can reformulate to account forCan reformulate to account for load penalties (lp) branch misprediction penalties (mp) return penalties (rp)

CPI Ci CbCi

1.0 CbCi

CPI 1.0 lpmp rp

Page 16: Pipelining IV

16

Computing CPI - II

So how do we determine the penalties?So how do we determine the penalties? Depends on how often each situation occurs on average How often does a load occur and how often does that load

cause a stall? How often does a branch occur and how often is it

mispredicted How often does a return occur?

We can measure theseWe can measure these simulator hardware performance counters

We can estimate through historical averagesWe can estimate through historical averages Then use to make early design tradeoffs for architecture

Page 17: Pipelining IV

17

Computing CPI - III

CPI = 1 + 0.31 = 1.31 == 31% worse than idealCPI = 1 + 0.31 = 1.31 == 31% worse than ideal

This gets worse when:This gets worse when: Account for non-ideal memory access latency Deeper pipelines (where stalls per hazard increase)

CauseCause NameName InstructionInstructionFrequencyFrequency

ConditionConditionFrequencyFrequency

StallsStalls ProductProduct

Load/UseLoad/Use lplp 0.300.30 0.30.3 11 0.090.09

MispredictMispredict mpmp 0.200.20 0.40.4 22 0.160.16

ReturnReturn rprp 0.020.02 1.01.0 33 0.060.06

Total penaltyTotal penalty 0.310.31

Page 18: Pipelining IV

18

CPI for PIPE (Cont.)B/I = LP + MP + RP

LP: Penalty due to load/use hazard stalling Fraction of instructions that are loads 0.25 Fraction of load instructions requiring stall 0.20 Number of bubbles injected each time 1

LP = 0.25 * 0.20 * 1 = 0.05

MP: Penalty due to mispredicted branches Fraction of instructions that are cond. jumps 0.20 Fraction of cond. jumps mispredicted 0.40 Number of bubbles injected each time 2

MP = 0.20 * 0.40 * 2 = 0.16

RP: Penalty due to ret instructions Fraction of instructions that are returns 0.02 Number of bubbles injected each time 3

RP = 0.02 * 3 = 0.06

Net effect of penalties 0.05 + 0.16 + 0.06 = 0.27 CPI = 1.27 (Not bad!)

Typical Values

Page 19: Pipelining IV

19

Summary

TodayToday Pipeline control logic Effect on CPI and performance

Next TimeNext Time Further mitigation of branch mispredictions State machine design