Lecture 7: Pipelining Review Kai Bu [email protected] .

121
Lecture 7: Pipelining Review Kai Bu [email protected] http://list.zju.edu.cn/kaibu/comparch

Transcript of Lecture 7: Pipelining Review Kai Bu [email protected] .

Page 1: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Lecture 7: PipeliningReview

Kai [email protected]

http://list.zju.edu.cn/kaibu/comparch

Page 2: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Appendix CLectures 4-6

Page 3: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Pipelining

start executing one instructionbefore completing the previous one

Page 4: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Outline

• What’s Pipelining• How Pipelining Works• Pipeline Hazards• Pipeline with Multicycle FP Operations

Page 5: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Outline

• What’s Pipelining• How Pipelining Works• Pipeline Hazards• Pipeline with Multicycle FP Operations

Page 6: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Laundry Example

Ann, Brian, Cathy, DaveEach has one load of clothes towash, dry, fold.

washer30 mins

dryer40 mins

folder20 mins

Page 7: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Sequential Laundry

What would you do?

Task

Ord

er

A

B

C

D

Time30 40 20 30 40 20 30 40 20 30 40 20

6 Hours

Page 8: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Sequential Laundry

What would you do?

Task

Ord

er

A

B

C

D

Time30 40 20 30 40 20 30 40 20 30 40 20

6 Hours

Page 9: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Pipelined LaundryObservations• A task has a series

of stages;• Stage dependency:

e.g., wash before dry;

• Multi tasks with overlapping stages;

• Simultaneously use diff resources to speed up;

• Slowest stage determines the finish time;

Task

Ord

er

A

B

C

D

Time30 40 40 40 40 20

3.5 Hours

Page 10: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Pipelined LaundryObservations• No speed up for

individual task;e.g., A still takes 30+40+20=90

• But speed up for average task execution time;e.g., 3.5*60/4=52.5 < 30+40+20=90

Task

Ord

er

A

B

C

D

Time30 40 40 40 40 20

3.5 Hours

Page 11: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Assembly Line

Auto

Cola

Page 12: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Pipelining

• An implementation technique whereby multiple instructions are overlapped in execution.e.g., B wash while A dry

• Essence: Start executing one instruction before completing the previous one.

• Significance: Make fast CPUs.

A

B

Page 13: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Balanced Pipeline

• Equal-length pipe stagese.g., Wash, dry, fold = 40 minsper unpipelined laundry time = 40x3 mins 3 pipe stages – wash, dry, fold

AT1

40min

T2T3T4

AA

BB

BC

CD

Page 14: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Balanced Pipeline

• Equal-length pipe stagese.g., Wash, dry, fold = 40 minsper unpipelined laundry time = 40x3 mins 3 pipe stages – wash, dry, fold

AT1

40min

T2T3T4

AA

BB

BC

CD

Page 15: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Balanced Pipeline

• Equal-length pipe stagese.g., Wash, dry, fold = 40 minsper unpipelined laundry time = 40x3 mins 3 pipe stages – wash, dry, fold

AT1

40min

T2T3T4

AA

BB

BC

CD

Page 16: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

One task/instructionper 40 mins

Time per instruction by pipeline = Time per instr on unpipelined machine

Number of pipe stages

Speed up by pipeline =Number of pipe stages

Balanced Pipeline

• Equal-length pipe stagese.g., Wash, dry, fold = 40 minsper unpipelined laundry time = 40x3 mins 3 pipe stages – wash, dry, fold

AT1

40min

T2T3T4

AA

BB

BC

CD

• Performance

Page 17: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Pipelining Terminology

• Latency: the time for an instruction to complete.

• Throughput of a CPU: the number of instructions completed per second.

• Clock cycle: everything in CPU moves in lockstep; synchronized by the clock.

• Processor Cycle: time required between moving an instruction one step down the pipeline;= time required to complete a pipe stage;= max(times for completing all stages);= one or two clock cycles, but rarely more.

• CPI: clock cycles per instruction

Page 18: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Outline

• What’s Pipelining• How Pipelining Works• Pipeline Hazards• Pipeline with Multicycle FP Operations

Page 19: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

RISC: Five-Stage Pipeline

• How it worksseparate instruction and data mems to eliminate conflicts for a single memory between instruction fetch and data memory access.

IF MEM

Instr mem Data mem

Page 20: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

RISC: Five-Stage Pipeline

• How it worksuse the register file in two stages;either with half CC;

in one clock cycle, write before read

ID WBread write

Page 21: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

RISC: Five-Stage Pipeline

• How it worksintroduce pipeline registers between successive stages;pipeline registers store the results of a stage and use them as the input of the next stage.

Page 22: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

RISC: Five-Stage Pipeline

• How it works

Page 23: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

RISC: Five-Stage Pipeline

• How it works - omit pipeline regs for simplicity

but required in implementation

Page 24: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

RISC: Reduced Instruction Set Computer

at most 5 clock cycles per instruction – 1IF ID EX MEM WB• Instruction Fetch cycle

send the PC to memory;fetch the current instruction from mem;PC = PC + 4; //each instr is 4 bytes

Page 25: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

RISC: Reduced Instruction Set Computer

at most 5 clock cycles per instruction – 2IF ID EX MEM WB• Instruction Decode/register fetch cycle

decode the instruction;read the registers (corresponding to register source specifiers);

Page 26: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

RISC: Reduced Instruction Set Computer

at most 5 clock cycles per instruction – 3IF ID EX MEM WB• Execution/effective address cycle

ALU operates on the operands from ID:3 functions depending on the instr type - 1-Memory referenceMemory reference: ALU adds base register and offset to form effective address;

Page 27: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

RISC: Reduced Instruction Set Computer

at most 5 clock cycles per instruction – 3IF ID EX MEM WB• Execution/effective address cycle

ALU operates on the operands from ID:3 functions depending on the instr type - 2-Register-Register ALU instructionRegister-Register ALU instruction: ALU performs the operation specified by opcode on the values read from the register file;

Page 28: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

RISC: Reduced Instruction Set Computer

at most 5 clock cycles per instruction – 3IF ID EX MEM WB• EXecution/effective address cycle

ALU operates on the operands from ID:3 functions depending on the instr type - 3-Register-Immediate ALU instructionRegister-Immediate ALU instruction: ALU operates on the first value read from the register file and the sign-extended immediate.

Page 29: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

RISC: Reduced Instruction Set Computer

at most 5 clock cycles per instruction – 4IF ID EX MEM WB• MEMory access

for load instr: the memory does a read using the effective address;for store instr: the memory writes the data from the second register using the effective address.

Page 30: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

RISC: Reduced Instruction Set Computer

at most 5 clock cycles per instruction – 5IF ID EX MEM WB• Write-Back cycle

for Register-Register ALU or load instr;write the result into the register file, whether it comes from the memory (for load) or from the ALU (for ALU instr).

Page 31: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

RISC: Reduced Instruction Set Computer

3 classes of instructions - 1• ALU (Arithmetic Logic Unit) instructions

operate on two regs or a reg + a sign-extended immediate;store the result into a third reg;e.g., add (DADD), subtract (DSUB)logical operations AND, OR

Page 32: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

RISC: Reduced Instruction Set Computer

3 classes of instructions - 2• Load (LD) and store (SD) instructions

operands: base register + offset;the sum (called effective address) is used as a memory address;Load: use a second reg operand as the destination for the data loaded from memory;Store: use a second reg operand as the source of the data stored into memory.

Page 33: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

RISC: Reduced Instruction Set Computer

3 classes of instructions - 3• Branches and jumps

conditional transfers of control;Branch:Branch: specify the branch conditionspecify the branch condition with a set of condition bits or comparisons between two regs or between a reg and zero;decide the branch destinationdecide the branch destination by adding a sign-extended offset to the current PC (program counter);

Page 34: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS Instruction

• at most 5 clock cycles per instruction• IF ID EX MEM WB

Page 35: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS Instruction

IF ID EX MEM WB

IR ← Mem[PC];NPC ← PC + 4;

Page 36: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS Instruction

IF ID EX MEM WB

A ← Regs[rs];B ← Regs[rt];Imm ← sign-extended immediate field of IR (lower 16 bits)

Page 37: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS Instruction

IF ID EX MEM WB

ALUOutput ← A + Imm;

ALUOutput ← A func B;

ALUOutput ← A op Imm;

ALUOutput ← NPC + (Imm<<2);Cond ← (A == 0);

Page 38: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS Instruction

IF ID EX MEM WB

LMD ← Mem[ALUOutput]; Mem[ALUOutput] ← B;

if (cond) PC ← ALUOutput;

Page 39: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS Instruction

IF ID EX MEM WB

Regs[rd] ← ALUOutput;

Regs[rt] ← ALUOutput;

Regs[rt] ← LMD;

Page 40: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS Instruction Demo

• Prof. Gurpur Prabhu, Iowa State Univ http://www.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/DLXimplem.html

• Load, Store• Register-register ALU• Register-immediate ALU• Branch

Page 41: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Load

Page 42: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Load

Page 43: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Load

Page 44: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Load

Page 45: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Load

Page 46: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Load

Page 47: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Store

Page 48: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Store

Page 49: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Store

Page 50: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Store

Page 51: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Store

Page 52: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Store

Page 53: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Register-Register ALU

Page 54: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Register-Register ALU

Page 55: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Register-Register ALU

Page 56: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Register-Register ALU

Page 57: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Register-Register ALU

Page 58: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Register-Register ALU

Page 59: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Register-Immediate ALU

Page 60: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Register-Immediate ALU

Page 61: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Register-Immediate ALU

Page 62: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Register-Immediate ALU

Page 63: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Register-Immediate ALU

Page 64: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Register-Immediate ALU

Page 65: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Branch

Page 66: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Branch

Page 67: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Branch

Page 68: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Branch

Page 69: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Branch

Page 70: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Branch

Page 71: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Outline

• What’s Pipelining• How Pipelining Works• Pipeline Hazards• Pipeline with Multicycle FP Operations

Page 72: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

When Pipeline Is Stuck

LD R1, 0(R2)

DSUB R4, R1, R5

R1

R1

Page 73: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Structural Hazard

• Example1 mem portmem conflict

data access vs

instr fetch

Load

Instr i+3

Instr i+2

Instr i+1

MEM

IF

Page 74: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Structural Hazard

Stall Instr i+3till CC 5

Page 75: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Data HazardDADD

DSUB

AND

OR

XOR

R1, R2, R3

R4, R1, R5

R6, R1, R7

R8, R1, R9

R10, R1, R11

R1

No hazard

1st half cycle: w

2nd half cycle: r

Page 76: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Data Hazard

• Solution: forwardingdirectly feed back EX/MEM&MEM/WBpipeline regs’ results to the ALU inputs;

if forwarding hw detects that previous ALU has written the reg corresponding to a source for the current ALU,control logic selects the forwarded result as the ALU input.

Page 77: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Data Hazard: ForwardingDADD

DSUB

AND

OR

XOR

R1, R2, R3

R4, R1, R5

R6, R1, R7

R8, R1, R9

R10, R1, R11

R1

Page 78: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Data Hazard: ForwardingDADD

DSUB

AND

OR

XOR

R1, R2, R3

R4, R1, R5

R6, R1, R7

R8, R1, R9

R10, R1, R11

R1EX/MEM

Page 79: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Data Hazard: ForwardingDADD

DSUB

AND

OR

XOR

R1, R2, R3

R4, R1, R5

R6, R1, R7

R8, R1, R9

R10, R1, R11

R1MEM/WB

Page 80: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Data Hazard: Forwarding

• Generalized forwardingpass a result directly to the functional unit that requires it;

forward results to not only ALU inputs but also other types of functional units;

Page 81: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Data Hazard: Forwarding

• Generalized forwarding

DADD R1, R2, R3

LD R4, 0(R1)

SD R4, 12(R1)

R1

R1

R1

R1

R4

R4

Page 82: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Data Hazard

• Sometimes stall is necessary

R1

R1

LD R1, 0(R2)

DSUB R4, R1, R5

MEM/WB

Forwarding cannot be backward.

Has to stall.

Page 83: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Branch Hazard

• Redo IF

If the branch is untaken,the stall is unnecessary.

essentially a stall

Page 84: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Branch Hazard: Solutions

4 simple compile time schemes – 1• Freeze or flush the pipeline

hold or delete any instructions after the branch till the branch dst is known;

i.e., Redo IF w/o the first IF

Page 85: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Branch Hazard: Solutions

4 simple compile time schemes – 2• Predicted-untaken

simply treat every branch as untaken;

when the branch is untaken,pipelining as if no hazard.

Page 86: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Branch Hazard: Solutions

4 simple compile time schemes – 2• Predicted-untaken

but if the branch is taken:turn fetched instr into a no-op (idle);restart the IF at the branch target addr

Page 87: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Branch Hazard: Solutions

4 simple compile time schemes – 3• Predicted-taken

simply treat every branch as taken;

not apply to the five-stage pipeline;

apply to scenarios when branch target addr is known before branch outcome.

Page 88: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Branch Hazard: Solutions

4 simple compile time schemes – 4• Delayed branch

delay the branch execution after the next instruction;

pipelining sequence:pipelining sequence:branch instructionsequential successorbranch target if taken

Branch delay slotthe next instruction

Page 89: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Branch Hazard: Solutions• Delayed branch

Page 90: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Outline

• What’s Pipelining• How Pipelining Works• Pipeline Hazards• Pipeline with Multicycle FP Operations

Page 91: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Multicycle FP Operation• FP pipeline

allow for a longer latency for op;two changes over integer pipeline:

repeat EX;use multiple FP functional units;

Page 92: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

FP Pipeline

loads and storesinteger ALU operations

branches

FP addFP subtract

FP conversion

FP and integer multiplier

FP and integer divider

Page 93: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Generalized FP Pipeline

• EX is pipelined (except for FP divider)• Additional pipeline registers

e.g., ID/A1

FP divider: 24 CCs

Page 94: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Generalized FP Pipeline

• Exampleitalics: stage where data is neededbold: stage where a result is available

Page 95: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Hazard

• Divider is not fully pipelined – structural hazard

Page 96: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Hazard

• Instructions have varying running times, maybe >1 register write in a cycle - structural hazard

Page 97: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Hazard

• Instructions no longer reach WB in order – Write after write (WAW) hazard

Page 98: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Hazard

• Instructions may complete in a different order than they were issued – exceptions

Page 99: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Hazard

• Longer latency of operations – more frequent stalls for RAW hazards

Page 100: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

RAW Hazards

Page 101: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

Structural Hazards

Page 102: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

WAW Hazards

• If L.D were issued one cycle earlier• L.D would write F2 one cycle earlier than

ADD.D – WAW hazardwhat if another instruction using F2 between

them? --- No WAW

Page 103: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

All in MIPS R4000

Page 104: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS R4000

• 5-stage -> 8-stage• Higher clock rate

Page 105: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS R4000

• IF: first half of instruction fetch;PC selection;initiation of instruction cache access;

Page 106: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS R4000

• IS: second half of instruction fetch;completion of instruction cache access;

Page 107: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS R4000

• RF: instruction decode and register fetch;hazard checking;instruction cache hit detection;

Page 108: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS R4000

• EX: executioneffective address calculation;ALU operation;branch-target computation and condition evaluation;

Page 109: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS R4000

• DF: data fetchfirst half of data access;

Page 110: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS R4000

• DS: second half of data fetchcompletion of data cache access;

Page 111: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS R4000

• TC: tag checkdetermine whether the data cache access hit;

Page 112: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS R4000

• WB: write backfor loads and register-register operations;

Page 113: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS R4000

• 2-cycle load delay

Page 114: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS R4000

• 3-cycle branch delay

Page 115: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS R4000

• FP unit with eight different stages

Page 116: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS R4000

• FP operations: latency and initiation interval

Page 117: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS R4000

• FP operations Example 1FP multiply + FP add

Page 118: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS R4000

• FP operations Example 2FP add + FP multiply

Page 119: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS R4000

• FP operations Example 3: divide + add

Page 120: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

MIPS R4000

• FP operations Example 4FP add + FP divide

Page 121: Lecture 7: Pipelining Review Kai Bu kaibu@zju.edu.cn .

?