Advanced Pipelining

84
COMP25212 Advanced Pipelining Out of Order Processors

description

Advanced Pipelining. Out of Order Processors. COMP25212. Classic 5-stage pipeline. A single execution flow. Inst Cache. Data Cache. Fetch Logic. Decode Logic. Exec Logic. Mem Logic. Write Logic. Modern Pipelines. Many execution flows. Ld1. Ld2. Write Back. Inst Cache. Add1. - PowerPoint PPT Presentation

Transcript of Advanced Pipelining

Page 1: Advanced Pipelining

COMP25212

Advanced Pipelining

Out of Order Processors

Page 2: Advanced Pipelining

2

Classic 5-stage pipeline

FetchLogic

DecodeLogic

ExecLogic

Mem

Logic

Write

Logic

Inst Cache Data Cache

• A single execution flow

Page 3: Advanced Pipelining

3

Modern Pipelines

Fetch

Decode

Add1

Ld2

Write

Back

Inst Cache

Write

Back

Ld1

Mul3

Write

Back

Mul1

Mul2

Div3

Write

Back

Div1

Div2

• Many execution flows

Functional Units (FU)

Page 4: Advanced Pipelining

4

ARM Pipelines

In-order processor

Out of order processor

Page 5: Advanced Pipelining

5

Out of Order Execution

• The original order in a program is not preserved

• Processors execute instructions as input data becomes available

• Pipeline stalls due to conflicted instructions are avoided by processing instructions which are able to run immediately

• Take advantage of ILP

• Instructions per cycle increases

Page 6: Advanced Pipelining

6

Conflicted Instructions

• Cache misses: long wait before finishing execution

• Structural Hazard: the required resources are not available

• Data hazard: dependencies between instructions

Page 7: Advanced Pipelining

7

Structural Hazards

• Functional Units are typically not pipelined

• This means only one instruction can use them at once

• If all suitable Functional Units for executing an instruction are busy, then the instruction can not be executed

• This is known as an Structural Hazard

Page 8: Advanced Pipelining

8

Data dependencies

• True dependencyr1 <- r2 op r3

r4 <- r1 op r5

• Anti-dependencyr1 <- r2 op r3

r2 <- r4 op r5

• Output dependencyr1 <- r2 op r3

r1 <- r4 op r5

Read-after-write

RAW

Write-after-read

WAR

Write-after-write

WAW

Page 9: Advanced Pipelining

9

• Key Idea: Allow instructions behind stall to proceed. => Instructions executing in parallel. There are multiple execution units, so use them

DIVD F0, F2, F4

ADDD F10, F0, F8

SUBD F12, F8, F14

– Enables out-of-order execution => out-of-order completion

Even though ADDD stalls, theSUBD has no dependencies and can run.

Dynamic Scheduling

Dynamic pipeline scheduling overcomes the limitations of in-order pipelined execution by allowing out-of-order instruction execution

Page 10: Advanced Pipelining

Out of Order Execution with Scoreboard

Page 11: Advanced Pipelining

11

Scoreboard

• The scoreboard is a centralized hardware mechanism

– Instruction are executed as soon as their operands are available and there are no hazard conditions

• It dynamically constructs the dependency graph by hardware for a window of instructions as they are issued in program order

• The scoreboard is a “data structure” that provides the information necessary for all pieces of the processor to work together

(In Appendix A.8) CDC6600 (1963)

Page 12: Advanced Pipelining

12

• Out-of-order execution divides ID stage:1.Issue—decode instructions, check for structural

hazards

2.Read operands—wait until no data hazards, then read operands

• Scoreboard allows instruction to execute whenever 1 & 2 hold, not waiting for prior instructions

• We will use In order issue, out of order execution, out of order commit ( also called completion)

The Key idea of Scoreboards

Page 13: Advanced Pipelining

13

Typical Scoreboard Structure

Page 14: Advanced Pipelining

14

Stages of a Scoreboard Pipeline

Issue

WriteBack

ReadOperands

ExecuteInteger

WriteBack

ExecuteFP Multiplication

WriteBack

ExecuteFP Add

WriteBack

ExecuteFP Division

WriteBack

ExecuteFP Multiplication

Page 15: Advanced Pipelining

15

1. Issue —decode instructions & check for structural & WAW hazards (ID)

If a functional unit for the instruction is free (no structural hazards) and no other active instruction has the same destination register (no WAW), the scoreboard issues the instruction to the functional unit and updates its internal data structure.

If a structural or WAW hazard exists, then the instruction issue stalls, and no further instructions will issue until these hazards are cleared.

2. Read operands —wait until no data hazards, then read operands (RO)

A source operand is available if no earlier issued active instruction is going to write it, or if the register containing the operand is being written by a currently active functional unit (no RAW).

When the source operands are available, the scoreboard tells the functional unit to proceed to read the operands from the registers and begin execution. The scoreboard resolves RAW hazards dynamically in this step, and instructions may be sent into execution out of order.

Stages of a Scoreboard Pipeline

Alwaysdone in programorder

Can bedone out ofprogramorder

Page 16: Advanced Pipelining

16

3. Execution —operate on operands (EX)

The functional unit begins execution upon receiving operands. When the result is ready, it notifies the scoreboard that it has completed execution.

4. Write result —finish execution (WB)

Once the functional unit has completed execution, the scoreboard checks for WAR hazards. If none, it writes results. If WAR, then it stalls the instruction.

Example:

DIVD F0,F2,F4

ADDD F10,F0,F8

SUBD F8,F8,F14

Scoreboard would stall SUBD completion until ADDD reads operands

Stages of a Scoreboard Pipeline

Page 17: Advanced Pipelining

17

1. Instruction status—which of 4 steps the instruction is in

2. Functional unit status—Indicates the state of the functional unit (FU). 9 fields for each functional unit

Busy—Indicates whether the unit is being used or not

Op—Operation to perform in the unit (e.g., + or –)

Fi—Destination register

Fj, Fk—Source-register numbers

Qj, Qk—Functional units producing source registers Fj, Fk

Rj, Rk—Flags indicating when Fj, Fk are ready. Set to No after operands are read.

3. Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register

Information within the Scoreboard

Page 18: Advanced Pipelining

18

A Scoreboard ExampleThe following code is run on a scoreboard pipeline with:

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional Unit (FU) # of FUs EX cycles

Integer Mem 1 1Floating Point Multiply 2 10 Floating Point Add 1 2Floating point Divide 1 40

Functional units are not pipelined

Page 19: Advanced Pipelining

19

Dependency Graph For Example Code

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

123456

L.D F6, 34 (R2)

1

L.D F2, 45 (R3)

2

MUL.D F0, F2, F4

3

DIV.D F10, F0, F6

5

SUB.D F8, F6, F2

4

ADD.D F6, F8, F2

6

Date Dependence:(1, 4) (1, 5) (2, 3) (2, 4) (2, 6) (3, 5) (4, 6)

Output Dependence:(1, 6)

Anti-dependence: (5, 6)

Example Code

Real Data Dependence (RAW)

Anti-dependence (WAR)

Output Dependence (WAW)

Page 20: Advanced Pipelining

20

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2LD F2 45+ R3MULTDF0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F30FU

Scoreboard Example

Instruction stream

FU countdown

Clock cycle counter

Functional Units:1 Integer

2 Multiplication1 Addition1 Division

Page 21: Advanced Pipelining

21

Scoreboard Example Cycle 1

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1LD F2 45+ R3MULTDF0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F301 FU Integer

Page 22: Advanced Pipelining

22

Scoreboard Example Cycle 2

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2LD F2 45+ R3MULTDF0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F302 FU Integer

Page 23: Advanced Pipelining

23

Scoreboard Example Cycle 3

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3LD F2 45+ R3MULTDF0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F303 FU Integer

Page 24: Advanced Pipelining

24

Scoreboard Example Cycle 4

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3MULTDF0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F304 FU Integer

Page 25: Advanced Pipelining

25

Scoreboard Example Cycle 5

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5MULTDF0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 YesMult1 NoMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F305 FU Integer

Page 26: Advanced Pipelining

26

Scoreboard Example Cycle 6

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6MULTDF0 F2 F4 6SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 YesMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F306 FU Mult1 Integer

Page 27: Advanced Pipelining

27

Scoreboard Example Cycle 7

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULTDF0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 YesMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd Yes Sub F8 F6 F2 Integer Yes NoDivide No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F307 FU Mult1 Integer Add

Page 28: Advanced Pipelining

28

Scoreboard Example Cycle 8a

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULTDF0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 YesMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd Yes Sub F8 F6 F2 Integer Yes NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F308 FU Mult1 Integer Add Divide

Page 29: Advanced Pipelining

29

Scoreboard Example Cycle 8b

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F308 FU Mult1 Add Divide

Page 30: Advanced Pipelining

30

Scoreboard Example Cycle 9

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

10 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No

2 Add Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F309 FU Mult1 Add Divide

Page 31: Advanced Pipelining

31

Scoreboard Example Cycle 11

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9 11DIVD F10 F0 F6 8ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

8 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No

0 Add Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3011 FU Mult1 Add Divide

Page 32: Advanced Pipelining

32

Scoreboard Example Cycle 12

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

7 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3012 FU Mult1 Divide

Page 33: Advanced Pipelining

33

Scoreboard Example Cycle 13

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

6 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3013 FU Mult1 Add Divide

Page 34: Advanced Pipelining

34

Scoreboard Example Cycle 14

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

5 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No

2 Add Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3014 FU Mult1 Add Divide

Page 35: Advanced Pipelining

35

Scoreboard Example Cycle 15

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

4 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No

1 Add Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3015 FU Mult1 Add Divide

Page 36: Advanced Pipelining

36

Scoreboard Example Cycle 16

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

3 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No

0 Add Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3016 FU Mult1 Add Divide

Page 37: Advanced Pipelining

37

Scoreboard Example Cycle 17

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

2 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3017 FU Mult1 Add Divide

Page 38: Advanced Pipelining

38

Scoreboard Example Cycle 18

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

1 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3018 FU Mult1 Add Divide

Page 39: Advanced Pipelining

39

Scoreboard Example Cycle 19

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

0 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3019 FU Mult1 Add Divide

Page 40: Advanced Pipelining

40

Scoreboard Example Cycle 20

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDDF6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Yes Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3020 FU Add Divide

Page 41: Advanced Pipelining

41

Scoreboard Example Cycle 21

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDDF6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Yes Yes

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3021 FU Add Divide

Page 42: Advanced Pipelining

42

Scoreboard Example Cycle 22

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDDF6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd No

40 Divide Yes Div F10 F0 F6 Yes YesRegister result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3022 FU Divide

Page 43: Advanced Pipelining

39 cycles later…

Page 44: Advanced Pipelining

44

Scoreboard Example Cycle 61

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61ADDDF6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd No

0 Divide Yes Div F10 F0 F6 Yes YesRegister result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3061 FU Divide

Page 45: Advanced Pipelining

45

Scoreboard Example Cycle 62

Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61 62ADDDF6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd No

0 Divide NoRegister result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3062 FU

• In-order issue, out-of-order execution and out-of-order completion.

Page 46: Advanced Pipelining

46

• Scoreboard techniques to deal with hazards:

– Result forwarding to reduce or eliminate RAW hazards

– Hazard detection hardware to stall the pipeline during hazards

– Uses a hardware-based mechanism to rearrange instruction execution order to reduce stalls dynamically at runtime (dynamic scheduling)

» Better dynamic exploitation of instruction-level parallelism (ILP)

Summary

Page 47: Advanced Pipelining

47

• The amount of parallelism available among a block of instructions

• The number of score entries determines the window size (typically small ~5 instr.)

• The number and types of functional units (Structural hazards increase with out of order)

• The presence of antidependence and output dependences lead to WAR and WAW stalls.

Limitations of Scoreboard

Page 48: Advanced Pipelining

Out of Order Execution with Tomasulo

Page 49: Advanced Pipelining

49

Tomasulo’s Algorithm

• Control logic for out-of-order execution is decentralized

– Reservation Stations (RS) in the functional units keep instruction information

– In addition RS seamlessly rename registers

• A Common Data Bus (CDB) broadcasts data and results to the different devices

– A single instruction can finish each cycle

• Distributed control allows for a larger window of instructions – Dynamic scheduling

Page 50: Advanced Pipelining

50

Tomasulo’s Algorithm

• Structural hazards stall the pipeline

• RS tracks when operands are available and buffers them as soon as they are

– No need for a register bank (store values or sources)

• Impact of RAW dependencies are limited

– Execute an instruction when its operands are available

• WAW and WAR dependencies are avoided

– Register renaming

Page 51: Advanced Pipelining

51

DIV.D F0, F2, F4ADD.D F6, F0, F8ST.D F6, 0(R1)SUB.D F8, F10, F14MUL.D F6, F10, F8

DIV.D F0, F2, F4ADD.D S, F0, F8ST.D S, 0(R1)SUB.D T, F10, F14MUL.D F6, F10, T

Antidependence

Output dependence

Using temporary registers S, T

Register Renaming (Example)

• Eliminates WAR and WAW hazards by renaming all destination registers.

• Can be done by compiler

True dependences

Page 52: Advanced Pipelining

52

Tomasulo Organization

FP addersFP adders

Add1Add2Add3

FP multipliersFP multipliers

Mult1Mult2

From Mem FP Registers

Reservation Stations

Common Data Bus (CDB)

To Mem

FP OpQueue

Load Buffers

Store Buffers

Load1Load2Load3Load4Load5Load6

Normal data bus: data + destinationCommon data bus: data + source

Page 53: Advanced Pipelining

53

Stages of a Tomasulo Pipeline

Issue

WriteBack

ExecuteInteger

WriteBack

ExecuteFP Multiplication

WriteBack

ExecuteFP Add

WriteBack

ExecuteFP Division

WriteBack

ExecuteFP Multiplication

Page 54: Advanced Pipelining

54

Three Stages of Tomasulo Algorithm

1. Issue—get instruction from FP Op Queue If reservation station free (no structural hazard),

control issues instr & sends operands (renames registers).

2.Execute—operate on operands (EX) When both source operands are ready then execute;

if not ready, watch Common Data Bus for result

3.Write result—finish execution (WB) Write on Common Data Bus to all awaiting units;

mark reservation station available

• Normal data bus: data + destination (“go to” bus)

• Common data bus: data + source (“come from” bus)– 64 bits of data + 4 bits of Functional Unit source address

– Write if matches expected Functional Unit (produces result)

– Does the broadcast

Page 55: Advanced Pipelining

55

Reservation Station Components

Op: Operation to perform in the unit (e.g., + or –)

Vj, Vk: Value of Source operands– Store buffers has V field, result to be stored

Qj, Qk: Reservation stations producing source registers (value to be written)

– Note: Qj,Qk=0 => ready– Store buffers only have Qi for RS producing result

Busy: Indicates reservation station or FU is busy

Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.

Page 56: Advanced Pipelining

56

A Tomasulo Example

The following code is run on a Tomasulo pipeline with:

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Functional Unit (FU) # of FUs EX cycles

FP Multiply/Division 2 10/40 FP Addition/Substraction 3 2Mem Load 3

Functional units not pipelined

Page 57: Advanced Pipelining

57

Dependency Graph

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

123456

L.D F6, 34 (R2)

1

L.D F2, 45 (R3)

2

MUL.D F0, F2, F4

3

DIV.D F10, F0, F6

5

SUB.D F8, F6, F2

4

ADD.D F6, F8, F2

6

Data Dependence:(1, 4) (1, 5) (2, 3) (2, 4) (2, 6) (3, 5) (4, 6)

Output Dependence:(1, 6)

Anti-dependence: (5, 6)

Example Code

Real Data Dependence (RAW)

Anti-dependence (WAR)

Output Dependence (WAW)

Page 58: Advanced Pipelining

58

Tomasulo ExampleInstruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

0 FU

Clock cycle counter

FU countdown

Instruction stream

3 Load/Buffers

3 FP Adder R.S.2 FP Mult R.S.

Page 59: Advanced Pipelining

59

Tomasulo Example Cycle 1Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

1 FU Load1

Page 60: Advanced Pipelining

60

Tomasulo Example Cycle 2Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

2 FU Load2 Load1

Page 61: Advanced Pipelining

61

Tomasulo Example Cycle 3Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

3 FU Mult1 Load2 Load1

Page 62: Advanced Pipelining

62

Tomasulo Example Cycle 4Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 Yes SUBD M(A1) Load2Add2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

4 FU Mult1 Load2 M(A1) Add1

Page 63: Advanced Pipelining

63

Tomasulo Example Cycle 5Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

2 Add1 Yes SUBD M(A1) M(A2)Add2 NoAdd3 No

10 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

5 FU Mult1 M(A2) M(A1) Add1 Mult2

Page 64: Advanced Pipelining

64

Tomasulo Example Cycle 6Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

1 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2) Add1Add3 No

9 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

6 FU Mult1 M(A2) Add2 Add1 Mult2

Page 65: Advanced Pipelining

65

Tomasulo Example Cycle 7Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

0 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2) Add1Add3 No

8 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

7 FU Mult1 M(A2) Add2 Add1 Mult2

Page 66: Advanced Pipelining

66

Tomasulo Example Cycle 8Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 No2 Add2 Yes ADDD (M-M) M(A2)

Add3 No7 Mult1 Yes MULTD M(A2) R(F4)

Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

8 FU Mult1 M(A2) Add2 (M-M) Mult2

Page 67: Advanced Pipelining

67

Tomasulo Example Cycle 9Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 No1 Add2 Yes ADDD (M-M) M(A2)

Add3 No6 Mult1 Yes MULTD M(A2) R(F4)

Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

9 FU Mult1 M(A2) Add2 (M-M) Mult2

Page 68: Advanced Pipelining

68

Tomasulo Example Cycle 10Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 No0 Add2 Yes ADDD (M-M) M(A2)

Add3 No5 Mult1 Yes MULTD M(A2) R(F4)

Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

10 FU Mult1 M(A2) Add2 (M-M) Mult2

Page 69: Advanced Pipelining

69

Tomasulo Example Cycle 11Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

4 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

11 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

Page 70: Advanced Pipelining

70

Tomasulo Example Cycle 12Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

3 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

12 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

Page 71: Advanced Pipelining

71

Tomasulo Example Cycle 13Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

2 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

13 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

Page 72: Advanced Pipelining

72

Tomasulo Example Cycle 14Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

1 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

14 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

Page 73: Advanced Pipelining

73

Tomasulo Example Cycle 15Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

0 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

15 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

Page 74: Advanced Pipelining

74

Tomasulo Example Cycle 16Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 No

40 Mult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

16 FU M*F4 M(A2) (M-M+M)(M-M) Mult2

Page 75: Advanced Pipelining

39 cycles later…

Page 76: Advanced Pipelining

76

Tomasulo Example Cycle 55Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 No

1 Mult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

55 FU M*F4 M(A2) (M-M+M)(M-M) Mult2

Page 77: Advanced Pipelining

77

Tomasulo Example Cycle 56Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 No

0 Mult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

56 FU M*F4 M(A2) (M-M+M)(M-M) Mult2

Page 78: Advanced Pipelining

78

Tomasulo Example Cycle 57Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56 57ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30

56 FU M*F4 M(A2) (M-M+M)(M-M) Result

• In-order issue, out-of-order execution and out-of-order completion.

Page 79: Advanced Pipelining

79

Tomasulo’s advantages

(1) Distributed hazard detection logic

– distributed reservation stations and the CDB

– If multiple instructions waiting on a single result, & each instruction has other operand, then instructions can be dispatched simultaneously by broadcasting on CDB

– If a centralized register file were used, the units would have to read their results from the registers when register buses are available.

(2) Avoids stalling due to WAW or WAR hazards

Page 80: Advanced Pipelining

80

Tomasulo Drawbacks

• Complexity of hardware

• Performance limited by Common Data Bus

– Each CDB must go to all functional units high capacitance, high wiring density

– Number of functional units that can complete per cycle limited to one!

» Multiple CDBs more FU logic for parallel stores

Page 81: Advanced Pipelining

81

Summary

• Reservations stations: implicit register renaming to larger set of registers + buffering source operands

– Prevents registers from being bottleneck

– Avoids the WAR and WAW hazards of Scoreboard

• Lasting Contributions

– Dynamic scheduling

– Register renaming

– Load/store disambiguation

Page 82: Advanced Pipelining

Summary of Out-of-Order Processors

Page 83: Advanced Pipelining

83

Out of Order Processors

BENEFITS:

• Accelerates the execution of programs

• More efficient design– Increases the utilisation of

processor resources

LIMITATIONS:

• More complex design

• Very expensive in terms of area and power

• Non-precise interrupts– Interrupting exactly after

an instruction might not be possible

Page 84: Advanced Pipelining

84

Scoreboard vs Tomasulo

Scoreboard Tomasulo

Window size: ≤ 5 instructions ≤ 14 instructions

Structural hazard: No issue No issue

WAR dependency stall completion renaming avoids

WAW dependency: stall completion renaming avoids

Results forwarding: Write/read registers Broadcast from FU

Control structure: central scoreboard distributed reservation stations