Instruction Level Parallelism2
Transcript of Instruction Level Parallelism2
-
7/29/2019 Instruction Level Parallelism2
1/56
Instruction Level Parallelism
-
7/29/2019 Instruction Level Parallelism2
2/56
Instruction Level Parallelism(ILP)
Overlap the execution of instructions to
improve performance
Two approaches to exploit ILP
1. dynamic and hardware intensive
(desktop and server markets)
2. static and software intensive(embedded market)
-
7/29/2019 Instruction Level Parallelism2
3/56
Instruction Level Parallelism(ILP)
Pipeline CPI = Ideal pipeline CPI +
structural stalls +
data hazard stalls +control stalls
reduce each of the terms in the RHS to reduce
the overall CPI and thus increase instructionsper cycle (IPC)
-
7/29/2019 Instruction Level Parallelism2
4/56
Instruction Level Parallelism(ILP)
Amount of parallelism available within a Basic
Block is very small.
Therefore, we exploit ILP across multiple
blocks
-
7/29/2019 Instruction Level Parallelism2
5/56
Instruction Level Parallelism(ILP)
Loop-level parallelism: exploit parallelism
across iterations of a loop
Ex:
for (i=1;i
-
7/29/2019 Instruction Level Parallelism2
6/56
Instruction Level Parallelism(ILP)
Converting loop-level parallelism intoinstruction level parallelism either statically bythe compiler or dynamically by the hardware
Alternatively, use vector instructions thatoperate on a sequence of data items
ex: we need 4 vector instructions to execute
this code: 2 for loading x and y into memory, 2for adding the two vectors and 1 for storingback the result vector
-
7/29/2019 Instruction Level Parallelism2
7/56
Data dependences and Hazards
Finding dependences is critical in determining
how much parallelism exists in a program
Which instructions can be executed in
parallel?
Whether an instruction is dependent on other
instruction?
-
7/29/2019 Instruction Level Parallelism2
8/56
Data dependences and Hazards
Three different types of dependences:
1. data dependences (also called true data
dependences)2. name dependences
3. control dependences
-
7/29/2019 Instruction Level Parallelism2
9/56
Data dependences
An instruction j is data dependent on instruction Iif either of the following holds:
1. instruction i produces a result that may be
used by instruction j, or2. instruction j is data dependent on
instruction k, and instruction k is data
dependent on instruction i(this dependence chain can be as long as the
entire program)
-
7/29/2019 Instruction Level Parallelism2
10/56
Data dependences
Ex: LOOP: L.D F0, 0(R1)
ADD.D F4, F0, F2
S.D F4, 0(R1)
DAAUI R1, R1, #-8
BNE R1, R2, LOOP
If two instructions are data dependent, they cannot beexecuted simultaneously or be completely overlapped
The dependence implies that there would be a chain of oneor more data hazards between the two instructions
-
7/29/2019 Instruction Level Parallelism2
11/56
Data dependences
The effect of the original data dependence mustbe preserved
The presence of the dependence indicates the
potential for a hazard, but the actual hazard andthe length of any stall is a property of thepipeline.
Ex: there is a data dependence between DADDIU and BNE;
this dependence causes a stall because we moved the branchtest for the MIPS pipeline to the ID stage. Had the branch test
stayed in EX, this dependence would not cause a stall
-
7/29/2019 Instruction Level Parallelism2
12/56
Data dependences
The importance of the data dependences isthat a dependence
(1) indicates the possibility of a hazard.
(2) determines the order in which results must
be calculated, and
(3) sets an upper bound on how much
parallelism can possibly be exploited
-
7/29/2019 Instruction Level Parallelism2
13/56
Data dependences
A dependence can be overcome in two
different ways:
1. maintaining the dependence but avoiding a
hazard, &
2. eliminating a dependence by transforming
the code
-
7/29/2019 Instruction Level Parallelism2
14/56
Data dependences
Primary method used to avoid a hazard is byscheduling the code without altering thedependencies (we see a hardware scheme forscheduling code dynamically as it is executed)
Dependences that flow through registers areeasy to detect than dependences that flowthrough memory locations(register names are fixed in the instruction,
100(R4) & 20(R6) may be identical,
20(R4) & 20(R4) may be different)
-
7/29/2019 Instruction Level Parallelism2
15/56
Name dependences
Occurs when two instructions use the same
register or memory location, called a name,
but there is no flow of data between the
instructions associated with that name.
-
7/29/2019 Instruction Level Parallelism2
16/56
Name dependences
There are two types of name dependencesbetween an instruction i that precedesinstruction j in program order:
1. An antidependence between instruction i &instruction j occurs when instruction j
writes register or memory location that
instruction i reads. The original orderingmust be preserved to ensure that I reads the
correct values
-
7/29/2019 Instruction Level Parallelism2
17/56
Name dependences
2. An output dependence occurs when
instruction i & instruction j writes the same
register or memory location. The ordering
between the instructions must be preservedto ensure that value finally written
corresponds to instruction j
-
7/29/2019 Instruction Level Parallelism2
18/56
Name dependences
Instructions involved in a name dependence
can execute simultaneously or be reordered,
if the name( register number or memory)
used in the instruction is changed so theinstructions do not conflict (register
renaming)
-
7/29/2019 Instruction Level Parallelism2
19/56
Dependences and data hazards
Data hazards are of three types depending on
the order of read and write accesses in the
instructions:
Consider two instructions i and j, with i
occurring before j in program order
1. RAW (read after write) true data
dependence. Ex: LOAD followed by an ALU
instrn that directly uses the LOAD result
-
7/29/2019 Instruction Level Parallelism2
20/56
Dependences and data hazards
2. WAW (write after write) output
dependence.
3. WAR (write after read) - antidependence
RAR (read after read) is not a hazard
-
7/29/2019 Instruction Level Parallelism2
21/56
Control dependences
Determines the ordering of an instruction i
with respect to a branch instruction
Ex: if p1 {
s1;
}
if p2 {s2;
}
s1 is control dependent on p1,
and s2 is control dependent on
p2, but not on p1
-
7/29/2019 Instruction Level Parallelism2
22/56
Control dependences
Two constraints imposed by control
dependencies:
1. An instruction that is control dependent on a
branch cannot be moved before the branch
so that its execution is no longer controlled
by the branch
2. An instruction that is not control dependent
on a branch cannot be moved after the
branch so that its execution is controlled by
the branch
-
7/29/2019 Instruction Level Parallelism2
23/56
Control dependences
Control dependencies are preserved by two
properties in a simple pipeline:
1. instructions execute in program order: this
ensures that the instruction that occurs
before the branch is executed before the
branch
2. the detection of a control or branch hazard
ensures that an instruction that is control
dependent on a branch is not executed until
the branch direction is known.
-
7/29/2019 Instruction Level Parallelism2
24/56
Control dependences
However we may be willing to violate the
control dependencies without affecting the
correctness of the program
Two properties that are critical to program
correctness and that must be preserved using
data and control dependencies are
exception behavior and the data flow.
-
7/29/2019 Instruction Level Parallelism2
25/56
Control dependences
Preserving the exception behavior: reordering of
instruction execution must not change how
exceptions are raised in the program (or must not
cause any new exceptions in the program)
Ex: DADDU R2, R3, R4
BEQZ R2, L1
LW R1, 0(R2)
Speculation a hardware technique which allowsus to overcome this exception problem
To show how maintainingdata and control dependences do
not cause any new exceptions
after instruction reordering
-
7/29/2019 Instruction Level Parallelism2
26/56
Control dependences
Preserving the data flow : data flow is the actual
flow of data values among instructions that produce
results and those that consume them
Branches make the data flow dynamic
ex: DADDU R1, R2, R3
BEQZ R4, L
DSUBU R1, R5, R6
L1: OR R7, R1, R8
Data dependence alone is
insufficient for correct execution.
Instead when the instructions
execute, the data flow must be
preserved. This data flow is
preserved by preserving controlflow.
-
7/29/2019 Instruction Level Parallelism2
27/56
Overcoming Data Hazards using
dynamic scheduling
Hardware rearranges the instruction execution
to reduce the stalls while maintaining data
flow and exception behavior
Advantage of dynamic scheduling is gained at
the cost of significant increase in hardware
complexity
-
7/29/2019 Instruction Level Parallelism2
28/56
Dynamic scheduling
Limitation of a simple pipeline: Instructions
are issued in program order, and if an
instruction is stalled in the pipeline, no later
instruction can proceed
Ex: DIV.D F0, F2, F4
ADD.D F10, F0, F8
SUB.D F12, F8, F14
( SUB.D cannot execute because the dependence of ADD.D on
DIV.D causes the pipeline to stall even though it is not data
dependent on anything in the pipeline)
-
7/29/2019 Instruction Level Parallelism2
29/56
Dynamic scheduling
In the 5-stage pipeline, both structural and data
hazards are checked in the ID stage
In order to begin executing SUB.D , we must separate
the ID stage into two parts:
1. checking for structural hazards2. Waiting for the absence of data hazard
We still use in-order instruction issue, but we want
an instruction to begin execution as soon as its dataoperands are available (out-of-order execution and
hence out-of order-completion)
-
7/29/2019 Instruction Level Parallelism2
30/56
Dynamic scheduling
Out-of-order execution introduces the possibility of
WAR and WAW hazards
Ex: DIV.D F0, F2, F4
ADD.D F6, F0, F8
SUB.D F8, F10, F14
MUL.D F6, F10, F8
( WAR hazard because of ADD.D and SUB.D if SUB.D
executes before ADD.D.WAW hazard because of ADD.D and MUL.D )
-
7/29/2019 Instruction Level Parallelism2
31/56
Dynamic schedulingTomosulos
approach
The scheme
- tracks when operands for instructions are
available, to minimize RAW hazards and- uses register renaming, to minimize WAR and WAW
hazards
In Tomosulos approach, register renaming is
provided by the reservation stations
-
7/29/2019 Instruction Level Parallelism2
32/56
Dynamic schedulingTomasulos
approach
How register renaming eliminates WAR and WAW
hazards:
Ex: code before renaming:
DIV.D F0, F2, F4
ADD.D F6, F0, F8
S.D F6, 0(R1)
SUB.D F8, F10, F14
MUL.D F6, F10, F8
Code after renaming:
DIV.D F0, F2, F4
ADD.D S, F0, F8
S.D S, 0(R1)
SUB.D T, F10, F14
MUL.D F6, F10, T
-
7/29/2019 Instruction Level Parallelism2
33/56
Tomasulo-based MIPS processor
-
7/29/2019 Instruction Level Parallelism2
34/56
Tomasulo-based MIPS processor
Each reservation station has 7 fields
Op:Operation to perform on source operands
Vj, Vk: Value of Source operands
Loads have offset value in Vk field
Qj, Qk: Reservation stations producing the corresponding source
operands (value of 0 indicates that the value of source operand
already available in Vj or Vk, or is unnecessary)
A : used to hold information for the memory address
calculation for a load or store
Busy: Indicates reservation station or Functional unit is busy
-
7/29/2019 Instruction Level Parallelism2
35/56
Tomasulo-based MIPS processor
The register file has one field
Qi : The number/name of the reservation station that
contains the operation whose result should be
stored into this register
The Load and Store buffers each have a field
A : holds the result of the effective address
-
7/29/2019 Instruction Level Parallelism2
36/56
Three Stages of Tomasulo
Algorithm1. Issueget instruction from FP Op Queue (Maintains
the correct data flow)
If reservation station free (no structural hazard),
control issues instr & sends operands (renames registers).2. Executeoperate on operands (EX)
When both operands ready then execute;
if not ready, watch Common Data Bus for result
3. Write resultfinish execution (WB)Write on Common Data Bus to all awaiting units;
mark reservation station available
-
7/29/2019 Instruction Level Parallelism2
37/56
Dynamic Scheduling : Example
Show what the information tables look like for the following codesequence when only the first load has completed and written its
result:
1. L.D F6, 34(R2)
2. L.D F2, 45(R3)
3. MUL.D F0, F2, F4
4. SUB.D F8, F2, F6
5. DIV.D F10, F0, F6
6. ADD.D F6, F8, F2
-
7/29/2019 Instruction Level Parallelism2
38/56
Tomasulo ExampleInstruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
L.D F6 34+ R2 Load1 NoL.D F2 45+ R3 Load2 No
MULT.D F0 F2 F4 Load3 No
SUB.D F8 F6 F2
DIV.D F10 F0 F6
ADD.D F6 F8 F2
Reservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
Mult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
0 FU
-
7/29/2019 Instruction Level Parallelism2
39/56
Tomasulo Example Cycle 1Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 Load2 No
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
Mult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
1 FU Load1
-
7/29/2019 Instruction Level Parallelism2
40/56
Tomasulo Example Cycle 2Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
Mult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Load2 Load1
-
7/29/2019 Instruction Level Parallelism2
41/56
Tomasulo Example Cycle 3Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 Yes MULTD R(F4) Load2
Mult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Mult1 Load2 Load1
-
7/29/2019 Instruction Level Parallelism2
42/56
Tomasulo Example Cycle 4Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 Load2 Yes 45+R3
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk
Add1 Yes SUBD M(A1) Load2
Add2 No
Add3 No
Mult1 Yes MULTD R(F4) Load2
Mult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU Mult1 Load2 M(A1) Add1
Load2 completing; what is waiting for Load2?
-
7/29/2019 Instruction Level Parallelism2
43/56
Tomasulo Example Cycle 5Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4
DIVD F10 F0 F6 5
ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk
2 Add1 Yes SUBD M(A1) M(A2)
Add2 No
Add3 No
10 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Mult1 M(A2) M(A1) Add1 Mult2
-
7/29/2019 Instruction Level Parallelism2
44/56
Tomasulo Example Cycle 6Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk
1 Add1 Yes SUBD M(A1) M(A2)
Add2 Yes ADDD M(A2) Add1
Add3 No
9 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1 M(A2) Add2 Add1 Mult2
-
7/29/2019 Instruction Level Parallelism2
45/56
Tomasulo Example Cycle 7Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk
0 Add1 Yes SUBD M(A1) M(A2)
Add2 Yes ADDD M(A2) Add1
Add3 No
8 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1 M(A2) Add2 Add1 Mult2
Add1 completing; what is waiting for it?
-
7/29/2019 Instruction Level Parallelism2
46/56
Tomasulo Example Cycle 8Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk
Add1 No
2 Add2 Yes ADDD (M-M) M(A2)
Add3 No
7 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 M(A2) Add2 (M-M) Mult2
-
7/29/2019 Instruction Level Parallelism2
47/56
Tomasulo Example Cycle 9Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk
Add1 No
1 Add2 Yes ADDD (M-M) M(A2)
Add3 No
6 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 M(A2) Add2 (M-M) Mult2
-
7/29/2019 Instruction Level Parallelism2
48/56
Tomasulo Example Cycle 10Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10
Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk
Add1 No
0 Add2 Yes ADDD (M-M) M(A2)
Add3 No
5 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
10 FU Mult1 M(A2) Add2 (M-M) Mult2
Add2 completing; what is waiting for it?
-
7/29/2019 Instruction Level Parallelism2
49/56
Tomasulo Example Cycle 11Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
4 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
11 FU Mult1 M(A2) (M-M+ (M-M) Mult2
-
7/29/2019 Instruction Level Parallelism2
50/56
Tomasulo Example Cycle 12Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RS Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
3 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
12 FU Mult1 M(A2) (M-M+ (M-M) Mult2
-
7/29/2019 Instruction Level Parallelism2
51/56
Tomasulo Example Cycle 13Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk Add1 No
Add2 No
Add3 No
2 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
13 FU Mult1 M(A2) (M-M+ (M-M) Mult2
-
7/29/2019 Instruction Level Parallelism2
52/56
Tomasulo Example Cycle 14Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk Add1 No
Add2 No
Add3 No
1 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
14 FU Mult1 M(A2) (M-M+ (M-M) Mult2
-
7/29/2019 Instruction Level Parallelism2
53/56
Tomasulo Example Cycle 15Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 15 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk Add1 No
Add2 No
Add3 No
0 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
15 FU Mult1 M(A2) (M-M+ (M-M) Mult2
-
7/29/2019 Instruction Level Parallelism2
54/56
Tomasulo Example Cycle 16Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 15 16 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk Add1 No
Add2 No
Add3 No
Mult1 No
40 Mult2 Yes DIVD M*F4 M(A1)
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
16 FU M*F4 M(A2) (M-M+ (M-M) Mult2
-
7/29/2019 Instruction Level Parallelism2
55/56
Faster than light computation
(skip a couple of cycles)
l l l
-
7/29/2019 Instruction Level Parallelism2
56/56
Tomasulo Example Cycle 55Instruction status: Exec Write
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 15 16 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk Add1 No
Add2 No
Add3 No
Mult1 No
1 Mult2 Yes DIVD M*F4 M(A1)
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
55 FU M*F4 M(A2) (M-M+ (M-M) Mult2