L27,28-Superscaler
-
Upload
domainname9 -
Category
Documents
-
view
220 -
download
0
Transcript of L27,28-Superscaler
-
8/6/2019 L27,28-Superscaler
1/28
What is Superscalar?
Common instructions (arithmetic, load/store,
conditional branch) can be initiated
simultaneously and executed independently A superscalar processor is one in which
multiple independent instruction pipelines are
used. Each pipeline consists of multiple
stages, so that each pipeline can handle
multiple instructions at a time.
Equally applicable to RISC & CISC
In practice usually RISC
-
8/6/2019 L27,28-Superscaler
2/28
Why Superscalar?
Most operations are on scalar quantities
Performance of execution of scalar instructions
can be improved by using superscalar processing.
Superscalar approach is the next step in the
evolution of high-performance general-purposeprocessors.
Superscalar allows parallel fetch execute
-
8/6/2019 L27,28-Superscaler
3/28
Superpipelined approach
Many pipeline stages need less than half a clock
cycle
Thus, a doubled internal clock speed allows the
performance of two tasks in one external clock
cycle
-
8/6/2019 L27,28-Superscaler
4/28
-
8/6/2019 L27,28-Superscaler
5/28
Superscalar implementations achievesInstruction level parallelism (ILP)
ILP can be maximized by optimizing the
compiler & Hardware techniques
Limitations to parallelism are
True data dependency
Procedural dependency
Resource conflicts Output dependency
Antidependency
-
8/6/2019 L27,28-Superscaler
6/28
Also called as Flow dependency or Write-read
dependency
In case of dependency b/w 1st and 2nd instructions , 2nd
instructions is delayed as many clock cycles as required
to remove data dependency
With no dependency instructions can be fetched,decoded and executed in parallel
-
8/6/2019 L27,28-Superscaler
7/28
Procedural Dependency
Can not execute instructions following a branch
instruction until the branch is executed
Also, if instruction length is not fixed, instructionshave to be partially decoded before the following
instruction can be fetched
This prevents simultaneous fetches which is
required by superscalar pipeline.
-
8/6/2019 L27,28-Superscaler
8/28
Resource Conflict
Two or more instructions requiring access to thesame resource at the same time
e.g. of resources include memories, caches,
buses, register-file ports, functional units (likeALU adder)
Can duplicate resources
e.g. have two arithmetic units so two
arithmetic instructions can be executed inparallel
-
8/6/2019 L27,28-Superscaler
9/28
-
8/6/2019 L27,28-Superscaler
10/28
Design Issues
Instruction level parallelism & MachineParallelism
Instruction-issue policy
Register Renaming
Branch Prediction
-
8/6/2019 L27,28-Superscaler
11/28
Design Issues
Instruction level parallelism exists when Instructions in a sequence are independent Executed in parallel by overlapping
Limitations: true data and procedural dependency
Machine Parallelism Measure of ability of processor to take advantage
of instruction level parallelism
Determined by number of instructions that can be
fetched and executed at the same time (no ofpipelines) & by the speed & mechanism that the
processor uses to find independent instructions
-
8/6/2019 L27,28-Superscaler
12/28
Instruction Issue Policy
Instruction issue refers to the process of initiatinginstruction execution in the processors functional
unit
Instruction-issue policy refers to the protocol
used to issue instructions Following orderings are important:
Order in which instructions are fetched
Order in which instructions are executed Order in which instructions change
registers and memory
-
8/6/2019 L27,28-Superscaler
13/28
In general, superscalar instruction issue policies are
grouped into the following categories :1. In-order issue with in-order completion
2. In-order issue with out-of-order completion
3. out-of-order issue with out-of-order completion
-
8/6/2019 L27,28-Superscaler
14/28
1. In-Order Issue In-Order Completion
Issue instructions in the order they occur i.e. issue
instructions in the order that would achieve
sequential execution
Result is written in the same order
Not very efficient
May fetch >1 instruction
Instructions are fetched 2 at a time and passed to the
decode unit.
Fetched Instructions must wait until the pair of
decode pipeline stages has cleared.
-
8/6/2019 L27,28-Superscaler
15/28
Example:
I1 requires two clock cycles to execute
I3 and I4 conflict for the same functional unit
I5 depends on the value produced by I4
I5 and I6 conflict for a functional unit
-
8/6/2019 L27,28-Superscaler
16/28
-
8/6/2019 L27,28-Superscaler
17/28
2. In-Order Issue Out-of-Order Completion
Used in superscalar RISC processors
Requires more complex logic than in-order completion
Output dependency should be avoided. Consider following
instructions:
R3:= R3 + R5; (I1) R4:= R3 + 1; (I2)
R3:= R5 + 1; (I3)
R7:= R3 + R4; (I4)
I2 depends on result of I1 - data dependency
If I3 completes before I1, the result will be wrong this isoutput (Write-write) dependency
To have correct o/p execution of I3 must be stalled if its
result might be overwritten by an older instruction that
takes longer to complete
-
8/6/2019 L27,28-Superscaler
18/28
-
8/6/2019 L27,28-Superscaler
19/28
3. Out-of-Order Issue Out-of-Order Completion
Decouple decode & execution stages of the pipeline This is done with a buffer referred to as an
instruction window
When the processor finishes decoding an instruction,
it is placed in the instruction window
Can continue to fetch and decode until this buffer is
full
When a functional unit becomes available an
instruction can be executed
So we can say that processor has look ahead capacity
Identifies independent instructions and bring it to
execution stage irrespective of their original program
order
-
8/6/2019 L27,28-Superscaler
20/28
-
8/6/2019 L27,28-Superscaler
21/28
This dependency is similar to that of true data
dependency, but reversed
Here 2nd instruction destroys a value that the 1st
instruction uses
-
8/6/2019 L27,28-Superscaler
22/28
Register Renaming
Out-of-order issuing and out-of-order completion
of instruction, gives rise to the possibility of
output dependency and antidependencies
Output and antidependencies occur becauseregister contents may not reflect the correct
ordering of values dedicated by the program flow
Both dependencies are examples of storage
conflicts Multiple instructions compete for the use of same
register locations. A solution to this is duplication
of resources. The technique is called as Register
Renaming
-
8/6/2019 L27,28-Superscaler
23/28
Register Renaming (Contd.)
In this method registers are allocated
dynamically by the processor h/w
When an instruction executes that has aregister as a destination operand, a new
register is allocated for that value
Subsequent instructions that access that value
as a source operand in that register must gothrough a renaming process
-
8/6/2019 L27,28-Superscaler
24/28
For each instruction that writes to a register X, a newregister X is instantiated
Multiple register Xs can co-exist
Consider
S1: R3 = R3 + R5
S2: R4 = R3 + 1 S3: R3 = R5 + 1
S4: R7 = R3 + R4
becomes
S1: R3b = R3a + R5a S2: R4b = R3b + 1
S3: R3c = R5a + 1
S4: R7b = R3c + R4b
-
8/6/2019 L27,28-Superscaler
25/28
-
8/6/2019 L27,28-Superscaler
26/28
Branch Prediction
Any high performance pipelined m/c must use a
method of dealing with branches 80486 fetches both next sequential instruction after branch
and branch target instruction Other methods for prediction are:
Predict never taken
Predict always taken
Predict by opcode
Taken/Not taken switch
Branch History Table
-
8/6/2019 L27,28-Superscaler
27/28
Conceptual view of Superscalar execution
-
8/6/2019 L27,28-Superscaler
28/28
Superscalar implementationFor implementing superscalar processors some processor
hardware is required. Following are the key elements:
1) Instruction fetch strategies
2) Logic for determining true dependencies &
mechanism for communicating these values towhere they are needed
3) Mechanism for initiating multiple instructions in
parallel
4) Resources for parallel execution of multipleinstructions
5) Mechanism for committing the process state in
correct order