L27,28-Superscaler

download L27,28-Superscaler

of 28

Transcript of L27,28-Superscaler

  • 8/6/2019 L27,28-Superscaler

    1/28

    What is Superscalar?

    Common instructions (arithmetic, load/store,

    conditional branch) can be initiated

    simultaneously and executed independently A superscalar processor is one in which

    multiple independent instruction pipelines are

    used. Each pipeline consists of multiple

    stages, so that each pipeline can handle

    multiple instructions at a time.

    Equally applicable to RISC & CISC

    In practice usually RISC

  • 8/6/2019 L27,28-Superscaler

    2/28

    Why Superscalar?

    Most operations are on scalar quantities

    Performance of execution of scalar instructions

    can be improved by using superscalar processing.

    Superscalar approach is the next step in the

    evolution of high-performance general-purposeprocessors.

    Superscalar allows parallel fetch execute

  • 8/6/2019 L27,28-Superscaler

    3/28

    Superpipelined approach

    Many pipeline stages need less than half a clock

    cycle

    Thus, a doubled internal clock speed allows the

    performance of two tasks in one external clock

    cycle

  • 8/6/2019 L27,28-Superscaler

    4/28

  • 8/6/2019 L27,28-Superscaler

    5/28

    Superscalar implementations achievesInstruction level parallelism (ILP)

    ILP can be maximized by optimizing the

    compiler & Hardware techniques

    Limitations to parallelism are

    True data dependency

    Procedural dependency

    Resource conflicts Output dependency

    Antidependency

  • 8/6/2019 L27,28-Superscaler

    6/28

    Also called as Flow dependency or Write-read

    dependency

    In case of dependency b/w 1st and 2nd instructions , 2nd

    instructions is delayed as many clock cycles as required

    to remove data dependency

    With no dependency instructions can be fetched,decoded and executed in parallel

  • 8/6/2019 L27,28-Superscaler

    7/28

    Procedural Dependency

    Can not execute instructions following a branch

    instruction until the branch is executed

    Also, if instruction length is not fixed, instructionshave to be partially decoded before the following

    instruction can be fetched

    This prevents simultaneous fetches which is

    required by superscalar pipeline.

  • 8/6/2019 L27,28-Superscaler

    8/28

    Resource Conflict

    Two or more instructions requiring access to thesame resource at the same time

    e.g. of resources include memories, caches,

    buses, register-file ports, functional units (likeALU adder)

    Can duplicate resources

    e.g. have two arithmetic units so two

    arithmetic instructions can be executed inparallel

  • 8/6/2019 L27,28-Superscaler

    9/28

  • 8/6/2019 L27,28-Superscaler

    10/28

    Design Issues

    Instruction level parallelism & MachineParallelism

    Instruction-issue policy

    Register Renaming

    Branch Prediction

  • 8/6/2019 L27,28-Superscaler

    11/28

    Design Issues

    Instruction level parallelism exists when Instructions in a sequence are independent Executed in parallel by overlapping

    Limitations: true data and procedural dependency

    Machine Parallelism Measure of ability of processor to take advantage

    of instruction level parallelism

    Determined by number of instructions that can be

    fetched and executed at the same time (no ofpipelines) & by the speed & mechanism that the

    processor uses to find independent instructions

  • 8/6/2019 L27,28-Superscaler

    12/28

    Instruction Issue Policy

    Instruction issue refers to the process of initiatinginstruction execution in the processors functional

    unit

    Instruction-issue policy refers to the protocol

    used to issue instructions Following orderings are important:

    Order in which instructions are fetched

    Order in which instructions are executed Order in which instructions change

    registers and memory

  • 8/6/2019 L27,28-Superscaler

    13/28

    In general, superscalar instruction issue policies are

    grouped into the following categories :1. In-order issue with in-order completion

    2. In-order issue with out-of-order completion

    3. out-of-order issue with out-of-order completion

  • 8/6/2019 L27,28-Superscaler

    14/28

    1. In-Order Issue In-Order Completion

    Issue instructions in the order they occur i.e. issue

    instructions in the order that would achieve

    sequential execution

    Result is written in the same order

    Not very efficient

    May fetch >1 instruction

    Instructions are fetched 2 at a time and passed to the

    decode unit.

    Fetched Instructions must wait until the pair of

    decode pipeline stages has cleared.

  • 8/6/2019 L27,28-Superscaler

    15/28

    Example:

    I1 requires two clock cycles to execute

    I3 and I4 conflict for the same functional unit

    I5 depends on the value produced by I4

    I5 and I6 conflict for a functional unit

  • 8/6/2019 L27,28-Superscaler

    16/28

  • 8/6/2019 L27,28-Superscaler

    17/28

    2. In-Order Issue Out-of-Order Completion

    Used in superscalar RISC processors

    Requires more complex logic than in-order completion

    Output dependency should be avoided. Consider following

    instructions:

    R3:= R3 + R5; (I1) R4:= R3 + 1; (I2)

    R3:= R5 + 1; (I3)

    R7:= R3 + R4; (I4)

    I2 depends on result of I1 - data dependency

    If I3 completes before I1, the result will be wrong this isoutput (Write-write) dependency

    To have correct o/p execution of I3 must be stalled if its

    result might be overwritten by an older instruction that

    takes longer to complete

  • 8/6/2019 L27,28-Superscaler

    18/28

  • 8/6/2019 L27,28-Superscaler

    19/28

    3. Out-of-Order Issue Out-of-Order Completion

    Decouple decode & execution stages of the pipeline This is done with a buffer referred to as an

    instruction window

    When the processor finishes decoding an instruction,

    it is placed in the instruction window

    Can continue to fetch and decode until this buffer is

    full

    When a functional unit becomes available an

    instruction can be executed

    So we can say that processor has look ahead capacity

    Identifies independent instructions and bring it to

    execution stage irrespective of their original program

    order

  • 8/6/2019 L27,28-Superscaler

    20/28

  • 8/6/2019 L27,28-Superscaler

    21/28

    This dependency is similar to that of true data

    dependency, but reversed

    Here 2nd instruction destroys a value that the 1st

    instruction uses

  • 8/6/2019 L27,28-Superscaler

    22/28

    Register Renaming

    Out-of-order issuing and out-of-order completion

    of instruction, gives rise to the possibility of

    output dependency and antidependencies

    Output and antidependencies occur becauseregister contents may not reflect the correct

    ordering of values dedicated by the program flow

    Both dependencies are examples of storage

    conflicts Multiple instructions compete for the use of same

    register locations. A solution to this is duplication

    of resources. The technique is called as Register

    Renaming

  • 8/6/2019 L27,28-Superscaler

    23/28

    Register Renaming (Contd.)

    In this method registers are allocated

    dynamically by the processor h/w

    When an instruction executes that has aregister as a destination operand, a new

    register is allocated for that value

    Subsequent instructions that access that value

    as a source operand in that register must gothrough a renaming process

  • 8/6/2019 L27,28-Superscaler

    24/28

    For each instruction that writes to a register X, a newregister X is instantiated

    Multiple register Xs can co-exist

    Consider

    S1: R3 = R3 + R5

    S2: R4 = R3 + 1 S3: R3 = R5 + 1

    S4: R7 = R3 + R4

    becomes

    S1: R3b = R3a + R5a S2: R4b = R3b + 1

    S3: R3c = R5a + 1

    S4: R7b = R3c + R4b

  • 8/6/2019 L27,28-Superscaler

    25/28

  • 8/6/2019 L27,28-Superscaler

    26/28

    Branch Prediction

    Any high performance pipelined m/c must use a

    method of dealing with branches 80486 fetches both next sequential instruction after branch

    and branch target instruction Other methods for prediction are:

    Predict never taken

    Predict always taken

    Predict by opcode

    Taken/Not taken switch

    Branch History Table

  • 8/6/2019 L27,28-Superscaler

    27/28

    Conceptual view of Superscalar execution

  • 8/6/2019 L27,28-Superscaler

    28/28

    Superscalar implementationFor implementing superscalar processors some processor

    hardware is required. Following are the key elements:

    1) Instruction fetch strategies

    2) Logic for determining true dependencies &

    mechanism for communicating these values towhere they are needed

    3) Mechanism for initiating multiple instructions in

    parallel

    4) Resources for parallel execution of multipleinstructions

    5) Mechanism for committing the process state in

    correct order