Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.
-
Upload
anna-bruce -
Category
Documents
-
view
238 -
download
3
Transcript of Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.
![Page 1: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/1.jpg)
Computer Organization
and Architecture
Instruction-Level Parallelism and Superscalar Processors
![Page 2: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/2.jpg)
Outline
What is a superscalar architecture? Superpipelining Features of superscalar architectures Data dependencies Policies for parallel instruction execution Register renaming
![Page 3: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/3.jpg)
Performance Improvement
There are two typical approaches today, in order to improve performance: Superpipelining Superscalar
![Page 4: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/4.jpg)
Superpipelining
Many pipeline stages need less than half a clock cycle
Double internal clock speed gets two tasks per external clock cycle
Superpipelining is based on dividing the stages of a pipeline into substages and thus increasing the number of instructions which are supported by the pipeline at a given moment.
![Page 5: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/5.jpg)
Superpipelining By dividing each stage into two, the clock cycle period t
will be reduced to the half, t/2; hence, at the maximum capacity, the pipeline produces a result every t/2 s
For a given architecture and the corresponding instruction set there is an optimal number of pipeline stages; increasing the number of stages over this limit reduces the performance
A solution to further improve speed is the superscalar architecture
![Page 6: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/6.jpg)
Pipelined, Superpiplined and Superscalar Execution
![Page 7: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/7.jpg)
What is a Superscalar Architecture? A superscalar architecture is one in which several
instructions can be initiated simultaneously and executed independently.
Pipelining allows several instructions to be executed at the same time, but have to be in different pipeline stages at a given moment.
Superscalar architectures include all features of pipelining but, in addition, can be several instructions executing simultaneously in the same pipeline stage.
![Page 8: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/8.jpg)
Superscalar Architectures Superscalar architectures allow several
instructions to be issued and completed per clock cycle
A superscalar architecture consists of a number of pipelines that are working in parallel
Depending on the number and kind of parallel units available, a certain number of instructions can be executed in parallel.
![Page 9: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/9.jpg)
General Superscalar Organization
![Page 10: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/10.jpg)
Superscalar Architecture
![Page 11: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/11.jpg)
Limitations on Parallel Execution The situations which prevent instructions to be executed
in parallel by a superscalar architecture are very similar to those which prevent an efficient execution on any pipelined architecture
The consequences of these situations on superscalar architectures are more severe than those on simple pipelines, because the potential of parallelism is greater and, thus, a greater opportunity is lost
Limited by: Resource conflicts Procedural dependency (control dependency) Data dependency
![Page 12: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/12.jpg)
Resource Conflicts
They occur if two or more instructions compete for the same resource (register, memory, functional unit) at the same time; they are similar to structural hazards discussed with pipelines.
Introducing several parallel pipelined units, superscalar architectures try to reduce a part of possible resource conflicts.
![Page 13: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/13.jpg)
Control Dependency
The presence of branches creates major problems in assuring an optimal parallelism
If instructions are of variable length, they cannot be fetched and issued in parallel; an instruction has to be decoded in order to identify the following one and to fetch it
Superscalar techniques are efficiently applicable to RISCs, with fixed instruction length and format
![Page 14: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/14.jpg)
Data Conflicts Data conflicts are produced by dependencies Because superscalar architectures provide a great
liberty in the order in which instructions can be issued and completed, data dependencies have to be considered with much attention.
Three types of data dependencies True data dependency Output dependency Antidependency
![Page 15: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/15.jpg)
Pipelined, Superpiplined and Superscalar Execution
![Page 16: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/16.jpg)
Dependencies
![Page 17: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/17.jpg)
True Data Dependency
True data dependency exists when the output of one instruction is required as an input to a subsequent instruction
![Page 18: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/18.jpg)
True Data Dependency True data dependencies are intrinsic features of
the user program. They cannot be eliminated by compiler or hardware techniques.
The simplest solution is to stall the adder until the multiplier has finished.
In order to avoid the adder to be stalled, the compiler or hardware can find other instructions which can be executed by the adder until result of the multiplication is available.
![Page 19: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/19.jpg)
Output Dependency
An output dependency exists if two instructions are writing into the same location; if the second instruction writes before the first one, an error occurs:
![Page 20: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/20.jpg)
Antidependency
An antidependency exists if an instruction uses a location as an operand while a following one is writing into that location; if the first one is still using the location when the second one writes into it, an error occurs:
![Page 21: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/21.jpg)
Output Dependencies and Antidependencies
![Page 22: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/22.jpg)
Solution: Additional Registers
![Page 23: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/23.jpg)
Policies for Parallel Instruction Execution (1)
![Page 24: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/24.jpg)
Policies for Parallel Instruction Execution (2)
![Page 25: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/25.jpg)
Policies for Parallel Instruction Execution (3)
![Page 26: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/26.jpg)
Execution Policies
In-order issue with in-order completion. In-order issue with out-of-order completion. Out-of-order issue with out-of-order completion.
![Page 27: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/27.jpg)
Example We consider the following superscalar
architecture Two instructions can be fetched and decoded at a
time Three functional units can work in parallel: floating
point unit, integer adder, integer multiplier Two instructions can be written back (completed) at
a time; results are written (completion) in the same order
![Page 28: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/28.jpg)
Example
![Page 29: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/29.jpg)
In-Order Issue with In-Order Completion
![Page 30: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/30.jpg)
In-Order Issue with In-Order Completion
To guarantee in-order completion, when there is a conflict for a functional unit or when a functional unit require more than 1 cycle, the issuing temporarily stalls.
![Page 31: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/31.jpg)
In-Order Issue with In-Order Completion
Has not to bother about output-dependency and antidependency
It has only to detect true data dependencies.
No one of two dependencies will be violated if instructions are issued/completed in-order
![Page 32: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/32.jpg)
In-Order Issue with In-Order Completion
![Page 33: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/33.jpg)
In-Order Issue with In-Order Completion
![Page 34: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/34.jpg)
In-Order Issue with Out-of-Order Completion
![Page 35: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/35.jpg)
Out-of-Order Issue with Out-of-Order Completion
![Page 36: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/36.jpg)
Out-of-Order Issue with Out-of-Order Completion
![Page 37: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/37.jpg)
Out-of-Order Issue with Out-of-Order Completion
![Page 38: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/38.jpg)
Register Renaming
![Page 39: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/39.jpg)
Register Renaming
![Page 40: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/40.jpg)
Comments
![Page 41: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/41.jpg)
Comments
![Page 42: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/42.jpg)
Some Architectures PowerPC 604
six independent execution units: Branch execution unit Load/Store unit 3 Integer units Floating-point unit
in-order issue Power PC 620
provides in addition to the 604 out-of-order issue
![Page 43: Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.](https://reader035.fdocuments.net/reader035/viewer/2022081506/56649e4a5503460f94b3eac0/html5/thumbnails/43.jpg)
Some Architectures
Pentium three independent execution units:
2 Integer units Floating point unit
in-order issue Intel P6
provides in addition to the Pentium out-of-order issue