EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and...
Transcript of EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and...
![Page 1: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/1.jpg)
EXPLOITING ILPB649
Parallel Architectures and Pipelining
![Page 2: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/2.jpg)
B649: Parallel Architectures and Programming, Spring 2009
What is ILP?
• Use hardware techniques to minimize stalls★ branch prediction★ dynamic scheduling★ speculation
• Pick instructions across branches (i.e., across basic blocks) to overlap★ on MIPS average dynamic branch frequency is 15% to 25%★ Pick instructions across loop iterations
2
Pipeline CPI = Ideal pipeline CPI + Structural stalls + Data hazard stalls + Control stalls
for (i=1; i<=1000; i++)x[i] = x[i] + y[i];
![Page 3: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/3.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Data Dependences• Conditions for data dependence from instruction i
to instruction j★ i and j access a common memory location / register★ at least one of those accesses is a write★ there is a valid control-flow path from i to j
• Data dependence is transitive★ j depends on i, k depends on j ⇒ k depends on i
• Three types of data dependences★ true dependence (RAW)★ anti-dependence (WAR)★ output dependence (WAW)
3
![Page 4: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/4.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Control Dependences
• Dependences arising out of program control-flow• Prevent reordering to maintain program correctness (just like data dependences)
4
if (c1) {S1;
}if (c2) {
S2;}S3;
![Page 5: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/5.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Control Dependences
• Dependences arising out of program control-flow• Prevent reordering to maintain program correctness (just like data dependences)
4
if (c1) {S1;
}if (c2) {
S2;}S3;
if (c1)
if (c2)
S1;
S2;
S3;
![Page 6: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/6.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Another (Equivalent) View
• Data dependences★ true dependences (RAW)
• Name dependences★ anti-dependences (WAR) and output dependences (WAW)★ not “true” dependences
• Control dependences
5
![Page 7: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/7.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Another (Equivalent) View
• Data dependences★ true dependences (RAW)
• Name dependences★ anti-dependences (WAR) and output dependences (WAW)★ not “true” dependences
• Control dependences
5
Both software and hardware will reorder to improve performance as long as the “observed behavior” of the program does not change.
(Principle of observational equivalence)
![Page 8: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/8.jpg)
BASIC COMPILER TECHNIQUES:LOOP UNROLLING AND
SCHEDULING
![Page 9: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/9.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Simple Example
7
for (i=1000; i>0; i--)x[i] = x[i] + s;
![Page 10: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/10.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Simple Example
7
for (i=1000; i>0; i--)x[i] = x[i] + s;
![Page 11: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/11.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Pipelined Machine
8
![Page 12: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/12.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Scheduled Loop
9
for (i=1000; i>0; i--)x[i] = x[i] + s;
![Page 13: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/13.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Scheduled Loop
10
for (i=1000; i>0; i--)x[i] = x[i] + s;
![Page 14: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/14.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Loop Unrolling
11
![Page 15: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/15.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Loop Unrolling
11
Takes 27 cycles
![Page 16: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/16.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Schedule Unrolled Loop
12
![Page 17: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/17.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Schedule Unrolled Loop
12
Takes 14 cycles
![Page 18: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/18.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Loop Unrolling
13
• Unroll a small number of times (called unroll factor)★ reduces branches★ bigger body enables better instruction scheduling
• Rename registers to avoid name dependences★ too much unrolling can cause register pressure
• Reorder instructions to reduce stalls• Generate startup and / or cleanup loops for
iterations that are not multiple of unroll factor
![Page 19: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/19.jpg)
BRANCH PREDICTION
![Page 20: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/20.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Static Branch Prediction
• Delay slots help★ can schedule instructions in the delay slots from the branch
direction taken more often
• Static prediction approaches★ predict taken (average misprediction for SPEC is 34%,
ranging from 9% to 59%)★ use profile information
15
![Page 21: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/21.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Misprediction Rate Based on Profile DataSpec92 Benchmarks
16
![Page 22: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/22.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Dynamic Branch Prediction
17
• “Branch prediction buffer” or “branch history table”• Small 1-bit memory (cache) indexed by lower bits of
address
bits to index BPB
Address bits
BranchPredictionBuffer1
0
10
![Page 23: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/23.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Dynamic Branch Prediction
17
• “Branch prediction buffer” or “branch history table”• Small 1-bit memory (cache) indexed by lower bits of
address
Problem: Predicts incorrectly twice
bits to index BPB
Address bits
BranchPredictionBuffer1
0
10
![Page 24: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/24.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Two-bit Predictors
18
![Page 25: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/25.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Prediction Accuracy: 4K 2-bit Entries(Spec89 Benchmarks)
19
![Page 26: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/26.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Prediction Accuracy: 4K vs Infinite
20
![Page 27: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/27.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Correlating Branch Predictors
21
if (aa == 2)aa = 0;
if (bb == 2)bb = 0;
if (aa != bb) {...
}
• Motivating example
• Idea★ “correlate” last m branches
![Page 28: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/28.jpg)
B649: Parallel Architectures and Programming, Spring 2009
General Correlating Predictors
• (m,n) predictor★ use last m branches to predict using n bit saturating
counters
22
bits to index BPB
Address bits
BranchPredictionBuffer
global shift register (m bits)
n bits
![Page 29: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/29.jpg)
B649: Parallel Architectures and Programming, Spring 2009
General Correlating Predictors
• (m,n) predictor★ use last m branches to predict using n bit saturating
counters
23
bits to index BPB
Address bits
BranchPredictionBuffer
global shift register (m bits)
n bits
![Page 30: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/30.jpg)
B649: Parallel Architectures and Programming, Spring 2009
General Correlating Predictors
• (m,n) predictor★ use last m branches to predict using n bit saturating
counters
24
if b = number of entries in BTB,
number of bits in BTB = 2^m × 2^n × b
![Page 31: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/31.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Comparison of 2-bit Predictors
25
![Page 32: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/32.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Tournament Predictors(Spec89 Benchmarks)
26
![Page 33: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/33.jpg)
DYNAMIC SCHEDULING
![Page 34: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/34.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Comments on Assignment 1
28
• Start NOW!• Use course blog for discussions• Needs work, but rewards await you• You may mix languages
★ e.g., parsing might be easier with a scripting language
• Matrix-matrix computation == matrix multiply• Note differences in Tomasulo’s approach for assignment from the
textbook Figure 2.9• Extra credit possibilities (case-by-case)
★ forwarding in pipelined architecture★ speculation in Tomasulo’s approach★ other applications★ transformations such as, loop unrolling
![Page 35: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/35.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Why Dynamic Scheduling?
• Reduces stalls★ tolerates data hazards★ tolerates cache misses
• Code optimized for one pipeline can run on another• Can work with statically scheduled code• BUT, needs significantly more hardware
29
![Page 36: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/36.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Dynamic Scheduling: Idea
• Out-of-order execution• Out-of-order completion• Additional issues:
★ WAR and WAW hazards★ precise exceptions
• Implementation:★ split ID into Issue and Read Operands★ multiple instructions in execution need multiple functional
units
• In order issue, out-of-order execution
30
![Page 37: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/37.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Dynamic Scheduling: Idea
• Out-of-order execution• Out-of-order completion• Additional issues:
★ WAR and WAW hazards★ precise exceptions
• Implementation:★ split ID into Issue and Read Operands★ multiple instructions in execution need multiple functional
units
• In order issue, out-of-order execution
30
DIV.D F0, F2, F4ADD.D F10,F0,F8SUB.D F12,F8,F14
![Page 38: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/38.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Dynamic Scheduling: Idea
• Out-of-order execution• Out-of-order completion• Additional issues:
★ WAR and WAW hazards★ precise exceptions
• Implementation:★ split ID into Issue and Read Operands★ multiple instructions in execution need multiple functional
units
• In order issue, out-of-order execution
30
DIV.D F0, F2, F4ADD.D F10,F0,F8SUB.D F12,F8,F14
DIV.D F0, F2, F4ADD.D F6,F0,F8SUB.D F8,F10,F14MUL.D F6,F10,F8
![Page 39: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/39.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Tomasulo’s Approach
• Invented by Robert Tomasulo for IBM 360★ 360 had only 4 FP regs and long floating point delays
• Avoids RAW and WAW hazards by register renaming★ use of reservation stations
31
DIV.D F0, F2, F4ADD.D F6,F0,F8S.D F6,0(R1)SUB.D F8,F10,F14MUL.D F6,F10,F8
![Page 40: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/40.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Tomasulo’s Approach
• Invented by Robert Tomasulo for IBM 360★ 360 had only 4 FP regs and long floating point delays
• Avoids RAW and WAW hazards by register renaming★ use of reservation stations
31
DIV.D F0, F2, F4ADD.D F6,F0,F8S.D F6,0(R1)SUB.D F8,F10,F14MUL.D F6,F10,F8
DIV.D F0, F2, F4ADD.D S,F0,F8S.D S,0(R1)SUB.D T,F10,F14MUL.D F6,F10,T
![Page 41: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/41.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Tomasulo’s Approach: Basic Structure
32
![Page 42: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/42.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Tomasulo’s Approach: Steps
33
• Issue★ get next instruction from queue★ move to a matching reservation station (or stall if none available)★ get operand values if in registers, else keep track of units that will produce
them
• Execute★ if an operand not available, monitor the CDB★ if all operands are ready, execute the instruction when the functional unit
becomes available★ loads and stores take two steps: read register and wait for memory
• Write results★ functional units write into CDB (and from there into registers)★ stores write to memory when both value and store register are available
![Page 43: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/43.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Tomasulo’s Approach: Fields
• Reservation station:★ Op: operation★ Qj, Qk: operands, to come from reservation stations★ Vj, Vk: operands, available in registers★ A: effective address (initialized with immediate value)★ Busy: bit to indicate the reservation station is busy
• Register file:★ Qi: the number of the reservation station whose result will go into this
register; blank (or zero) indicates the register already has the value
34
![Page 44: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/44.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Example
35
1. L.D F6,32(R2)2. L.D F2,44(R3)3. MUL.D F0,F2,F44. SUB.D F8,F2,F65. DIV.D F10,F0,F66. ADD.D F6,F8,F2
![Page 45: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/45.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Example
36
![Page 46: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/46.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Example (contd.)
37
![Page 47: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/47.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Observations
• RAW hazards handled by waiting for operands• WAR and WAW hazards handled by register
renaming★ only WAR and WAW hazards between instructions
currently in the pipeline are handled; is this a problem?★ larger number of hidden names reduces name dependences
• CDB implements forwarding
38
![Page 48: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/48.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Loop Example
39
Loop: L.D F0,0(R1)MUL.D F4,F0,F2S.D F4,0(R1)DADDIU R1,R1,-8BNE R1,R2,loop
![Page 49: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/49.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Loop Example
40
![Page 50: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/50.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Summary of Tomasulo’s Approach
41
• Need to check WAR and WAW hazards★ through registers★ through loads and stores
• Works very well if branches predicted accurately★ instruction not allowed to execute unless all preceding branches
finished
• Out-of-order completion results in imprecise exceptions• Widely popular
★ high performance without compiler assistance★ can hide cache latencies★ reasonable performance for code difficult to schedule statically★ key component of speculation
![Page 51: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/51.jpg)
HARDWARE-BASED SPECULATION
![Page 52: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/52.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Speculation: Handling Control Dependences
• Fetch, issue, and execute instructions as if branch predictions always right
• Mechanism to handle situation where prediction was incorrect
• Combines three key ideas:★ dynamic branch prediction★ speculation to execute without waiting for control
dependences to resolve★ dynamic scheduling
43
![Page 53: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/53.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Speculation: Handling Control Dependences
• Fetch, issue, and execute instructions as if branch predictions always right
• Mechanism to handle situation where prediction was incorrect
• Combines three key ideas:★ dynamic branch prediction★ speculation to execute without waiting for control
dependences to resolve★ dynamic scheduling
43
Data flow execution
![Page 54: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/54.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Extending Tomasulo’s Algorithm
• Separate execution from completion★ execute: when data dependences are resolved★ commit: when control dependences are resolved
• Out-of-order execution, but in-order commit• Store uncommitted instructions in a reorder
buffer (ROB)• Written results found in ROB, until committed
★ similar to store buffer
• Register file / memory written upon commit
44
![Page 55: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/55.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Tomasulo’s Approach + Speculation
45
![Page 56: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/56.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Tomasulo’s Approach + Speculation
45
Fields in ROB1. Instruction type2. Destination3. Value4. Ready
![Page 57: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/57.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Speculation StepsIssue → Execute → Write → Commit
46
• Issue★ get instruction from queue★ issue if empty reservation station and empty ROB slot
✴ otherwise, stall★ update control fields to indicate buffers are in use★ send the reserved ROB entry number to the reservation
station for tagging
![Page 58: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/58.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Speculation StepsIssue → Execute → Write → Commit
47
• Execute★ if an operand not ready, monitor the CDB
✴ checks RAW hazards★ execute when both operands available
✴ loads require two steps (why?)✴ stores only need base register (why?)
![Page 59: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/59.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Speculation StepsIssue → Execute → Write → Commit
48
• Write★ upon completion, write result+tag on CDB★ CDB is read by ROB and any waiting reservation stations★ mark reservation station available★ for store:
✴ write in ROB’s Value field if value available✴ otherwise, keep monitoring CDB for the value
![Page 60: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/60.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Speculation StepsIssue → Execute → Write → Commit
49
• Commit (also called “completion” or “graduation”)★ normal commit
✴ update the register with the result, remove entry from ROB★ store
✴ update memory with the result, remove entry from ROB★ correctly predicted branch
✴ finish the branch★ incorrectly predicted branch
✴ flush the ROB✴ restart at the correct successor
![Page 61: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/61.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Example
50
1. L.D F6,32(R2)2. L.D F2,44(R3)3. MUL.D F0,F2,F44. SUB.D F8,F2,F65. DIV.D F10,F0,F66. ADD.D F6,F8,F2
![Page 62: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/62.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Example
51
![Page 63: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/63.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Loop Example
52
Loop: L.D F0,0(R1)MUL.D F4,F0,F2S.D F4,0(R1)DADDIU R1,R1,-8BNE R1,R2,loop
![Page 64: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/64.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Loop Example
53
![Page 65: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/65.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Observations on Speculation
54
• Speculation enables precise exception handling★ defer exception handling until instruction ready to commit
• Branches are critical to performance★ prediction accuracy★ latency of misprediction detection★ misprediction recovery time
• Must avoid hazards through memory ★WAR and WAW already taken care of (how?)★ for RAW
✴ don’t allow load to proceed if an active ROB entry has Destination field matching with A field of load
✴ maintain program order for effective address computation (why?)
![Page 66: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/66.jpg)
SUPERSCALAR PROCESSORS
![Page 67: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/67.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Improving ILP: Multiple Issue
• Statically scheduled superscalar processors• VLIW (Very Long Instruction Word) processors• Dynamically scheduled superscalar processors
56
![Page 68: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/68.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Multiple Issue Processor Types
57
![Page 69: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/69.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Dyn. Scheduling+Multiple Issue+Speculation
58
• Design parameters★ two-way issue (two instruction issues per cycle)★ pipelined and separate integer and FP functional units★ dynamic scheduling, but not out-of-order issue★ speculative execution
• Task per issue: assign reservation station and update pipeline control tables (i.e., control signals)
• Two possible techniques★ do the task in half a clock cycle★ build wider logic to issue any pair of instructions together
• Modern processors use both (4 or more way superscalar)
![Page 70: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/70.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Loop Example
59
Loop: LD R2,0(R1) ;R2=array element DADDIU R2,R2,#1 ;increment R2 SD R2,0(R1) ;store result DADDIU R1,R1,#8 ;increment pointer BNE R2,R2,Loop ;branch if not last
![Page 71: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/71.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Two-Way Issue, Without Speculation
60
![Page 72: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/72.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Two-Way Issue, With Speculation
61
Compare with 14 and 19, without speculation
![Page 73: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/73.jpg)
ADVANCED TECHNIQUES FORINSTRUCTION DELIVERY AND
SPECULATION
![Page 74: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/74.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Increasing Fetch Bandwidth:Branch Target Buffers
63
![Page 75: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/75.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Increasing Fetch Bandwidth:Branch Target Buffers
64
![Page 76: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/76.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Increasing Fetch Bandwidth:Branch Target Buffers
65
![Page 77: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/77.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Increasing Fetch Bandwidth:Branch Target Buffers: Variation
66
Target instruction(s)+
![Page 78: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/78.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Increasing Fetch Bandwidth
67
• Return address predictors★ assuming a special instruction for return (not a jump)★ returns are indirect (why?)★ more important for OO and dynamic languages (esp. VMs)
![Page 79: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/79.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Return Address Prediction Accuracy
68
![Page 80: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/80.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Increasing Fetch Bandwidth
69
• Return address predictors★ assuming a special instruction for return (not a jump)★ returns are indirect (why?)★ more important for OO and dynamic languages (esp. VMs)
• Integrated (stand-alone) instruction fetch units (not just a pipeline stage)★ integrated branch prediction★ instruction prefetch (when might this be useful?)★ instruction memory access and buffering (e.g., trace cache
on Pentium 4)
![Page 81: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/81.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Improving Speculation: Register Renaming• Use an extended set of “physical” registers, instead of
“architectural” registers★ beware of the terminology difference with memory system
• Physical registers hold values, instead of reservation stations or ROB
• Map of architectural to physical registers• Commit steps:
★ make the map entry for written architectural reg. “permanent”★ deallocate physical registers containing “older” value
• Deallocating physical registers★ look at the source operands to find unused physical registers★ wait until next instruction writing the same architectural register
commits (how does this work?)70
![Page 82: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/82.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Improving Speculation
• Limiting speculation★ avoid speculating when it may cause an expensive
exception, such as TLB miss or L2 miss
• Speculating through multiple branches★ speculate on a subsequent branch while the previous one
still pending✴ useful when high branch frequency, clustered branches, or long
functional unit delays★ speculating on more than branch in one cycle
✴ no architecture combines multiple branch prediction in one cycle with full speculation
• Value prediction71
![Page 83: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/83.jpg)
INTEL PENTIUM 4
![Page 84: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/84.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Overall Architecture
73
![Page 85: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/85.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Overall Architecture
74
![Page 86: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/86.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Overall Architecture
75
![Page 87: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/87.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Overall Architecture
76
![Page 88: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/88.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Overall Architecture
77
![Page 89: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/89.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Characteristics
78
![Page 90: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/90.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Branch Misprediction Rate
79
![Page 91: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/91.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Percentage Mispredicted uops
80
![Page 92: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/92.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Data Cache Misses
81
![Page 93: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/93.jpg)
B649: Parallel Architectures and Programming, Spring 2009
CPI
82
![Page 94: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/94.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Intel Pentium 4 vs AMD Opteron: CPI
83
![Page 95: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/95.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Intel Pentium 4 vs AMD Opteron: Performance
84
Pentium 4: 3.2 GHz Opteron: 2.6 GHz
![Page 96: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/96.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Intel Pentium 4 vs IBM Power5
85
Pentium 4: 3.2 GHz Power5: 1.9 GHz
![Page 97: EXPLOITING ILP - Indiana University BloomingtonEXPLOITING ILP B649 Parallel Architectures and Pipelining B649: Parallel Architectures and Programming, Spring 2009 What is ILP? •Use](https://reader030.fdocuments.net/reader030/viewer/2022011917/5fe9d603ce052a0c845d1790/html5/thumbnails/97.jpg)
B649: Parallel Architectures and Programming, Spring 2009
Recap
86
• Many advanced techniques on modern processors★ pipelining★ dynamic scheduling★ branch prediction ★ speculation★ multiple issue★ branch target buffers★ register renaming★ ...
• Too much complexity ⇒ Multiple cores