RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629:...
Transcript of RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629:...
RECAPB649
Parallel Architectures and Programming
B629: Practical Compiling for Modern Machines
RECAP
2
B629: Practical Compiling for Modern Machines
Recap
• ILP• Exploiting ILP• Dynamic scheduling• Thread-level Parallelism• Memory Hierarchy• Other topics through student presentations• Virtual Machines
3
B629: Practical Compiling for Modern Machines
ILP: Pipelining
4
B629: Practical Compiling for Modern Machines
Pipelining: Adding Latches
5
B629: Practical Compiling for Modern Machines
Pipelining: Adding Forwarding
6
B629: Practical Compiling for Modern Machines
Pipelining: Adding Branch Delay Slots
7
B629: Practical Compiling for Modern Machines
Extending the Basic Pipeline
8
B629: Practical Compiling for Modern Machines
Extending the Basic Pipeline
8
B629: Practical Compiling for Modern Machines
Extending the Basic Pipeline
8
B629: Practical Compiling for Modern Machines
Exploiting ILP Through Compiler Techniques
9
• Loop unrolling• Making use of branch delayed slots• Static branch prediction• Loop fusion• Unroll and jam• ...
B629: Practical Compiling for Modern Machines
Dynamic Branch Prediction
10
bits to index BPB
Address bits
BranchPredictionBuffer1
0
10
B629: Practical Compiling for Modern Machines
Two-bit Branch Predictor
11
B629: Practical Compiling for Modern Machines
General n-bit Correlating Branch Predictors
12
bits to index BPB
Address bits
BranchPredictionBuffer
global shift register (m bits)
n bits
Use Branch Target Buffers (BTBs) for caching branch targets
B629: Practical Compiling for Modern Machines
Dynamic Scheduling: Tomasulo’s Approach
13
B629: Practical Compiling for Modern Machines
Tomasulo’s Approach: Observations
• RAW hazards handled by waiting for operands• WAR and WAW hazards handled by register
renaming★ only WAR and WAW hazards between instructions
currently in the pipeline are handled; is this a problem?★ larger number of hidden names reduces name dependences
• CDB implements forwarding
14
B629: Practical Compiling for Modern Machines
Tomasulo’s Approach + Speculation
15
Fields in ROB1. Instruction type2. Destination3. Value4. Ready
B629: Practical Compiling for Modern Machines
Observations on Speculation
16
• Speculation enables precise exception handling★ defer exception handling until instruction ready to commit
• Branches are critical to performance★ prediction accuracy★ latency of misprediction detection★ misprediction recovery time
•Must avoid hazards through memory ★WAR and WAW already taken care of (how?)★ for RAW
✴ don’t allow load to proceed if an active ROB entry has Destination field matching with A field of load
✴ maintain program order for effective address computation (why?)
B629: Practical Compiling for Modern Machines
Multiple Issue Processor Types
17
B629: Practical Compiling for Modern Machines
Dyn. Scheduling+Multiple Issue+Speculation
18
• Design parameters★ two-way issue (two instruction issues per cycle)★ pipelined and separate integer and FP functional units★ dynamic scheduling, but not out-of-order issue★ speculative execution
• Task per issue: assign reservation station and update pipeline control tables (i.e., control signals)
• Two possible techniques★ do the task in half a clock cycle★ build wider logic to issue any pair of instructions together
• Modern processors use both (4 or more way superscalar)
B629: Practical Compiling for Modern Machines
Shared-Memory Multiprocessors
19
B629: Practical Compiling for Modern Machines
Distributed-Memory Multiprocessors
20
B629: Practical Compiling for Modern Machines
Other Ways to Categorize Parallel Programming
21
data vs task parallel
client-sever vs p-to-p /
master-slave vs symm.
shared m
em vs m
sg
passing
tight
ly v
s lo
osel
y co
uple
dthreads vs
producer-consumer
course vs fine grained
SPM
D vs
MPM
Drecursive vs iterative
synch. vs asynch.
Parallel Programs
B629: Practical Compiling for Modern Machines
Write Invalidate Cache Coherence Protocolfor Write-Back Caches
22
B629: Practical Compiling for Modern Machines
Distributed Memory+Directories
23
B629: Practical Compiling for Modern Machines
Directory-Based Cache Coherence
24
B629: Practical Compiling for Modern Machines
Other Topics
• x86 assembly programming• VLIW / EPIC• Vector processors• Embedded systems• Scientific applications• GPUs and GPGPUs• CUDA and OpenCL• Interconnection networks• Virtualization
25
B629: Practical Compiling for Modern Machines
WHAT’S NEXT?
26
B629: Practical Compiling for Modern Machines
Future
• Continued importance of parallel programming★ challenge: how to program multiprocessors★ role of programming languages and compilers
• Convergence or specialization?★ “standardization” of general purpose architecture★ migration of “special-purpose” CPUs for general use
27
B629: Practical Compiling for Modern Machines
Landscape of Parallel Computing Research:A View from Berkeley
28