RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629:...

30
RECAP B649 Parallel Architectures and Programming

Transcript of RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629:...

Page 1: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

RECAPB649

Parallel Architectures and Programming

Page 2: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

RECAP

2

Page 3: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Recap

• ILP• Exploiting ILP• Dynamic scheduling• Thread-level Parallelism• Memory Hierarchy• Other topics through student presentations• Virtual Machines

3

Page 4: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

ILP: Pipelining

4

Page 5: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Pipelining: Adding Latches

5

Page 6: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Pipelining: Adding Forwarding

6

Page 7: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Pipelining: Adding Branch Delay Slots

7

Page 8: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Extending the Basic Pipeline

8

Page 9: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Extending the Basic Pipeline

8

Page 10: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Extending the Basic Pipeline

8

Page 11: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Exploiting ILP Through Compiler Techniques

9

• Loop unrolling• Making use of branch delayed slots• Static branch prediction• Loop fusion• Unroll and jam• ...

Page 12: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Dynamic Branch Prediction

10

bits to index BPB

Address bits

BranchPredictionBuffer1

0

10

Page 13: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Two-bit Branch Predictor

11

Page 14: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

General n-bit Correlating Branch Predictors

12

bits to index BPB

Address bits

BranchPredictionBuffer

global shift register (m bits)

n bits

Use Branch Target Buffers (BTBs) for caching branch targets

Page 15: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Dynamic Scheduling: Tomasulo’s Approach

13

Page 16: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Tomasulo’s Approach: Observations

• RAW hazards handled by waiting for operands• WAR and WAW hazards handled by register

renaming★ only WAR and WAW hazards between instructions

currently in the pipeline are handled; is this a problem?★ larger number of hidden names reduces name dependences

• CDB implements forwarding

14

Page 17: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Tomasulo’s Approach + Speculation

15

Fields in ROB1. Instruction type2. Destination3. Value4. Ready

Page 18: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Observations on Speculation

16

• Speculation enables precise exception handling★ defer exception handling until instruction ready to commit

• Branches are critical to performance★ prediction accuracy★ latency of misprediction detection★ misprediction recovery time

•Must avoid hazards through memory ★WAR and WAW already taken care of (how?)★ for RAW

✴ don’t allow load to proceed if an active ROB entry has Destination field matching with A field of load

✴ maintain program order for effective address computation (why?)

Page 19: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Multiple Issue Processor Types

17

Page 20: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Dyn. Scheduling+Multiple Issue+Speculation

18

• Design parameters★ two-way issue (two instruction issues per cycle)★ pipelined and separate integer and FP functional units★ dynamic scheduling, but not out-of-order issue★ speculative execution

• Task per issue: assign reservation station and update pipeline control tables (i.e., control signals)

• Two possible techniques★ do the task in half a clock cycle★ build wider logic to issue any pair of instructions together

• Modern processors use both (4 or more way superscalar)

Page 21: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Shared-Memory Multiprocessors

19

Page 22: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Distributed-Memory Multiprocessors

20

Page 23: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Other Ways to Categorize Parallel Programming

21

data vs task parallel

client-sever vs p-to-p /

master-slave vs symm.

shared m

em vs m

sg

passing

tight

ly v

s lo

osel

y co

uple

dthreads vs

producer-consumer

course vs fine grained

SPM

D vs

MPM

Drecursive vs iterative

synch. vs asynch.

Parallel Programs

Page 24: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Write Invalidate Cache Coherence Protocolfor Write-Back Caches

22

Page 25: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Distributed Memory+Directories

23

Page 26: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Directory-Based Cache Coherence

24

Page 27: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Other Topics

• x86 assembly programming• VLIW / EPIC• Vector processors• Embedded systems• Scientific applications• GPUs and GPGPUs• CUDA and OpenCL• Interconnection networks• Virtualization

25

Page 28: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

WHAT’S NEXT?

26

Page 29: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Future

• Continued importance of parallel programming★ challenge: how to program multiprocessors★ role of programming languages and compilers

• Convergence or specialization?★ “standardization” of general purpose architecture★ migration of “special-purpose” CPUs for general use

27

Page 30: RECAP - Indiana University Bloomington · RECAP B649 Parallel Architectures and Programming. B629: Practical Compiling for Modern Machines RECAP 2. ... Write Invalidate Cache Coherence

B629: Practical Compiling for Modern Machines

Landscape of Parallel Computing Research:A View from Berkeley

28