CS 7810 Lecture 18 The Potential for Using Thread-Level Data Speculation to Facilitate Automatic...

CS 7810 Lecture 18

The Potential for Using Thread-Level DataSpeculation to Facilitate Automatic Parallelization

J.G. Steffan and T.C. MowryProceedings of HPCA-4

February 1998

Multi-Threading

• CMPs advocate low complexity and static approaches to parallelism extraction

• Resolving memory dependences for integer codes is not easy!

Large window100 in-flight instrs

Compiler-generated threads4 windows of 25 instrs each

Probable Conflicts

Example: Compress

Example Execution

• Bullet

Compiler Optimizations

• Induction variables: in_count

• Reduction: out_count

• Parallel I/O: getchar() and putchar()

• Scalar forwarding: free_entries

• Ambiguous loads and stores: hash[…]

Methodology

• Threads (epochs) were constructed by hand

• The procs are in-order and instrs are unit latency

Ambiguous Loads and Stores

Average Run Lengths

Forwarding Registers and Scalars

Average Run Lengths

Realistic Models

• 10-cycle forwarding latency• Sharing at cache line granularity• Recovery from misspeculation• Results are not sensitive to forwarding latency or cache line size

Hardware Support

• Cache coherence protocol for the L1 caches

• For each cache line, keep track of whether the line has been read/modified

• When the oldest thread writes to a cache line, an invalidate is sent to the other caches

• The younger thread sets a violation flag if the younger thread has speculatively loaded the line -- s/w recovery is initiated when the thread commits

• Cache line evicts cause violations (not common)

Role of the Compiler

• Profiling to identify epochs large enough to offset thread management and communication cost; small enough to have low speculative state

• Estimate probability of violation (static/dynamic)

• Optimizations (induction, reduction, parallel I/O)

• Scalar forwarding and rescheduling

• Insertion of register recovery code

Conclusions

• Hardware catches violations; compiler can parallelize aggressively

• Competitive implementation: large window with store sets prediction

• Bullet

CS 7810 Lecture 18 The Potential for Using Thread-Level Data Speculation to Facilitate Automatic...

Documents

Transcript of CS 7810 Lecture 18 The Potential for Using Thread-Level Data Speculation to Facilitate Automatic...

Synchronization Todd C. Mowry CS 740 November 24, 1998

Kulturbericht Land Kärnten 2016 Druck Steffan

Synchronization Todd C. Mowry CS 740 November 1, 2000

CATHERINE MOWRY LaCUGNA’S CONTRIBUTION TO TRINITARIAN THEOLOGYcdn.theologicalstudies.net/63/63.4/63.4.4.pdf · CATHERINE MOWRY LaCUGNA’S CONTRIBUTION TO TRINITARIAN THEOLOGY ...

Advanced jQuery and Lasso Integration By Steffan Cline steffan@execuchoice.net.

CS 7810 Lecture 13

Steffan vs Ashcroft

Optimistic Intra-Transaction Parallelism using Thread Level Speculation Chris Colohan 1, Anastassia Ailamaki 1, J. Gregory Steffan 2 and Todd C. Mowry.

Data Sheet pd 7810/11

Mowry Landing Mowry Avenue & Blacow Road Fremont, CA Features Location:

Steffan harris final powerpoint

A Probabilistic Pointer Analysis for Speculative Optimization Jeff DaSilva Greg Steffan Jeff DaSilva Greg Steffan Electrical and Computer Engineering University.

7810 Carpal Tunnel Syndrome 2-16-web

Synchronization Todd C. Mowry CS 495 March 26, 2002

Power of Networks by Steffan Aquarone

helsea 7810 - Legacy Classic

El socialismo del siglo xxi heinz dieterich steffan

Par : Vanessa Gallo, Justin Rule et Ashley Mowry.

Customer preso 1 6 steffan welch

CS 7810 Lecture 22