Transactional Memory CDA6159

54
Transactional Memory CDA6159

description

Transactional Memory CDA6159. Outline. Introduction Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93) Paper 2: Transactional Memory Coherence and Consistency (Lance Hammond, ISCA ‘04). Introduction. Transaction - PowerPoint PPT Presentation

Transcript of Transactional Memory CDA6159

Page 1: Transactional Memory CDA6159

Transactional Memory

CDA6159

Page 2: Transactional Memory CDA6159

Outline

Introduction

Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93)

Paper 2: Transactional Memory Coherence and Consistency (Lance Hammond, ISCA ‘04)

Page 3: Transactional Memory CDA6159

Introduction

Transaction

A sequence of actions that appears indivisible and instantaneous to an outside observer.

Four specific attributes: atomicity, consistency, isolation, and

durability — collectively known as the ACID properties.

Page 4: Transactional Memory CDA6159

Introduction

Concurrency controlLock? Bad performance, deadlock, etc.

lock-free, optimistic cc

Herlihy and Moss in 1993 proposed hardware-supported transactional memory as a mechanism for building lock-free data structures.

Page 5: Transactional Memory CDA6159

Basic Transactional Mechanisms

Isolation Detect when transactions conflict Track read and write sets

Version management Record new and old values

Atomicity Commit new values Abort back to old values

Page 6: Transactional Memory CDA6159

H/W Transactional Memory Systems

Knight’s Lisp Work Transactional Memory Oklahoma Update SLE/TLR Transactional Coherence and Consistency Unbounded TM Virtual TM Thread-level TM

Page 7: Transactional Memory CDA6159

Outline

Introduction

Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93)

Paper 2: Transactional Memory Coherence and Consistency (Lance Hammond, ISCA ‘04)

Page 8: Transactional Memory CDA6159

Lock and Problems

Lock is commonly used with shared data Priority Inversion

Lower priority process hold a lock needed by a higher priority process

Convoy Effect When lock holder is interrupted, other is forced to wait

Deadlock Circular dependence between different processes acquiring locks, so

everyone just wait for locks

Page 9: Transactional Memory CDA6159

H&M’s Transactional Memory [’93]

Intended to replace short critical sections Motivated by lock-free data structures

Transactions: Read and write multiple locations Commit in arbitrary order Implicit begin, explicit commit operations Abort affects memory, not registers

Software manages restarting execution Validate instruction detects pending abort

Implementation extends cache coherence Read/Write locks correspond to MESI states Add orthogonal transaction states

Page 10: Transactional Memory CDA6159

Transactional Hardware State

processor state transaction active flag (TACTIVE)

whether a transaction is in progress; implicitly set by 1st xactional op

transaction status flag (TSTATUS)whether the transaction is active (true) or aborted (false)

small, fully-associative xactional cachedisjoint from the L1 cache (data can only be one or the other)

hold tentative writes before propagationinvalidated if aborted, snooped and/or written back if committed

2 copies of each xactional lineto avoid writebacks to memory; this enables xactional writes to hold both old & new value

abort another xaction that will cause conflict aborted by interrupts & xactional cache overflows act like regular cache if not in xaction fast commit and abort (in a single cache cycle)

Page 11: Transactional Memory CDA6159

TM Instructions

Instructions for accessing memory Load-transactional (LT)

Reads from shared memory into private register

Load-transactional-exclusive (LTX) LT+ hinting write is coming up

Store-transactional (ST) Tentatively write from private register to shared memory, new value is not visible to other processors till commit

Instructions for manupulating xaction state Commit

Tries to make tentative write permanent. Successful if no other processor read its write set or write its read/write set. Write set visible to others.When fails, discard all updates to write set

AbortDiscard all updates to write set

ValidateReturn current transaction status. Indicating whether it’s aborted.If current status is false, discard all updates to write set

Page 12: Transactional Memory CDA6159

Transaction Example

/* keep trying */While ( true ) {

/* read variables */v1 = LT ( V1 ); …; vn = LT ( Vn );/* check consistency */if ( ! VALIDATE () ) continue;/* compute new values */compute ( v1, … , vn);/* write tentative values */ ST (v1, V1); … ST(vn, Vn);/* try to commit */if ( COMMIT () ) return result;else backoff;

}

Page 13: Transactional Memory CDA6159

Transactional Cache

Extend cache coherency protocols any protocol capable of detecting accessibilit

y conflicts can also detect transaction conflict at no extra cost.

Includes bus snoopy, directory

Additional transactional tag EMPTY, NORMAL, XCOMMIT, XABORT Two entries per xaction data

XCOMMIT, XABORT

Allocation policyEMPTY>NORMAL>XCOMMIT

Bus cycles T_READ and T_RFO(read for ownership) BUSY

Request can be refused by responding BUSY; When BUSY is received, xaction is aborted;

This prevents deadlock and continual mutual aborts

Page 14: Transactional Memory CDA6159

Processor Operations

LT Check for XABORT entry If false, check for NORMAL entry

Switch NORMAL to XABORT and allocate XCOMMIT

If false, issue T_READ on bus, then allocate XABORT and XCOMMIT If T_READ receive BUSY, abort

Set TSTATUS to false Drop all XABORT entries Set all XCOMMIT entries to NORMAL Return random data

LTX, ST Same as LT Except

Use T_RFO on a miss rather than T_READ, cache line state to RESERVED For ST, XABORT entry is updated

Page 15: Transactional Memory CDA6159

Processor Operations

VALIDATE Return TSTATUS flag If false, set TSTATUS true, TACTIVE false

ABORT Set TSTATUS true, TACTIVE false Change XABORT to EMPTY, XCOMMIT to NORMAL

COMMIT Return TSTATUS, set TSTATUS true, TACTIVE false Drops all XCOMMIT and changes all XABORT to NORMAL

Page 16: Transactional Memory CDA6159

Snoopy Cache Actions

Regular cache acts as MESI, treats READ as T_READ, RFO as T_RFO

Transactional cache Non-xactional cycle: Acts like regular cache, NORMAL entries only T_READ: If the the entry is valid (share), returns the value All other cycle: BUSY

Memory Responds to READ, T_READ, RFO, T_RFO when no cache responds; WRITE

Page 17: Transactional Memory CDA6159

Advantage and disadvantage

Single cache for both reg/xaction data Set size would determine the max xaction size; Parallel commit/abort logic for a larger cache

Xaction size is limited by the xactional cache size Overflow, traps into software Xaction data set is small

Cannot survive interrupt

Page 18: Transactional Memory CDA6159

Simulation

Proteus Simulator 32 processors Regular cache

Direct mapped, 2048 8-byte lines Transactional cache

Fully associative, 64 8-byte lines Single cycle caches access 4 cycle memory access Both snoopy bus and directory are simulated 2 stage network with switch delay of 1 cycle each

Page 19: Transactional Memory CDA6159

Benchmarks

Counter n processors, each increment a shared counter (2^16)/n times

Producer/Consumer buffer n/2 processors produce, n/2 processor consume through a

shared FIFO end when 2^16 items are consumed

Doubly-linked list N processors tries to rotate the content from tail to head End when 2^16 items are moved Variables shared are conditional Traditional locking method can introduce deadlock

Page 20: Transactional Memory CDA6159

Comparisons

CompetitorsTransactional memoryLoad-locked/store-cond (Alpha)Spin lock with backoff Software queueHardware queue

Page 21: Transactional Memory CDA6159

Counter Result

Page 22: Transactional Memory CDA6159

Producer/Consumer Result

Page 23: Transactional Memory CDA6159

Doubly Linked List Result

Page 24: Transactional Memory CDA6159

Conclusion

Avoid extra lock variable and lock problems Trade dead lock for possible live lock/starvation Comparable performance to lock technique when shared

data structure is small Relatively easy to implement

Page 25: Transactional Memory CDA6159

Outline

Introduction

Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93)

Paper 2: Transactional Memory Coherence and Consistency (Lance Hammond, ISCA ‘04)

Page 26: Transactional Memory CDA6159
Page 27: Transactional Memory CDA6159
Page 28: Transactional Memory CDA6159
Page 29: Transactional Memory CDA6159
Page 30: Transactional Memory CDA6159
Page 31: Transactional Memory CDA6159
Page 32: Transactional Memory CDA6159
Page 33: Transactional Memory CDA6159
Page 34: Transactional Memory CDA6159
Page 35: Transactional Memory CDA6159
Page 36: Transactional Memory CDA6159
Page 37: Transactional Memory CDA6159
Page 38: Transactional Memory CDA6159
Page 39: Transactional Memory CDA6159
Page 40: Transactional Memory CDA6159
Page 41: Transactional Memory CDA6159
Page 42: Transactional Memory CDA6159
Page 43: Transactional Memory CDA6159
Page 44: Transactional Memory CDA6159

Basic TCC Transaction Control Bits

In each local cache Read bits (per cache line, or per word to eliminate false sharing)

Set on speculative loads Snooped by a committing transaction (writes by other CPU)

Modified bits (per cache line) Set on speculative stores Indicate what to rollback if a violation is detected Different from dirty bit

Page 45: Transactional Memory CDA6159

During A Transaction Commit

Need to collect all of the modified caches together into a commit packet

Potential solutions A separate write buffer, or An address buffer maintaining a list of the line tags to be committed Size?

Broadcast all writes out as one single (large) packet to the rest of the system

Page 46: Transactional Memory CDA6159

Other

Rollback is needed when a transaction cannot commit Checkpoints needed prior to a transaction Checkpoint register state

Hardware approach: Flash-copying rename table / arch register file Software approach: extra instruction overheads

Overflow issue Conflict or capacity misses require all the victim lines to be kept somewhere (e.g. victim

cache) Stall temporarily, request for commit

Page 47: Transactional Memory CDA6159
Page 48: Transactional Memory CDA6159
Page 49: Transactional Memory CDA6159
Page 50: Transactional Memory CDA6159
Page 51: Transactional Memory CDA6159
Page 52: Transactional Memory CDA6159
Page 53: Transactional Memory CDA6159
Page 54: Transactional Memory CDA6159

Thanks!