Transactional Memory An Overview of Hardware Alternatives

22
Transactional Memory An Overview of Hardware Alternatives David A. Wood University of Wisconsin Transactional Memory Workshop April 8 th , 2005

description

Transactional Memory An Overview of Hardware Alternatives. David A. Wood University of Wisconsin Transactional Memory Workshop April 8 th , 2005. What’s database got to do with it?. Atomicity All updates, or none Consistency Correct at begin and end Isolation Partial work not visible - PowerPoint PPT Presentation

Transcript of Transactional Memory An Overview of Hardware Alternatives

Page 1: Transactional Memory An Overview of Hardware Alternatives

Transactional MemoryAn Overview of Hardware Alternatives

David A. WoodUniversity of Wisconsin

Transactional Memory WorkshopApril 8th, 2005

Page 2: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 2

What’s database got to do with it? Atomicity

All updates, or none Consistency

Correct at begin and end

Isolation Partial work not visible Inputs stay stable

Durability Survive “system”

failures

Despite increasing awareness of failures

All (or some) memory ops,not just database objects

Page 3: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 3

801 Database Storage Lock bits on virtual

memory 128 byte granularity Added to pagetable and TLB Caches user’s lock state Trap on lock conflict

No h/w for logging, abort, etc.

Only uniprocessors 801 and RS/6000

PPN Tid, Lock bits

CPU

Memory

TLB

Was this transactional memory?Tid

Page 4: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 4

SQL/801 “The development of SQL/801 was greatly

simplified because, with minor exceptions, it considers only a single user. It achieves multiuser concurrency [on a uniprocessor] by running in multiple processes using the shared database storage….” Chang and Mergen, ’88

Largest transactional memory application Only real hardware transactional memory

implementation No one seems to be looking at what they learned

Page 5: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 5

Basic Transactional Mechanisms

Isolation Detect when transactions conflict Track read and write sets

Version management Record new and old values

Atomicity Commit new values Abort back to old values

Page 6: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 6

H/W Transactional Memory Systems

Knight’s Lisp Work Transactional Memory Oklahoma Update SLE/TLR Transactional Coherence and Consistency Unbounded TM Virtual TM Thread-level TM

Page 7: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 7

Knight’s Lisp Work [’86] Parallel execution of sequential code

Break program into “transaction blocks” Multiple loads in a transaction Exactly one store ends the transaction No register state passed between transactions

Execute transactions in parallel Track dependences (i.e., read set) Abort and restart on conflicting write

Transactions commit in sequential order Broadcast writes on commit

Page 8: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 8

Knight’s Hardware Two caches

Dependency cache Tracks read set Bus monitor detects conflicts

Confirm cache Holds write set Supports multiple writes

Commits Check dep. cache Broadcast writes

Fast aborts Invalidate Confirm cache Use old values in Dep. Cache Immediately restart execution

Valid Old Value

Depends

Old Value

Invalid

A New Value

Confirm Cache

CPU

Memory

Dependency Cache

Spawned two threads: TLS & TM

Page 9: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 9

H&M’s Transactional Memory [’93] Targets explicitly parallel (non-functional)

codes Motivated by lock-free data structures

Transactions: Read and write multiple locations Commit in arbitrary order Implicit begin, explicit commit operations Abort affects memory, not registers

Software manages restarting execution Validate instruction detects pending abort

Implementation extends cache coherence Read/Write locks correspond to MOESI states Add orthogonal transaction states

Page 10: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 10

H&M’s Transactional Memory Adds Transaction Cache

Stores all data accessed by transactions

2 copies of each line Before and after image Even for read-only data

Small, fully associative Abort on all conflicts

NACK conflicting requests Abort NACKed transaction

Fast commit and abort Change trans. cache state

M

S

S

M XCommit

New Value

M XAbort Old Value

S XCommit

Old Value

S XAbort Old ValueCache Transaction

Cache

CPU

Memory

Page 11: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 11

SLE/TLR Hardware exploits speculative processors

Read sets tracked by coherence protocol Write set maintained in store queue Abort restarts execution, including register state

Speculative lock elision (SLE) Elide locks from the dynamic execution stream

Convert critical sections to optimistic transactions Concurrently execute non-conflicting transactions Fall back on explicit locks if conflicts

Transactional Lock Removal (TLR) Resolve conflicts using priority ordering (timestamps) Delay lower priority transactions Deadlock and starvation free

Page 12: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 12

Transactional Coherence and Consistency [’04]

TCC unifies coherence, memory consistency, and transaction support All transactions, all the time

Transaction ordering Ordered, Unordered, Partially Ordered Supports thread-level speculation

Optimistic concurrency model Unordered transactions serialize at commit Conflicts detected at commit

Page 13: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 13

TCC

CPUL1 D

L2 Cache Logically Shared

Write buffer ~4 kB, holds new values until commit

On-Chip Interconnect

Broadcast updates at commit

L1 cache tracks read set, bit per line

SRF

Shadow register file checkpoints architectural registers

Page 14: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 14

TCC Commits are sequential

Broadcasts addresses of all updates Supports large transactions

Serialize all other transactions Grabs and holds the commit bus

Cannot abort large transactions Updates affect L2/Mem; no undo

Extensions forthcoming talk to Kunle and Christos

Page 15: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 15

Unbounded Transactional Memory (UTM)

Unbounded transactions Arbitrary size

Not limited by write buffer, cache, or memory Arbitrary duration

Not limited by interrupts, context switch, etc. Complex implementation

Not justified by performance

Settle for “nearly” unbounded transactions Much simpler hardware

Page 16: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 16

Transactional Linux

Almost all of the transactions require < 100 cache lines 99.9% need fewer than 54 cache lines

There are, however, some very large transactions! >500k-byte fully-associative cache required

9.355x10^6

10^6

10^4

10^2

1 8144 1000 100 10 1N

umbe

r of

ove

rflo

win

g tr

ansa

ctio

ns

Fully associative cache size (64 byte lines)

makedbench

Log-log scale

Page 17: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 17

Large Transaction Memory (LTM) Register checkpoints

Snapshot of rename maps Cache tracks read and write sets

T-bits mark transactional blocks Cache holds new data values “in place” O-bit indicates overflow to in-memory hashtable

Memory holds committed state Abort invalidates all modified blocks

Miss on re-execution Transactional writes force memory updates

Repeated writes (e.g., to local data) are written through

Page 18: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 18

Virtual Transactional Memory (VTM) Only an overflow mechanism

No overhead on common in-cache case Check shared overflow counter on cache miss

Low overhead when no conflict Shared Bloom Filter rules out conflicts Filter resides in virtual memory

Higher overhead on possible conflict Hardware table walk to detect actual conflict Table resides in virtual memory Only incurred by large transactions with likely

conflict Supports context switches and paging

Page 19: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 19

801 revisited Why didn’t 801 database storage succeed?

Lock bits helped performance and simplified software Answer #1:

Changing lock bits requires TLB shootdown Too complicated for the benefits? Not a current problem: transaction h/w is easy

Answer #2: Not universally available DB2 was (is) multiplatform

Can’t rely on feature only available in one architecture Still a relevant concern

Page 20: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 20

Need Standard Transaction Interface

Abstract away resource requirements Support large, long transactions

Virtualize transactional memory Transaction semantics between threads NOT a hardware property

Permit range of implementations Hardware, software, and combinations

Page 21: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 21

Thread-level Transactional Memory Abstract mechanisms

Version management Update memory “in place” Log “before images” to thread level VM

Isolation Logically extend memory words with read and write

bits Implementations can be conservative (e.g., blocks)

Atomicity Commits easy due to in place updates Aborts trap to user-level software

Hardware can accelerate common case

Page 22: Transactional Memory An Overview of Hardware Alternatives

October 21, 2004 Thread-Level Transactional Memory 22

Conclusions Make the common case fast

99+% of transactions fit in hardware Lots of alternatives Make both commits and aborts fast

Handle the uncommon case Large transactions will occur, deal with ‘em Shouldn’t be limited by hardware

Agree on a common abstraction Success requires multi-platform support Let vendors compete on price-performance