Transactional Memory An Overview of Hardware Alternatives
description
Transcript of Transactional Memory An Overview of Hardware Alternatives
Transactional MemoryAn Overview of Hardware Alternatives
David A. WoodUniversity of Wisconsin
Transactional Memory WorkshopApril 8th, 2005
October 21, 2004 Thread-Level Transactional Memory 2
What’s database got to do with it? Atomicity
All updates, or none Consistency
Correct at begin and end
Isolation Partial work not visible Inputs stay stable
Durability Survive “system”
failures
Despite increasing awareness of failures
All (or some) memory ops,not just database objects
October 21, 2004 Thread-Level Transactional Memory 3
801 Database Storage Lock bits on virtual
memory 128 byte granularity Added to pagetable and TLB Caches user’s lock state Trap on lock conflict
No h/w for logging, abort, etc.
Only uniprocessors 801 and RS/6000
PPN Tid, Lock bits
CPU
Memory
TLB
Was this transactional memory?Tid
October 21, 2004 Thread-Level Transactional Memory 4
SQL/801 “The development of SQL/801 was greatly
simplified because, with minor exceptions, it considers only a single user. It achieves multiuser concurrency [on a uniprocessor] by running in multiple processes using the shared database storage….” Chang and Mergen, ’88
Largest transactional memory application Only real hardware transactional memory
implementation No one seems to be looking at what they learned
October 21, 2004 Thread-Level Transactional Memory 5
Basic Transactional Mechanisms
Isolation Detect when transactions conflict Track read and write sets
Version management Record new and old values
Atomicity Commit new values Abort back to old values
October 21, 2004 Thread-Level Transactional Memory 6
H/W Transactional Memory Systems
Knight’s Lisp Work Transactional Memory Oklahoma Update SLE/TLR Transactional Coherence and Consistency Unbounded TM Virtual TM Thread-level TM
October 21, 2004 Thread-Level Transactional Memory 7
Knight’s Lisp Work [’86] Parallel execution of sequential code
Break program into “transaction blocks” Multiple loads in a transaction Exactly one store ends the transaction No register state passed between transactions
Execute transactions in parallel Track dependences (i.e., read set) Abort and restart on conflicting write
Transactions commit in sequential order Broadcast writes on commit
October 21, 2004 Thread-Level Transactional Memory 8
Knight’s Hardware Two caches
Dependency cache Tracks read set Bus monitor detects conflicts
Confirm cache Holds write set Supports multiple writes
Commits Check dep. cache Broadcast writes
Fast aborts Invalidate Confirm cache Use old values in Dep. Cache Immediately restart execution
Valid Old Value
Depends
Old Value
Invalid
A New Value
Confirm Cache
CPU
Memory
Dependency Cache
Spawned two threads: TLS & TM
October 21, 2004 Thread-Level Transactional Memory 9
H&M’s Transactional Memory [’93] Targets explicitly parallel (non-functional)
codes Motivated by lock-free data structures
Transactions: Read and write multiple locations Commit in arbitrary order Implicit begin, explicit commit operations Abort affects memory, not registers
Software manages restarting execution Validate instruction detects pending abort
Implementation extends cache coherence Read/Write locks correspond to MOESI states Add orthogonal transaction states
October 21, 2004 Thread-Level Transactional Memory 10
H&M’s Transactional Memory Adds Transaction Cache
Stores all data accessed by transactions
2 copies of each line Before and after image Even for read-only data
Small, fully associative Abort on all conflicts
NACK conflicting requests Abort NACKed transaction
Fast commit and abort Change trans. cache state
M
S
S
M XCommit
New Value
M XAbort Old Value
S XCommit
Old Value
S XAbort Old ValueCache Transaction
Cache
CPU
Memory
October 21, 2004 Thread-Level Transactional Memory 11
SLE/TLR Hardware exploits speculative processors
Read sets tracked by coherence protocol Write set maintained in store queue Abort restarts execution, including register state
Speculative lock elision (SLE) Elide locks from the dynamic execution stream
Convert critical sections to optimistic transactions Concurrently execute non-conflicting transactions Fall back on explicit locks if conflicts
Transactional Lock Removal (TLR) Resolve conflicts using priority ordering (timestamps) Delay lower priority transactions Deadlock and starvation free
October 21, 2004 Thread-Level Transactional Memory 12
Transactional Coherence and Consistency [’04]
TCC unifies coherence, memory consistency, and transaction support All transactions, all the time
Transaction ordering Ordered, Unordered, Partially Ordered Supports thread-level speculation
Optimistic concurrency model Unordered transactions serialize at commit Conflicts detected at commit
October 21, 2004 Thread-Level Transactional Memory 13
TCC
CPUL1 D
L2 Cache Logically Shared
Write buffer ~4 kB, holds new values until commit
On-Chip Interconnect
Broadcast updates at commit
L1 cache tracks read set, bit per line
SRF
Shadow register file checkpoints architectural registers
October 21, 2004 Thread-Level Transactional Memory 14
TCC Commits are sequential
Broadcasts addresses of all updates Supports large transactions
Serialize all other transactions Grabs and holds the commit bus
Cannot abort large transactions Updates affect L2/Mem; no undo
Extensions forthcoming talk to Kunle and Christos
October 21, 2004 Thread-Level Transactional Memory 15
Unbounded Transactional Memory (UTM)
Unbounded transactions Arbitrary size
Not limited by write buffer, cache, or memory Arbitrary duration
Not limited by interrupts, context switch, etc. Complex implementation
Not justified by performance
Settle for “nearly” unbounded transactions Much simpler hardware
October 21, 2004 Thread-Level Transactional Memory 16
Transactional Linux
Almost all of the transactions require < 100 cache lines 99.9% need fewer than 54 cache lines
There are, however, some very large transactions! >500k-byte fully-associative cache required
9.355x10^6
10^6
10^4
10^2
1 8144 1000 100 10 1N
umbe
r of
ove
rflo
win
g tr
ansa
ctio
ns
Fully associative cache size (64 byte lines)
makedbench
Log-log scale
October 21, 2004 Thread-Level Transactional Memory 17
Large Transaction Memory (LTM) Register checkpoints
Snapshot of rename maps Cache tracks read and write sets
T-bits mark transactional blocks Cache holds new data values “in place” O-bit indicates overflow to in-memory hashtable
Memory holds committed state Abort invalidates all modified blocks
Miss on re-execution Transactional writes force memory updates
Repeated writes (e.g., to local data) are written through
October 21, 2004 Thread-Level Transactional Memory 18
Virtual Transactional Memory (VTM) Only an overflow mechanism
No overhead on common in-cache case Check shared overflow counter on cache miss
Low overhead when no conflict Shared Bloom Filter rules out conflicts Filter resides in virtual memory
Higher overhead on possible conflict Hardware table walk to detect actual conflict Table resides in virtual memory Only incurred by large transactions with likely
conflict Supports context switches and paging
October 21, 2004 Thread-Level Transactional Memory 19
801 revisited Why didn’t 801 database storage succeed?
Lock bits helped performance and simplified software Answer #1:
Changing lock bits requires TLB shootdown Too complicated for the benefits? Not a current problem: transaction h/w is easy
Answer #2: Not universally available DB2 was (is) multiplatform
Can’t rely on feature only available in one architecture Still a relevant concern
October 21, 2004 Thread-Level Transactional Memory 20
Need Standard Transaction Interface
Abstract away resource requirements Support large, long transactions
Virtualize transactional memory Transaction semantics between threads NOT a hardware property
Permit range of implementations Hardware, software, and combinations
October 21, 2004 Thread-Level Transactional Memory 21
Thread-level Transactional Memory Abstract mechanisms
Version management Update memory “in place” Log “before images” to thread level VM
Isolation Logically extend memory words with read and write
bits Implementations can be conservative (e.g., blocks)
Atomicity Commits easy due to in place updates Aborts trap to user-level software
Hardware can accelerate common case
October 21, 2004 Thread-Level Transactional Memory 22
Conclusions Make the common case fast
99+% of transactions fit in hardware Lots of alternatives Make both commits and aborts fast
Handle the uncommon case Large transactions will occur, deal with ‘em Shouldn’t be limited by hardware
Agree on a common abstraction Success requires multi-platform support Let vendors compete on price-performance