Transactional Memory

65
Art of Multiprocessor Programming 1 Transactional Memory Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

description

Transactional Memory. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit. Fine grained parallelism has huge performance benefit. The reason we get only 2.9 speedup. c. c. c. c. c. c. c. c. c. c. c. c. c. c. c. c. c. c. c. c. - PowerPoint PPT Presentation

Transcript of Transactional Memory

Page 1: Transactional Memory

Art of Multiprocessor Programming

1

Transactional Memory

Companion slides forThe Art of Multiprocessor

Programmingby Maurice Herlihy & Nir Shavit

Page 2: Transactional Memory

Shared Data Structures

75%Unshared

25%Shared

c cc c

c cc c

CoarseGrained

c

cc

c

c

c

c c

c cc c

c cc c

FineGrained c c

cc

cc

cc

The reason we get

only 2.9 speedup

75%Unshared

25%Shared

Page 3: Transactional Memory

A FIFO Queue

b c d

TailHead

a

Enqueue(d)Dequeue() => a

Page 4: Transactional Memory

A Concurrent FIFO Queue

Object lock

b c d

TailHead

a

P: Dequeue() => a Q: Enqueue(d)

Simple Code, easy to prove correct

Contention and sequential bottleneck

Page 5: Transactional Memory

Fine Grain Locks

b c d

TailHead

a

P: Dequeue() => a Q: Enqueue(d)

Finer Granularity, More Complex Code

Verification nightmare: worry about deadlock, livelock…

Page 6: Transactional Memory

Fine Grain Locks

b c d

TailHead

a

P: Dequeue() => a Q: Enqueue(b)

b

TailHead

a

Worry how to acquire multiple locks

Complex boundary cases: empty queue, last item

Page 7: Transactional Memory

Art of Multiprocessor Programming

77

Moreover: Locking Relies on Conventions

• Relation between– Lock bit and object bits– Exists only in programmer’s mind

/* * When a locked buffer is visible to the I/O layer * BH_Launder is set. This means before unlocking * we must clear BH_Launder,mb() on alpha and then * clear BH_Lock, so no reader can see BH_Launder set * on an unlocked buffer and then risk to deadlock. */

Actual comment from Linux Kernel

(hat tip: Bradley Kuszmaul)

Page 8: Transactional Memory

Lock-Free (JDK 1.5+)

b c d

TailHead

a

P: Dequeue() => a Q: Enqueue(d)

Even Finer Granularity, Even More Complex Code

Worry about starvation, subtle bugs, hardness to modify…

Page 9: Transactional Memory

Composing Objects

Complex: Move data atomically between structures

More than twice the worry…

b c d

TailHead

a

P: Dequeue(Q1,a)

c d a

TailHead

b

Enqueue(Q2,a)

Page 10: Transactional Memory

Transactional Memory

b c d

TailHead

a

P: Dequeue() => a Q: Enqueue(d)

Don’t worry about deadlock, livelock, subtle bugs, etc…

Great Performance, Simple Code

Page 11: Transactional Memory

Promise of Transactional Memory

b c d

TailHead

a

P: Dequeue() => a Q: Enqueue(d)

b

TailHead

a

TM deals with boundary cases under the hood

Don’t worry which locks need to cover which variables when…

Page 12: Transactional Memory

Composing ObjectsWill be easy to modify multiple structures atomically

Provide Composability…

b c d

TailHead

a

P: Dequeue(Q1,a)

c d a

TailHead

b

Enqueue(Q2,a)

Page 13: Transactional Memory

Art of Multiprocessor Programming

1313

The Transactional Manifesto

• Current practice inadequate– to meet the multicore challenge

• Research Agenda– Replace locking with a transactional

API – Design languages to support this model– Implement the run-time to be fast

enough

Page 14: Transactional Memory

Art of Multiprocessor Programming

1414

Transactions

• Atomic– Commit: takes effect– Abort: effects rolled back

• Usually retried

• Serizalizable– Appear to happen in one-at-a-time

order

Page 15: Transactional Memory

Art of Multiprocessor Programming

1515

atomic { x.remove(3); y.add(3);}

atomic { y = null;}

Atomic Blocks

Page 16: Transactional Memory

Art of Multiprocessor Programming

1616

atomic { x.remove(3); y.add(3);}

atomic { y = null;}

Atomic Blocks

No data race

Page 17: Transactional Memory

Art of Multiprocessor Programming

1717

Public void LeftEnq(item x) { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q;}

Designing a FIFO Queue

Write sequential Code

Page 18: Transactional Memory

Art of Multiprocessor Programming

1818

Public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; }}

Designing a FIFO Queue

Page 19: Transactional Memory

Art of Multiprocessor Programming

1919

Public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; }}

Designing a FIFO Queue

Enclose in atomic block

Page 20: Transactional Memory

Art of Multiprocessor Programming

2020

Warning• Not always this

simple– Conditional waits– Enhanced concurrency– Complex patterns

• But often it is

Page 21: Transactional Memory

Art of Multiprocessor Programming

2121

Public void Transfer(Queue<T> q1, q2){ atomic { T x = q1.deq(); q2.enq(x); }}

Composition

Trivial or what?

Page 22: Transactional Memory

Art of Multiprocessor Programming

2222

Public T LeftDeq() { atomic { if (this.left == null) retry; … }}

Roll Back

Roll back transaction and restart when something

changes

Page 23: Transactional Memory

Art of Multiprocessor Programming

2323

OrElse Compositionatomic { x = q1.deq(); } orElse { x = q2.deq();}

Run 1st method. If it retries …Run 2nd method. If it retries …

Entire statement retries

Page 24: Transactional Memory

Art of Multiprocessor Programming

24

Transactional Memory

• Software transactional memory (STM)

• Hardware transactional memory (HTM)

• Hybrid transactional memory (HyTM, try in hardware and default to software if unsuccessful)

24

Page 25: Transactional Memory

Art of Multiprocessor Programming

2525

Hardware versus Software

• Do we need hardware at all?– Analogies:

• Virtual memory: yes!• Garbage collection: no!

– Probably do need HW for performance

• Do we need software?– Policy issues don’t make sense for

hardware

Page 26: Transactional Memory

Transactional Consistency

• Memory Transactions are collections of reads and writes executed atomically

• Tranactions should maintain internal and external consistency– External: with respect to the interleavings of

other transactions.– Internal: the transaction itself should

operate on a consistent state.

Page 27: Transactional Memory

Art of Multiprocessor Programming

27

External Consistency

Application Memory

x

y

4

2

8

4

Invariant x = 2y

Transaction A: Write xWrite y

Transaction B: Read xRead y Compute z = 1/(x-y)

Page 28: Transactional Memory

Art of Multiprocessor Programming

28

Simple Lock-Based STM

• STMs come in different forms– Lock-based– Lock-free

• Here we will describe a simple lock-based STM

Page 29: Transactional Memory

Art of Multiprocessor Programming

29

Synchronization

• Transaction keeps– Read set: locations & values read– Write set: locations & values to be

written• Deferred update

– Changes installed at commit• Lazy conflict detection

– Conflicts detected at commit

Page 30: Transactional Memory

Art of Multiprocessor Programming

3030

STM: Transactional Locking

Map

Array of Versioned-Write-Locks

Application Memory

V#

V#

V#

Page 31: Transactional Memory

Art of Multiprocessor Programming

3131

Reading an Object

• Check not locked• Put V#s & value in RS

MemLocks

V#

V#

V#

V#

V#

Page 32: Transactional Memory

Art of Multiprocessor Programming

3232

To Write an Object

• Add V# and new value to WS

MemLocks

V#

V#

V#

V#

V#

Page 33: Transactional Memory

Art of Multiprocessor Programming

3333

To Commit

• Acquire W locks• Check V#s unchanged

• In RS only• Install new values• Increment V#s• Release …

MemLocks

V#

V#

V#

V#

V#

X

Y

V#+1

V#+1

Page 34: Transactional Memory

Problem: Internal Inconsistency

• A Zombie is a currently active transaction that is destined to abort because it saw an inconsistent state

• If Zombies that see inconsistent states are allowed to have irreversible impact on execution state then errors can occur

• Eventual abort does not save us

Page 35: Transactional Memory

Art of Multiprocessor Programming

35

Internal Consistency

Application Memory

x

y

4

2

8

4

Invariant x \neq 0 && x = 2y

Transaction A: Write x (kills B)

Write y

Transaction B: Read x = 4

Transaction B: (zombie) Read y = 4Compute z = 1/(x-y)

DIV by 0 ERROR

Page 36: Transactional Memory

Art of Multiprocessor Programming

36

Solution: The “Global Clock”

• Have one shared global clock• Incremented by (small subset of)

writing transactions• Read by all transactions• Used to validate that state worked

on is always consistent

Page 37: Transactional Memory

Art of Multiprocessor Programming

3737

Read-Only Transactions

• Copy V Clock to RV• Read lock,V#• Read mem• Check unlocked (1)

• Recheck V# unchanged (2)• (1)+(2)v# and mem content consistent

• Check V# < RV

MemLocks

12

32

56

19

17

100

Shared Version Clock

Private Read Version (RV)

100

Reads from a snapshot of memory.No read set!

Page 38: Transactional Memory

Art of Multiprocessor Programming

3838

Regular Transactions – during trans. prev. to commit

• Copy V Clock to RV• On read/write, check:

• Unlocked (acquire)• V# < RV• Add to R/W set

MemLocks

69

Shared Version Clock

69

12

32

56

19

17 Private Read Version (RV)

Page 39: Transactional Memory

Art of Multiprocessor Programming

3939

Regular Transactions- Commit• Acquire locks (write set only)• WV = Fetch&Inc(V Clock)• For all read set

• check unlock and • revalidate V# < RV

• Update write set• Set write write set V#s to WV• Release locks

MemLocks

100

Shared Version Clock

69101

x

y

12

32

56

19

17

100

100 Private Read Version (RV)

Page 40: Transactional Memory

Art of Multiprocessor Programming

40

• When two transactions have their read and write sets intersected, but both succeed to read before the write of the other transaction occurs, then there is no way to serialize them (see example below by Nir Shavit). Hence the need to revalidate that the read set *after* locking the write set.

• Also, upon commit of transaction A, *after* A already took the locks on the write set, and after or while the read set revalidated, another transaction cannot succeed to read from A's write set before A writes it (because it is locked).

Detailed example: Take two transactions T1 and T2. Lets say that there are 2 memory locations initialized to 0. Lets say that both transactions read both locations, and T1 writes 1 to location 1 if it saw all 0's and T2 writes 1 to location 2 if it saw all 0's. Now if they both do not revalidate the read locations this means that T1 does not revalidate location 2 after acquiring the lock and T2 does not revalidate location 1 after grabbing the lock.

So if they both run, both read both locations, both see all 0's in a snapshot, then both grab locks on their respective write locations, revalidate their own write locations, and write the 1 value with a timestamp greater by 1. Since they only revalidated their write locations after locking, neither saw that the other thread changed the location they only read to a 1 with a larger timestamp. Now we have a memory with two 1's in it even though there is no such serializable execution.

Seeing a snapshot before grabbing the locks in the commit is thus not sufficient and the algorithm must have the transactions each revalidate the read set locations after acquiring the locks.

Some explanations to lock-based implementation of regular transactions: why do we need to revalidate reads?

Page 41: Transactional Memory

Art of Multiprocessor Programming

4141

Hardware Transactional Memory

• Exploit Cache coherence

• Already almost does it– Invalidation– Consistency checking

• Speculative execution– Branch prediction =

optimistic synch!

Page 42: Transactional Memory

Art of Multiprocessor Programming

4242

HW Transactional Memory

Interconnect

caches

memory

read active

T

Page 43: Transactional Memory

Art of Multiprocessor Programming

4343

Transactional Memory

caches

memory

read

activeTT

active

Page 44: Transactional Memory

Art of Multiprocessor Programming

4444

Transactional Memory

caches

memory

activeTT

activecommitted

Page 45: Transactional Memory

Art of Multiprocessor Programming

4545

Transactional Memory

caches

memory

write

active

committed

TD

Page 46: Transactional Memory

Art of Multiprocessor Programming

4646

Rewind

caches

memory

activeTT

activewriteaborted

D

Page 47: Transactional Memory

Art of Multiprocessor Programming

4747

Transaction Commit

• At commit point– If no cache conflicts, we win.

• Mark transactional entries– Read-only: valid– Modified: dirty (eventually written

back)

• That’s all, folks!– Except for a few details …

Page 48: Transactional Memory

Art of Multiprocessor Programming

4848

Not all Skittles and Beer

• Limits to– Transactional cache size– Scheduling quantum

• Transaction cannot commit if it is– Too big– Too slow– Actual limits platform-dependent

Page 49: Transactional Memory

Art of Multiprocessor Programming

49

TM Design Issues

• Implementation choices

• Language design issues

• Semantic issues

Page 50: Transactional Memory

Art of Multiprocessor Programming

50

Granularity

• Object– managed languages, Java, C#, …– Easy to control interactions between

transactional & non-trans threads

• Word– C, C++, …– Hard to control interactions between

transactional & non-trans threads

Page 51: Transactional Memory

Art of Multiprocessor Programming

51

Direct/Deferred Update

• Deferred – modify private copies & install on

commit– Commit requires work– Consistency easier

• Direct – Modify in place, roll back on abort– Makes commit efficient– Consistency harder

Page 52: Transactional Memory

Art of Multiprocessor Programming

52

Conflict Detection

• Eager– Detect before conflict arises– “Contention manager” module

resolves

• Lazy– Detect on commit/abort

• Mixed– Eager write/write, lazy read/write …

Page 53: Transactional Memory

Art of Multiprocessor Programming

53

Conflict Detection

• Eager detection may abort transaction that could have committed.

• Lazy detection discards more computation.

Page 54: Transactional Memory

Art of Multiprocessor Programming

54

Contention Management & Scheduling

• How to resolve conflicts?

• Who moves forward and who rolls back?

Page 55: Transactional Memory

Art of Multiprocessor Programming

55

Contention Manager Strategies

• Exponential backoff• Priority to

– Oldest?– Most work?– Non-waiting?

• None DominatesJudgment of Solomon

Page 56: Transactional Memory

Art of Multiprocessor Programming

56

I/O & System Calls?

• Some I/O revocable– Provide transaction-

safe libraries– Undoable file

system/DB calls

• Some not– Opening cash

drawer– Firing missile

Page 57: Transactional Memory

Art of Multiprocessor Programming

57

I/O & System Calls

• One solution: make transaction irrevocable– If transaction tries I/O, switch to

irrevocable mode.• There can be only one …

– Requires serial execution• No explicit aborts

– In irrevocable transactions

Page 58: Transactional Memory

Art of Multiprocessor Programming

58

Exceptions

int i = 0;try { atomic { i++; node = new Node(); }} catch (Exception e) { print(i);}

int i = 0;try { atomic { i++; node = new Node(); }} catch (Exception e) { print(i);}

Page 59: Transactional Memory

Art of Multiprocessor Programming

59

Exceptions

int i = 0;try { atomic { i++; node = new Node(); }} catch (Exception e) { print(i);}

int i = 0;try { atomic { i++; node = new Node(); }} catch (Exception e) { print(i);}

Throws OutOfMemoryException!

Page 60: Transactional Memory

Art of Multiprocessor Programming

60

Exceptions

int i = 0;try { atomic { i++; node = new Node(); }} catch (Exception e) { print(i);}

int i = 0;try { atomic { i++; node = new Node(); }} catch (Exception e) { print(i);}

Throws OutOfMemoryException!

What is printed?

Page 61: Transactional Memory

Art of Multiprocessor Programming

61

Unhandled Exceptions

• Aborts transaction– Preserves invariants– Safer

• Commits transaction– Like locking semantics– What if exception object refers to

values modified in transaction?

Page 62: Transactional Memory

Art of Multiprocessor Programming

62

Nested Transactions

atomic void foo() { bar();}

atomic void bar() { …}

atomic void foo() { bar();}

atomic void bar() { …}

Page 63: Transactional Memory

Art of Multiprocessor Programming

63

Nested Transactions

• Needed for modularity– Who knew that cosine() contained a

transaction?• Flat nesting

– If child aborts, so does parent• First-class nesting

– If child aborts, partial rollback of child only

Page 64: Transactional Memory

Art of Multiprocessor Programming

64

Open Nested Transactions

• Normally, child commit– Visible only to parent

• In open nested transactions– Commit visible to all– Escape mechanism– Dangerous, but useful

• What escape mechanisms are needed?

Page 65: Transactional Memory

Art of Multiprocessor Programming

65

Strong vs Weak Isolation

• How do transactional & non-transactional threads synchronize?

• Interaction with memory-model?

• Efficient algorithms?