Hybrid Transactional Memory

22
1 Hybrid Transactional Memory Reza Sherafat Prof. Cristiana Amza University of Toronto Dec 4, 2006

description

Hybrid Transactional Memory. Reza Sherafat Prof. Cristiana Amza University of Toronto Dec 4, 2006. Quick Background Review. A transaction is a sequence of operations that “as a whole” is performed atomically. Life cycle of a transaction: - PowerPoint PPT Presentation

Transcript of Hybrid Transactional Memory

Page 1: Hybrid Transactional Memory

1

Hybrid Transactional Memory

Reza SherafatProf. Cristiana Amza

University of Toronto

Dec 4, 2006

Page 2: Hybrid Transactional Memory

2

Quick Background Review• A transaction is a sequence of operations that “as a whole” is performed

atomically.

• Life cycle of a transaction:

– Initialization: start a transaction by storing the current state;

– Execution: Open objects for read/write; • Data modifications are hidden from others;

• Watch for conflicts;

– Termination: end the transaction• Successful completion (Commit):

Let other threads know about the changes were made; and modifications take effect; or

• Unsuccessful completion (Abort): Discard modifications

Page 3: Hybrid Transactional Memory

3

Outline• Motivations

• Hybrid Transactional Memory

• Implementation

• Evaluations

• Conclusions

Page 4: Hybrid Transactional Memory

4

Motivations• In parallel programs we must protect concurrent access to shared data.

• Locks are widely used; but several problems are associated with using locks:– Performance (speedup)

Overhead of locking (wait time, acquire, release)Granularity (hard to balance wait time, overhead)Over serialization

– Programming Hard for programmers to write and debugDeadlocks are hard to avoid

– Other problemsPriority inversionProblem when a process holding the lock crashes

Page 5: Hybrid Transactional Memory

5

Transactional Memory (TM)• Main idea: Non-blocking execution

– Execute each concurrent transaction speculatively;– Apply changes when transaction completed successfully.

• Non-conflicting access to shared objects within transactions is allowed:– Once conflict is detected, transaction rolls back and state is restored (abort);

• TM support is provided through an API:– Start a transaction– Abort/commit a transaction– Wrap objects in TM objects

• Properties of transactions:– Atomic: a transaction is like a single unit (all-or-nothing) – Serializable: concurrent Start a transaction t transactions are performed in some serial order– Obstruction-freedom: guarantees progress of one process in absence of contention– No deadlock

Page 6: Hybrid Transactional Memory

6

Conflicting Access to Shared Data• Conflicts in accessing shared data may result in data inconsistencies.

• Conflicts happen when an object that has been accessed by other transactions (read or write) is updated before others commit.

– Multiple readers are allowed– Only one writer is allowed at each time

• The system ensures that transactions that access data don’t conflict.If no conflicts occur, the transactions are serializable.

• Conflict resolution: once a conflict is detected, we can get a serializable execution by aborting all but one of the conflicting transactions.

• Speculative modifications of aborted transactions are discarded.Old values before starting the transaction become valid.

Page 7: Hybrid Transactional Memory

7

Hybrid TMEach approach should implement TM semantics:

Start transaction, open object, detect conflicts, abort, commit.

• Hardware-based approaches:– Bounded number of locations– Maintain versions in cache

→ Low overhead

• Software-based approaches:– Unbounded number of locations can be accessed within a transaction– Slow due to overhead of maintaining multiple copies– Potentially orders of magnitude

• Hybrid: Combines the benefits of both approaches– High performance (unless the transaction exceeds HW limits)– Support for unlimited transactional objects– Handles simultaneous data access from HW/SW modes

Page 8: Hybrid Transactional Memory

8

Implementations• Two modes for executing transactions: HW vs. SW.

• In general, HW mode is preferred (it is faster), unless we run out of resources.

• Naïve approach: the system has a universal mode of operation.

• A better approach: transactions have two modes to choose from.– Each transaction separately chooses the mode of operation when it starts.– Better performance and utilization of system resources

• Other policies may also be applied to chose the mode:If the transaction fails for a number of time (e.g., 3) then start in SW mode;

• Pure HW/SW implementations must be tailored such that they can coexist.– Objects may be accessed simultaneously in transactions in HW, SW modes.– Interoperability is a must.

Page 9: Hybrid Transactional Memory

9

Hardware TMA HW-TM scheme that can used for the Hybrid implementation that relies on the

standard cache coherence protocol and some additional components.

• Cache coherence protocol handles data consistencies across multiple processors:– Only one processor has permission to write to a cache line;– No processor can read a line that another processor has permission to write to.

• Additional components on each processor store speculative data and check for conflicts:

– ISA extensions• Instructions for: transactional begin, commit, abort, load/store, etc.

– Additional components on the processor chip (In parallel with the L1 cache)• Transactional buffer: old, • Transactional state table: state of the contexts (threads) running on the processor

• All memory accesses within a transaction are done transactionally.

Page 10: Hybrid Transactional Memory

10

HW-TM• Old field is keeps speculative values

• Transactional semantics:– Start transaction: Transactional state for

that context is set to SELECT, ALL.– Abort: Exception flag is set, clear

corresponding read/write bits, invalidates speculative written data

– Commit: Update the transactional state.– Detect conflicts: read/write bit vector

• If the exception flag is set, any attempt to commit or load/store by the transaction results in a trap that will be handled by the exception handler.

Question: How is abort implemented across multiple processors?CCP!

Page 11: Hybrid Transactional Memory

11

Quick Review of DSTM

Object Contents

Object Pointer

Object Contents

State PointerOldNew

State

Object Contents

State PointerOldNew

State

XValid Copy

Before accessing an object within a transaction

Modify

Page 12: Hybrid Transactional Memory

12

Software TM• Uses a locator similar to DSTM:

– Redirection and object copying.

• The locator also keeps track of the readers.

– As opposed to local hash tables to store the last data value in each read transaction.

– This helps early abort, and avoids validation when committing

• A locator consists of:– Valid field– Write state (one)– Read state (multiple)– Old/new objects– Object size

A locator object in Hybrid-TM

Page 13: Hybrid Transactional Memory

13

Putting Things Together• Transactions in HW may conflict with those of SW, and vice versa.

– Opening an object in HW:• [read the TMObject pointer transactionally] • Abort all conflicting HW/SW

– Opening an object in SW:• Create a state object, and load it transactionally• Abort conflicting HW/SW transactions

– Hardware aborts Hardware• A load/store (trans. by default) causes an abort

– Software aborts Hardware• When SW opens a TMObject, it assigns it to a new locator. Since the object is transactionally

read by the HW, the transaction is aborted.– Hardware aborts Software

• When HW opens a TMObject, it writes ABORTED to transaction state having this object– Software aborts Software

• Write ABORTED to the state from the reader/writer pointers.

Page 14: Hybrid Transactional Memory

14

Software aborts Hardware

Object Contents

Object Pointer

Object Contents

State PointerOldNew

State

Object Contents

State PointerOldNew

State

X

In the Software Mode Copy and Modify

In the Hardware Mode Modify in place

Thread 1: HW modeThread 2: HW mode

Thread 3: SW mode

Conflict detected by the threads in the hardware mode

Page 15: Hybrid Transactional Memory

15

Evaluations• Three microbenchmarks

– VR: Small critical section (overhead of starting/committing transactions)– HT: Simultaneous lookup operations (per object overhead of transactions)– GU: Course grained locking vs. transactional memory

• For each case two scenarios: Low and High Contention

• Compare four synchronization implementations– Lock– Pure Hardware Transactional Memory– Pure Software Transactional Memory– Hybrid Transactional Memory

Page 16: Hybrid Transactional Memory

17

Evaluations (Hybrid Execution)

• In all cases of hybrid execution, the ratio of SW/HW mode is very small.

• This is due to relatively (compared to size of transactional objects) large size of transactional buffer. (is this realistic?)

• Since in most transactions HW mode is used, this does not give a good view of the overhead associated with effects of slow SW mode.

Page 17: Hybrid Transactional Memory

18

Evaluations (VR)

• When # of processors grow, contention does not grow significantly– This is because transactions are too small (conflicts rarely happen)

Page 18: Hybrid Transactional Memory

19

Evaluations (HT)

• It is true that several lookup operations can be performed simultaneously, however those operations will be rolled back all together once a conflict with a writer occurs

– This seems to be significant for slightly long duration transactions – The lock performance is better.

• The paper claims similar behavior would be achieved by reader-writer locks;– I expect that would have a much better performance, since once underway concurrent operations will

not be undone

Page 19: Hybrid Transactional Memory

20

Evaluations (GU)

• Why does the execution time decreases in the lock implementation from GU-low to GU-high?

• It is usually inverse!– Do locks have back-offs?

Page 20: Hybrid Transactional Memory

21

Conclusions• Transactional memory outperforms the lock-based synchronization in most cases

• Hybrid Transactional Memory approach gives a good balance between scalability of SW and performance of HW

– Requires only modest hardware support (transactional buffer, state table)

– Within system limits: Good performance for most transactions

– Exceeding system limits: fallbacks to software mode when a transaction cannot complete within the hardware bounds

• More needs to be gone to ensure progress.

Page 21: Hybrid Transactional Memory

22

Questions?!

Page 22: Hybrid Transactional Memory

23

• Nested transaction?

• Additional limits for the HW:– Contexts

• Hybrid has limitations:– Uses transactional buffer

• I am not sure how the non-blocking mechanism is implemented across multiple processors.