©2009 HP Confidential1 A Proposal to Incorporate Software Transactional Memory (STM) Support in the...
-
Upload
chrystal-peters -
Category
Documents
-
view
219 -
download
1
Transcript of ©2009 HP Confidential1 A Proposal to Incorporate Software Transactional Memory (STM) Support in the...
©2009 HP Confidential1
A Proposal to Incorporate Software Transactional Memory (STM) Support
in the Open64 Compiler
Dhruva R. ChakrabartiHP Labs, USA
2
How is transactional memory better than locks
Traditional threads programming is hard Requires reasoning with locks
Prone to synchronization errors
Tools exist but complexities still remain
Transactional memory programming raises the abstraction level Locks are not exposed
Program with atomic sections Atomicity, Consistency, and Isolation (ACI) properties guaranteed
Underlying system implements transactions
Deadlock freedom at the programmer level
3
Quantitative comparison between locks and TM
Source: Rossbach et al., Is Transactional Programming actually easier?, PPoPP 2010
User study of undergrads in an OS class
Same programs written with coarse grain locks, fine grain locks, monitors, and TM
Compared development effort, ease, and programming errors
Conclusion was that TM was harder to use than coarse grain locks but easier than fine grain locksThe study used API-based STM libraries for Java (DSTM2 and JDASTM)Use of compiler support for atomic section based programming should be even easier
Synchronization errors were much less for transactionsOn a similar programming problem, 70% errors with fine grain locks but 10% errors with transactions
4
What about performance?
Single-thread overhead is large because of logging costs
Multi-threaded performance is typically much better than coarse grain locks and approaches that of fine grain locksShown on micobenchmarks using hashtable, map, tree operations [source: PLDI 2006 papers on transactions]
Reasonable scalability using large transactions for STAMP benchmarks has been shown [source: http://stamp.stanford.edu ]
Good results on minimum spanning forest of sparse graphs [source: Kang et al., An Efficient Transactional Memory Algorithm for Computing Minimum Spanning Forest of Sparse Graphs, PPoPP 2009]
Large programs such as Quake and RMS applications have been transactified [http://www.bscmsrc.eu/research/software]
Numerous research papers have shown how to reduce overheadsSome are purely library-based approachesSome optimize the calls made to STM, potentially reducing transaction regions
Some feed the STM with information that leads to an optimized STM
5
Tool support
Few existHerlihy et al., tm_db (An open source generic debugging library for transactional programs)
Debugging/profiling support (Zyulkyarov et al., Debugging programs that use atomic blocks and transactional memory, PPoPP 2010)
Fine grain conflict graph to aid performance analysis (Chakrabarti et al., New abstractions for effective performance analysis of STM programs, PPoPP 2010)
Going forward, debuggers and performance analysis tools will be important to adoption of STMs
6
State of STMs today
Has shown a lot of promiseAtomic section included in emerging lanaguages, e.g. Fortress, X10, Chapel
Improved programmability over locks
Does require programmer annotations
Performance benefits have been shown but pathological situations exist
Debuggers and performance tools starting to show up
A small set of benchmarks exists, some large programs have been transactifiedMore benchmarks and applications are required
All multi-threaded programming paradigms are not expressed easily in terms of atomic sectionsAn example is cond-wait (or retry)Atomic section will have to co-exist with locks
7
Outline
What is an atomic section
What is an STM library
Basic STM API and a flavor of the draft spec
Basic STM library/compiler interface and a flavor of the Intel ABI
Platforms available today and their state
Proposal to incorporate STM in Open64 framework
8
Shared counter update
lock(L)++ counterunlock(L)
Lock-based
atomic { ++ counter}
Atomic section-based
For atomic section-based code, • No need to associate shared data with locks• Still need to identify atomic sections• No deadlocks at the programmer level since there are no locks• Livelocks could be present but usually resolved by contention manager• Data races could still be present
9
STM library/compiler interface
atomic{ x = y w = z}
Compiler
TxStart()TxRead(y)TxWrite(x)TxRead(z)TxWrite(w)TxCommit
STM
10
Different implementation strategies
Non-blocking vs blocking
Strong vs weak isolation It has been shown that strong isolation is very hard to provide Most STMs today only support weak isolation
Direct vs deferred update
Flattened vs closed vs open nesting
Transaction granularity: object vs word
Pessimistic vs optimistic concurrency control
11
Main STM data structures (blocking implementation)Shared lock table
A hash function maps a given address to an entry in the (tagless) hash table
Designed to get to the lock without locking the hash table
Shared addresses Lock Table
12
Transactional read
Reads are typically optimistic ---- no locks are acquired
Read from shared memory location into local buffer
Validate readset to ensure its consistency
If validation fails, the transaction is rolled back
The address and its current version are entered into a readset
13
Transactional write
Buffered write Make the change onto a local buffer Add the location and the new value to a write set (redo log) All subsequent reads of this location are serviced from the write set The original location is unchanged
Direct update Acquire the lock corresponding to the shared location. If already locked,
abort Log the old value into a write set (undo log) Directly change the shared location
14
Transactional commit
Acquire locks for all entries of the writeset, if not already done
If a lock is held by another transaction, abort
Validate the read set, aborting if required
Copy all buffered data to shared locations, if required
Release all locks and update the versions of modified locations
15
A more elaborate API
Refer to Draft Specification of Transactional Language Constructs for C++1 for more details
Irrevocability of certain statements introduces complicationsStatements can be either safe or unsafe
A conventional atomic section can contain only safe statements
Necessitates use of attributes in certain cases
Annotation of functions called within a transaction
A relaxed transaction can contain unsafe statements
Supporting explicit abort/cancel of a transaction Only a conventional atomic section can contain an abort/cancel statement
Allowing nesting of transactions
Exceptions, exception specifications
1 http://software.intel.com/file/21569
16
Compiler/STM interface
Intel has released Intel® Transactional Memory Compiler and Runtime Application Binary Interface 2
An interface that a compiler writer or an expert user calling the STM has to conform to
Enables use of different STMs without changing the application
A fixed naming convention of library routines is imposed
Standard interfaces for starting a transaction, getting a handle to a transaction, aborting/committing a transaction, reading/writing memory locations, etc.
2 http://software.intel.com/file/8097
17
Platforms available today
Some examplesIntel STM compiler and libraryIBM xl C/C++ for transactional memory for AIXSkySTM: Sun Studio based compiler and STMMicrosoft .NET implementationgcc transactional memory support (currently on a branch)TL2TinySTM…
18
Proposal to incorporate STM support in Open64 Writing an STM library
First step is to support the stock atomic section Provide a blocking implementation
Support closed nesting with partial roll-back
Initial work is to provide support for the minimal set of entry points needed for simple programs to pass
Will follow the ABI document released by Intel
Could leverage work done in gcc space
In the compiler space, mostly front-end work for functional completeness
Main task is to lower the atomic section into calls to STM
Will need to support annotations to support static checking of proper use of TM constructs
Will follow the draft spec of the API
Could leverage work done in gcc space
Set up a small number of benchmarks and applications for testing
19
Possible optimizations
Inter-procedural optimizations can reduce overheads substantiallyNot required for initial implementation but something that would possibly give an Open64-based implementation an edge over other frameworks
Redundant calls to STM could be removedRead after read, write after read (of the same location) could be optimizedRecognition of local memory accesses could remove barriers altogetherIPA could feed STM with critical information to help reduce STM overheads
The STM library admits numerous optimizationsUse of pessimistic concurrency in addition to primarily optimistic Use of application-specific policiesReduction of false conflictsReduction of validation costs
20
Summary
An STM implementation based on Open64 will be useful to the community
Should conform to the draft API and the ABI
Will provide a great research platform for further advances in STMs
Will help development of more transactional applications