Remote core locking (rcl)
-
Upload
chinthaka-henadeera -
Category
Technology
-
view
100 -
download
0
Transcript of Remote core locking (rcl)
Presentation Outline 1. Introduction
2. Motivation
3. RCL
3.1 Core algorithm
3.2 Profiling
3.3 Re-engineering
3.4 RCL runtime implenetation
4. Evaluation
4.1 Comparison with other locks
4.2 Comparison of app. Performance
4.3 Locality analysis
5. Related work
6. References
1. Introduction
The lock algorithm in a multithreaded application
is a key factor to scaling up the performance in
multicore world.
Remote Core Locking (RCL) is a newly invented
locking techinque to reduce cache misses and to
reduce access contention simultaniously.
3. RCL
Main idea of RCL is,
- Requests of the client cores are entered into a
request array.
- Remote server core executes CS and returns the
results to the client core.
3.1 Core algorithm
Fig: 3 The request array [1]
"Service thread“(ST) of the server core, searches for the non-NULL 3rd
element of each request over and over again. If it finds a non-Null 3rd
element and the requested lock is free, executes the critical section
using function pointer and context.
3.2 Profiling
Profiler is a tool which dynamically loads a library
and intercepts the applications.
Extracts the information( involving POSIX locks,
condition variables and threads etc.) about
application
Determines that which locks can be improved by
using RCL.
3.3 Re-engineering
Reengineering tool takes out the critical section
code into a separate function.
Such a function receives the values of the
variables and returns the updated values of the
variables.
3.4 RCL runtime implenetation
It is difficult to ensure the liveness and the
responsiveness using only a server thread
because ,
(i) Blocked by the operating system
(ii) Spin in the cases of acquiring a spin lock or
nested RCL or implements some form of ad
hoc synchronization.
(iii) Thread can be pre-empted by the operating
system if the time-slice of the thread is run
out or due to a page fault.
t
In RCL runtime, there is a "management thread“ (MT) which responsible to keep liveness of RCL by managing the ST pool.
MT is activated and is expired in a given frequency. When it is activated it runs at highest priority.
MT checks a global flag which indicates the ST is progressing since last activation of the MT.
If the flag is not updated, the MT considers that the ST is waiting or is blocked and it adds a free ST to service thread pool.
4 strategies to improve the responsiveness.
i) RCL runtime uses POSIX FIFO scheduling policy
to prevent the thread pre-emption from the OS
scheduler.
ii) RCL runtime minimizes the number of STs before
an unblocked servicing thread is rescheduled in
order to reduce the delay.
iii) When servicing threads are blocked by the
OS, RCL runtime uses a low prioritized (than ST)
backup thread to clear the global flag and to wake
up the MT.
iv) When nested RCL is handled by the same
core, sometimes the lock may already owned by
another servicing thread. In this case the
servicing thread yields without delay, in order to
owner of the lock to release the lock.
Using FIFO policy introduces another two problems.
1) FIFO scheduling can course to priority mismatches. Ex: between BT and ST and between ST and MT. This problem can be solved by only using lock-free algorithms in RCL runtime.
2) When a ST mumbles in an active wait loop, it will not be pre-empted. There for unable to elect a free thread. In this case MT detects no progression of the servicing thread and it decreases the priority of the particular ST and then increase the priorities of all STs.
4. Evaluation
Comparison with other locks using a custom
microbenchmark
Comparison of the application performance
Locality analysis
4.2 Comparison of app. performance
Fig. 5 Best performance for each type of lock relative to the best performance
for POSIX locks [1].
5. Related work
Attaluri at. al. Proposed a control concurrency
with lock pre-emption and restoration in 1995 [2].
Abellan at. al. have proposed the concept of G-
locks [3]
Suleman et al. have proposed to critical sections
are executed in a special fast core in an ACMP by
introducing new instructions to handover the
control [4].
Related work contd..
Handler et al. have suggested software only
solution called "Flat combining" based on coarse
gained locking [5].
6. References [1] Jean-Pierre Lozi, Florian David, Gael Thomas, Julia Lawall, and Filles
Muller. Re-mote core locking: Migrating critical-section execution to improve the performance of multithreaded applications. IBM Systems Journal,2012 USENIX Annual Technical Conference BOSTON MA, 47(2):221{236, April 2008.
[2] J. Slonim G. K. Attaluri and P. Larson. Concurrency control with lock preemtion and restoration. CASCON ' 95, 1995.
[3] J. Fernndez J. L. Abelln and M. E. Acacio. Glocks: Efficient support for highly-contended locks in many-core cmps. In 25th IPDPS, 2011.
[4] M.K. Qureshi M. A. Suleman, O. Mutlu and Y. N. Patt. Accelerating critical section execution with asynchronous multi-core architecture. ASPLOS, pages 253-264, 2009.
[5] N. Shavit D. Hendler, I. Incze and M. Tzafrir. Flat combining and the synchronization-parallelism tradeo. SPAA' 10, pages 355-354, 2010.