CONCURRENT PROGRAMMING Introduction to Locks and Lock-free data structures 1.

download CONCURRENT PROGRAMMING Introduction to Locks and Lock-free data structures 1.

of 52

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of CONCURRENT PROGRAMMING Introduction to Locks and Lock-free data structures 1.

  • Slide 1

CONCURRENT PROGRAMMING Introduction to Locks and Lock-free data structures 1 Slide 2 Agenda Concurrency and Mutual Exclusion Mutual Exclusion without hardware primitives Mutual Exclusion using locks and critical sections Lock-based Stack Lock freedom Reasoning about concurrency: Linearizability Disadvantages of lock based data structures A lock free stack using CAS The ABA problem in the stack we just implemented Fix Other problems with CAS We need better hardware primitives Transactional memory Slide 3 Mutual Exclusion Mutual Exclusion : aims to avoid the simultaneous use of a common resource Eg: Global Variables, Databases etc. Solutions: Software: Petersons algorithm, Dekkers algorithm, Bakery etc. Hardware: Atomic test and set, compare and set, LL/SC etc. 3 Slide 4 Using the hardware instruction Test and Set Test and Set, here on, TS: TS on a boolean variable flag #atomic // The two lines below will be executed one after the other without interruption If(flag == false) flag = true; #end atomic 4 bool lock = false; // shared lock variable // Process i Init i; while(true) { while (lock==false){ // entry protocol TS(lock)}; Critical secion # i; lock = false; // exit protocol //Remainder of code;} Slide 5 Software solution: Petersons Algorithm One of the purely software solutions to the mutual exclusion problem based on shared memory Simple solution for two processes P0 and P1 that would like share the use of a single resource R More rigorously, P1 shouldnt have access to R when P0 is modifying/reading R and vice-versa. 5 R P0P1 Slide 6 Petersons Algorithm: Two processor version Requires one global int variable (turn), and one bool variable (flag) per process. The global variable is turn each processor has signal a variable flag flag[0] = true is processor P0s signal that it wants to enter the critical section turn = 0 says that it is processor P0s turn to enter the critical section Can be extended to N processors 6 Slide 7 How to think about Consider you are in a hall way that is only wide enough for one person to walk. However, you a see a guy walking in the opposite direction as you are. Once you approach him, you have two options: Be a gentleman and step to the side so that he may walk first, and you will continue after he passes ( Petersons algorithm) Beat him up and walk over him (Critical section violation) 7 Slide 8 The algorithm in code 8 // Process 1 init; while(true) { // entry protocol flag[1] = true; turn = 0; while (flag[0] && turn == 0) {}; critical section #1; // exit protocol flag[1] = false; //remainder code } // Process 0 init; while(true) { // entry protocol flag[0] = true; turn = 1; while (flag[1] && turn == 1) {}; critical section #0; // exit protocol flag[1] = false; //remainder code } // Shared variables bool flag[2] = {false, false}; int turn = 0; Slide 9 Requirements for Petersons Reads and writes have to atomic No reordering of instructions or memory In order processors sometime reorder memory accesses even if they dont reorder memory accesses. In that case one needs to use memory barrier instructions Visibility: Any change to a variable has to take immediate effect so that everybody knows about. Keyword volatile in Java 9 Slide 10 So why dont people use Petersons? Notice the while loop in the algorithm If process 0 waits a lot of time to enter the critical section, it continually checks the flag and turn to see it can or not, while not doing any useful work This is termed busy waiting, and locking mechanisms like Petersons have a major disadvantage in that regard. Locks that employ continuous checking mechanism for a flag are called Spin-Locks. Spin locks are good when the you know that the wait is not long enough. 10 while (flag[1] && turn == 1) {}; Slide 11 Properties of Petersons algorithm Mutual Exclusion Absense of Livelocks and Deadlocks: A live lock is similar to a dead lock but the states of competing processes continually change their state but neither makes any progress. Eventual Entry: is guaranteed even if scheduling policy is only weakly fair. A weakly fair scheduling policy guarantees that if a process requests to enter its critical section (and does not withdraw the request), the process will eventually enter its critical section. 11 Slide 12 Comparison with Test and Set 12 Test and Set Petersons algorithm Mutual ExclusionYes Absence of DeadlocksYes Absence of unnecessary delay Yes Eventual EntryStrongly fair Scheduling policy Weakly fair Scheduling policy Practical issuesSpecial instructionsStandard instructions Easy to implement for any number of processors > 2 processes becomes complex but doable Slide 13 Putting it all together: a lock based Stack Stack: A list or an array based data structure that enforces last-in-first-out ordering of elements Operations Void Push(T data) : pushes the variable data on to the stack T Pop() : removes the last item that was pushed on to a stack. Throws a stackEmptyException if the stack is empty Int Size() : returns the size of the stack All operations are synchronized using one common lock object. 13 Slide 14 Code : Java 14 Class Stack { ArrayList _container = new ArrayList (); RentrantLock _lock = new ReentrantLock(); public void push(T data){ _lock.lock(); _container.add(data); _lock.unlock();} public int size(){ int retVal; _lock.Lock(); retVal = _container.size(); _lock.unlock(); return retVal; } public T pop(){ _lock.lock(); if(_container.empty()) { _lock.unlock(); throw new Exception(Stack Empty);} T retVal _container.get(_container.size() 1); _lock.unlock(); return retVal; } Slide 15 Problems with locks Stack is simple enough. There is only one lock. The overhead isnt that much. But there are data structures that could have multiple locks Problems with locking Deadlock Priority inversion Convoying Kill tolerant availability Preemption tolerance Overall performance 15 Slide 16 Problems with locking 2 Priority inversion: Assume two threads: T1 with very low priority T2 with very high priority Both need to access a shared resource R but T2 holds the lock to R T2 takes longer to complete the operation leaving the higher priority thread waiting, hence by extension T1 has achieved a lower priority Possible solution Priority inheritance 16 Slide 17 Problems with Locking 3 Deadlock: Processes cant proceed because each of them is waiting for the other release a needed resource. Scenario: There are two locks A and B Process 1 needs A and B in that order to safely execute Process 2 needs B and A in that order to safely execute Process 1 acquires A and Process two acquires B Now Process 1 is waiting for Process 2 to release B and Process 2 is waiting for process 1 to release A 17 Slide 18 Problems with Locking 4 Convoying, all the processes need a lock A to proceed however, a lower priority process acquires A it first. Then all the other processes slow down to the speed of the lower priority process. Think of a freeway: You are driving an Aston Martin but you are stuck behind a beat up old pick truck that is moving very slow and there is no way to overtake him. 18 Slide 19 Problems with Locking 5 Kill tolerance What happens when a process holding a lock is killed? Everybody else waiting for the lock may not ended up getting it and would wait forever. Async-signal safety Signal handlers cant use lock-based primitives Why? Suppose a thread receives a signal while holding a user level lock in the memory allocator Signal handler executes, calls malloc, wants the lock 19 Slide 20 Problems with Locking 6 Overall performance Arguable Efficient lock-based algorithms exist Constant struggle between simplicity and efficiency Example. thread-safe linked list with lots of nodes Lock the whole list for every operation? Reader/writer locks? Allow locking individual elements of the list? 20 Slide 21 A Possible solution Lock-free Programming 21 Slide 22 Lock-free data structures A data structure wherein there are no explicit locks used for achieving synchronization between multiple threads, and the progress of one thread doesnt block/impede the progress of another. Doesnt imply starvation freedom ( Meaning one thread could potentially wait forever). But nobody starves in practice Advantages: You dont run into all the that you would problems with using locks Disadvantages: To be discussed later 22 Slide 23 Lock-free Programming Think in terms of Algorithms + Data Structure = Program Thread safe access to shared data without the use of locks, mutexes etc. Possible but not practical/feasible in the absence of hardware support So what do we need? A compare and set primitive from the hardware guys, abbreviated CAS (To be discussed in the next slide) Interesting TidBit: Lots of music sharing and streaming applications use lock-free data structures PortAudio, PortMidi, and SuperColliderPortAudio 23 Slide 24 Lock-free Programming Compare and Set primitive boolean cas( int * valueToChange, int * valueToSet To, int * ValueToCompareTo) Sematics: The pseudocode below executes atomically without interruption If( valueToChange == valueToCompareTo){ valueToChange = valueToSetTo; return true; } else { return false; } This function is exposed in Java through the atomic namespace, in C++ depending on the OS and architecture, you find libraries CAS is all you need for lock-free queues, stacks, linked-lists, and sets. 24 Slide 25 Trick to building lock-free data structures Limit the scope of changes to a single atomic variable Stack : head Queue: head or tail depending on enque or deque 25 Slide 26 A simple lock-free example A lock free Stack Adopted from Geoff Langdale at CMU Intended to illustrate the design of lock-free data structures and problems with lock-free synchronization There is a primitive operation we need: CAS or Compare a