Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler...
-
Upload
herbert-caron -
Category
Documents
-
view
239 -
download
1
Transcript of Local-Spin Algorithms Multiprocessor synchronization algorithms (20225241) Lecturer: Danny Hendler...
Local-Spin Algorithms
Multiprocessor synchronization
algorithms (20225241)
Lecturer: Danny Hendler
This presentation is based on the book “Synchronization Algorithms and Concurrent Programming” by G. Taubenfeld and on a the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman
The CC and DSM models
This figure is taken from the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman
Remote and local memory accesses
In a DSM system: local
remote
In a Cache-coherent system:
An access of v by p is remote if it is the first access or if v has been written by another process since p’s last access of it.
Local-spin algorithmsIn a local-spin algorithm, all busy waiting
(‘await’) is done by read-only loops of local-accesses, that do not cause
interconnect traffic.
The same algorithm may be local-spin on one architecture (DSM/CC) and non-local spin
on the other!
For local-spin algorithms, our complexity metric is the worst-case number of Remote
Memory References (RMRs)
Peterson’s 2-process algorithm
Program for process 1
1. b[1]:=true2. turn:=13. await (b[0]=false or
turn=0)4. CS5. b[1]:=false
Program for process 0
1. b[0]:=true2. turn:=03. await (b[1]=false or
turn=1)4. CS5. b[1]:=false
Is this algorithm local-spin on a DSM machine?No
Is this algorithm local-spin on a CC machine?Yes
Peterson’s 2-process algorithm
Program for process 1
1. b[1]:=true2. turn:=13. await (b[0]=false or
turn=0)4. CS5. b[1]:=false
Program for process 0
1. b[0]:=true2. turn:=03. await (b[1]=false or
turn=1)4. CS5. b[0]:=false
What is the RMR complexity on a DSM machine?
Unbounded
What is the RMR complexity on a CC machine?Constant
Kessel’s single-writer algorithm
Program for process 0
1. b[0]:=true2. local[0]:=turn[1]3. turn[0]:=local[0]4. Await (b[1]=false or
local[0]<>turn[1])5. CS6. b[0]:=false
Program for process 1
1. b[1]:=true2. local[1]:=1-turn[0]3. turn[1]:=local[1]4. Await (b[0]=false or
local[1]=turn[0])5. CS6. b[1]:=false
Can Kessel’s algorithm be made local-spin on a DSM machine?Yes, if:
b[1], turn[1] are located at p0’s memory module
b[0], turn[0] are located at p1’s memory module
Anderson’s queue-based algorithmShared:integer ticket – A RMW object, initially 0bit valid[0..n-1], initially valid[0]=1 and valid[i]=0, for i{1,..,n-1}
Local:integer myTicket
Program for process i1. myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket2. await valid[myTicket]=1 ; wait for your turn3. CS4. valid[myTicket]:=0 ; dequeue5. valid[myTicket+1 mod n]:=1 ; signal successor
0 1 2 3 n-1
valid 1 0
1
0 0 0 0
ticket
Anderson’s queue-based algorithm (cont’d)
0ticket
valid 1 0 0 0 0
Initial configuration
1ticket
valid 1 0 0 0 0
After entry section of p3
0myTicket3
After p1 performs entry section
2ticket
valid 1 0 0 0 0
0myTicket3
1myTicket1
2ticket
valid 0 1 0 0 0
After p3 exits
1myTicket1
Anderson’s queue-based algorithm (cont’d)
What is the RMR complexity on a DSM machine?
Unbounded
What is the RMR complexity on a CC machine?Constant
Program for process i1. myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket2. await valid[myTicket]=1 ; wait for your turn3. CS4. valid[myTicket]:=0 ; dequeue5. valid[myTicket+1 mod n]:=1 ; signal successor
Graunke and Thakkar’s algorithm
Uses the more common swap primitive:
swap(w, new)do atomically prev:=w w:=new return prev
Graunke and Thakkar’s algorithm (cont’d)Shared:bit slots[0..n-1], initially slots[i]=1, for i{0,..,n-1}
structure {bit value, bit *node} tail, initially {0, &slots[0]}
Local:structure {bit value, bit *node} myRecord, prevbit temp
0
tail
1 1 1 1 1
0 2 3 n-11
slots
Graunke and Thakkar’s algorithm (cont’d)Shared:bit slots[0..n-1], initially slots[i]=1, for i{0,..,n-1}
structure {bit value, bit* slot} tail, initially {0, &slot[0]}
Local:structure {bit value, bit* node} myRecord, prev, bit temp
Program for process i1. myRecord.value:=slots[i] ; prepare to thread yourself to queue2. myRecord.slot:=&slots[i]3. prev=swap(&tail, myRecord) ; prev now points to predecessor4. await (*prev.slot ≠prev.value) ;local spin until predecessor’s value changes5. CS6. temp:=1-slots[i]7. slots[i]:=temp ; signal successor
Graunke and Thakkar’s algorithm (cont’d)
What is the RMR complexity on a DSM machine?
Unbounded
What is the RMR complexity on a CC machine?Constant
Program for process i1. myRecord.value:=slots[i] ; prepare to thread yourself to queue2. myRecord.slot:=&slots[i]3. prev=swap(&tail, myRecord) ; prev now points to predecessor4. await (*prev.slot ≠prev.value) ;local spin until predecessor’s value changes5. CS6. temp:=1-slots[i]7. slots[i]:=temp ; signal successor
The MCS queue-based algorithm
Type:Qnode: structure {bit locked, Qnode *next}Shared:Qnode nodes[0..n-1]
Qnode *tail initially nil
Local:Qnode *myNode, initially &nodes[i]Qnode *prev, *successor
Has constant RMR complexity under both the DSM and CC models
Uses swap and CAS
The MCS queue-based algorithm (cont’d)
Program for process i1. myNode.next := nil ; prepare to be last in queue2. prev := myNode ;prepare to thread yourself3. swap(&tail, prev) ;tail now points to myNode4. if (prev ≠ nil) ;I need to wait for a predecessor5. *myNode.locked := true ;prepare to wait6. *prev.next := myNode ;let my predecessor know it has to unlock me7. await myNode.locked := false8. CS9. if (myNode.next = nil) ; if not sure there is a successor 10. if (compare-and-swap(tail, myNode, nil) = false) ; if there is a
successor11. await (myNode->next ≠ null) ; spin until successor let me know its
identity12. successor := myNode->next ; get a pointer to my successor13. successor->locked := false ; unlock my successor14. else ; for sure, I have a successor15. successor := myNode->next ; get a pointer to my successor16. successor->locked := false ; unlock my successor
A local-spin tournament-tree algorithm(Anderson, Yang, 1993)
O(log n) RMR complexity for both DSM and CC systems
This is `suspected’ to be optimal!
Uses O(n log n) registers
0
0 1
0 1 2 3
0 1 2 3 4 5 6 7
Level 0
Level 1
Level 2
Processes
Each node is identified by
(level, number)
A local-spin tournament-tree algorithm (cont’d)
Shared:- Per each node, v, there are 3 registers: name[level, 2node], name[level, 2node+1] initially -1 turn[level, node]
- Per each level l and process i, a spin flag: flag[level, i]
Local:level, node, id
A local-spin tournament-tree algorithm (cont’d)Program for process i1. id:=i2. For level = o to log n-1 do ;from leaf to root3. node:= id/2 ;the current node4. name[level, 2node+(id mod 2)]:=i ;identify yourself5. turn[level,node]:=id ;update the tie-breaker6. flag[level, i]:=0 ;initialize the locally-accessible spin flag7. if (even(id))8. rival:=name[level, id+1]9. else10. rival:=name[level, id-1]11. if ( (rival ≠ -1) and (turn[level, node] = i) ) ;if not sure I should precede rival12. if (flag[level, rival] =0)13. flag[level, rival]:=1 ;release the rival from waiting 14. await flag[level, i] ≠ 0 ;await until sure the rival updated the tie-breaker15. if (turn[level,node]=i) ;if I lost16. await flag[level,i]=2 ;wait till rival notifies me its my turn17. id:=node ;move to the next level18. CS19. for level=log n –1 downto 0 do ;begin exit code20. id:= i/2level , node:= id/2 ;set node and id21. name[level, 2node+(id mod 2]) :=-1 ;erase name22. rival := turn[level,node] ;find who rival is (if there is one)23. if rival ≠ i ;if there is a rival24. flag[level,rival] :=2 ;notify rival