Paractical Solutions for Multicore Programming

download Paractical Solutions for Multicore Programming

If you can't read please download the document

Transcript of Paractical Solutions for Multicore Programming

  1. 1. Practical Solutions for Multicore ProgrammingDr. Guy Korland
  2. 2. Process 1 a = acc.get() a = a + 100Process 2 b = acc.get() b = b + 50 acc.set(b)acc.set(a) ... Lost Update! ...
  3. 3. Process 1Process 2lock(A) lock(B) . lock(A) lock(A) ... DeadLock! ...
  4. 4. Process 1Process 2atomic{ a = acc.get() a = a + 100 acc.set(a) }atomic{ b = acc.get() b = b + 50 acc.set(b) }... WIIIII! ...
  5. 5. Intel TSX if(_xbegin()==-1) { if( !fallback_mutex.is_acquired() ) { tions. sums[mygroup] += data[i];e instruc impl } else { d to s e _xabort(1); Limit ll-back fa } erency Coh uires _xend(); Req Cache } else { ing on fallback_mutex.acquire(); Relay sums[mygroup] += data[i]; fallback_mutex.release(); }
  6. 6. We still need Software Transactional Memory
  7. 7. DSTM2Maurice Herlihy et al, A flexible framework [OOPSLA06]@atomic public interface INode{ int getValue (); void setValue (int value ); jects. } to Ob dimite L sive. Factory factory ru int = Thread.makeFactory(INode.class ); aries. final INodeVery factory.create(); ort libr node = factory result = Thread.doIt(new Callable() { t supp e (fork). n Does public Boolean call nc rma () { return node.setValue(value); perfo Bad } });
  8. 8. JVSTMJoo Cachopoand Antnio Rito-Silva, Versioned boxes as the basis for memory transactions [SCOOL05]public class Account{ private VBox balance = new aries. VBox();}rt libr suppo public @Atomic void withdraw(long amount) { esnt Do e. - amount); hared fields balance.put rusiv int(balance.get() nce s Less } nnou to A Need
  9. 9. Atom-JavaB. Hindman and D. Grossman. Atomicity via source-tosource translation. [MSPC06]public void update ( double value) { Atomic { ord. w commission += value; erved a res tion. Add } ompila ries. pre-c } ibra eedN ort l t supp sive. n Does s intru n Les Eve
  10. 10. Deuce STM - API G. Korland, N. Shavit and P. Felber, Noninvasive Java Concurrency with Deuce STM,[MultiProg '10]public class Bank{ rds. ed wo private double commission = 10; servNo re ased. nb tion. @Atomicnnotatio mpila A re co pac1,-Account ac2,rdouble amount){ public void transaction( Account ies. d for ee No n (amount + commission);lib al ra ol ac1.balance -= xtern ac2.balanceppamount;e += orts rch to Su resea } able d Exten }
  11. 11. Deuce STM - Overview
  12. 12. Benchmarks(Sun UltraSPARC T2 Plus 2 x Quad x 8 HT)
  13. 13. Benchmarks(Azul Vega2 2 x 48)
  14. 14. Benchmark - the dark side 1.210.80.60.40.20 12345678910
  15. 15. Overhead Contention Retries, Aborts, Contention Manager STM Algorithm Data structures, optimistic, pessimisticSemantic Consistency model, PrivatizationInstrumented Memory access Linear overhead on every read/write
  16. 16. Static analysis Optimizations 1. Avoiding instrumentation of accesses to immutable and transaction-local memory. 2. Avoiding lock acquisition and releases for local memory.thread-3. Avoiding readset population in read-only transactions.
  17. 17. Novel Static analysis OptimizationsY. Afek, G. Korland, and A. Zilberstein, Lowering STM Overhead with Static Analysis, LCPC'101. Reduce amount of instrumented memory reads using load elimination. 2. Reduce amount of instrumented memory writes using scalar promotion. 3. Avoid writeset lookups for memory not yet written to. 4. Avoid writeset record keeping for memory that will not be read. 5. Reduce false conflicts by Transaction re-scoping. ...
  18. 18. Benchmarks K-Means
  19. 19. We still need Fine-Grained Concurrent Data Structures
  20. 20. e.g. Pool P1 Get( ) Put(x) C2 P2. . . C1 Put(y) Get( ) Pn Put(z) Get( ) pool. . . Cn
  21. 21. Java - pools 1. SynchronousQueue/Stack -pairing up function without buffering. Producers and consumers wait for one anotherlabilty. /FIFO Sca LIFO and leave, mited Li 2. LinkedBlockingQueuet- Producers put their value ' need n Consumers wait l does become available. for a value to Poo3. ConcurrentLinkedQueue - Producers put their value and leave, Consumers return null if the pool is empty.
  22. 22. ED-Tree Scalable Producer-Consumr Pools Based on Elimination-Diffraction Trees (Y. Afek, G. Korland, M. Natanzon, N. Shavit): ucture Merge ee Str ng-Tr fracti Dif ach) d Zem cture an havit e Stru (S n-Tre inatio Elim itou) nd Tou ueue a v it a (Sh kingQ Bloc ed Link
  23. 23. Performance
  24. 24. What about other cases?
  25. 25. Do we really need Linearizability?
  26. 26. Can we make it more formal?
  27. 27. The solution: Relax the Linearizability Requirements Y. Afek, G. Korland, and A. Yanovsky, Quasi-Linearizability: relaxed consistency for improved concurrency, OPODIS'10
  28. 28. e.g. Task Queue TailHeadTaskTask ConsumersTaskTaskTaskTaskTask Producers
  29. 29. K-Quasi Task Queue k TailHead TaskTaskTaskTask ConsumerTaskTaskTask ConsumerTaskTask
  30. 30. Quasi Linearizable DefinitionH123456H412356Distance 3
  31. 31. More motivation... Statistical CounterID generatorWeb Cache