Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation...
-
Upload
kelley-mcgee -
Category
Documents
-
view
224 -
download
4
Transcript of Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation...
![Page 1: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/1.jpg)
Distributed Galois
Andrew Lenharth2/27/2015
![Page 2: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/2.jpg)
Goals
• An implementation of the operator formulation for distributed memory– Ideally forward-compatible where possible
• Both simple programming model and fast implementation– Like Galois, may need restrictions or structure for
highest performance
![Page 3: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/3.jpg)
Overview
• PGAS (using fat pointers)• Implicit, asynchronous communication• Default execution mode:– Galois compatable– Implicit locking and data movement– Plugable schedulers– Speculative execution
• All D-Galois programs are valid Galois
![Page 4: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/4.jpg)
Support
Galois Implementation
User Code
User Context
Graph
Parallel Loop
Contention Manager
Memory Management
Statistics
Topology
Scheduler
Barrier
Termination Etc
![Page 5: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/5.jpg)
Support
Distributed Galois Implementation
User Code
User Context
Graph
Parallel Loop
Contention Manager
Memory Management
Statistics
Topology
Scheduler
Barrier
Termination Etc
NetworkDirectoryRemote Store
![Page 6: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/6.jpg)
Current Status
• Working implementation of baseline– Asynchronous, speculative
![Page 7: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/7.jpg)
Interesting Problems
• Livelock• Asynchronous directory• Abstractions for building data-structures• Network hardware• Network software• Remote updates• Scheduling
![Page 8: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/8.jpg)
Solved: Livelock
• Source: object state transition is more complex, is asynchronous, and may require multiple steps (hence interruptable)
• Solution: scheme to ensure forward progress of one host
• Alternate: if this happens a lot for your application, a coordinated scheduling may be more appropriate (or relaxed consistency)
![Page 9: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/9.jpg)
Asynchronous Directory
• Source: communication and workers interleave access to directory (and directly to objects stored in the directory)
• Solution: mostly just a pain.
![Page 10: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/10.jpg)
Abstraction for building DS
• Source: Distributed data structures are hard (so are SM DS).
• Solution: Set of abstractions• Federated object: different instance on each
host/thread, pointers resolve locally.• Federation bootstrapped by runtime.• Federated objects don’t have any notion of
exclusive behavior
![Page 11: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/11.jpg)
Remote Updates
• Directory synchronization really bad when not needed (essential when needed)
• Many algorithms have an update and schedule behavior for their neighbors
• Treat this behavior as a task type– Multiple task-types per loop– Quite similar to nested parallelism
![Page 12: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/12.jpg)
Remote Updates – PageRank
Self.value += self.residualFor n : neighbor
n.residual += f(self.residual)Schedule (operator type on) {n}
Self.value += self.residualFor n : neighbor
Schedule (update type on) {n, f(self.residual)}
With a new operator:Self.redual += updateSchedule (operator type on) {self}
![Page 13: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/13.jpg)
Scheduling
• Source: Imagine SSSP using the existing schedulers (host-unaware) on distributed memory
• Need schedule with way to anchor work to data-structure element
![Page 14: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/14.jpg)
Network hardware
![Page 15: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/15.jpg)
![Page 16: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/16.jpg)
![Page 17: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/17.jpg)
Networks
• Small asynchronous messages are bad for throughput
• Scale-free graphs stress throughput• Large messages are bad for latency• Find optimal point– Sometimes latency is critical
![Page 18: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/18.jpg)
Nagle’s algorithm
• If you don’t have a large message, wait a while to get more data
• Bad for latency• Also, keeps MPI in it’s broken behavior range• Also, requires O(P) memory for
communications (assuming direct pointwise)
![Page 19: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/19.jpg)
Communication pattern
![Page 20: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/20.jpg)
Communication pattern
![Page 21: Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.](https://reader030.fdocuments.net/reader030/viewer/2022013004/56649f125503460f94c25c5e/html5/thumbnails/21.jpg)
Software Routing
• Pros: single communication channel– Scales with hosts– Aggregates all messages
• Cons: 2 hops (or more)