Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation...

Post on 04-Jan-2016

224 views 4 download

Transcript of Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation...

Distributed Galois

Andrew Lenharth2/27/2015

Goals

• An implementation of the operator formulation for distributed memory– Ideally forward-compatible where possible

• Both simple programming model and fast implementation– Like Galois, may need restrictions or structure for

highest performance

Overview

• PGAS (using fat pointers)• Implicit, asynchronous communication• Default execution mode:– Galois compatable– Implicit locking and data movement– Plugable schedulers– Speculative execution

• All D-Galois programs are valid Galois

Support

Galois Implementation

User Code

User Context

Graph

Parallel Loop

Contention Manager

Memory Management

Statistics

Topology

Scheduler

Barrier

Termination Etc

Support

Distributed Galois Implementation

User Code

User Context

Graph

Parallel Loop

Contention Manager

Memory Management

Statistics

Topology

Scheduler

Barrier

Termination Etc

NetworkDirectoryRemote Store

Current Status

• Working implementation of baseline– Asynchronous, speculative

Interesting Problems

• Livelock• Asynchronous directory• Abstractions for building data-structures• Network hardware• Network software• Remote updates• Scheduling

Solved: Livelock

• Source: object state transition is more complex, is asynchronous, and may require multiple steps (hence interruptable)

• Solution: scheme to ensure forward progress of one host

• Alternate: if this happens a lot for your application, a coordinated scheduling may be more appropriate (or relaxed consistency)

Asynchronous Directory

• Source: communication and workers interleave access to directory (and directly to objects stored in the directory)

• Solution: mostly just a pain.

Abstraction for building DS

• Source: Distributed data structures are hard (so are SM DS).

• Solution: Set of abstractions• Federated object: different instance on each

host/thread, pointers resolve locally.• Federation bootstrapped by runtime.• Federated objects don’t have any notion of

exclusive behavior

Remote Updates

• Directory synchronization really bad when not needed (essential when needed)

• Many algorithms have an update and schedule behavior for their neighbors

• Treat this behavior as a task type– Multiple task-types per loop– Quite similar to nested parallelism

Remote Updates – PageRank

Self.value += self.residualFor n : neighbor

n.residual += f(self.residual)Schedule (operator type on) {n}

Self.value += self.residualFor n : neighbor

Schedule (update type on) {n, f(self.residual)}

With a new operator:Self.redual += updateSchedule (operator type on) {self}

Scheduling

• Source: Imagine SSSP using the existing schedulers (host-unaware) on distributed memory

• Need schedule with way to anchor work to data-structure element

Network hardware

Networks

• Small asynchronous messages are bad for throughput

• Scale-free graphs stress throughput• Large messages are bad for latency• Find optimal point– Sometimes latency is critical

Nagle’s algorithm

• If you don’t have a large message, wait a while to get more data

• Bad for latency• Also, keeps MPI in it’s broken behavior range• Also, requires O(P) memory for

communications (assuming direct pointwise)

Communication pattern

Communication pattern

Software Routing

• Pros: single communication channel– Scales with hosts– Aggregates all messages

• Cons: 2 hops (or more)