A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2...

43
A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of Computer Science, The University of Texas at Austin 2: Institute for Computational Engineering and Sciences, The University of Texas at Austin

Transcript of A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2...

Page 1: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

A Shape Analysis for Optimizing Parallel Graph Programs

Dimitrios Prountzos1

Keshav Pingali1,2

Roman Manevich2

Kathryn S. McKinley1

1: Department of Computer Science, The University of Texas at Austin2: Institute for Computational Engineering and Sciences, The

University of Texas at Austin

Page 2: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

2

Motivation

• Graph algorithms are ubiquitous

• Goal: Compiler analysis for optimization of parallel graph algorithms

Computational biology Social Networks

Computer Graphics

Page 3: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

3

Organization• Parallelization of graph algorithms in Galois system

– Speculative execution– Example: Boruvka MST algorithm

• Optimization opportunities– Reduce speculation overheads– Analysis problem: LockSet shape analysis

• Lockset shape analysis – Abstract Data Type (ADT) modeling– Hierarchy summarization abstraction– Predicate discovery

• Evaluation– Fast and infers all available optimizations– Optimizations give speedup up to 12x

Page 4: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

4

Boruvka’s Minimum Spanning Tree Algorithm

Build MST bottom-uprepeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a single node

c d

a b

e f

g

2 4

6

5

3

7

4

1d

a,c b

e f

g

4

6

3

4

17

lt

Page 5: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

5

• Algorithm = repeated application of operator to graph

– Active node: • Node where computation is needed

– Activity: • Application of operator to active node

– Neighborhood:• Sub-graph read/written to perform activity

– Unordered algorithms: • Active nodes can be processed in any order

• Amorphous data-parallelism– Parallel execution of activities, subject to neighborhood constraints

• Neighborhoods are functions of runtime values – Parallelism cannot be uncovered at compile time in general

Parallelism in Boruvka

i1

i2

i3

Page 6: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

6

Optimistic Parallelization in Galois• Programming model

– Client code has sequential semantics– Library of concurrent data structures

• Parallel execution model– Thread-level speculation (TLS)– Activities executed speculatively

• Conflict detection– Each node/edge has associated exclusive

lock– Graph operations acquire locks on

read/written nodes/edges– Lock owned by another thread conflict

iteration rolled back– All locks released at the end

• Two main overheads– Locking– Undo actions

i1

i2

i3

Page 7: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

7

Overheads (I): Locking• Optimizations– Redundant locking elimination– Lock removal for iteration private data– Lock removal for lock domination

• ACQ(P): set of definitely acquired locks per program point P• Given method call M at P:

Locks(M) ACQ(P) Redundant Locking

Page 8: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

8

Overheads (II): Undo actions

Lockset Grows

Lockset Stable

Failsafe

foreach (Node a : wl) {

}

foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}

Program point P is failsafe if: Q : Reaches(P,Q) Locks(Q) ACQ(P)

Page 9: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

9

GSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();

foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}

Lockset Analysis

• Redundant Locking• Locks(M) ACQ(P)

• Undo elimination• Q : Reaches(P,Q)

Locks(Q) ACQ(P)

• Need to compute ACQ(P)

: Runtime overhead

Page 10: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

10

Analysis Challenges• The usual suspects: – Unbounded Memory Undecidability – Aliasing, Destructive updates

• Specific challenges:– Complex ADTs: unstructured graphs– Heap objects are locked– Adapt abstraction to ADTs

• We use Abstract Interpretation [CC’77]– Balance precision and realistic performance

Page 11: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

11

Organization• Parallelization of graph algorithms in Galois system

– Speculative execution– Example: Boruvka MST algorithm

• Optimization opportunities– Reduce speculation overheads– Analysis problem: LockSet shape analysis

• Lockset shape analysis – Abstract Data Type (ADT) modeling– Hierarchy summarization abstraction– Predicate discovery

• Evaluation– Fast and infers all available optimizations– Optimizations give speedup up to 12x

Page 12: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

Shape Analysis Overview

12

HashMap-Graph

Tree-based Set

……

Graph { @rep nodes @rep edges …}

Graph Spec

Concrete ADTImplementationsin Galois library

Predicate Discovery

Shape Analysis

Boruvka.javaOptimizedBoruvka.java

Set { @rep cont …}

Set Spec

ADT Specifications

Page 13: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

13

ADT Specification

Graph<ND,ED> {

@rep set<Node> nodes @rep set<Edge> edges

Set<Node> neighbors(Node n);

}

Graph Spec

...Set<Node> S1 = g.neighbors(n);

...

Boruvka.java

Abstract ADT state by virtual set fields

@locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src)@op( nghbrs = n.rev(src).dst + n.rev(dst).src , ret = new Set<Node<ND>>(cont=nghbrs) )

Assumption: Implementation satisfies Spec

Page 14: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

14

Graph<ND,ED> {

@rep set<Node> nodes@rep set<Edge> edges

@locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src)@op( nghbrs = n.rev(src).dst + n.rev(dst).src , ret = new Set<Node<ND>>(cont=nghbrs) ) Set<Node> neighbors(Node n);}

Modeling ADTs

c

a bGraph Spec

dst

src

srcdst

dst

src

Page 15: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

15

Modeling ADTs

c

a b

nodes edges

Abstract State

cont

ret nghbrs

Graph Spec

dst

src

srcdst

dst

src

Graph<ND,ED> {

@rep set<Node> nodes@rep set<Edge> edges

@locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src)@op( nghbrs = n.rev(src).dst + n.rev(dst).src , ret = new Set<Node<ND>>(cont=nghbrs) ) Set<Node> neighbors(Node n);}

Page 16: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

16

Organization• Parallelization of graph algorithms in Galois system

– Speculative execution– Example: Boruvka MST algorithm

• Optimization opportunities– Reduce speculation overheads– Analysis problem: LockSet shape analysis

• Lockset shape analysis – Abstract Data Type (ADT) modeling– Hierarchy summarization abstraction– Predicate discovery

• Evaluation– Fast and infers all available optimizations– Optimizations give speedup up to 12x

Page 17: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

17

cont cont

S1 S2L(S1.cont) L(S2.cont)

Abstraction Scheme

(S1 ≠ S2) ∧ L(S1.cont) ∧ L(S2.cont)

• Parameterized by set of LockPaths: L(Path) o . o ∊ Path Locked(o)– Tracks subset of must-be-locked objects

• Abstract domain elements have the form: Aliasing-configs 2LockPaths …

Page 18: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

18

( L(y.nd) ) ( () L(x.nd) )

( L(y.nd) ) ( L(y.nd) L(x.rev(src)) ) ( () L(x.nd) )

Joining Abstract States

( L(y.nd) ) ( () L(x.nd) )

Aliasing is crucial for precisionMay-be-locked does not enable our optimizations

#Aliasing-configs : small constant (6)

Page 19: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

19

lt

GSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();

foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}

Example Invariant in Boruvka

The immediate neighbors of a and lt are locked

a

( a ≠ lt ) ∧ L(a) L(a.rev(src)) L(a.rev(dst))∧ ∧ ∧ L(a.rev(src).dst) L(a.rev(dst).src) ∧ ∧ L(lt) L(lt.rev(dst)) L(lt.rev(src)) ∧ ∧ ∧ L(lt.rev(dst).src) L(lt.rev(src).dst)∧

…..

Page 20: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

20

Heuristics for Finding Paths

• Hierarchy Summarization (HS)– x.( fld )*– Type hierarchy graph acyclic

bounded number of paths

– Preflow-Push: • L(S.cont) L(S.cont.nd)∧• Nodes in set S and their data are locked

Set<Node>

S

Node

NodeData

cont

nd

Page 21: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

21

Footprint Graph Heuristic• Footprint Graphs (FG)[Calcagno et al. SAS’07]– All acyclic paths from arguments of ADT method to

locked objects– x.( fld | rev(fld) )* – Delaunay Refinement: L(S.cont) L(S.cont.rev(src)) L(S.cont.rev(dst)) ∧ ∧ ∧ L(S.cont.rev(src).dst) L(S.cont.rev(dst).src)∧– Nodes in set S and all of their immediate

neighbors are locked

• Composition of HS, FG– Preflow-Push: L(a.rev(src).ed)

FG HS

Page 22: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

22

Organization• Parallelization of graph algorithms in Galois system

– Speculative execution– Example: Boruvka MST algorithm

• Optimization opportunities– Reduce speculation overheads– Analysis problem: LockSet shape analysis

• Shape analysis – Abstract Data Type modeling– Hierarchy summarization abstraction– Predicate discovery

• Evaluation– Fast and infers all available optimizations– Optimizations give speedup up to 12x

Page 23: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

23

Experimental Evaluation• Implement on top of TVLA– Encode abstraction by 3-Valued Shape Analysis

[SRW TOPLAS’02]• Evaluation on 4 Lonestar Java benchmarks

• Inferred all available optimizations• # abstract states practically linear in program size

Benchmark Analysis Time (sec)

Boruvka MST 6

Preflow-Push Maxflow 7

Survey Propagation 12

Delaunay Mesh Refinement 16

Page 24: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

24

Impact of Optimizations for 8 Threads

Boruvka MST Delaunay Mesh Refinement

Survey Propaga-tion

Preflow-Push Maxflow

0

50

100

150

200

250

Baseline Optimized

Tim

e (s

ec)

5.6× 4.7×11.4×

2.9× 8-core Intel Xeon @ 3.00 GHz

Page 25: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

25

Related Work• Safe programmable speculative parallelism [Prabhu et al. PLDI’10]

– Focused on value speculation on ordered algorithms– Different rollback freedom condition

• Transactional Memory compiler optimizations [Harris et al. PLDI’06, Dragojevic et al. SPAA’09]– Similar optimizations– Don’t target rollback freedom– Imprecise for unbounded data-structures

• Optimizations for parallel graph programs [Mendez-Lojo et al. PPOPP’10]– Manual optimizations– Failsafe subsumes cautious

• Verifying conformance of ADT implementation to specification– The Jahob project (Kuncak, Rinard, Wies et al.)

Page 26: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

26

Conclusion• New application for static analysis – Optimization of optimistically parallelized graph

programs

• Novel shape analysis– Utilize observations on the structure of concrete

states and programming style

• Enables optimizations crucial for performance

Page 27: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

27

Thank You!

Page 28: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

28

Backup

Page 29: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

Outline of Boruvka MST CodeGSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();

foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}

Pick arbitrary worklist node

Find lightest neighbor

Update neighbors of lightest

Update worklist and MST

Page 30: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

30

Approximating Sets of Locked Objects

Reachability-based Scheme

cont cont

S1 S2

Reach(S1.cont)Reach(S2.cont) Reach(S2.cont) Reach(S2.cont)

cont cont

S1 S2

Reach(S1.cont)Reach(S2.cont) Reach(S2.cont)

Page 31: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

31

cont cont

S1 S2L(S1.cont) L(S2.cont)

Approximating Sets of Locked Objects

Hierarchy Summarization Scheme

cont cont

S1 S2L(S1.cont) L(S2.cont) (S1 ≠ S2)

∧ L(S1.cont) ∧ L(S2.cont)

L(Path) o . o ∊ Path Locked(o)

Page 32: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

32

• Graph API uses flags to enable/disable locking and storing undo actions

– removeEdge(Node src, Node dst, Flag f);

Enabling Optimizations in Galois

Challenge: Find minimal flag per ADT method call Solution: Lockset analysis

UNDOLOCKS

NONE

ALL

Lock(src)Lock(dst)

Lock( (src,dst) )addEdge(src, dst);

Page 33: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

33

Optimization Conditions• set of definitely acquired locks per program

point

• Given method call at :

• Program point is failsafe if:

– Infer from by simple backward analysis

Page 34: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

34

Speculation Overheads and Optimizations

Source of Overhead Optimization

Locking shared objects• Redundant locking elimination• Lock elision for iteration private data• Lock domination

Backup original state for rollback • Avoid backups after failsafe points

Page 35: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

35

Modeling ADTs

c d

a b

7

23

4

5

ns es

Abstract State

cont

retnghbrs

Graph<ND,ED> {

@rep set<Node> ns; // nodes @rep set<Edge> es; // edges

@locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src)@op( nghbrs = n.rev(src).dst + n.rev(dst).src , ret = new Set<Node<ND>>(cont=nghbrs) ) Set<Node> neighbors(Node n);

}

Graph Spec

a b

5

srcdst

ed

Page 36: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

36

Failsafe Points – Eliminating Undo Actions

c d

a b

7

23

4

5

Graph Node

Graph Edge

Edge Data

lt

GSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();

foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}

a

g.neighbors(lt);CAUTIOUS OPERATOR [Mendez et al. PPOPP’10]

Page 37: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

37

Boruvka’s Minimum Spanning Tree Algorithm

• Build MST bottom uprepeat { pick arbitrary active node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a singular node

c d

a b

e f

g

2 4

6

5

3

7

4

1

3

7

Page 38: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

39

Failsafe Points – Eliminating Undo Actions

c d

a b

7

23

4

5

Graph Node

Graph Edge

Edge Data Acquired

New Lock

Redundant

lt

GSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();

foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}

a

g.neighbors(lt);CAUTIOUS OPERATOR [Mendez et al. PPOPP’10]

Page 39: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

40

GSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();

foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}

Redundant Locking Example

c d

a b

7

23

4

5

Graph Node

Graph Edge

Edge Data Acquired

New Lock

Redundant

lt

a

Page 40: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

41

Boruvka’s Minimum Spanning Tree Algorithm

c d

a b

e f

g

2 4

6

5

3

7

4

1

3

7

GSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();

foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}

• Build MST iteratively• Pick random active node• Contract edge with lightest

neighbor

Page 41: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

42

GSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();

foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}

Parallelism in Boruvka’s Algorithm

c d

a b

e f

g

2 4

6

5

37

4

1 f

• Dependences between activities are functions of runtime values • Parallelism cannot be uncovered

at compile time in general

• Don’t Care Non-Determinism• All produced MSTs correct and

optimal

Page 42: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

43

Abstraction of a Single State

c d

a b

7

23

4

5

a

lt

State after first loop in Boruvka

( a lt ) ∧ L(a) ∧ L(a.rev(src)) ∧ L(a.rev(dst)) ∧ L(a.rev(src).dst) ∧ L(a.rev(dst).src) ∧

UniqueRef(EdgeData)

We maintain We loose

Set of definitely locked objects denoted by lockpaths Maybe locked information

Aliasing of top level variables Cardinality of sets

Uniquely pointed-to types and objects referenced from the stack

Content sharing of multiple collections

Page 43: A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of.

44

Hierarchy Summarization Intuition

GraphSet<Node>

Weight

Node

Edge

Iterator<Node>

es

src,dst

nscont

ed

past, at,future

nd

all

gaNghbrsltNghbrsnIter

a, lt, n

w,wan,

minW

e,an

Gset<Node>

gcont

wl

GBag<Weight>

mst

bcont

Void