Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta...

31
Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv

Transcript of Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta...

Page 1: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Hawkeye: Effective Discovery of Dataflow Impediments to

Parallelization

Omer TrippJohn Field

Greta YorshMooly Sagiv

Page 2: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Dataflow Impediments to Parallelization

public void set(Object o) {

this.f = calc_f(o);}

public void process() {Object o = this.f;if (o == null) {

doA();} else {

doB();}

}

public void setAndProcess(Object o) {

set(o);process();

}

set(o) || process()?

RAWdependency

Page 3: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

for (Vertex cutpoint : this.cutpoints) { UndirectedGraph subgraph = new SimpleGraph(); subgraph.addVertex(cutpoint); this.cutpointGraphs.put(cutpoint, subgraph); this.addVertex(subgraph); Set blocks = this.vertex2blocks.get(cutpoint); for (UndirectedGraph block : blocks) { int oldHitCount = this . block2hits .get(block); this.block2hits.put(block, oldHitCount+1); this.addEdge (subgraph, block); } }

Simplified version of the JGraphT algorithm for building a block-cutpoint graph

Sometimes It’s Less Obviousfor (Vertex cutpoint : this.cutpoints) { UndirectedGraph subgraph = new SimpleGraph(); subgraph.addVertex(cutpoint); this.cutpointGraphs.put(cutpoint, subgraph); this.addVertex(subgraph); Set blocks = this.vertex2blocks.get(cutpoint); for (UndirectedGraph block : blocks) { int oldHitCount = this.block2hits.get(block); this.block2hits.put(block, oldHitCount+1); this.addEdge (subgraph, block); } }

This code admits a lot of available parallelism, but there are a few impediments that must be addressed toward parallelizing it.

How can we pinpoint these dependencies precisely and concisely?

Page 4: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Field-based Dependence Analysis

So let’s use dynamic dependence

analysis instead…

Static dependence analysis is challenged by dynamic containers, aliasing, etc

Page 5: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

789modcount

table [0] next

[8] next

next

next

1

Kkey

value

]…[ …

]…[ …

2

K’key

value

next

m.put(k,1);

m.put(k’,2);

Spurious dependencies, which inhibitm.put(k,1) || m.put(k’,2)!

m = new ConcurrentHashMap();

2

m.put(k,2);

Semantic dependency, which gets “lost” in the noise!

Page 6: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Eureka: Let’s Use Abstraction

Abstract Locking

Galois

Leveraging ADT

semantics in STM conflict

detection

Using ADT semantics in DB concurrency control

(Muth et al., 93)

Exploiting commutativity in DB transactions(Bernstein, 66)

But…We need a predictive tool; our code is still sequential

We want the tool to pinpoint impediments to parallelization before applying parallelization transformations

Page 7: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

The Hawkeye Analysis Tool

789modcount

table [0] next

[8] next

next

next

1

Kkey

value

]…[ …

]…[ … 2

K’

key

value

next

K 1valueK

K’ 2

valueK’

?

value

?

value

?

?

Representation Function

Key Value

Concrete Map state Map ADT state

Dynamic analysis toolUses abstraction while tracking (certain) dependenciesUser specifies representation function for data structures of choice; rest tracked concretelyAllows concentrating on semantic dependencies while suppressing spurious dependencies

Page 8: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Specification Language

foreach key k in m.keySet() adtState.add(m -> k);foreach entry (k,v) in m.entrySet() adtState.add(k -> v);

foreach node n in g.nodes() adtState.add(g -> n);foreach edge (n1,n2) in g.edges() adtState.add(n1 -> n2);

Map

Graph

Page 9: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Specification Language

foreach instance i1 in instances() foreach instance i2 in instances() adtState.add((i1,i2) -> distance(i1,i2));…

DistanceFunction

Page 10: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Specification Language

No need to model ADT operations

User can refine approximation (though our experience shows that the default is mostly accurate)

No need for a commutativity spec

Hawkeye uses heuristics for (sound) approximation of the footrprint of an ADT operation

Page 11: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Concrete

The Hawkeye Algorithm789M

modcount

table [0] next

next

[8] next

next

next

1

Kkey

value2

K’key

value2

M(M,X)

(M,K)1

(M,K,1)K

(M,K’)2

(M,K’,2)K’

m.put(k,1);

m.put(k’,2);

m.put(k,2);

2(M,K,2) (R: {}, W: {(M,K),(M,K,1)})

(R: {}, W: {(M,K’),(M,K’,2)})

(R: {}, W: {(M,K),(M,K,1),(M,K,2)})

WAW

Our assumptions:

• linearizability – for trace abstraction• encapsulation – for state abstraction

Logical

Page 12: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Challenges

• What is the meaning of dependencies under abstraction?

• How can we track both concrete and abstract dependencies simultaneously?

We’ve developed a uniform framework for tracking data dependencies…

Page 13: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Best Write Set

• The write set of transition is the union of– the locations whose value was changed by ;– the locations allocated by ; and– the locations de-allocated by .

Intuitively, the write set of a transition is its observable effect, i.e., the delta between the

entry and exit states.

Page 14: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Best Read Set (More Tricky)

• is a sufficient read set of transition

iff for every , such that andagree on , write( ) ≡ write( ).

• The read set of transition is the union of all its minimal sufficient read sets.

',, p

M

',, p ',, pM

Intuitively, the read set of a transition is the set of locations whose values determine the

observable effect of the transition.

Page 15: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Simple Example

([y=3], set(y,4), [y=4])Read set: { y }Write set: { y }

([y=3], set(y,3), [y=3])

Read set: { y }Write set: { }

Secures y=4 in exit state

Secures empty write set

Page 16: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Approximating the “Best” Definitions

• The good news: The “best” definitions apply both in concrete and in abstract semantics

• The bad news: The definition of the “best” read set is not computable in general

An approximation r, w of read, write is sound iff• read r w• write w

Page 17: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Usage Scenario

7

modcount

table [0] next

[8] next

next

next

1

Kkey

value

]…[ …

]…[ … 2

K’

key

value

next

Hmmm… Too many dependencies!

Page 18: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Usage Scenario

K 1valueK

K’ 2

valueK’

?

value

?

value

?

?

Now I understand what’s going on!

Page 19: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Usage Scenario

K 1valueK

K’ 2

valueK’

?

value

?

value

?

?

Page 20: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Trace Length Description Name

813,382 Solver for MST problem Boruvka

2,629,457 Java code coverage analysis Cobertura

1,733,552 Utility for synchronizing pairs of directories JFileSync

710,580 Graph library JGraphT

2,190,213 Java source code analyzer PMD

17,945,255 Machine-learning library Weka

4,840,544 Web-site dowload and mirror tool WebLech

Page 21: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Boruvk

a

Cobertura

JFileSy

nc

JGrap

hT

PMD

Weka

0

50

100

150

200

250

300

HawkeyeBaseline

Number of inter-iteration dependencies at the level of ADT operations with and without abstraction

Only built-in spec (Java collections)

Page 22: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Boruvk

a

Cobertura

JFileSy

nc

JGrap

hT

PMD

Weka

0

50

100

150

200

250

300

HawkeyeBaseline

Number of inter-iteration dependencies at the level of ADT operations with and without abstraction

Including user spec (for user types)

Page 23: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

789modcount

table [0] next

T H Anext

]…[ …

]…[ …

N

next

Y O Unext

!next

next next

next

Knext

Page 24: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Backup

Page 25: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Preliminaries

• A state maps memory locations to values.

• A transition is a triple , where p is a program statement and are states, such that .

• A program trace is a sequence of transitions.• We assume an interleaving semantics of

concurrency.

VL:

',, p',

)(' p

Page 26: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Challenges

• What is the meaning of dependencies under abstraction?

• How can we track both concrete and abstract dependencies simultaneously?

We’ve developed a uniform framework for tracking data dependencies…

Page 27: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Best Write Set

• The write set of transition is the union of– the locations whose value was changed by ;– the locations allocated by ; and– the locations de-allocated by .

Intuitively, the write set of a transition is its observable effect, i.e., the delta between the

entry and exit states.

Page 28: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Best Read Set (More Tricky)

• is a sufficient read set of transition

iff for every , such that andagree on , write( ) ≡ write( ).

• The read set of transition is the union of all its minimal sufficient read sets.

',, p

M

',, p ',, pM

Intuitively, the read set of a transition is the set of locations whose values determine the

observable effect of the transition.

Page 29: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Simple Example

([y=3], set(y,4), [y=4])Read set: { y }Write set: { y }

([y=3], set(y,3), [y=3])

Read set: { y }Write set: { }

Secures y=4 in exit state

Secures empty write set

Page 30: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Approximating the “Best” Definitions

• The good news: The “best” definitions apply both in concrete and in abstract semantics

• The bad news: The definition of the “best” read set is not computable in general

An approximation r, w of read, write is sound iff• read r w• write w

Page 31: Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv.

Approximate Read Set

Take 1: all the locations reachable from arguments

Take 2: all the locations reachable from arguments that were accessed

during the statement’s execution

Take 3: all the locations reachable from arguments that were accessed during the statement’s execution with user

specification of the frame