Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta...

Hawkeye: Effective Discovery of Dataflow Impediments to

Parallelization

Omer TrippJohn Field

Greta YorshMooly Sagiv

Dataflow Impediments to Parallelization

public void set(Object o) {

this.f = calc_f(o);}

public void process() {Object o = this.f;if (o == null) {

doA();} else {

doB();}

}

public void setAndProcess(Object o) {

set(o);process();

}

set(o) || process()?

RAWdependency

for (Vertex cutpoint : this.cutpoints) { UndirectedGraph subgraph = new SimpleGraph(); subgraph.addVertex(cutpoint); this.cutpointGraphs.put(cutpoint, subgraph); this.addVertex(subgraph); Set blocks = this.vertex2blocks.get(cutpoint); for (UndirectedGraph block : blocks) { int oldHitCount = this . block2hits .get(block); this.block2hits.put(block, oldHitCount+1); this.addEdge (subgraph, block); } }

Simplified version of the JGraphT algorithm for building a block-cutpoint graph

Sometimes It’s Less Obviousfor (Vertex cutpoint : this.cutpoints) { UndirectedGraph subgraph = new SimpleGraph(); subgraph.addVertex(cutpoint); this.cutpointGraphs.put(cutpoint, subgraph); this.addVertex(subgraph); Set blocks = this.vertex2blocks.get(cutpoint); for (UndirectedGraph block : blocks) { int oldHitCount = this.block2hits.get(block); this.block2hits.put(block, oldHitCount+1); this.addEdge (subgraph, block); } }

This code admits a lot of available parallelism, but there are a few impediments that must be addressed toward parallelizing it.

How can we pinpoint these dependencies precisely and concisely?

Field-based Dependence Analysis

So let’s use dynamic dependence

analysis instead…

Static dependence analysis is challenged by dynamic containers, aliasing, etc

789modcount

table [0] next

[8] next

next

next

1

Kkey

value

]…[ …

]…[ …

2

K’key

value

next

m.put(k,1);

m.put(k’,2);

Spurious dependencies, which inhibitm.put(k,1) || m.put(k’,2)!

m = new ConcurrentHashMap();

2

m.put(k,2);

Semantic dependency, which gets “lost” in the noise!

Eureka: Let’s Use Abstraction

Abstract Locking

Galois

Leveraging ADT

semantics in STM conflict

detection

Using ADT semantics in DB concurrency control

(Muth et al., 93)

Exploiting commutativity in DB transactions(Bernstein, 66)

But…We need a predictive tool; our code is still sequential

We want the tool to pinpoint impediments to parallelization before applying parallelization transformations

The Hawkeye Analysis Tool

789modcount

table [0] next

[8] next

next

next

1

Kkey

value

]…[ …

]…[ … 2

K’

key

value

next

K 1valueK

K’ 2

valueK’

?

value

?

value

?

?

Representation Function

Key Value

Concrete Map state Map ADT state

Dynamic analysis toolUses abstraction while tracking (certain) dependenciesUser specifies representation function for data structures of choice; rest tracked concretelyAllows concentrating on semantic dependencies while suppressing spurious dependencies

Specification Language

foreach key k in m.keySet() adtState.add(m -> k);foreach entry (k,v) in m.entrySet() adtState.add(k -> v);

foreach node n in g.nodes() adtState.add(g -> n);foreach edge (n1,n2) in g.edges() adtState.add(n1 -> n2);

Map

Graph


foreach instance i1 in instances() foreach instance i2 in instances() adtState.add((i1,i2) -> distance(i1,i2));…

DistanceFunction


No need to model ADT operations

User can refine approximation (though our experience shows that the default is mostly accurate)

No need for a commutativity spec

Hawkeye uses heuristics for (sound) approximation of the footrprint of an ADT operation

Concrete

The Hawkeye Algorithm789M

modcount

table [0] next

next

[8] next

next

next

1

Kkey

value2

K’key

value2

M(M,X)

(M,K)1

(M,K,1)K

(M,K’)2

(M,K’,2)K’

m.put(k,1);

m.put(k’,2);

m.put(k,2);

2(M,K,2) (R: {}, W: {(M,K),(M,K,1)})

(R: {}, W: {(M,K’),(M,K’,2)})

(R: {}, W: {(M,K),(M,K,1),(M,K,2)})

WAW

Our assumptions:

• linearizability – for trace abstraction• encapsulation – for state abstraction

Logical

Challenges

• What is the meaning of dependencies under abstraction?

• How can we track both concrete and abstract dependencies simultaneously?

We’ve developed a uniform framework for tracking data dependencies…

Best Write Set

• The write set of transition is the union of– the locations whose value was changed by ;– the locations allocated by ; and– the locations de-allocated by .

Intuitively, the write set of a transition is its observable effect, i.e., the delta between the

entry and exit states.

Best Read Set (More Tricky)

• is a sufficient read set of transition

iff for every , such that andagree on , write( ) ≡ write( ).

• The read set of transition is the union of all its minimal sufficient read sets.

',, p

M

',, p ',, pM

Intuitively, the read set of a transition is the set of locations whose values determine the

observable effect of the transition.

Simple Example

([y=3], set(y,4), [y=4])Read set: { y }Write set: { y }

([y=3], set(y,3), [y=3])

Read set: { y }Write set: { }

Secures y=4 in exit state

Secures empty write set

Approximating the “Best” Definitions

• The good news: The “best” definitions apply both in concrete and in abstract semantics

• The bad news: The definition of the “best” read set is not computable in general

An approximation r, w of read, write is sound iff• read r w• write w

Usage Scenario

7

modcount

table [0] next

[8] next

next

next

1

Kkey

value

]…[ …

]…[ … 2

K’

key

value

next

Hmmm… Too many dependencies!

Usage Scenario

K 1valueK

K’ 2

valueK’

?

value

?

value

?

?

Now I understand what’s going on!

Usage Scenario

K 1valueK

K’ 2

valueK’

?

value

?

value

?

?

Trace Length Description Name

813,382 Solver for MST problem Boruvka

2,629,457 Java code coverage analysis Cobertura

1,733,552 Utility for synchronizing pairs of directories JFileSync

710,580 Graph library JGraphT

2,190,213 Java source code analyzer PMD

17,945,255 Machine-learning library Weka

4,840,544 Web-site dowload and mirror tool WebLech

Boruvk

a

Cobertura

JFileSy

nc

JGrap

hT

PMD

Weka

0

50

100

150

200

250

300

HawkeyeBaseline

Number of inter-iteration dependencies at the level of ADT operations with and without abstraction

Only built-in spec (Java collections)

Boruvk

a

Cobertura

JFileSy

nc

JGrap

hT

PMD

Weka

0

50

100

150

200

250

300

HawkeyeBaseline

Number of inter-iteration dependencies at the level of ADT operations with and without abstraction

Including user spec (for user types)

789modcount

table [0] next

T H Anext

]…[ …

]…[ …

N

next

Y O Unext

!next

next next

next

Knext

Backup

Preliminaries

• A state maps memory locations to values.

• A transition is a triple , where p is a program statement and are states, such that .

• A program trace is a sequence of transitions.• We assume an interleaving semantics of

concurrency.

VL:

',, p',

)(' p

Challenges

• What is the meaning of dependencies under abstraction?

• How can we track both concrete and abstract dependencies simultaneously?

We’ve developed a uniform framework for tracking data dependencies…

Best Write Set

• The write set of transition is the union of– the locations whose value was changed by ;– the locations allocated by ; and– the locations de-allocated by .

Intuitively, the write set of a transition is its observable effect, i.e., the delta between the

entry and exit states.

Best Read Set (More Tricky)

• is a sufficient read set of transition

iff for every , such that andagree on , write( ) ≡ write( ).

• The read set of transition is the union of all its minimal sufficient read sets.

',, p

M

',, p ',, pM

Intuitively, the read set of a transition is the set of locations whose values determine the

observable effect of the transition.

Simple Example

([y=3], set(y,4), [y=4])Read set: { y }Write set: { y }

([y=3], set(y,3), [y=3])

Read set: { y }Write set: { }

Secures y=4 in exit state

Secures empty write set

Approximating the “Best” Definitions

• The good news: The “best” definitions apply both in concrete and in abstract semantics

• The bad news: The definition of the “best” read set is not computable in general

An approximation r, w of read, write is sound iff• read r w• write w

Approximate Read Set

Take 1: all the locations reachable from arguments

Take 2: all the locations reachable from arguments that were accessed

during the statement’s execution

Take 3: all the locations reachable from arguments that were accessed during the statement’s execution with user

specification of the frame

Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta...

Documents

Transcript of Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta...