Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta...

Click here to load reader

  • date post

    17-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    1

Embed Size (px)

Transcript of Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta...

  • Slide 1
  • Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field Greta Yorsh Mooly Sagiv
  • Slide 2
  • Dataflow Impediments to Parallelization public void set(Object o) { this.f = calc_f(o); } public void process() { Object o = this.f; if (o == null) { doA(); } else { doB(); } public void setAndProcess(Object o) { set(o); process(); } set(o) || process()? RAW dependency
  • Slide 3
  • for (Vertex cutpoint : this.cutpoints) { UndirectedGraph subgraph = new SimpleGraph(); subgraph.addVertex(cutpoint); this.cutpointGraphs.put(cutpoint, subgraph); this.addVertex(subgraph); Set blocks = this.vertex2blocks.get(cutpoint); for (UndirectedGraph block : blocks) { int oldHitCount = this. block2hits.get(block); this.block2hits.put(block, oldHitCount+1); this.addEdge (subgraph, block); } Simplified version of the JGraphT algorithm for building a block- cutpoint graph Sometimes Its Less Obvious for (Vertex cutpoint : this.cutpoints) { UndirectedGraph subgraph = new SimpleGraph(); subgraph.addVertex(cutpoint); this.cutpointGraphs.put(cutpoint, subgraph); this.addVertex(subgraph); Set blocks = this.vertex2blocks.get(cutpoint); for (UndirectedGraph block : blocks) { int oldHitCount = this.block2hits.get(block); this.block2hits.put(block, oldHitCount+1); this.addEdge (subgraph, block); } } This code admits a lot of available parallelism, but there are a few impediments that must be addressed toward parallelizing it. How can we pinpoint these dependencies precisely and concisely?
  • Slide 4
  • Field-based Dependence Analysis So lets use dynamic dependence analysis instead Static dependence analysis is challenged by dynamic containers, aliasing, etc
  • Slide 5
  • 789 modcount table [0] next [8] next 1 K key value [] 2 K key value next m.put(k,1); m.put(k,2); Spurious dependencies, which inhibit m.put(k,1) || m.put(k,2)! m = new ConcurrentHashMap(); 2 m.put(k,2); Semantic dependency, which gets lost in the noise!
  • Slide 6
  • Eureka: Lets Use Abstraction Abstract Locking Galois Leveraging ADT semantics in STM conflict detection Using ADT semantics in DB concurrency control (Muth et al., 93) Exploiting commutativity in DB transactions (Bernstein, 66) But We need a predictive tool; our code is still sequential We want the tool to pinpoint impediments to parallelization before applying parallelization transformations
  • Slide 7
  • The Hawkeye Analysis Tool 789 modcount table [0] nex t [8] nex t 1 K key value [] 2 KK key value nex t K 1 valueK KK 2 K ? value ? ? ? Representation Function KeyValue Concrete Map state Map ADT state Dynamic analysis tool Uses abstraction while tracking (certain) dependencies User specifies representation function for data structures of choice; rest tracked concretely Allows concentrating on semantic dependencies while suppressing spurious dependencies
  • Slide 8
  • Specification Language foreach key k in m.keySet() adtState.add(m -> k); foreach entry (k,v) in m.entrySet() adtState.add(k -> v); foreach node n in g.nodes() adtState.add(g -> n); foreach edge (n 1,n 2 ) in g.edges() adtState.add(n 1 -> n 2 ); Map Graph
  • Slide 9
  • Specification Language foreach instance i 1 in instances() foreach instance i 2 in instances() adtState.add((i 1,i 2 ) -> distance(i 1,i 2 )); DistanceFunction
  • Slide 10
  • Specification Language No need to model ADT operations User can refine approximation (though our experience shows that the default is mostly accurate) No need for a commutativity spec Hawkeye uses heuristics for (sound) approximation of the footrprint of an ADT operation
  • Slide 11
  • Concrete The Hawkeye Algorithm 789 M modcount table [0] next [8] next 1 K key value 2 K key value 2 M (M,X) (M,K) 1 (M,K,1) K (M,K) 2 (M,K,2) K m.put(k,1); m.put(k,2); m.put(k,2); 2 (M,K,2) (R: {}, W: {(M,K),(M,K,1)}) (R: {}, W: {(M,K),(M,K,2)}) (R: {}, W: {(M,K),(M,K,1),(M,K,2)}) WAW Our assumptions: linearizability for trace abstraction encapsulation for state abstraction Logical
  • Slide 12
  • Challenges What is the meaning of dependencies under abstraction? How can we track both concrete and abstract dependencies simultaneously? Weve developed a uniform framework for tracking data dependencies
  • Slide 13
  • Best Write Set The write set of transition is the union of the locations whose value was changed by ; the locations allocated by ; and the locations de-allocated by. Intuitively, the write set of a transition is its observable effect, i.e., the delta between the entry and exit states.
  • Slide 14
  • Best Read Set (More Tricky) is a sufficient read set of transition iff for every, such that and agree on, write( ) write( ). The read set of transition is the union of all its minimal sufficient read sets. Intuitively, the read set of a transition is the set of locations whose values determine the observable effect of the transition.
  • Slide 15
  • Simple Example ([y=3], set(y,4), [y=4]) Read set:{ y } Write set:{ y } ([y=3], set(y,3), [y=3]) Read set:{ y } Write set:{ } Secures y=4 in exit state Secures empty write set
  • Slide 16
  • Approximating the Best Definitions The good news: The best definitions apply both in concrete and in abstract semantics The bad news: The definition of the best read set is not computable in general An approximation r, w of read, write is sound iff read r w write w
  • Slide 17
  • Usage Scenario 7 modcount table [0] nex t [8] nex t 1 K key value [] 2 KK key value nex t Hmmm Too many dependencies!
  • Slide 18
  • Usage Scenario K 1 valueK KK 2 K ? value ? ? ? Now I understand whats going on!
  • Slide 19
  • Usage Scenario K 1 valueK KK 2 K ? value ? ? ?
  • Slide 20
  • Slide 21
  • Number of inter-iteration dependencies at the level of ADT operations with and without abstraction Only built-in spec (Java collections)
  • Slide 22
  • Number of inter-iteration dependencies at the level of ADT operations with and without abstraction Including user spec (for user types)
  • Slide 23
  • 789 modcount table [0] next T H A [] N next Y O U ! K
  • Slide 24
  • Backup
  • Slide 25
  • Preliminaries A state maps memory locations to values. A transition is a triple, where p is a program statement and are states, such that. A program trace is a sequence of transitions. We assume an interleaving semantics of concurrency.
  • Slide 26
  • Challenges What is the meaning of dependencies under abstraction? How can we track both concrete and abstract dependencies simultaneously? Weve developed a uniform framework for tracking data dependencies
  • Slide 27
  • Best Write Set The write set of transition is the union of the locations whose value was changed by ; the locations allocated by ; and the locations de-allocated by. Intuitively, the write set of a transition is its observable effect, i.e., the delta between the entry and exit states.
  • Slide 28
  • Best Read Set (More Tricky) is a sufficient read set of transition iff for every, such that and agree on, write( ) write( ). The read set of transition is the union of all its minimal sufficient read sets. Intuitively, the read set of a transition is the set of locations whose values determine the observable effect of the transition.
  • Slide 29
  • Simple Example ([y=3], set(y,4), [y=4]) Read set:{ y } Write set:{ y } ([y=3], set(y,3), [y=3]) Read set:{ y } Write set:{ } Secures y=4 in exit state Secures empty write set
  • Slide 30
  • Approximating the Best Definitions The good news: The best definitions apply both in concrete and in abstract semantics The bad news: The definition of the best read set is not computable in general An approximation r, w of read, write is sound iff read r w write w
  • Slide 31
  • Approximate Read Set Take 1: all the locations reachable from arguments Take 2: all the locations reachable from arguments that were accessed during the statements execution Take 3: all the locations reachable from arguments that were accessed during the statements execution with user specification of the frame