Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability

Click here to load reader

  • date post

    31-Dec-2015
  • Category

    Documents

  • view

    44
  • download

    0

Embed Size (px)

description

Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability. Seth Hallem and Eric Watkins. Exhaustive Analysis Papers. “Precise Interprocedural Dataflow Analysis via Graph Reachability” Reps, Horowitz, Sagiv -- POPL 1995 - PowerPoint PPT Presentation

Transcript of Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability

  • Context-Sensitive, Interprocedural Dataflow Analysis as CFL ReachabilitySeth Hallem and Eric Watkins

  • Exhaustive Analysis PapersPrecise Interprocedural Dataflow Analysis via Graph ReachabilityReps, Horowitz, Sagiv -- POPL 1995applies CFL reachability to context-sensitive, interprocedural dataflow analysisProgram Analysis via Graph ReachabilityReps -- ILP 1997describes two additional applications: interprocedural program slicing and shape analysis

  • The Reduction to CFL ReachabilityQuestion 1: What problems can we solve?Question 2: How do we set up the problem?Question 3: How do we solve the problem?Question 4: What is the complexity of this approach?Running example: possibly uninitialized variables

  • What problems can we solve?IFDS problemsFinite set of dataflow facts (D)Mapping from functions :2D2D to edges in the CFGEach is distributive wrt the meet operator:(a b) = (a) (b)Possibly uninitialized vars:Each program variable corresponds to a dataflow fact. When that fact holds, the variable may be uninitialized.Transfer functions: a variable is uninitialized if it was just declared or if it is assigned an expression containing uninitialized variables.

  • Simple Exampleint z;int main (void) {int x ,y = 0; /* {x, z} */y = y + x; /* {x, y, z} */z = 0; /* {x, y} */} D = {x, y, z}, domain/range of transfer functions is the power set of D (2D)

  • How do we setup and solve IFDS problems?Inputs to the algorithm:Exploded supergraph (next couple of slides)Outputs from the algorithm:meet-over-all-realizable-paths solution:MRPn = pfq( )qRpaths (startmain, n)

  • The Supergraph

  • Representation RelationsEach dataflow function, , is converted to a representation relation, which is represented as a graph consisting of 2D + 2 nodesD input nodes, one for each dataflow fact, plus the node (or 0), which corresponds to the empty set.D output nodes plus the node There is an edge from input node d1 to output node d2 if d2 (S) if d1S and d2 ()

  • More Representation Relations(a) and (b) show representation relations for two functions (nodes smain and n1)(c) and (d) show two ways to compose these relations(d) illustrates the need for the in each relation

  • Exploding the Supergraph

  • CFL ReachabilityWant to solve the dataflow problem with a reachability query on the exploded supergraph.Not all paths in G# are valid, though. Must match calls w/returns.Insight: context-sensitivity = matching parens; language of matching parens is a CFL

  • Context-Sensitivity = CFLAssign a unique index to each callsite, define a CFL of matching calls and returns.Suppose we have two call-sites to function P(), which we label i and k(i (k )k )i is a valid path(i (k )k is a valid path(i (k )i is not

  • Reachability AlgorithmDynamic programming is the keyStart at the entry point to the program. Follow the edges in G#, recording what dataflow facts we can reach. At a procedure call, follow the call. To avoid re-doing any work, though, maintain a cache of edges of that summarize pieces of the computation.Summary edges record the results of an entire procedure, start at a callsite, end at the corresponding return-site.Path edges record the suffix of a valid path.

  • Dynamic Programming Details

  • ComplexityWorst case for general CFL reachability is cubic in the number of nodes in the graphCan do better for dataflow analysis: O(ED3) for any distributive problem, O(Call D3 + hED2) for h-sparse problemspossibly uninitialized variables is 2-sparse when aliasing is ignored: a variables status as initialized or uninitialized can only affect itself and one other variable (if it is assigned to that variable)

  • Other ApplicationsInterprocedural slicingidentify all pieces of a program relevant to a particular statementShape AnalysisFor any DAG data structure, determines a superset of the possible shapes for that data structure.Each dataflow fact corresponds to a single possible shape.Problem: infinite number of shapes. Solution is to define shape at program point q in terms of shape at previous program points.ILP paper has an example of shape analysis of a linked list.

  • The other papersDemand Interprocedural Dataflow AnalysisHorowitz, Reps, Sagiv -- FSE 1995Demand-driven Computation of Interprocedural Data FlowDuesterwald, Gupta, Soffa -- POPL 1995

    Provide two possible frameworks for transforming any IFDS analysis into a demand-driven analysis

  • Steps to Demand-driven analysisDefine problem in the IFDS frameworkReverse the flow functions, or reverse the flow edgesStart with initial query < d, n >Propagate the query backwards until solved

  • Reversing dataflowIn Duesterwald et al., the dataflow problem is specified with flow functionsReverse the functionsFor CFL problems, the problem is represented as a set of edgesJust reverse the edges

  • Example: CCPNotationx set of dataflow factsxw dataflow fact for variable wfn(x)w transfer fn for variable w at node n

    [w = c] set of dataflow facts, where the fact for variable w equals c

  • Query AlgorithmWorklist holds the set of outstanding queriesWhile not empty, remove a queryPropagate backwards one node in the flowgraphFor a function call, create a backwards summary for that function and apply that

  • Query PropagationMore notationrp entry node for procedure pm, n normal nodesfm reverse dataflow fn for node mNcall all nodes that are callsitescall(m) the procedure called at node m f(rp, ep) summary fn for procedure p

  • Backwards edge propagation

  • Query Algorithm EfficiencyOptimizations: function summaries, early termination, query result cacheIn the worst case, its the same as exhaustive analysisSome problems work better than others for demand-driven analysis.Depends how much information you need to answer queries, or how many queries need to be made.

  • ConclusionsDemand-driven analysis is a powerful ideaSaves time and space, but in the worst case its no better than exhaustive analysisOnly works for distributive problemsTwo approaches for demand-driven analysis are equivalent

  • DiscussionAre these algorithms generally applicable?Are they fast?No evidence the papers, but the answer is yes (see ESP in a couple of weeks)Why are they efficient (beyond the complexity guarantee)?Is it always cheap to compute the exploded supergraph?How can an imprecise alias analysis influence this step and the overall performance of the algorithm?

    Note: Meet operator takes a flow value from bottom to top. I.e. values start at bottom and work upwards.Here, bottom means no information and top means not possible.