Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to Analysis
description
Transcript of Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to Analysis
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to Analysis
Guoqing Xu and Atanas Rountev
Ohio State University
Supported by NSF under CAREER grant CCF-0546040
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
22
Precise and Scalable Points-to Analysis Analysis precision- Calling context sensitivity – e.g. chain of call sites- Heap cloning [Nystrom-PASTE’04, Lhotak-CC’06]- The most precise analysis: refinement-based
analysis [Sridharan-PLDI’06], but does not scale well
Analysis scalability - Millions of distinct call chains in a moderate-size
Java program- Option 1: sacrifice precision with k-length chain- Option 2: merge equivalent relationships using
BDDs- BDDs incurs running time overhead, and may not
scale for heap-cloning-based analysis
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Motivation Many clients require the underlying pointer
analysis to quickly provide a precise points-to solution for all variables (e.g., slicing; verification)
Example:- The refinement-based analysis used 1000 sec to
provide a precise solution for all variables in a 30-line Java program (including variables in all relevant library code)
- The solution produced under a 500-second budget made our slicer generate a slice containing the whole program
33
Goal: relatively precise result in a more scalable manner
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Merge Equivalent Contexts Equivalence classes exists in the
representation of calling contexts [Lhotak-CC’06]
Can we find and merge equivalent calling contexts?- We would be able to scale the points-to analysis
without relying on the merging inside the BDD “black box”
A unique replacement context (URC) can be used to represent an equivalence class in the points-to relationships
44
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Outline A model of equivalent contexts
- Abstraction functions for pointer variables and targets- Proposed for pointer analysis, but can be applied to
other context-sensitive analysis algorithms
A whole-program points-to analysis for Java- Implements the model- Context-sensitive for both pointer variables and
targets- Does not limit the length of context strings (not k-CFA)- Bottom-up, summary-based
Experimental evaluation- Much more precise and efficient than state-of-the-art
1-object-sensitive analysis with BDDs [Lhotak-CC’06]- More efficient than the refinement-based analysis
55
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Motivating Examplevoid main(String[] args){
A a1 = new A();
A a2 = new A();
foo(a1); //call site 1
foo(a2); //call site 2
}
void foo(A a){
t = new B();
a.fld = t ;
bar(a.fld); //call site 3
}
B bar(B b){
p = b;
return p; } 66
p(1,3)---> new B(1,3)
p(2,3)---> new B(2,3)
t1 ---> new B1
t2 ---> new B2
Observation:1. t points to new B, under all calling contexts2. p points to new B, under calling contexts (*,3)
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
A Better Representation? Can we represent the points-to relationships
like this?- t new B - p
3 new B 3
- 1 copy of t and p, 2 copies of new B
Key insights- Context-sensitivity corresponds to inlining; full
context-sensitivity is achieved if all reachable methods are inlined in main
- If a points-to relationship can be determined at method m during inlining, it will not be affected by m’s callers
77
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Two Hypothetical Inlining-based Analyses (v, cv, o, co) represents a fully context-sensitive
points-to relationship; v is local var; o is alloc site Analysis 1: all-the-way inlining
- A statement p := q is rewritten as p(e) := q(e), at call graph edge e
- An intraprocedural points-to analysis is performed on the body of main
Analysis 2: a variation of analysis 1- An intraprocedural analysis is performed immediately
after a method m inlines all its call sites- Suppose (v, cv, o, co) is produced for m- (v, (e) cv, o, (e) co) must be produced later for m’s
caller n; e is the call graph edge between n and m 88
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Calling Context Reduction Inlining preserves context-sensitive points-to
relationships- All statements in m are also in n
The lifetime of a tuple (v, cv, o, co) consists of- A single creation event in some method m: flowing
point- A sequence of inlining steps that increase both cv
and co in synch
URCs computed by abstraction functions for calling context; m is the flowing point- Fv(cv) = (e0, e1, …, ei) where e0.src = m- Fo(co) = (f0, f1, …, fj) where f0.src = m- Keep only the relevant suffix of the call chain
99
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Approximation in the Presence of Recursion Two-phase approximation: (1) map an infinite call
chain to a finite one, collapsing cycles while going backwards; (2) then, use the abstraction function shown earlier
1010
Precision loss may result from phase 1: e.g., p points to o under context eabcdabf, but does not under context eabf
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Using the Reduced Contexts A URC is used to represent a set of calling
contexts in the points-to relationships A query (v, c) can be answered as follows:- Find all (v, urcv, o, urco) such that substring(urcv, c)
holds- Return all (o, co) such that substring(urco, co) holds
A BDD may not be as effective for an analysis with heap cloning- Without heap cloning, an equivalence class is
defined by a single string urcv
- For an analysis with heap cloning, an equivalence class is defined by a pair (urcv , urco)
- More classes = fewer opportunities for merging
1111
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Points-to Analysis A specific algorithm that implements this
model- Using URCs to represent calling contexts- The use of URCs could be applicable to other
categories of points-to analysis
Resembles bottom-up inlining Heap-cloning-based - Context-sensitively treat both pointer variables and
targets
Partial unification (bi-directional flow of values)
1212
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Intraprocedural Analysis Symbolic points-to graph (SPG)- A symbolic object node is introduced for each (1)
formal parameter, (2) base variable v of load a = v.f, and (3) lhs v of a call site v = a.b(…)
- Standard points-to analysis algorithm [Lhotak-CC’03] is used for SPG construction
- SPG contains much fewer nodes and edges than the original program
Example
void add (Integer t) { this.names = t; }
1313
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Escape Analysis An allocation node new C or symbolic node
SO directly escapes a method, if - It is pointed to by a formal parameter- It is pointed to by a returned variable- It is pointed to by a static field
A node indirectly escapes a method, if- It is reachable from nodes that directly escape
Compute a set of allocation/symbolic nodes that escape the method where they are defined
1414
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Interprocedural Analysis Summary-based - Bottom-up traverse the call graph SCC-DAG
Summary function definition: - {[Of, Gf]}- Gf : the subgraph of all escaping objects (reachable
from Of) and their points-to edges
Clone a summary function for each incoming call graph edge e- If o1
c1 o2c2 where o1 and o2 escape: in the caller,
create o1(e) c1 o2
(e) c2
- If v c1 s c2 where s is an escaping symbolic node: create v (e) c1 s (e) c2
1515
f
f
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Interprocedural Analysis Composition of summary functions- For each [Oactual , Gactual ]- Find [O’formal , G’formal ]- Merge Gactual and G’formal
Subgraph merging- Simultaneously traverse Gactual and G’formal from Oactual
and O’formal
- Merge b and c, if
where d and e have been merged
1616
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Merging Two Nodes
1717
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Interprocedural Analysis The points-to solution is built on the fly- Once v c1 o c2 is formed, (v, c1, o, c2) is added to
the points-to solution- The edge is removed from the SPG: we have found
the flowing point and no further propagation is necessary
- (c1, c2) defines an equivalence class for contexts
Node merging is essentially a dynamic transitive closure computation for identifying memory aliases
1818
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
K-Last-Substring-Merging A scalability-precision tradeoff- When composing summary functions, do not clone
node oc1 in the callee if there already exists a node oc2 in the caller such that suffix(c1, k) == suffix(c2, k)
Does not limit the context length Can be used to tune the amount of context
merging throughout the program Example- k =3- O(defg) and O(hefg) are distinct nodes if d.src !=
h.src- O(defg) and O(hefg) are merged if d.src == h.src
1919
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Experiments Benchmark set contains 19 Java programs, from
SPECJVM, Ashes, and DaCapo Experimentally compared our analysis with
- Refinement: the refinement-based analysis from [Sridharan-PLDI’06] with its default budget for refinement the most precise publicly-available analysis for
computing an on-demand solution queried the points-to sets for all possible (v, c)
- 1H: 1-object-sensitive analysis with heap cloning, using BDDs [Lhotak-CC’06]; the most precise publicly-available analysis for computing a whole-program solution
Our analysis: computes a whole-program solution- 1Equiv: 1-last-substring-merging- 2Equiv: 2-last-substring-merging
2020
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Precision: #downcasts proven to be safe
2121
2Equiv proves 59% more safe downcasts than 1HBut less than Refinement
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Cost: time to get a whole-program solution
2222
2Equiv is 13 times faster than 1H5.3 times faster than Refinement
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University
Conclusions Context equivalence class identification- An equivalence class is represented by a URC- Only URCs need to be explicitly represented in the
data structures of the analysis
A points-to analysis for Java- Uses URCs to represent contexts- Summary-based, bottom-up, with partial unification- Heap-cloning
Experimental evaluation- Precision approaches that of the refinement-based
analysis, and is much higher than that of 1H- Faster than both refinement-based and 1H