Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis
-
Upload
cathleen-leblanc -
Category
Documents
-
view
71 -
download
0
description
Transcript of Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis
Speeding Up Dataflow Analysis Using Flow-
Insensitive Pointer Analysis
Stephen Adams, Tom Ball, Manuvir Das
Sorin Lerner, Mark Seigle
Westley Weimer
Microsoft Research
University of Washington
UC Berkeley
Motivation
• Static analysis for program verification
• Complex dataflow analyses are popular– SLAM, ESP, BLAST, CQual, …– Flow-Sensitive– Interprocedural– Expensive!
• Cut down on “data flow facts”
• Without losing anything important
General Idea
• If complex analysis is worse than O(N)
• And you have a cheap analysis that– Is O(N)– Reduces N
• Then composing them saves time
Value Flow Graph (VFG)
• Variant of a points-to graph
• Encodes the flow of values in the program
• Conservative approximation
• Lightweight, fast to compute and query
• Early queries can safely reduce – data-flow facts considered– program points considered
• Like slicing a program wrt. value flow
Computing a VFG
• Use a subtyping-based pointer analysis– We used One-Level Flow [Das]
• Process all assignments– Not just those involving pointers
• Represent constant values explicitly– Put them in the graph
• Label graph with source locations– Encodes program slices
Example Points-To Graph
1: int a, *x;
2: x = &a;
3: *x = 7;
x
a
Points-to Edge
Source“Address”Node
ExprNode
x
One Level Flow Graph
1: int a, *x;
2: x = &a;
3: *x = 7;
x
a
Flow Edge
Points-to Edge
Source“Address”Node
ExprNode
x
Value Flow Graph
1: int a, *x;
2: x = &a;
3: *x = 7;
7
x
a2
3
2
2,3
2Flow Edge
Points-to Edge
Source“Address”Node
ExprNode
x
VFG Properties
• Computed in almost-linear time
• Get points-to sets from VFG in linear time– Backwards reachability via flow edges– Gather up all variables
• Get value flow from VFG in linear time– Backwards reachability via flow edges– Follow points-to edges up one
VFG Query: Points-To of x
1: int a, *x;
2: x = &a;
3: *x = 7;
7
x
a2
3
2
2,3
2Flow Edge
Points-to Edge
Source“Address”Node
ExprNode
x
VFG Query: Value Flow into a
1: int a, *x;
2: x = &a;
3: *x = 7;
7
x
a2
3
2
2,3
2Flow Edge
Points-to Edge
Source“Address”Node
ExprNode
x
VFG Summary
• Computed in almost-linear time
• Queries complete in linear time
• Approximates flow of values in program
• Show two applications that benefit– ESP– SLAM
Application 1: ESP
• Verification tool for large C++ programs
• Tracks “typestate” of values– Encoded as Finite State Machine– Special Error state
• Core: interprocedural data-flow engine– Flow sensitive: state at every point
• Performed bottom-up on call graph
• Requires function summaries
ESP Function Summaries
• Consider stateful memory locations
• Summarize function behavior for each loc– Reducing number of locs would be good!– But C has evil casts, so types cannot be used
• Worst case set of locations:– All globals and formal parameters– Everything transitively reachable from there
Reduce Location Set
• Location L needs to be considered in F if– Some exp E has its state changed in F– Value held by L at entry to F can flow into E
• Assuming state-changing ops are known
• Query VFG to find values that flow in
ESP Example
FILE *e, *f, *g, *h;
void foo() {
FILE **p;
int a = (int)h;
if (…) p = &e;
else p = &f;
*p = fopen(…);
}
Locations to consider for foo() summary:{ e, *e, f, *f, g, *g, h, *h }
ESP Example
FILE *e, *f, *g, *h;
void foo() {
FILE **p;
int a = (int)h;
if (…) p = &e;
else p = &f;
*p = fopen(…);
}
(1) Compute VFG
(2) Query value flow on *p
(3) Reduced locations to consider for foo() summary: { e, f }
(4) Reduce lines to consider for dataflow
ESP Results
• FILE * output in GCC– 140 KLOC, 2149 functions, 66 files, 1068
globals
• VFG Queries take 200 seconds• Reduce average number of locations per
function summary from 1100 to <1– Median of 15 for functions with >0
• Verification takes 15 minutes– Infeasible otherwise
Application 2: SLAM
• Validates temporal safety properties– Boolean abstraction– Interprocedural dataflow analysis– Counterexample-driven refinement
• Convert C program to Boolean program• Exhaustive dataflow analysis
– No errors? Program is safe.– Real error? Program has a bug.– False error? Add predicates, repeat.
Boolean Programs
int x,y;
x = 5;
y = 6;
x = x * 2;
y = y * 2;
assert(x<y)
bool p,q;
p = 1;
q = 1;
p = 0; q = 0;
q = 1;
assert(q)
p means “x == 5”q means “x < y”
C Program
Predicates(important!) Boolean Program
SLAM Predicates
• Hard to come up with good predicates
• Counterexample-driven refinement– Picks good predicates– Is very slow
• Taking all possible predicates– Is even slower
• Want “all the useful” predicates
Speeding Up SLAM
• For a simple subset of C– Similar to “Copy Constants”– Use VFG to find a sufficient set of predicates– Provably sufficient for this subset
• If this set fails to prove the real program– Fall back on counterexample-driven
refinement
A Simple Language
s ::= vi = n // constants
| vi = vj // variable copy
| if (*) s1 else s2 // condition ignored
| vi = fun(vj, …) // function call
| return(vi) // function return
| assert(vi vj) // safety property
Predicate Discovery
• High-level idea– Each flow edge in the VFG means “values may
flow from X to Y”– Add predicates to see if they do
• For each assert(vi vj)
– Consider the chain of values flowing to vi, vj
– Add an equality predicate for each link– Use constants to resolve scoping
SLAM Exampleint sel(int f) { int r; if (*) r = f; else r = 3; return(r);}void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c);}
a 1fr3b
4c2
Predicates For “b”int sel(int f) { int r; if (*) r = f; else r = 3; return(r);}void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c);}
a 1fr3b
Predicates: b == rr == 3r == ff == aa == 1
Predicates For “b”int sel(int f) { int r; if (*) r = f; else r = 3; return(r);}void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c);}
a 1fr3b
Predicates: b == rr == 3r == ff == a // no scope!a == 1
Predicates For “b”int sel(int f) { int r; if (*) r = f; else r = 3; return(r);}void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c);}
a 1fr3b
Predicates: b == r b == rr == 3 r == 3r == f r == ff == a // no scope! f == 1 f == 3a == 1 a == 1 a == 3
Why does this work?
• Simple language– No arithmetic, etc.– Just copying around initial values
• Knowing final values of variables– Completely decides safety condition
• Still related to real life– Cannot do arithmetic on locks, FILE *s, device
driver status codes, etc.
Some SLAM Results
Program LOC Original Runtime
Improved Runtime
Generated Predicates
Missing Predicates
apmbatt 2207 229 22 85 0
pnpmem 3849 1132 125 143 4
floppy 7562 1063 600 154 33
iscsiprt 4543 ** 729 146 42
Generated predicates are between all and two-thirds of the necessarypredicates. However, since SLAM must iterate once to generate 3-7 missing predicates, the net performance increase is more than linear.
Predicates can be specialized or simplified if the assert() condition isa common relational operator (e.g., x==y, x<y, x==5).
Conclusions
• Complex interprocedural analyses can benefit from inexpensive value-flow
• VFG encodes value flow– Constructed and queried quickly
• Prune the set of dataflow facts and program points considered
• Large net performance increase