Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

31
Speeding Up Dataflow Analysis Using Flow- Insensitive Pointer Analysis Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer Microsoft Research University of Washington UC Berkeley

description

Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis. Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer. Microsoft Research University of Washington UC Berkeley. Motivation. Static analysis for program verification - PowerPoint PPT Presentation

Transcript of Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Page 1: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Speeding Up Dataflow Analysis Using Flow-

Insensitive Pointer Analysis

Stephen Adams, Tom Ball, Manuvir Das

Sorin Lerner, Mark Seigle

Westley Weimer

Microsoft Research

University of Washington

UC Berkeley

Page 2: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Motivation

• Static analysis for program verification

• Complex dataflow analyses are popular– SLAM, ESP, BLAST, CQual, …– Flow-Sensitive– Interprocedural– Expensive!

• Cut down on “data flow facts”

• Without losing anything important

Page 3: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

General Idea

• If complex analysis is worse than O(N)

• And you have a cheap analysis that– Is O(N)– Reduces N

• Then composing them saves time

Page 4: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Value Flow Graph (VFG)

• Variant of a points-to graph

• Encodes the flow of values in the program

• Conservative approximation

• Lightweight, fast to compute and query

• Early queries can safely reduce – data-flow facts considered– program points considered

• Like slicing a program wrt. value flow

Page 5: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Computing a VFG

• Use a subtyping-based pointer analysis– We used One-Level Flow [Das]

• Process all assignments– Not just those involving pointers

• Represent constant values explicitly– Put them in the graph

• Label graph with source locations– Encodes program slices

Page 6: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Example Points-To Graph

1: int a, *x;

2: x = &a;

3: *x = 7;

x

a

Points-to Edge

Source“Address”Node

ExprNode

x

Page 7: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

One Level Flow Graph

1: int a, *x;

2: x = &a;

3: *x = 7;

x

a

Flow Edge

Points-to Edge

Source“Address”Node

ExprNode

x

Page 8: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Value Flow Graph

1: int a, *x;

2: x = &a;

3: *x = 7;

7

x

a2

3

2

2,3

2Flow Edge

Points-to Edge

Source“Address”Node

ExprNode

x

Page 9: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

VFG Properties

• Computed in almost-linear time

• Get points-to sets from VFG in linear time– Backwards reachability via flow edges– Gather up all variables

• Get value flow from VFG in linear time– Backwards reachability via flow edges– Follow points-to edges up one

Page 10: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

VFG Query: Points-To of x

1: int a, *x;

2: x = &a;

3: *x = 7;

7

x

a2

3

2

2,3

2Flow Edge

Points-to Edge

Source“Address”Node

ExprNode

x

Page 11: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

VFG Query: Value Flow into a

1: int a, *x;

2: x = &a;

3: *x = 7;

7

x

a2

3

2

2,3

2Flow Edge

Points-to Edge

Source“Address”Node

ExprNode

x

Page 12: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

VFG Summary

• Computed in almost-linear time

• Queries complete in linear time

• Approximates flow of values in program

• Show two applications that benefit– ESP– SLAM

Page 13: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Application 1: ESP

• Verification tool for large C++ programs

• Tracks “typestate” of values– Encoded as Finite State Machine– Special Error state

• Core: interprocedural data-flow engine– Flow sensitive: state at every point

• Performed bottom-up on call graph

• Requires function summaries

Page 14: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

ESP Function Summaries

• Consider stateful memory locations

• Summarize function behavior for each loc– Reducing number of locs would be good!– But C has evil casts, so types cannot be used

• Worst case set of locations:– All globals and formal parameters– Everything transitively reachable from there

Page 15: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Reduce Location Set

• Location L needs to be considered in F if– Some exp E has its state changed in F– Value held by L at entry to F can flow into E

• Assuming state-changing ops are known

• Query VFG to find values that flow in

Page 16: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

ESP Example

FILE *e, *f, *g, *h;

void foo() {

FILE **p;

int a = (int)h;

if (…) p = &e;

else p = &f;

*p = fopen(…);

}

Locations to consider for foo() summary:{ e, *e, f, *f, g, *g, h, *h }

Page 17: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

ESP Example

FILE *e, *f, *g, *h;

void foo() {

FILE **p;

int a = (int)h;

if (…) p = &e;

else p = &f;

*p = fopen(…);

}

(1) Compute VFG

(2) Query value flow on *p

(3) Reduced locations to consider for foo() summary: { e, f }

(4) Reduce lines to consider for dataflow

Page 18: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

ESP Results

• FILE * output in GCC– 140 KLOC, 2149 functions, 66 files, 1068

globals

• VFG Queries take 200 seconds• Reduce average number of locations per

function summary from 1100 to <1– Median of 15 for functions with >0

• Verification takes 15 minutes– Infeasible otherwise

Page 19: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Application 2: SLAM

• Validates temporal safety properties– Boolean abstraction– Interprocedural dataflow analysis– Counterexample-driven refinement

• Convert C program to Boolean program• Exhaustive dataflow analysis

– No errors? Program is safe.– Real error? Program has a bug.– False error? Add predicates, repeat.

Page 20: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Boolean Programs

int x,y;

x = 5;

y = 6;

x = x * 2;

y = y * 2;

assert(x<y)

bool p,q;

p = 1;

q = 1;

p = 0; q = 0;

q = 1;

assert(q)

p means “x == 5”q means “x < y”

C Program

Predicates(important!) Boolean Program

Page 21: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

SLAM Predicates

• Hard to come up with good predicates

• Counterexample-driven refinement– Picks good predicates– Is very slow

• Taking all possible predicates– Is even slower

• Want “all the useful” predicates

Page 22: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Speeding Up SLAM

• For a simple subset of C– Similar to “Copy Constants”– Use VFG to find a sufficient set of predicates– Provably sufficient for this subset

• If this set fails to prove the real program– Fall back on counterexample-driven

refinement

Page 23: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

A Simple Language

s ::= vi = n // constants

| vi = vj // variable copy

| if (*) s1 else s2 // condition ignored

| vi = fun(vj, …) // function call

| return(vi) // function return

| assert(vi vj) // safety property

Page 24: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Predicate Discovery

• High-level idea– Each flow edge in the VFG means “values may

flow from X to Y”– Add predicates to see if they do

• For each assert(vi vj)

– Consider the chain of values flowing to vi, vj

– Add an equality predicate for each link– Use constants to resolve scoping

Page 25: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

SLAM Exampleint sel(int f) { int r; if (*) r = f; else r = 3; return(r);}void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c);}

a 1fr3b

4c2

Page 26: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Predicates For “b”int sel(int f) { int r; if (*) r = f; else r = 3; return(r);}void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c);}

a 1fr3b

Predicates: b == rr == 3r == ff == aa == 1

Page 27: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Predicates For “b”int sel(int f) { int r; if (*) r = f; else r = 3; return(r);}void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c);}

a 1fr3b

Predicates: b == rr == 3r == ff == a // no scope!a == 1

Page 28: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Predicates For “b”int sel(int f) { int r; if (*) r = f; else r = 3; return(r);}void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c);}

a 1fr3b

Predicates: b == r b == rr == 3 r == 3r == f r == ff == a // no scope! f == 1 f == 3a == 1 a == 1 a == 3

Page 29: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Why does this work?

• Simple language– No arithmetic, etc.– Just copying around initial values

• Knowing final values of variables– Completely decides safety condition

• Still related to real life– Cannot do arithmetic on locks, FILE *s, device

driver status codes, etc.

Page 30: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Some SLAM Results

Program LOC Original Runtime

Improved Runtime

Generated Predicates

Missing Predicates

apmbatt 2207 229 22 85 0

pnpmem 3849 1132 125 143 4

floppy 7562 1063 600 154 33

iscsiprt 4543 ** 729 146 42

Generated predicates are between all and two-thirds of the necessarypredicates. However, since SLAM must iterate once to generate 3-7 missing predicates, the net performance increase is more than linear.

Predicates can be specialized or simplified if the assert() condition isa common relational operator (e.g., x==y, x<y, x==5).

Page 31: Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Conclusions

• Complex interprocedural analyses can benefit from inexpensive value-flow

• VFG encodes value flow– Constructed and queried quickly

• Prune the set of dataflow facts and program points considered

• Large net performance increase