Program Analysis via Graph Reachability
Thomas Reps
University of Wisconsin
PLDI 00 Tutorial, Vancouver, B.C., June 18, 2000
http://www.cs.wisc.edu/~reps/
PLDI 00 Registration Form
• PLDI 00: …………………….. $ ____
• Tutorial (morning): …………… $ ____
• Tutorial (afternoon): ………….. $ ____
• Tutorial (evening): ……………. $ – 0 –
Applications• Program optimization• Program-understanding and
software-reengineering• Security
– information flow
• Verification– model checking– security of crypto-based protocols for
distributed systems
1987
1993
1994
1995
1997
1998
1996
Slicing&
Applications
DataflowAnalysis Demand
Algorithms
SetConstraints
Structure-TransmittedDependences
CFLReachability
. . . As Well As . . .• Flow-insensitive points-to analysis
• Complexity results– Linear . . . cubic . . . undecidable variants– PTIME-completeness
• Model checking of recursive hierarchical finite-state machines– “infinite”-state systems– linear-time and cubic-time algorithms
. . . And Also
• Analysis of attribute grammars• Security of crypto-based protocols for
distributed systems [Dolev, Even, & Karp 83]
• Formal-language problems– CFL-recognition (given G and , is L(G)?)
– 2DPDA- and 2NPDA-simulation
• Given M and , is L(M)?
• String-matching problems
Unifying Conceptual Modelfor Dataflow-Analysis Literature
• Linear-time gen-kill [Hecht 76], [Kou 77]• Path-constrained DFA [Holley & Rosen 81]• Linear-time GMOD [Cooper & Kennedy 88]• Flow-sensitive MOD [Callahan 88]• Linear-time interprocedural gen-kill
[Knoop & Steffen 93]• Linear-time bidirectional gen-kill [Dhamdhere 94]• Relationship to interprocedural DFA
[Sharir & Pneuli 81], [Knoop & Steffen 92]
Collaborators
• Susan Horwitz
• Mooly Sagiv
• Genevieve Rosay
• David Melski
• David Binkley
• Michael Benedikt
• Patrice Godefroid
Themes
• Harnessing CFL-reachability
• Relationship to other analysis paradigms
• Exhaustive alg. Demand alg.
• Understanding complexity– Linear . . . cubic . . . undecidable
• Beyond CFL-reachability
int main() {int sum = 0;int i = 1;while (i < 11) {
sum = sum + i;i = i + 1;
}printf(“%d\n”,sum);printf(“%d\n”,i);
}
Backward Slice
Backward slice with respect to “printf(“%d\n”,i)”
int main() {int sum = 0;int i = 1;while (i < 11) {
sum = sum + i;i = i + 1;
}printf(“%d\n”,sum);printf(“%d\n”,i);
}
Backward Slice
Backward slice with respect to “printf(“%d\n”,i)”
int main() {
int i = 1;while (i < 11) {
i = i + 1;}
printf(“%d\n”,i);}
Slice Extraction
Backward slice with respect to “printf(“%d\n”,i)”
Forward Slice
int main() {int sum = 0;int i = 1;while (i < 11) {
sum = sum + i;i = i + 1;
}printf(“%d\n”,sum);printf(“%d\n”,i);
}
Forward slice with respect to “sum = 0”
Forward slice with respect to “sum = 0”
Forward Slice
int main() {int sum = 0;int i = 1;while (i < 11) {
sum = sum + i;i = i + 1;
}printf(“%d\n”,sum);printf(“%d\n”,i);
}
What Are Slices Useful For?• Understanding Programs
– What is affected by what?
• Restructuring Programs– Isolation of separate “computational threads”
• Program Specialization and Reuse– Slices = specialized programs– Only reuse needed slices
• Program Differencing– Compare slices to identify changes
• Testing– What new test cases would improve coverage?– What regression tests must be rerun after a change?
Line-Character-Count Program
void line_char_count(FILE *f) {int lines = 0;int chars;BOOL eof_flag = FALSE;int n;extern void scan_line(FILE *f, BOOL *bptr, int *iptr);scan_line(f, &eof_flag, &n);chars = n;while(eof_flag == FALSE){
lines = lines + 1;scan_line(f, &eof_flag, &n);chars = chars + n;
}printf(“lines = %d\n”, lines);printf(“chars = %d\n”, chars);
}
Character-Count Program
void char_count(FILE *f) {int lines = 0;int chars;BOOL eof_flag = FALSE;int n;extern void scan_line(FILE *f, BOOL *bptr, int *iptr);scan_line(f, &eof_flag, &n);chars = n;while(eof_flag == FALSE){
lines = lines + 1;scan_line(f, &eof_flag, &n);chars = chars + n;
}printf(“lines = %d\n”, lines);printf(“chars = %d\n”, chars);
}
Line-Character-Count Program
void line_char_count(FILE *f) {int lines = 0;int chars;BOOL eof_flag = FALSE;int n;extern void scan_line(FILE *f, BOOL *bptr, int *iptr);scan_line(f, &eof_flag, &n);chars = n;while(eof_flag == FALSE){
lines = lines + 1;scan_line(f, &eof_flag, &n);chars = chars + n;
}printf(“lines = %d\n”, lines);printf(“chars = %d\n”, chars);
}
Line-Count Program
void line_count(FILE *f) {int lines = 0;int chars;BOOL eof_flag = FALSE;int n;extern void scan_line2(FILE *f, BOOL *bptr, int *iptr);scan_line2(f, &eof_flag, &n);chars = n;while(eof_flag == FALSE){
lines = lines + 1;scan_line2(f, &eof_flag, &n);chars = chars + n;
}printf(“lines = %d\n”, lines);printf(“chars = %d\n”, chars);
}
Specialization Via Slicing
wc -lc
wc -c wc -l
void line_count(FILE *f);
Not partial evaluation!
Control Flow Graph
Enter
sum = 0 i = 1 while(i < 11) printf(sum) printf(i)
sum = sum + i i = i + i
T
F
int main() {int sum = 0;int i = 1;while (i < 11) {
sum = sum + i;i = i + 1;
}printf(“%d\n”,sum);printf(“%d\n”,i);
}
Flow Dependence Graphint main() {
int sum = 0;int i = 1;while (i < 11) {
sum = sum + i;i = i + 1;
}printf(“%d\n”,sum);printf(“%d\n”,i);
} Enter
sum = 0 printf(sum) printf(i)
sum = sum + i i = i + i
Flow dependence
p q Value of variableassigned at p may beused at q.
i = 1 while(i < 11)
q is reached from pif condition p istrue (T), not otherwise.
Control Dependence Graph
Control dependence
p qT
p qF
Similar for false (F).
Enter
sum = 0 i = 1 while(i < 11) printf(sum) printf(i)
sum = sum + i i = i + i
T T
TT T
TTT
int main() {int sum = 0;int i = 1;while (i < 11) {
sum = sum + i;i = i + 1;
}printf(“%d\n”,sum);printf(“%d\n”,i);
}
Program Dependence Graph (PDG)int main() {
int sum = 0;int i = 1;while (i < 11) {
sum = sum + i;i = i + 1;
}printf(“%d\n”,sum);printf(“%d\n”,i);
} Enter
sum = 0 i = 1 while(i < 11) printf(sum) printf(i)
sum = sum + i i = i + i
T
TT T
T
Control dependence
Flow dependence
TT
T
Program Dependence Graph (PDG)int main() {
int i = 1;int sum = 0;while (i < 11) {
sum = sum + i;i = i + 1;
}printf(“%d\n”,sum);printf(“%d\n”,i);
} Enter
sum = 0 i = 1 while(i < 11) printf(sum) printf(i)
sum = sum + i i = i + i
T
TT T
TTT
T
Opposite Order
Same PDG
Backward Sliceint main() {
int sum = 0;int i = 1;while (i < 11) {
sum = sum + i;i = i + 1;
}printf(“%d\n”,sum);printf(“%d\n”,i);
} Enter
sum = 0 i = 1 while(i < 11) printf(sum) printf(i)
sum = sum + i i = i + i
T
TT T
TTT
T
Backward Slice (2)int main() {
int sum = 0;int i = 1;while (i < 11) {
sum = sum + i;i = i + 1;
}printf(“%d\n”,sum);printf(“%d\n”,i);
} Enter
sum = 0 i = 1 while(i < 11) printf(sum) printf(i)
sum = sum + i i = i + i
T
TT T
TTT
T
Backward Slice (3)int main() {
int sum = 0;int i = 1;while (i < 11) {
sum = sum + i;i = i + 1;
}printf(“%d\n”,sum);printf(“%d\n”,i);
} Enter
sum = 0 i = 1 while(i < 11) printf(sum) printf(i)
sum = sum + i i = i + i
T
TT T
TTT
T
Backward Slice (4)int main() {
int sum = 0;int i = 1;while (i < 11) {
sum = sum + i;i = i + 1;
}printf(“%d\n”,sum);printf(“%d\n”,i);
} Enter
sum = 0 i = 1 while(i < 11) printf(sum) printf(i)
sum = sum + i i = i + i
TT
TT T
TTT
Slice Extractionint main() {
int i = 1;while (i < 11) {
i = i + 1;}
printf(“%d\n”,i);} Enter
i = 1 while(i < 11) printf(i)
i = i + iT
TT
TT
CodeSurfer
Browsing a Dependence Graph
Pretend this is your favorite browser
What does clicking on a link do?You geta new page
Or you move to an internal tag
Interprocedural Slice
int main() {int sum = 0;int i = 1;while (i < 11) {
sum = add(sum,i);i = add(i,1);
}printf(“%d\n”,sum);printf(“%d\n”,i);
}
int add(int x, int y) {return x + y;
}
Backward slice with respect to “printf(“%d\n”,i)”
Interprocedural Slice
int main() {int sum = 0;int i = 1;while (i < 11) {
sum = add(sum,i);i = add(i,1);
}printf(“%d\n”,sum);printf(“%d\n”,i);
}
int add(int x, int y) {return x + y;
}
Backward slice with respect to “printf(“%d\n”,i)”
int main() {int sum = 0;int i = 1;while (i < 11) {
sum = add(sum,i);i = add(i,1);
}printf(“%d\n”,sum);printf(“%d\n”,i);
}
Interprocedural Slice
int add(int x, int y) {return x + y;
}
Superfluous components included by Weiser’s slicing algorithm [TSE 84]Left out by algorithm of Horwitz, Reps, & Binkley [PLDI 88; TOPLAS 90]
System Dependence Graph (SDG)
Enter main
Call p Call p
Enter p
SDG for the Sum ProgramEnter main
sum = 0 i = 1 while(i < 11) printf(sum) printf(i)
Call add Call add
xin = sum yin = i sum = xout xin = i yin= 1 i = xout
Enter add
x = xin y = yin x = x + y xout = x
Interprocedural Backward SliceEnter main
Call p Call p
Enter p
Interprocedural Backward Slice (2)Enter main
Call p Call p
Enter p
Interprocedural Backward Slice (3)Enter main
Call p Call p
Enter p
Interprocedural Backward Slice (4)Enter main
Call p Call p
Enter p
Interprocedural Backward Slice (5)Enter main
Call p Call p
Enter p
Interprocedural Backward Slice (6)Enter main
Call p Call p
Enter p
[
]
)
(
Matched-Parenthesis Path
)(
)[
Interprocedural Backward Slice (6)Enter main
Call p Call p
Enter p
Interprocedural Backward Slice (7)Enter main
Call p Call p
Enter p
Slice ExtractionEnter main
Call p
Enter p
Slice of the Sum ProgramEnter main
i = 1 while(i < 11) printf(i)
Call add
xin = i yin= 1 i = xout
Enter add
x = xin y = yin x = x + y xout = x
CFL-Reachability[Yannakakis 90]
• G: Graph (N nodes, E edges)
• L: A context-free language
• L-path from s to t iff
• Running time: O(N 3)
Lts ,*
Interprocedural Slicingvia CFL-Reachability
• Graph: System dependence graph
• L: L(matched) [roughly]
• Node m is in the slice w.r.t. n iff there
is an L(matched)-path from m to n
Asymptotic Running Time [Reps, Horwitz, Sagiv, & Rosay 94]
• CFL-reachability
– System dependence graph: N nodes, E edges
– Running time: O(N 3)
• System dependence graph Special structure
Running time: O(E + CallSites MaxParams3)
( e [
e
]e
[
e ] ] e )
matched | e | [ matched ] | ( matched ) | matched matched
CFL-Reachability
s ts
( e e e e e e[[ [
t
)]] ]
s ts t
Ordinary Graph Reachability
CFL-Reachability via Dynamic Programming
GrammarGraph
BC
A
A B C
s t
Degenerate Case: CFL-Recognition
“(a + b) * c” L(exp) ?
exp id | exp + exp | exp * exp | ( exp )
)( a cb+ *
*a + +)b c
s t
Degenerate Case: CFL-Recognition
“a + b) * c +” L(exp) ?
exp id | exp + exp | exp * exp | ( exp )
CYK: Context-Free Recognition
= “( [ ] ) [ ]”
Is L(M)?
M M M | ( M ) | [ M ] | ( ) | [ ]
CYK: Context-Free Recognition
M M M | ( M ) | [ M ] | ( ) | [ ]
M M M | LPM ) | LBM ] | ( ) | [ ]LPM ( MLBM [ M
Is “( [ ] ) [ ]” L(M)?
( [ ] ) [ ]
{M}
{M} {M}
{LPM}
{ ( } { [ }{ ) } { ] }{ [ } { ] }
length
start
M [ ]LPM ( M
Is “( [ ] ) [ ]” L(M)?
( [ ] ) [ ]
{M}
{M}
{M} {M}
{LPM}
{ (} { [ }{ ) } { ] }{ [ } { ] }
length
start
M? M M M
CYK: Graphs vs. Tables
Is “( [ ] ) [ ]” L(M)?
s t
( [ ] ) [ ]
M M M | LPM ) | LBM ] | ( ) | [ ] LPM ( M LBM [ M
M M
LPMM
M
CFL-Reachability via Dynamic Programming
GrammarGraph
BC
A
A B C
Dynamic Transitive Closure ?!
• Aiken et al.– Set-constraint solvers– Points-to analysis
• Henglein et al.– type inference
• But a CFL captures a non-transitive reachability relation [Valiant 75]
S T
Program Chopping
Given source S and target T, what program points transmit effects from S to T?
Intersect forward slice from S with backward slice from T, right?
Non-Transitivity and Slicing
int main() {int sum = 0;int i = 1;while (i < 11) {
sum = add(sum,i);i = add(i,1);
}printf(“%d\n”,sum);printf(“%d\n”,i);
}
int add(int x, int y) {return x + y;
}
Forward slice with respect to “sum = 0”
int main() {int sum = 0;int i = 1;while (i < 11) {
sum = add(sum,i);i = add(i,1);
}printf(“%d\n”,sum);printf(“%d\n”,i);
}
Forward slice with respect to “sum = 0”
Non-Transitivity and Slicing
int add(int x, int y) {return x + y;
}
Non-Transitivity and Slicing
int main() {int sum = 0;int i = 1;while (i < 11) {
sum = add(sum,i);i = add(i,1);
}printf(“%d\n”,sum);printf(“%d\n”,i);
}
int add(int x, int y) {return x + y;
}
Backward slice with respect to “printf(“%d\n”,i)”
Non-Transitivity and Slicing
int main() {int sum = 0;int i = 1;while (i < 11) {
sum = add(sum,i);i = add(i,1);
}printf(“%d\n”,sum);printf(“%d\n”,i);
}
int add(int x, int y) {return x + y;
}
Backward slice with respect to “printf(“%d\n”,i)”
Forward slice with respect to “sum = 0”
Non-Transitivity and Slicing
int main() {int sum = 0;int i = 1;while (i < 11) {
sum = add(sum,i);i = add(i,1);
}printf(“%d\n”,sum);printf(“%d\n”,i);
}
int add(int x, int y) {return x + y;
}
Backward slice with respect to “printf(“%d\n”,i)”
Non-Transitivity and Slicing
int main() {int sum = 0;int i = 1;while (i < 11) {
sum = add(sum,i);i = add(i,1);
}printf(“%d\n”,sum);printf(“%d\n”,i);
}
int add(int x, int y) {return x + y;
}
Chop with respect to “sum = 0” and “printf(“%d\n”,i)”
Non-Transitivity and SlicingEnter main
sum = 0 i = 1 while(i < 11) printf(sum) printf(i)
Call add Call add
xin = sum yin = i sum = xout xin = i yin= 1 i = xout
Enter add
x = xin y = yin x = x + y xout = x
( ]
Program Chopping
Given source S and target T, what program points transmit effects from S to T?
S T
“Precise interprocedural chopping”[Reps & Rosay FSE 95]
CF-Recognition vs. CFL-Reachability• CF-Recognition
– Chain graphs– General grammar: sub-cubic time [Valiant75]– LL(1), LR(1): linear time
• CFL-Reachability– General graphs: O(N3)– LL(1): O(N3)– LR(1): O(N3)– Certain kinds of graphs: O(N+E)– Regular languages: O(N+E)
Gen/kill IDFA
GMOD IDFA
Regular-Language Reachability[Yannakakis 90]
• G: Graph (N nodes, E edges)
• L: A regular language
• L-path from s to t iff
• Running time: O(N+E)
• Ordinary reachability (= transitive closure)
– Label each edge with e
– L is e*
Lts ,*
vs. O(N3)
Security of Crypto-Based Protocols for Distributed System
• “Ping-pong” protocols
(1) X —EncryptY(M X) Y
(2) Y —EncryptX(M) X
• [Dolev & Yao 83]–O(N8) algorithm
• [Dolev, Even, & Karp 83]– Less well known than [Dolev & Yao 83]–O(N3) algorithm
[Dolev, Even, & Karp 83]
Id EncryptX Id DecryptX
Id DecryptX Id EncryptX
Id . . .
Id ?
Message SaboteurEY
EY
AX
AZ
Themes
• Harnessing CFL-reachability
• Relationship to other analysis paradigms
• Exhaustive alg. Demand alg.
• Understanding complexity– Linear . . . cubic . . . undecidable
• Beyond CFL-reachability
Relationship to Other Analysis Paradigms
• Dataflow analysis
–reachability versus equation solving
• Deduction
• Set constraints
Dataflow Analysis
• Goal: For each point in the program, determine a superset of the “facts” that could possibly hold during execution
• Examples– Constant propagation– Reaching definitions– Live variables– Possibly uninitialized variables
Useful For . . .
• Optimizing compilers
• Parallelizing compilers
• Tools that detect possible logical errors
• Tools that show the effects of a proposed modification
Possibly Uninitialized VariablesStart
x = 3
if . . .
y = x
y = w
w = 8
printf(y)
},,.{ yxwV
}{. xVV
VV .VV .
}{. wVV
}{ else }{ then
if .
yVyV
VxV
}{ else }{ then
if .
yVyV
VwV
{w,x,y}
{w,y}
{w,y}
{w,y}
{w}
{w,y}{}
{w,y}
{}
Precise Intraprocedural Analysis
start n
C
ffffpf 121
kkp
)(]MOP[]PathsTo[
pf Cnnp
p
f 1 f 2 f kf 1k
x = 3
p(x,y)
return from p
printf(y)
start main
exit main
start p(a,b)
if . . .
b = a
p(a,b)
return from p
printf(b)
exit p
(
)
]
(
Precise Interprocedural Analysis
start n
C
f 4
f 5
f 3
start q exitq
callq ret
)(]MOMP[]hsTo[MatchedPat
pf Cnnp
p
f 1 f 2 f kf 1k
f 2k
f 3k
( )
[Sharir & Pnueli 81]
Representing Dataflow Functions
Identity Function
VV .f
}{.f bVConstant Function
a b c
a b c
},{}),f({ baba
}{}),f({ bba
Representing Dataflow Functions
}{}){.(f cbVV
}{ else }{ then
if .f
bVbV
VaV
“Gen/Kill” Function
Non-“Gen/Kill” Function a b c
a b c
},{}),f({ caba
},{}),f({ baba
x = 3
p(x,y)
return from p
printf(y)
start main
exit main
start p(a,b)
if . . .
b = a
p(a,b)
return from p
printf(b)
exit p
x y a b
else }{ then
if .f 2
cVbV
a b c}{ else }{then
if .f 1
bVbV
VaV
b ca
Composing Dataflow Functions
}{ else }{then
if .f 1
bVbV
VaV
b ca
else }{ then
if .f 2
cVbV
}),({ff 12 ca }{c
x = 3
p(x,y)
return from p
start main
exit main
start p(a,b)
if . . .
b = a
p(a,b)
return from p
exit p
x y a b
printf(y)
Might b beuninitializedhere?
printf(b) NO!
(
]
Might y beuninitializedhere?
YES!
(
)
matched matched matched
| (i matched )i 1 i CallSites | edge |
stack
) ( (
((
(
)
) )
)
stack
( )
Off Limits!
) (
(
((
(
)
)
)
( )
(
stack
(
(
unbalLeft matched unbalLeft
| (i unbalLeft 1 i CallSites |
stack
Off Limits!
Interprocedural Dataflow Analysisvia CFL-Reachability
• Graph: Exploded control-flow graph
• L: L(unbalLeft)
• Fact d holds at n iff there is an L(unbalLeft)-path
from dnstartmain , to,
Asymptotic Running Time [Reps, Horwitz, & Sagiv 95]
• CFL-reachability– Exploded control-flow graph: ND nodes– Running time: O(N3D3)
• Exploded control-flow graph Special structure
Running time: O(ED3)
Typically: E N, hence O(ED3) O(ND3)
“Gen/kill” problems: O(ED)
Why Bother?“We’re only interested in million-line programs”
• Know thy enemy!– “Any” algorithm must do these operations– Avoid pitfalls (e.g., claiming O(N2) algorithm)
• The essence of “context sensitivity”• Special cases
– “Gen/kill” problems: O(ED)• Compression techniques
– Basic blocks– SSA form, sparse evaluation graphs
• Demand algorithms
Relationship to Other Analysis Paradigms
• Dataflow analysis
–reachability versus equation solving
• Deduction
• Set constraints
The Need for Pointer Analysis
int main() { int sum = 0; int i = 1; int *p = ∑ int *q = &i; int (*f)(int,int) = add; while (*q < 11) { *p = (*f)(*p,*q); *q = (*f)(*q,1); } printf(“%d\n”,*p); printf(“%d\n”,*q);}
int add(int x, int y) { return x + y;}
The Need for Pointer Analysis
int main() { int sum = 0; int i = 1; int *p = ∑ int *q = &i; int (*f)(int,int) = add; while (*q < 11) { *p = (*f)(*p,*q); *q = (*f)(*q,1); } printf(“%d\n”,*p); printf(“%d\n”,*q);}
int add(int x, int y) { return x + y;}
The Need for Pointer Analysis
int main() { int sum = 0; int i = 1; int *p = ∑ int *q = &i; int (*f)(int,int) = add; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i);}
int add(int x, int y) { return x + y;}
Flow-Sensitive Points-To Analysis
p = &q;
p = q;
p = *q;
*p = q;
p q
pr1
r2
q
r1
r2
q
s1
s2
s3
p
ps1
s2
qr1
r2
p q
pr1
r2
q
r1
r2
q
s1
s2
s3
p
ps1
s2
qr1
r2
Flow-Sensitive Flow-Insensitive
start main
exit main
3
2
1
45
3
2
1
4
5
Flow-Insensitive Points-To Analysis[Andersen 94, Shapiro & Horwitz 97]
p = &q;
p = q;
p = *q;
*p = q;
p q
pr1
r2
q
r1
r2
q
s1
s2
s3
p
ps1
s2
qr1
r2
Flow-Insensitive Points-To Analysis
a = &e; b = a; c = &f;*b = c; d = *a;
a
d
b
cf
e
Flow-Insensitive Points-To Analysis• Andersen [Thesis 94]
– Formulated using set constraints– Cubic-time algorithm
• Shapiro & Horwitz (1995; [POPL 97])– Re-formulated as a graph-grammar problem
• Reps (1995; [unpublished])– Re-formulated as a Horn-clause program
• Melski (1996; see [Reps, IST98])– Re-formulated via CFL-reachability
CFL-Reachability via Dynamic Programming
GrammarGraph
BC
A
A B C
CFL-Reachability = Chain Programs
Grammar
A B C
Graph
BC
a(X,Z) :- b(X,Y), c(Y,Z).
zx
y
A
Base Facts for Points-To Analysis
p = &q;
p = q;
p = *q;
*p = q;
assignAddr(p,q).
assign(p,q).
assignStar(p,q).
starAssign(p,q).
Rules for Points-To Analysis (I)
pointsTo(P,Q) :- assignAddr(P,Q).
pointsTo(P,R) :- assign(P,Q), pointsTo(Q,R).
p = &q; p q
p = q; pr1
r2
q
Rules for Points-To Analysis (II)
pointsTo(P,S) :- assignStar(P,Q),pointsTo(Q,R),pointsTo(R,S).
pointsTo(R,S) :- starAssign(P,Q),pointsTo(P,R),pointsTo(Q,S).
p = *q; r1
r2
q
s1
s2
s3
p
*p = q; ps1
s2
qr1
r2
Creating a Chain Program
pointsTo(R,S) :- starAssign(P,Q),pointsTo(P,R),pointsTo(Q,S).
*p = q; ps1
s2
qr1
r2
pointsTo(R,S) :- pointsTo(P,R),starAssign(P,Q),pointsTo(Q,S).
pointsTo(R,S) :- pointsTo(R,P),starAssign(P,Q),pointsTo(Q,S).
pointsTo(R,P) :- pointsTo(P,R).
Base Facts for Points-To Analysis
p = &q;
p = q;
p = *q;
*p = q;
assignAddr(p,q).
assign(p,q).
assignStar(p,q).
starAssign(p,q).starAssign(q,p).
assignStar(q,p).
assign(q,p).
assignAddr(q,p).
Creating a Chain ProgrampointsTo(P,Q) :- assignAddr(P,Q).
pointsTo(P,R) :- assign(P,Q), pointsTo(Q,R).
pointsTo(P,S) :- assignStar(P,Q),pointsTo(Q,R),pointsTo(R,S).
pointsTo(Q,P) :- assignAddr(Q,P).
pointsTo(R,S) :- pointsTo(R,P),starAssign(P,Q),pointsTo(Q,S).
pointsTo(S,P) :- pointsTo(S,R),pointsTo(R,Q),assignStar(Q,P).
pointsTo(S,R) :- pointsTo(S,Q),starAssign(Q,P),pointsTo(P,R).
pointsTo(R,P) :- pointsTo(R,Q), assign(Q,P).
. . . and now to CFL-Reachability
pointsTo assign pointsTo
pointsTo assignStar pointsTo pointsTo
pointsTo assignAddr
pointsTo assignAddr
pointsTo pointsTo starAssign pointsTo
pointsTo pointsTo pointsTo assignStar
pointsTo pointsTo starAssign pointsTo
pointsTo pointsTo assign
Relationship to Other Analysis Paradigms
• Dataflow analysis
–reachability versus equation solving
• Deduction
• Set constraints
1987
1993
1994
1995
1997
1998
1996
Slicing&
Applications
DataflowAnalysis Demand
Algorithms
SetConstraints
Structure-TransmittedDependences
CFLReachability
Structure-TransmittedDependences Set
Constraints
Structure-Transmitted Dependences [Reps1995]
McCarthy’s equations: car(cons(x,y)) = x cdr(cons(x,y)) = y
w = cons(x,y); v = car(w);
v
w
yx
dep
dep
dephd dep hddep -1
hd tl
hd 1-
Set Constraintsw = cons(x,y); ),cons( YXW
)(cons 11 WV v = car(w);
McCarthy’s Equations Revisited
)(provided ,)),(cons(cons 11 Y IXYX
)}(),(|{))(cons( 2111
1 VIvvconsvVI
Semantics of Set Constraints
)}( and )( |),({)),(cons( 22112121 VIvVIvvvconsVVI
CFL-Reachabilityversus
Set Constraints
• Lazy languages: CFL-reachability is more natural– car(cons(X,Y)) = X
• Strict languages: Set constraints are more natural– car(cons(X,Y)) = X, provided I(Y)
• But . . . SC and CFL-reachability are equivalent! – [Melski & Reps 97]
Solving Set Constraints
aW
),cons( YXW )(cons 1
2 WU
X is “inhabited”
),cons( YXW
X is “inhabited”Y is “inhabited”
)(cons 11 WV
),cons( YXW Y is “inhabited”
W is “inhabited”
W is “inhabited”XV YU
W
Simulating “Inhabited”
aW
dep inhab depinhab
inhab
a
dep dep
inhab
W
YX
Simulating “Inhabited”
hd tlhd tl),cons( YXW
inhabinhab
tlinhab tl hd inhab hdinhab
inhab
V
W
YX
Simulating “Provided I(Y) ”
),cons( YXW )(cons 1
1 WV
hd tlhd tl
inhab
dep
hd tlinhab tl hddep 1-hd 1-
provided I(Y)
Themes
• Harnessing CFL-reachability
• Relationship to other analysis paradigms
• Exhaustive alg. Demand alg.
• Understanding complexity– Linear . . . cubic . . . undecidable
• Beyond CFL-reachability
Exhaustive Versus Demand Analysis
• Exhaustive analysis: All facts at all points
• Optimization: Concentrate on inner loops
• Program-understanding tools: Only some facts
are of interest
Exhaustive Versus Demand Analysis
• Demand analysis:– Does a given fact hold at a given point?– Which facts hold at a given point?– At which points does a given fact hold?
• Demand analysis via CFL-reachability– single-source/single-target CFL-reachability– single-source/multi-target CFL-reachability– multi-source/single-target CFL-reachability
x = 3
p(x,y)
return from p
printf(y)
start main
exit main
start p(a,b)
if . . .
b = a
p(a,b)
return from p
printf(b)
exit p
x y a b
YES!
(
)
NO!
“Semi-exhaustive”:All “appropriate” demands
Might y beuninitializedhere?
Might b beuninitializedhere?
Experimental Results[Horwitz , Reps, & Sagiv 1995]
• 53 C programs (200-6,700 lines)• For a single fact of interest:
– demand always better than exhaustive
• All “appropriate” demands beats exhaustive when percentage of “yes” answers is high– Live variables– Truly live variables– Constant predicates– . . .
A Related Result [Sagiv, Reps, & Horwitz 1996]
• [Uses a generalized analysis technique]• 38 C programs (300-6,000 lines)
– copy-constant propagation– linear-constant propagation
• All “appropriate” demands always beats exhaustive– factor of 1.14 to about 6
Exhaustive Versus Demand Analysis
• Demand algorithms for
– Interprocedural dataflow analysis
– Set constraints
– Points-to analysis
Demand Analysis and LP Queries (I)
• Flow-insensitive points-to analysis– Does variable p point to q?
• Issue query: ?- pointsTo(p, q).• Solve single-source/single-target L(pointsTo)-
reachability problem
– What does variable p point to?• Issue query: ?- pointsTo(p, Q).• Solve single-source L(pointsTo)-reachability problem
– What variables point to q?• Issue query: ?- pointsTo(P, q).• Solve single-target L(pointsTo)-reachability problem
Demand Analysis and LP Queries (II)
• Flow-sensitive analysis– Does a given fact f hold at a given point p?
?- dfFact(p, f).– Which facts hold at a given point p?
?- dfFact(p, F).– At which points does a given fact f hold?
?- dfFact(P, f).
• E.g., flow-sensitive points-to analysis?- dfFact(p, pointsTo(x, Y)).?- dfFact(P, pointsTo(x, y)).etc.
Themes
• Harnessing CFL-reachability
• Relationship to other analysis paradigms
• Exhaustive alg. Demand alg.
• Understanding complexity– Linear . . . cubic . . . undecidable
• Beyond CFL-reachability
Interprocedural Backward SliceEnter main
Call p Call p
Enter p
[
]
)
(
x = 3
p(x,y)
return from p
start main
exit main
start p(a,b)
if . . .
b = a
p(a,b)
return from p
exit p
x y a b
printf(y)printf(b)
y may beuninitialized here
[
])
(
Structure-Transmitted Dependences [Reps1995]
McCarthy’s equations: car(cons(x,y)) = x cdr(cons(x,y)) = y
w = cons(x,y); v = car(w);
v
w
yx
hd tl
hd 1-
Dependences + Matched Paths?
Enter main
Enter p
w=cons(x,y) Call p
w
Call p
v = car(w)
w
w
x y
hd
hd-1
( )
tl
[ ]
Undecidable![Reps, TOPLAS 00]
hd hd-1( )
Interleaved Parentheses!
Themes
• Harnessing CFL-reachability
• Relationship to other analysis paradigms
• Exhaustive alg. Demand alg.
• Understanding complexity– Linear . . . cubic . . . undecidable
• Beyond CFL-reachability
CFL-Reachability via Dynamic Programming
GrammarGraph
BC
A
A B C
Beyond CFL-Reachability:Composition of Linear Functions
x.3x+5x.2x+1
x.6x+11
(x.2x+1) (x.3x+5) = x.6x+11
Beyond CFL-Reachability:Composition of Linear Functions
• Interprocedural constant propagation– [Sagiv, Reps, & Horwitz TCS 96]
• Interprocedural path profiling– The number of path fragments contributed
by a procedure is a function
– [Melski & Reps CC 99]
Model-Checking of Recursive HFSMs [Benedikt, Godefroid, & Reps (in prep.)]
• Non-recursive HFSMs [Alur & Yannakakis 98]
• Ordinary FSMs– T-reachability/circularity queries
• Recursive HFSMs– Matched-parenthesis T-reachability/circularity
• Key observation: Linear-time algorithms for matched-parenthesis T-reachability/cyclicity– Single-entry/multi-exit [or multi-entry/single-exit]– Deterministic, multi-entry/multi-exit
T-Cyclicity inHierarchical Kripke Structures
SN/SX SN/MX MN/SX MN/MXnon-rec: O(|k|) non-rec: O(|k|) ? ?rec: O(|k|3) rec: ?
SN/SX SN/MX MN/SX MN/MXO(|k|) O(|k|) O(|k|) O(|k|3)
O(|k||t|) [lin rec] O(|k|) [det]
Recursive HFSMs: Data Complexity
SN/SX SN/MX MN/SX MN/MX LTL non-rec: O(|k|) non-rec: O(|k|) ? ?
rec: P-time rec: ?
CTL O(|k|) bad ? badCTL* O(|k|2) [L2] bad ? bad
Recursive HFSMs: Data Complexity
SN/SX SN/MX MN/SX MN/MXLTL O(|k|) O(|k|) O(|k|) O(|k|3)
O(|k||t|) [lin rec] O(|k|) [det]
CTL O(|k|) bad O(|k|) badCTL* O(|k|) bad O(|k|) bad
Not Dual Problems!
CFL-Reachability: Scope of Applicability
• Static analysis– Slicing, DFA, structure-transmitted dep.,
points-to analysis
• Verification– Security of crypto-based protocols for
distributed systems [Dolev, Even, & Karp 83]– Model-checking recursive HFSMs
• Formal-language theory– CF-, 2DPDA-, 2NPDA-recognition– Attribute-grammar analysis
CFL-Reachability: Benefits• Algorithms
– Exhaustive & demand
• Complexity– Linear-time and cubic-time algorithms– PTIME-completeness– Variants that are undecidable
• Complementary to– Equations– Set constraints– Types– . . .
Most Significant Contributions: 1987-2000
• Asymptotically fastest algorithms– Interprocedural slicing– Interprocedural dataflow analysis
• Demand algorithms– Interprocedural dataflow analysis [CC94,FSE95]– All “appropriate” demands beats exhaustive
• Tool for slicing and browsing ANSI C– Slices programs as large as 75,000 lines– University research distribution– Commercial product: CodeSurfer
(GrammaTech, Inc.)
Most Significant Contributions: 1987-2000
• Unifying conceptual model– [Kou 77], [Holley&Rosen 81], [Cooper&Kennedy 88],
[Callahan 88], [Horwitz,Reps,&Binkley 88], . . .
• Identifies fundamental bottlenecks– Cubic-time “barrier”– Litmus test: quadratic-time algorithm?!– PTIME-complete limits to parallelizability
• Existence proofs for new algorithms– Demand algorithm for set constraints– Demand algorithm for points-to analysis
References
• Papers by Reps and collaborators:– http://www.cs.wisc.edu/~reps/
• CFL-reachability– Yannakakis, M., Graph-theoretic methods in
database theory, PODS 90.– Reps, T., Program analysis via graph
reachability, Inf. and Softw. Tech. 98.
References• Slicing, chopping, etc.
– Horwitz, Reps, & Binkley, TOPLAS 90– Reps, Horwitz, Sagiv, & Rosay, FSE 94– Reps & Rosay, FSE 95
• Dataflow analysis– Reps, Horwitz, & Sagiv, POPL 95– Horwitz, Reps, & Sagiv, FSE 95, TR-1283
• Structure dependences; set constraints– Reps, PEPM 95– Melski & Reps, Theor. Comp. Sci. 00
References• Complexity
– Undecidability: Reps, TOPLAS 00?– PTIME-completeness: Reps, Acta Inf. 96.
• Verification– Dolev, Even, & Karp, Inf & Control 82.– Benedikt, Godefroid, & Reps, In prep.
• Beyond CFL-reachability– Sagiv, Reps, Horwitz, Theor. Comp. Sci 96– Melski & Reps, CC 99, TR-1382
Top Related