Program Analysis Mooly Sagiv sagiv/courses/pa.html Tel Aviv University 640-6706 Sunday 18-21...
-
date post
22-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of Program Analysis Mooly Sagiv sagiv/courses/pa.html Tel Aviv University 640-6706 Sunday 18-21...
Program AnalysisMooly Sagiv
http://www.math.tau.ac.il/~sagiv/courses/pa.html
Tel Aviv University
640-6706
Sunday 18-21 Scrieber 8
Monday 10-12 Schrieber 317
Textbook: Principles of Program Analysis
Chapter 1.1-5
Outline The Nature of Program Analysis Setting the Scene
– The While language
– Reaching Definitions Program Analysis Techniques
– Data Flow Analysis - the equational approach
– The Constraint Based Approach
– Abstract Interpretation
– Type and Effect Systems
– Algorithms
– Transformations
The Nature of Program Analysis
Compile-time techniques for predicating safe and computable approximations to the behaviors arising at tun-time when executing a program
Differences with operational semantics– The input state is not usually known at compile-time
– The compiler must always terminate (fast)
– The compiler can generate suboptimal code
The Nature of Program AnalysisErring on the Safe Side
{d1, d2, …, dN}{d1, d2, …, dn dN}
true-answer
{d1, d2, …, dn, dn+1, … dn+m , dN}
safe-answer
Examplevoid main()
{ int y, z;
read(x);
if (x>0) then
y = 1;
else {
y = 2;
f() ; /* f does not change y */
}
z = y;
}
Semantics Based Program Analysis
Information obtained can be proved safe (or correct) w.r.t. operational semantics
Earlier detection of conceptual compiler bugs But not committing to semantics directed program
analysis– The structure of the program analysis algorithm need
reflect the structure of the semantics
The While Programming Language RevisitedSyntactical Categories
x, y Var program variables
n Num program numerals
a Aexp arithmetic expressions
b Bexp Boolean expressions
s Stm set of program statements
l Lab set of program labels opa Opa arithmetic operators
opb Opb Boolean operators
opr Opb relational operators
The While Programming Language RevisitedAbstract Syntax
a := x | n | a1 opa a2
b := true | false | not b | b1 opb b2 | a1 opr a2
S := [x := a]l | [skip] l | S1 ; S2 | if [b]l then S1 else S2 | while [b]l do S
The Factorial Program[y := x]1;[z := 1]2;
while [y>1]3 do (
[z:= z * y]4;
[y := y - 1]5;
)
[y := 0]6;
Example Program Analysis ProblemReaching Definitions
An assignment (definition) of the form [x := a] l may reach an elementary block l’ if– there is execution of the program that leads to l'
where x was last assigned at l
Reaching Definitions in Factorial [y := x]1;[z := 1]2;
while [y>1]3
do (
[z:= z * y]4;
[y := y - 1]5;
)
[y := 0]6;
{(x, ?), (y, ?), (z, ?)} {(x, ?), (y, 1), (z, ?)}
{(x, ?), (y, 1), (z, ?)} {(x, ?), (y, 1), (z, 2)}
{(x, ?), (y, 1), (y, 5), (z, 2), (z, 4)}
{(x, ?), (y, 1), (y, 5), (z, 2), (z, 4)}
{(x, ?), (y, 1), (y, 5), (z, 2), (z, 4)} {(x, ?), (y, 1), (y, 5), (z, 4)}
{(x, ?), (y, 1), (y, 5), (z, 4)} {(x, ?), (y, 5), (z, 4)}
{(x, ?), (y, 6), (z, 2), (z, 4)}
{(x, ?), (y, 1), (y, 5), (z, 2), (z, 4)}
Reaching Definitions in Factoriall RDentry (l) RDexit(l)
1 {(x, ?), (y, ?), (z, ?)} {(x, ?), (y, 1), (z, ?)}
2 {(x, ?), (y, 1), (z, ?)} {(x, ?), (y, 1), (z, 2)}
3 {(x, ?), (y, 1), (y, 5), (z, 2), (z, 4)} {(x, ?), (y, 1), (y, 5),(z, 2), (z, 4)}
4 {(x, ?), (y, 1), (y, 5), (z, 2), (z, 4)} {(x, ?), (y, 1), (y, 5),(z, 4)}
5 {(x, ?), (y, 1), (y, 5), (z, 4)} {(x, ?), (y, 5), (z, 4)}
6 {(x, ?), (y, 1), (y, 5), (z, 2), (z, 4)} {(x, ?), (y, 6), (z, 4)}
Usage of Reaching Definitions
Compiler optimizations– An occurrence of a variable x in in an elementary block
l is constant n if all in the reaching definitions (x, l'),l' assigns n to x
– Loop invariant code motion
– Program dependence graphs
Software quality tools– A usage of a variable x in an elementary block may be
uninitialized if ...
– Program slicing
Soundness in Reaching Definitions
Every reachable definition is detected May include more definitions
– Less constants may be identified
– Not all the loop invariant code will be identified
– May warn against uninitailzed variables that are in fact in initialized
But never miss a reaching definition– All constants are indeed such
– Never move a non invariant code
– Never miss an error
Reaching Definitions in Factorial [y := x]1;[z := 1]2;
while [y>1]3
do (
[z:= z * y]4;
[y := y - 1]5;
)
[y := 0]6;
{(x, ?), (y, ?), (z, ?)} {(x, ?), (y, 1), (z, ?)}
{(x, ?), (y, 1), (z, ?)} {(x, ?), (y, 1), (z, 2)}
{(x, ?), (y, 1), (y, 5), (z, 2), (z, 4)}
{(x, ?), (y, 1), (y, 5), (z, 2), (z, 4)}
{(x, ?), (y, 1), (y, 5), (z, 2), (z, 4)} {(x, ?), (y, 1), (y, 5), (z, 4)}
{(x, ?), (y, 1), (y, 5), (z, 4)} {(x, ?), (y, 5), (z, 4)}
{(x, ?), (y, 6), (z, 2), (z, 4)}
{(x, ?), (y, 1), (y, 5), (z, 2), (z, 4)}
Unsound Reaching Definitions 1[y := x]1;[z := 1]2;
while [y>1]3
do (
[z:= z * y]4;
[y := y - 1]5;
)
[y := 0]6;
{(x, ?), (y, ?), (z, ?)} {(x, ?), (y, 1), (z, ?)}
{(x, ?), (y, 1), (z, ?)} {(x, ?), (y, 1), (z, 2)}
{(x, ?), (y, 1), (y, 5), (z, 2), (z, 4)}
{(x, ?), (y, 1), (y, 5), (z, 2), (z, 4)}
{(x, ?), (y, 1), (z, 2)} {(x, ?), (y, 1), (z, 4)}
{(x, ?), (y, 1), (z, 4)} {(x, ?), (y, 5), (z, 4)}
{(x, ?), (y, 6), (z, 2), (z, 4)}
{(x, ?), (y, 1), (z, 2)}
Unsound Reaching Definitions 2[y := x]1;[z := 1]2;
while [y>1]3
do (
[z:= z * y]4;
[y := y - 1]5;
)
[y := 0]6;
{(x, ?), (y, ?), (z, ?)} {(x, ?), (y, 1), (z, ?)}
{(x, ?), (y, 1), (z, ?)} {(x, ?), (y, 1), (z, 2)}
{(x, ?), (y, 5), (z, 4)}
{(x, ?), (y, 1), (y, 5), (z, 2), (z, 4)}
{(x, ?), (y, 1), (y, 5), (z, 2), (z, 4)} {(x, ?), (y, 1), (y, 5), (z, 4)}
{(x, ?), (y, 1), (y, 5), (z, 4)} {(x, ?), (y, 5), (z, 4)}
{(x, ?), (y, 6), (z, 4)}
{(x, ?), (y, 1), (y, 5), (z, 2), (z, 4)}
Suboptimal Reaching Definitions [y := x]1;[z := 1]2;
while [y>1]3
do (
[z:= z * y]4;
[y := y - 1]5;
)
[y := 0]6;
{(x, ?), (y, ?), (z, ?)} {(x, ?), (y, 1), (z, ?)}
{(x, ?), (y, 1), (z, ?)} {(x, ?), (y, 1), (z, 2)}
{(x, ?), (y, 1), (y, 5), (y, 6), (z, 2), (z, 4)}
{(x, ?), (y, 1), (y, 5), (y, 6), (z, 2), (z, 4)}
{(x, ?), (y, 1), (y, 5), (y, 6), (z, 2), (z, 4)}
{(x, ?), (y, 1), (y, 6), (y, 5), (z, 4)}
{(x, ?), (y, 1), (y, 5), (y, 6), (z, 4)} {(x, ?), (y, 5), (z, 4)}
{(x, ?), (y, 6), (z, 2), (z, 4)}
{(x, ?), (y, 1), (y, 5), (y, 6), (z, 2), (z, 4)}
Program Analysis Techniques Find sound solutions Data Flow Analysis - the equational approach The Constraint Based Approach Abstract Interpretation Type and Effect Systems
The Dataflow Analysis Approach
Generate a system of equations Find the least solution in one of the following
ways– Start with the minimum element and iterate until no
more changes occur
– Eliminate equations until every value is expressed in terms of the initial dataflow value when the program begins (not studied in this course)
Equations Generated for Reaching Definitions Equations for elementary statements
– [skip]l
RDexit(1) = RDentry(l)
– [b]l
RDexit(1) = RDentry(l)
– [x := a]l
RDexit(1) = (RDentry(l) - {(x, l) | l Lab }) {(x, l)}
Equations for control flow constructs RDentry(l) = RDexit(l’) l’ immediately precedes l in the control flow graph
An equation for the entryRDentry(1) = {(x, ?) | x is a variable in the program}
The Factorial Program[y := x]1;[z := 1]2;
while [y>1]3 do (
[z:= z * y]4;
[y := y - 1]5;
)
[y := 0]6;
The Least Solution
12 sets of equationsRDentry(1), …, RDexit (6)
Can be written in vectorial form
Find the minimum solution Every component is minimal Since F is monotonic such a solution always exists Since the number of definitions is finite it is possible
to compute the minimum solution iteratively
)RD(RD F
Chaotic Computation of the Least SolutionInitialize RDentry(1)={(x, ?), (y, ?), (z, ?)}|
RDentry(2)= RDentry(3)= RDentry(4)= RDentry (5)= RDentry (6)=
RDexit (i)= WL = {1, 2, 3, 4, 5, 6}
while WL != do select and remove an l from WL new = FRDexit(l)(…) if (new != RDexit(l)) then
RDexit(l) = new
for all l’ such that RDexit(l) is used in FRDentry(l’)(…) do RDentry(l’) = RDentry(l’) new
WL := WL {l’}
The Constraint Based Approach
Generate a system of set inclusions X Y
Fits very well with functional and object oriented programming languages in which the control flow graph is not immediately derived from the syntax
Find the least solution
Constraints Generated for Reaching Definitions Constrains for elementary statements
– [skip]l
RDexit(1) RDentry(l)
– [b]l
RDexit(1) RDentry(l)
– [x := a]l
RDexit(1) (RDentry(l) - {(x, l) | l Lab })
RDexit(1) {(x, l)}
Equations for control flow constructs RDentry(l) = RDexit(l’) l’ immediately precedes l in the control flow graph
An equation for the entryRDentry(1) {(x, ?) | x is a variable in the program}
Constraint vs.. Equations Reaching Definitions
Every solution to the system of equations is a solution to the set of constraintsRDexit(1) (RDentry(l) - {(x, l) | l Lab })
RDexit(1) {(x, l)}
RDexit(1) (RDentry(l) - {(x, l) | l Lab }) {(x, l)}
But some solutions to the set of constraints are not solutions to the system of equations
The least solution is the same The connection between constraints and equations is not
always obvious
The Control Flow Analysis Problem
Given a program in a functional programming language with higher order functions(functions can serve as parameters and return values)
Find out for each function invocation which functions may be applied
Obvious in C without function pointers Difficult in C++, Java and ML
An ML Example
let f = fn x => x 1 ;
g = fn y => y + 2 ;
h = fn z => z + 3;
in (f g) + (f h)
An ML Example
let f = fn x => /* {g, h} */ x 1 ;
g = fn y => y + 2 ;
h = fn z => z + 3;
in (f g) + (f h)
Control Flow Analysis (pure)ML Find out for every formal argument x
the set of expressions that may be bound to x in some execution
Analyze all function invocations Generate a set of constraints
– Label every program sub-expression
– The Control Flow Analysis Algorithm needs to find a pair (C, p) where
» C(l) is a superset of the potential sub-expressions that can occur at l
» p(x) is a superset of the potential sub-expressions that x can be bound to
Generate constraints for (C, p)
Simplified Example
let f = [ fn x => [[x]1 1]2]3; g = [ fn y =>[[y]4 + 2]5]6; h = [ fn z =>[[z]7 + 3]8]9;in [f h] 10
Simplified Constraints let f = [ fn x => [[x]1 1]2]3; g = [ fn y =>[[y]4 + 2]5]6; h = [ fn z =>[[z]7 + 3]8]9;in [f h] 10
C(1) { [x]1} C(2) { [[x]1 1]2
…
C(10) {[f h]10} C(1) p(x)C(4) p(y)C(7) p(z)p(x) C(9)C(10) C(3)