Post on 28-Dec-2015
PRESTO Research Group, Ohio State University
Interprocedural Dataflow Analysis in the Presence of
Large Libraries
Atanas (Nasko) RountevScott Kagan
Ohio State University
Thomas MarloweSeton Hall University
223/30/06
CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group
Uses of Interprocedural Dataflow Analysis
Performance optimizations in compilers Software understanding and
transformation e.g. dependence analysis for program
slicing, change impact analysis, refactoring, etc.
Software testing e.g. dataflow-based testing; testing of
object interactions in OO software Software checking
e.g. object protocols: open(read|write)*close
333/30/06
CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group
Model for Interprocedural Whole-Program Analysis
Components C1, C2, …, Cn form a complete program
Assumption: it is possible and desirable to analyze the source code of the entire program
code for C1
code for C2
…code for Cn
dataflowsolution forC1 + C2 + … + Cn
Engine forEngine forWhole-Whole-
ProgramProgramDataflowDataflowAnalysisAnalysis
443/30/06
CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group
A Specific Case: Main + Lib
Main + Lib form a complete program What if we are using large libraries that need to
be re-analyzed from scratch? e.g. the standard Java libraries contain about
10,000 classes and 80,000 methods need to be re-analyzed with every new Main
component
code for Main
code for Lib
dataflowsolution forMain + Lib
Engine forEngine forWhole-Whole-
ProgramProgramDataflowDataflowAnalysisAnalysis
553/30/06
CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group
Example: Methods in Java Programs
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Nu
mb
er o
f m
eth
od
s
User Methods Library Methods
663/30/06
CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group
A Specific Case: Main + Lib
Goal: the solution for Main should be as good as the solution that would have been computed by a whole-program analysis (no loss of precision)
code for Lib
Summary Summary GenerationGeneration
AnalysisAnalysissummary for Lib
code for Main dataflow solution for Main
Engine for Engine for Whole-Whole-
Program Program Dataflow Dataflow AnalysisAnalysis
summary for Lib
773/30/06
CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group
Functional Approach to Whole-Program Analysis
Sharir-Pnueli 1981 Dataflow lattice L Edge function f: L L for effects of a
statement Path function: f = fn fn-1 … f2 f1
Phase 1: summary functions φn: L L solution at node n as a function of the
solution at the entry of n’s procedure Phase 2: solutions at start nodes of
procedures Phase 3: solutions at the remaining nodes
883/30/06
CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group
φ6 = φ13 f1 f0
1
f0
main
2
3
4
5
6
p2
14
15
16
17
24
22
21
18
2019
23
p3
25
26
27
28
f1
f4 f5
f6
f7
f8
p1
7
8
9 10
11
12
13
f2 f3
Example: Functional Approach
φ28 = f8 f7
φ21 = f4 f5 (φ28 f6)
φ13 = (φ21 f2) (φ21 f3)
993/30/06
CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group
Callbacks
Callbacks e.g.
function pointers in C
e.g. virtual dispatch in C++ and Java
Can no longer determine φ21 and φ13 without code for ext
1
f0
main
2
3
4
5
6
p2
14
15
16
17
24
22
21
18
2019
23
p3
25
26
27
28
ext
29
30
31
f1
f4 f5
f6
f7
f8
f9
p1
7
8
9 10
11
12
13
f2 f3
10103/30/06
CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group
p2
14
15
16
17
24
22
21
18
2019
23
p3
25
26
27
28
f4 f5
f6
f7
f8
p1
7
8
9 10
11
12
13
f2 f3
Library Summary Idea: run
“pieces” of phase 1
Compute functions for sets of library-local pathsφ = id
φ = f8 f7
f6
φ = f4 f5
φ = f2 f3
φ = id
141614217 11
17211213
11113/30/06
CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group
Library Summary Generation
“Fixed” call in the library always invokes the same library procedure
independent of code for main component “Fixed” procedure in the library
makes no calls, or makes only fixed calls, to fixed procedures standard functional approach can be
applied For any other procedure, compute φ
k is the start node, or k is a return from a non-fixed call, or k is a return from a fixed call to a non-fixed
procedure
k n
12123/30/06
CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group
Example: Library Summary Generationp2
14
15
16
17
24
22
21
18
2019
23
p3
25
26
27
28
f4 f5
f6
f7
f8
p1
7
8
9 10
11
12
13
f2 f3
Fixed calls 11-12 and 23-24
Non-fixed calls 16-17
Fixed procedures p3
Non-fixed procedures p1 and p2
Contexts k for φ 7 and 14: start nodes 17: return from a non-fixed call 12: return from a fixed call to a non-
fixed procedure
k n
13133/30/06
CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group
The Condensed Graphp2
14
15
16
17
24
22
21
18
2019
23
p3
25
26
27
28
f4 f5
f6
f7
f8
p1
7
8
9 10
11
12
13
f2 f3
p2
21
p1
7
11
12
13
14
16
17
φ = id
φ = f8 f7
f6
φ = f4 f5
φ = f2 f3
φ = id
141614217 11
17211213
14143/30/06
CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group
Analysis of a Main Component
1
f0
main
2
3
4
5
6 ext
29
30
31
f1
f9
p2
21
p1
7
11
12
13
14
16
17
Create a “fake” graph for the whole program
Run a whole-program analysis engine
Safe solutions for non-library nodes precise for
distributive problems
15153/30/06
CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group
Original vs. Condensed Library CFGs: Number of Nodes
0
50000
100000
150000
200000
250000
300000
350000
Nu
mb
er
of
CF
G n
od
es
Nodes in original CFGs Nodes in condensed CFGs
16163/30/06
CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group
Original vs. Condensed Library CFGs: Number of Edges
0
50000
100000
150000
200000
250000
300000
350000
400000
Nu
mb
er
of
CF
G e
dg
es
Edges in original CFGs Edges in condensed CFGs
17173/30/06
CC 2006, Scott Kagan, PRESTO Research GroupCC 2006, Scott Kagan, PRESTO Research Group
Discussion
Flow and context insensitivity Cost reduction: time and memory Compact representation of functions
IFDS, IDE Use assumptions about the callback
methods? e.g. assume callback methods are
“good”