End-User Shape Analysis
description
Transcript of End-User Shape Analysis
End-User Shape Analysis
National Taiwan University – August 11, 2009
Xavier RivalINRIA/ENS
Paris
Bor-Yuh Evan Chang 張博聿
U of Colorado, Boulder
If some of the symbols are garbled, try either installing TexPoint (http://texpoint.necula.org) or the TeX fonts (http://www.cs.colorado.edu/~bec/texpoint-fonts.zip).
George C. NeculaU of California,
Berkeley
Programming Languages Research at the University of
Colorado, Boulder
3
Software errors cost a lot
~$60 billion annually (~0.5% of US GDP)– 2002 National Institute of Standards and
Technology report
total annual revenue of>10x annual budget of >
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
4
But there’s hope in program analysis
Microsoft uses and distributesthe Static Driver Verifier
Airbus appliesthe Astrée Static Analyzer
Companies, such as Coverity and Fortify, market static source code analysis toolsBor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
5
Because program analysis caneliminate entire classes of bugs
For example,– Reading from a closed file:
– Reacquiring a locked lock:
How?– Systematically examine the program
– Simulate running program on “all inputs”
– “Automated code review”
read( );
acquire( );
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
6
…code …// x now points to an unlocked lock
acquire(x);… code …
analysis state
Program analysis by example:Checking for double acquires
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
Simulate running program on “all inputs”
x
acquire(x);… code …
7
…code …// x now points to an unlocked lock in a linked list
acquire(x);… code …
ideal analysis state
Program analysis by example:Checking for double acquires
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
Simulate running program on “all inputs”
x xx
or or or …
undecidability
8
…code …// x now points to an unlocked lock in a linked list
acquire(x);… code …
ideal analysis state analysis state
Must abstract
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
x xx
or or or … ?
xFor decidability, must abstract—“model all inputs” (e.g., merge objects)
Abstraction too coarse or not precise enough (e.g., lost x is always unlocked)
mislabels good code as buggy
9
To address the precision challenge
Traditional program analysis mentality:
“Why can’t developers write more specifications for our analysis? Then, we could verify so much more.”
“Since developers won’t write specifications, we will use default abstractions (perhaps coarse) that work hopefully most of the time.”
End-user approach:
“Can we design program analyses around the user? Developers write testing code. Can we adapt the analysis to use those as specifications?”
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
10
Summary of overview
Challenge in analysis: Finding a good abstraction
precise enough but not more than necessary
Powerful, generic abstractionsexpensive, hard to use and understand
Built-in, default abstractionsoften not precise enough (e.g., data structures)
End-user approach:Must involve the user in abstraction
without expecting the user to be a program analysis expert
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
11
Overview of contributions
Extensible Inductive Shape Analysis (Xisa)Precise inference of data structure
propertiesAble to check, for instance, the locking
example
Targeted to software developersUses data structure checking code for guidance Turns testing code into a specification for
static analysis
Efficient~10-100x speed-up over generic approaches Builds abstraction out of developer-supplied
checking code
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
Extensible InductiveShape Analysis
Precise inference of data structure properties
End-user approach
…
13
Shape analysis is a fundamental analysisData structures are at the core of
– Traditional languages (C, C++, Java)– Emerging web scripting languages
Improves verifiers that try to– Eliminate resource usage bugs
(locks, file handles)– Eliminate memory errors (leaks, dangling
pointers)– Eliminate concurrency errors (data races)– Validate developer assertions
Enables program transformations– Compile-time garbage collection– Data structure refactorings
…
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
14
Shape analysis by example:Removing duplicates
// l is a sorted doubly-linked list
for each node cur in list l {remove cur if duplicate;
}assert l is sorted,
doubly-linked with no duplicates;
Example/Testing Code Review/Static Analysis
“no duplicates”l
“sorted dl list”l
program-specific
l 2 2 44
l 2 44
cur
l 2 4
“sorted dl list”l“segment withno duplicates”
cur
intermediate state more
complicated
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
15
Shape analysis is not yet practical
Choosing the heap abstraction difficult for precision
Parametric in high-level, developer-oriented predicates+ Extensible+ Targeted at developers
Xisa
Built-in high-level predicates
- Harder to extend+ No additional user effort (if
precise enough)
Parametric in low-level, analyzer-oriented predicates+ Very general and expressive- Harder for non-expert
89
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
Some representative approaches:
End-user approach:
Space Invader [Distefano et
al.]
TVLA[Sagiv et al.]
16
Our approach: Executable specifications
Utilize “run-time checking code” as specification for static analysis.
assert(sorted_dll(l,…));
for each nodecurinlistl {removecurif duplicate;
}
assert(sorted_dll_nodup(l,…));
l
l
cur
l
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
h.dll(p) =if (h =null) then
trueelse
h!prev= p and h!next.dll(h)
checker
Contribution: Automatically generalize checkers for complicated intermediate states
Contribution:Build the abstraction for analysis out of developer-specified checking code
•p specifies where prev should point
17
Xisa is …
• Extensible and targeted for developers– Parametric in developer-supplied checkers—
viewed as inductive definitions in separation logic
• Precise yet compact abstraction for efficiency– Data structure-specific based on properties of
interest to the developer
An automated shape analysis with a precise memory abstraction based around invariant checkers.
Xisa
h.dll(p) =if (h = null) then
trueelse
h!prev = prev and h!next.dll(h)
checkers
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
18
Splitting of summaries
To reflect updates precisely
And summarizing for termination
Shape analysis is an abstract interpretation on abstract memory descriptions with …
cur
l
cur
l
cur
l
cur
l
cur
l
cur
l
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
19
Roadmap: Components of Xisa
Xisa shape analyzer
abstract interpretation
splitting andinterpreting update
summarizing
level-typeinference
on checkerdefinitions
h.dll(p) =if (h = null) then
trueelse
h!prev = prev and h!next.dll(h)
checkers
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
Learn information about the checker to use it as an abstraction
Compare and contrast manual code review and our automated shape analysis
20
Overview: Split summariesto interpret updates precisely
l
cur
l
cur
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
Want abstract update to be “exact”, that is, to update one “concrete memory cell”.The example at a high-level: iterate using cur changing the doubly-linked list from purple to red.
l
cur
split at cur
update cur purple to red
l
cur
Challenge:How does the analysis “split” summaries and know where to “split”?
21
“Split forward”by unfolding inductive definition
Çh.dll(p) =
if(h =null) thentrue
elseh!prev= p and
h!next.dll(h)
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
l
curget: cur!next
l
cur
null
p dll(cur, p)
l
cur
pdll(n, cur)
n
Analysis doesn’t forget the empty case
22
“Split backward” also possible and necessary
h.dll(p) =if (h =null) then
trueelse
h!prev= p and h!next.dll(h)
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
l
cur
pdll(n, cur)
n
for each node cur in list l {
remove cur if duplicate;}assert l is
sorted, doubly-linked with no duplicates;
“dll segment”
l
cur
p0dll(n, cur)
n“dll segment”
cur!prev!next= cur!next;
l
cur
dll(n, cur)nnull
get: cur!prev!next
Ç
Technical Details:How does the analysis do this unfolding?Why is this unfolding allowed?(Key: Segments are also inductively defined)
[POPL’08]
How does the analysis know to do this unfolding?
23
Roadmap: Components of Xisa
Xisa shape analyzer
abstract interpretation
splitting andinterpreting update
summarizing
level-typeinference
on checkerdefinitions
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
Contribution: Turns testing code into specification for static analysis
How do we decide where to unfold?
Derives additional information to guide unfolding
h.dll(p) =if (h = null) then
trueelse
h!prev = prev and h!next.dll(h)
checkers
… to be discussed this afternoon
24
Summary of interpreting updates
Splitting of summaries needed for precision
Unfolding checkers is a natural way to do splitting
When checker traversal matches code traversal
Checker parameter type analysisUseful for guiding unfolding in difficult cases, for example, “back pointer” traversals
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
25
Results: Performance
Benchmark
Max. Num.
Graphs at a
Program Pt
Analysis
Time (ms)
singly-linked list reverse 1 1.0
doubly-linked list reverse 1 1.5
doubly-linked list copy 2 5.4
doubly-linked list remove 5 17.9
doubly-linked list remove and back 5 18.1
search tree with parent insert 3 16.6
search tree with parent insertand back
5 64.7
two-level skip list rebalance 1 11.7
Linux scull driver (894 loc) (char arrays ignored, functions inlined)
4 3969.6
Times negligible for data structure operations (often in sec or 1/10 sec)Expressiveness:
Different data structures
Verified shape invariant as given by the checker is preserved across the operation.Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
TVLA: 850 ms
TVLA: 290 ms
Space Invaderonly analyzes lists (built-in)
26
Demo: Doubly-linked list reversal
http://www.cs.colorado.edu/~bec/
Body of loop over the elements:Swaps the next and prev fields of curr.
Already reversed segmentNode whose next and prev fields were swappedNot yet reversed list
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
27
Experience with the tool
Checkers are easy to write and try out– Enlightening (e.g., red-black tree checker in 6
lines)– Harder to “reverse engineer” for someone else’s
code– Default checkers based on types useful
Future expressiveness and usability improvements– Pointer arithmetic and arrays (in progress)– More generic checkers:
polymorphic “element kind unspecified”
higher-orderparameterized by other predicates
Future evaluation: user study
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
28
Near-term future work:Exploiting common specification frameworkScenario: Code instrumented with lots of
checker calls (perhaps automatically with object invariants)assert( mychecker(x) );
// … operation on x …assert( mychecker(x) );
Can we prove parts statically?Static Analysis View: Hybrid checkingTesting View: Incrementalize invariant checking
Example: Insert in a sorted list
l v wu
Preservation of sortedness shown statically
Emit run-time check for new element: u · v · w
• Very slow to execute• Hard to prove statically (in
general)
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
29
Conclusion
Extensible Inductive Shape Analysisprecision demanding program analysis improved by novel user interaction
Developer: Gets results corresponding to intuition
Analysis: Focused on what’s important to the developer
Practical precise tools for better software with an end-user approach!
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder - End-User Shape Analysis
Programming Languages Research at the University of
Colorado, Boulder
31
Who we are
Faculty
Ph.D. Students
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
Amer Diwan Jeremy Siek Sriram SankaranarayananBor-Yuh Evan Chang
32
Outline
• Gradual Programming– A new collaborative project involving
Amer Diwan, Jeremy Siek, and myself
• Brief Sketches of Other Activities
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
Gradual Programming: Bridging the Semantic Gap
34
Have you noticed a time where your program is not optimized where you expect?
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
“I need a map data structure”
Load class fileRun class initializationCreate hashtable
Problem: Tools (IDEs, checkers, optimizers) have no knowledge of what the programmer cares about
… hampering programmer productivity, software reliability, and execution efficiency
semantic gap
Observation: A disconnect between programmer intent and program meaning
35
Example: Iteration Order
class OpenArray extends Object {private Double data[]; public boolean contains(Object
lookFor) {for (i = 0; i < data.length; i++) {
if (data[i].equals(lookFor)) return true;
}return false;
}}
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
Compiler cannot choose a different iteration order (e.g., parallel)
Must specify an iteration order even when it should not matter
36
Wild and Crazy Idea: Use Non-Determinism
• Programmer starts with a potentially non-deterministic program
• Analysis identifies instances of “under-determinedness”
• Programmer eliminates “under-determinedness”
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
class OpenArray extends Object {private Double data[]; public boolean contains(Object lookFor)
{for (i = 0; i < data.length; i++) {
if (data[i].equals(lookFor)) return true;
}return false;
}}
class OpenArray extends Object {private Double data[]; public boolean contains(Object lookFor)
{i 0 .. data.length-1 {
if (data[i].equals(lookFor)) return true;
}return false;
}}
“over-determined”
“under-determined”
just right
starting point
Question: What does this mean? Is it “under-determined”?Response: Depends, is the iteration order important?
37
Let’s try a few program variants
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
public boolean contains(Object lookFor) { for (i = data.length-1; i >= 0; i--) {
if(data[i].equals(lookFor)) return true; } return false;}
public boolean contains(Object lookFor) { for (i = 0; i < data.length; i++) {
if(data[i].equals(lookFor)) return true; } return false;}
public boolean contains(Object lookFor) { parallel_for (0, data.length-1) i => {
if(data[i].equals(lookFor)) return true; } return false;}
Do they compute the same result?
Approach: Try to verify equivalence of program variants up to a specificationYes Pick any oneNo Ask user
What about here?
38
Surprisingly, analysis says no. Why?
Exceptions!
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
Need user interaction to refine specification that captures programmer intent
nulla.data=
a.contains( )left-to-right iterationreturns true
right-to-left iterationthrows NullPointerException
39
Proposal Summary
• “Fix semantics per program”: Abstract constructs with many possible concrete implementations
• Apply program analysis to find inconsistent implementations
• Interact with the user to refine the specification
• Language designer role can enumerate the possible implementationsBor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
40
Bridging the Semantic Gap
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
“I need a map data structure”
“Looks like iterator order matters for your program”
“Yes, I need iteration in sorted order”
“Let’s use a balanced binary tree (TreeMap)”
Other Activities
42
Formal Methods
Prof. Sriram Sankaranarayanan (CS)Cyber-physical systems verification– hybrid automata theory, control systems
verification, analysis of Simulink and Stateflow diagrams
– advanced mathematical techniques:• convex optimization: linear and semi-definite• differential equations: set-valued analysis• SMT solvers over non-linear theories
– applications to automotive software (with NEC labs and GM labs)
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
Prof. Aaron Bradley (ECEE)Decision procedures, Model checking
Prof. Fabio Somenzi (ECEE)
43
Programming Languages and AnalysisProf. Amer Diwan (CS)
Performance analysis of computer systemsHow do we know that we have not perturbed our data?Using machine learning and statistical techniques to reason about
data
Tool-assisted program transformationsAlgorithmic optimizations for performanceProgram metamorphosis for improving code quality
Prof. Jeremy Siek (ECEE/CS)Gradual type checking: static (Java) dynamic (Python)
Meta-programming: programs that write programs
Compilers for optimizing scientific codes
Prof. Bor-Yuh Evan Chang (CS)End-user program analysisPrecise analysis (shape, collections)Interactive analysis refinement (type checking + symbolic
evaluation)
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
44
Applying to Colorado
• Computer Science Department information
• Deadlines
• Graduate Advisor: Nicholas Vocatura
• Talk to me about application fee waiver
Bor-Yuh Evan Chang 張博聿 , University of Colorado at Boulder
http://www.cs.colorado.edu/grad/admission/
http://www.cs.colorado.edu/~bec/
Dec 1 for Fall (Sep 1 for Spring)