Post on 17-Dec-2015
Proofs from Tests
Nels E. Beckman Aditya V. Nori
Sriram K. Rajamani Robert J. Simmons
Carnegie Mellon University Microsoft Research India
Microsoft Research India Carnegie Mellon University
The Problem• Given – a sequential program P with inputs I (say,
written in C)– an assertion “assert(e)”
• Questions– Bug finding: Does there exist an execution
of the program P for some input I such that the assertion is violated?
– Verification: Does the assertion hold for all possible inputs?
Possible solution: Testing
• The “old-fashioned” way• Generate test cases and
see if we can find an input that violates the assertion
• Possible approaches:– Random test case
generation– Symbolic execution– “Concolic” execution (more
recent, e.g. DART/CUTE)
What’s wrong with testing?
If we view testing as a “black-box” activity, Dijkstra is right!After executing many tests, we still don’t know if there is another test that can violatethe assertion
If we view testing as a “white-box” activity, and “observe” what happens inside the program (along with symbolic execution), we can do several interesting things:– We can generate test cases in a directed
manner to find the bug– We can prove that the assertion holds for all
inputs!
Our hypothesis
Tests and Proofs1
34 5
6
2
10
7 89
12 11
Tests and Proofsa=true, b=false, limit=2 1
34 5
6
2
10
7 89
×
× ×
×
×
×
×
×
×
12 11×
×
×
×
×
×
×
×
Tests and Proofs1
34 5
6
2
10
7 89
12 11
3’
Tests and Proofs1
3’’4’ 5’’
6’’
2’
10’
7’ 8’’9’’
12 1110’’
2’’
5’4’’6’
8’7’’9’
DASH: Proofs from Tests– Algorithm uses only test case generation
operations– Maintains two data structures:
• A forest of reachable concrete states (tests)– Under-approximates executions of the
program• A region graph (an abstraction)
– Over-approximates all executions of the program
– Our goal: bug finding and proving• If a test reaches an error, we have found
bug• If we refine the abstraction so that there is
*no* path from the initial region to error region, we have a proof
– Handles the richness of C• New operator WPα uses only aliases α that
are present along concrete tests that are executed
• Algorithm uses recursive invocations to handle inter-procedural analysis
Empirical Evaluation
Current StatusYogi works on 904 (driver, property) pairs!
31 properties on which Yogi terminates and SLAM “times/spaces out”
Key Idea - IFrontier: Boundary between tested and
untested regions 01234
789
×
××
××
××
××
frontier
Key Idea 2
WPα: New refinement operation that does not depend on whole program alias information.
DASH Algorithm• Main workhorse: test
case generation• Use counterexamples
from current abstraction to “extend frontier” and generate tests
• When test case generation fails, use this information to “refine” abstraction at the frontier• Use only aliases that
happen on the tests!
Can extend test beyond frontier?
Refine abstraction
Construct initial abstraction
Construct random tests
Test succeeded? Bug!
Abstractionsucceeded?
τ = error path in abstraction f = frontier of error path
yes
no
yes
no
Proof! yes
no
Input:Program P
Property ψ
Example
Example
Can extend test beyond frontier?
Refine abstraction
Construct initial abstractionConstruct random tests
Test succeeded? Bug!
Abstractionsucceeded?
τ = error path in abstraction f = frontier of error path
yes
no
yes
no
Proof! yes
no
Input:Program P
Property ψ
τ=(0,1,2,3,4,7,8,9)
Example
y = 1
Symbolic execution +
Theorem proving
frontier
Can extend test beyond frontier?
Refine abstraction
Construct initial abstractionConstruct random tests
Test succeeded? Bug!
Abstractionsucceeded?
τ = error path in abstraction f = frontier of error path
yes
no
yes
no
Proof! yes
no
Input:Program P
Property ψ
01234
56
789
×
× ×
× ×
× ×
× ×
×
×
× ×
×
10×
Symbolic execution + Theorem Proving
τ=(0,1,2,3,4,7,8,9)
y y0lock.state Lx y0
(x =y) = (y0 = y0 ) = T(lock.state != L) = (L != L) = F
symbolic memory
constraints
Example
Symbolic execution +
Theorem proving
frontier
Can extend test beyond frontier?
Refine abstraction
Construct initial abstractionConstruct random tests
Test succeeded? Bug!
Abstractionsucceeded?
τ = error path in abstraction f = frontier of error path
yes
no
yes
no
Proof! yes
no
Input:Program P
Property ψ
01234
56
789
×
× ×
× ×
× ×
× ×
×
×
× ×
×
10×
Template-based refinement01234
56
789
×
× ×
× ×
× ×
× ×
×
×
× ×
×
10×
8:¬ρ 8:ρ9
89
ρ= (lock.state != L)
× ×
Template-based refinement
8:¬ρ 8:ρ9
89
ρ= (lock.state != L)
× ×
01234
56
78 :¬ρ
9
×
× ×
× ×
× ×
× ×
×
×
× ×
×
10×
8:ρ
Example
τ=(0,1,2,3,4,7,<8,p>,9)
01234
56
78 :¬ρ
9
×
× ×
× ×
× ×
× ×
×
×
× ×
×
10×
8:ρ Can extend test beyond frontier?
Refine abstraction
Construct initial abstractionConstruct random tests
Test succeeded? Bug!
Abstractionsucceeded?
τ = error path in abstraction f = frontier of error path
yes
no
yes
no
Proof! yes
no
Input:Program P
Property ψ
frontier
Proof!0123
4⋀¬s5⋀¬s6⋀¬r
9
×
× ×
× ×
× ×
××
×
×
7⋀¬q×
8⋀¬p×
4⋀s5⋀s6⋀r7⋀q
8⋀p×
Can extend test beyond frontier?
Refine abstraction
Construct initial abstractionConstruct random tests
Test succeeded? Bug!
Abstractionsucceeded?
τ = error path in abstraction f = frontier of error path
yes
no
yes
no
Proof! yes
no
Input:Program P
Property ψ
10
Template-based refinement
frontier
op • IF(i>=j)• ASSGN(i=i+j)• CALL(foo(i,j))op
Sk-1
Sk
×
Sk-2 T×
witness
Template-based refinement
Sk-1
Sk
×
Sk-2 T×
opSk-
1∧¬ρ Sk-1∧ρ
Sk
×
Sk-2 T×
opsuitable predicat
e
No theorem prover calls!
Candidates for suitable predicates
Sk-1∧¬ρ Sk-
1∧ρSk
×
Sk-2 T×
op
A. Strongest postcondition (SP)
B. Weakest precondition (WP)
Increased number of iterations, leading to non-termination in
many cases
Explodes in the presence of
aliasing
What’s wrong with WP?
ASSGN(i=j)Sk-1
*a<10
×
Sk-2 T×
What’s wrong with WP?
Sk-1∧¬ρ Sk-
1∧ρ*a<10
×
Sk-2 T×
ASSGN(i=j)ρ = (a≠&i ∧ *a<10) ∨ (a=&i ∧ j<10)
ρ = WP(*a<10, “i = j”)
What’s wrong with WP?
Sk-1∧¬ρ Sk-
1∧ρ*a+*b<10
×
Sk-2 T×
ASSGN(i=j)
ρ = (a≠&i b≠&i *a+*b<10) ∧ ∧ ∨ (a=&i b≠&i j+*b<10) ∧ ∧ ∨ (a≠&i b=&i *a+j<10) ∧ ∧ ∨ (a=&j b=&i j+j<10)∧ ∧
What’s wrong with WP?¬((a≠&i ∧ b≠&i ∧ *a+*b<10) || (a=&i ∧ b≠&i ∧ j+*b<10) || (a≠&i ∧ b=&i ∧ *a+j<10) || (a=&j ∧ b=&i ∧ j+j<10))
*a+*b<10
×
ASSGN(i=j)
(a≠&i ∧ b≠&i ∧ *a+*b<10) || (a=&i ∧ b≠&i ∧ j+*b<10) || (a≠&i ∧ b=&i ∧ *a+j<10) || (a=&j ∧ b=&i ∧ j+j<10)
In practice a global alias analysis required to prune the formula generated by WP
Deriving a suitable predicate
*a+*b<10ASSGN(i=j)
a≠&i ∧ b≠&i ∧ *a+*b≥10 a≠&i ∧ b≠&i ∧ *a+*b<10a=&i ∧ b≠&i ∧ j+*b≥10a≠&i ∧ b=&i ∧ *a+j≥10a=&i ∧ b=&i ∧ j+j≥10
a=&i ∧ b≠&i ∧ j+*b<10a≠&i ∧ b=&i ∧ *a+j<10a=&i ∧ b=&i ∧ j+j<10
×
Deriving a suitable predicate
*a+*b<10ASSGN(i=j)
a≠&i ∧ b≠&i ∧ *a+*b≥10 a≠&i ∧ b≠&i ∧ *a+*b<10a=&i ∧ b≠&i ∧ j+*b≥10a≠&i ∧ b=&i ∧ *a+j≥10a=&i ∧ b=&i ∧ j+j≥10
a=&i ∧ b≠&i ∧ j+*b<10a≠&i ∧ b=&i ∧ *a+j<10a=&i ∧ b=&i ∧ j+j<10
×
Refining with suitable predicate WPα
*a+*b<10ASSGN(i=j)
a=&i ∧ b≠&i ∧ j+*b≥10 a ≠ &i ∨ b=&i ∨ j+*b<10×
- No global alias analysis required!- WP α stronger than WP and weaker than SP!
WPα:Template-based refinement
Theorem: WPα(Sk, op) is a suitable predicate for template-based refinement
No theorem prover calls!Sk-1
Sk
×
Sk-2 T×
opSk-
1∧¬ρ Sk-1∧ρ
Sk
×
Sk-2 T×
opsuitable predicat
e
Example
p = p1
p2 = malloc();p2->lock = 0
p1 = malloc();p1->lock = 0
Aliasing Example
assume(p1->lock =1 p2->lock=1)
01
234
p->lock = 1
65
7
assume(!(p1->lock =1 p2->lock=1))
p = p2
Aliasing Example01
234
65
7
×
×
×
×
×
×
×
frontier
ρ = WPα = (p1->lock=1 p2->lock=1)
Aliasing Example01
23: ¬ρ
465
7
×
×
×
×
×
×
×
3: ρ
frontier
= WPα = ¬((p≠p1 p≠p2) ¬(p1->lock=1 p2->lock=1))
2 :2:¬
Aliasing Example01
3: ¬ρ4
65
7
×
×
×
×
×
×
×
3: ρ
2 :2:¬
Aliasing Example - Proof0
1: ¬μ
3: ¬ρ4
65
7
×
×
×
×
×
×
×
3: ρ
1: μ
Generalized Example
What about procedures?
Key ideaPerform a recursive Dash query on the called procedure and usethe result to either generate a test or compute WPα
Sk-1
Sk
×
Sk-2 T×
CALL(foo(i,j)) frontier
Interprocedural analysis
Sk-1
Sk
×
Sk-2 T×
CALL(foo(i,j)) frontier
Interprocedural analysis
Sk-1
Sk
×
Sk-2 T×
CALL(foo(i,j)) Dash[assume(φ1), foo(i, j), assert(¬φ2)]- pass: perform refinement- fail: generate test
Soundness and Complexity
• Theorem. If Dash terminates on (P,φ), then either of the following is true:– If Dash returns (“pass”, Σ≃), then Σ≃ is a proof
that P cannot reach ¬φ– If Dash returns (“fail”, t), then t certifies that P
reaches ¬φ• Theorem. The complexity of Dash is
precisely one theorem-prover call per iteration
Soundness and Complexity
• Theorem. If Dash terminates on (P,φ), then either of the following is true:– If Dash returns (“pass”, Σ≃), then Σ≃ is a proof
that P cannot reach ¬φ– If Dash returns (“fail”, t), then t certifies that P
reaches ¬φ• Theorem. Proofs at the same complexity
as testing!
Empirical Evaluation
Current StatusYogi works on 904 (driver, property) pairs!
31 properties on which Yogi terminates and SLAM “times/spaces out”
Acknowledgments
• Tom Ball• Nikolaj Bjorner• Leonardo de Moura• Patrice Godefroid• Akash Lal• Jim Larus• Rustan Leino• Kanika Nema• G. Ramalingam• Sai Tetali• Aditya Thakur
Rigorous Software EngineeringMicrosoft Research India
http://research.microsoft.com/research/rse