The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

50
The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs

Transcript of The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Page 1: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

The Evolution ofSymbolic Model Checking

Ken McMillan

Cadence Berkeley Labs

Page 2: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

What is Symbolic Model Checking?

• Trading in one difficulty...– The state explosion problem

• ...for another difficulty.– PSPACE completeness of QBF

Theoreticians are nonplused...

Page 3: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Another view• Abstract interpretation

– Compute an approximation of the collecting semantics as a fixed point

• Symbolic model checking– Compute the exact collecting semantics as a fixed point, using some

compact representation

As a working definition, we’ll say SMC is computingexact fixed points with compact representations.

Page 4: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Example: CTL model checking• Fixed point characterization of operators

AG p = Q. p Æ (AX Q)AF p = Q. p Ç (AX Q)EF p = Q. p Ç (EX Q)EG p = Q. p Æ (EX Q)

• Image operators

EX p = V. 9 V’. (p(V’) Æ T(V,V’))AX p = V. 8 V’. (T(V,V’) ) p(V’))

Trick is to reduce these QBF expressions tosome compact normal form (hopeless, but interesting...)

Page 5: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

The magic of BDD’s• BDD’s provide the following desirata for SMC:

– Efficient Boolean operations

– Quantifier elimination (exponential, but efficient in practice)

– Efficient reduction to canonical form

Note, canonical form is useful, but not necessary for detectingfixed points. Main advantage is that it prevents explosion ofthe representation as we iterate.

• BDD’s exploit low mutual information of components

Component A

Component B

cut width determined by mutual information

Page 6: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Application domain• Symbolic model checking with BDD’s is appropriate when

– State space is dense, or

– Branching factor is high

• ...otherwise explicit state tends to be more efficient• Hardware model checking is in the “sweet spot”

– Very fine grain parallelism dense state space

– Branching factor exponential in number of inputs

• Protocol verification is a poor application– Few states reachable

– Branching factor linear in number of processes (interleaving)

Page 7: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

SMC as Paradigm• The idea of computing fixed points with compact representations, and its

embodiment as BDD-based SMC have several qualities of Kuhn’s notion of a paradigm:– Responds to a crisis

• the state explosion problem

– Solves some specific problem previously unsolvable

• say, verify the Encore Gigamax cache protocols

– Shows potential to solve many more problems

• advantage of magic – can’t prove it doesn’t work

– Leaves many more problems unsolved than solved

• leaves room for future research

Let’s follow the history of the elaboration of this paradigm

Page 8: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Development of a paradigm

• Finally, a research paradigm becomes just a tool in the toolbox

BDD-basedSMC

Better BDD algorithms

More compact forms

Other applications

Abstraction methods, etc.

Page 9: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Image computation

• An intractable problem spins off many intractable subproblems.– Main problem: transition relation cannot be expressed as one BDD

• Approaches to image computation– Coudert and Madre

• Use vector function representation for transition relation

• Use constrain operation to reduce to range computation

• Case splitting strategies (over range, domain)

– Burch and Long

• Leave transition relation as implicit conjunction or disjunction

• Early quantification – push quantifiers inside Æ and Ç

• Many optimization approaches

Each of these creates its own intractable subproblems

Page 10: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Quantification scheduling• Push quantifiers inside as we build the conjunction

– Try to minimize number of intermediate variables

9 V. (P Æ T1 Æ T2 ... Æ Tn)US Patent Nr. 6,131,078

9 V. (9 v1(P Æ T1) Æ T2 ... Æ Tn)

This basic idea spins off interesting optimization problems

Page 11: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Optimizing Quantification

• This by itself is a hard optimization problem– Many heuristic approaches (greedy, simulated annealing, etc.)

• Cuts at gate level also possible– How to decompose? Fine grain or coarse grain?

• Case splitting can improve cut width

Find a series of cutsminimizing communication

These problems were never fully solved, but a consensus approach developed.

Page 12: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Variable ordering• Optimizing BDD variable order also an intractable problem

– Structural methods (many authors)

• Many variations, good for circuits, but not very effective for SMC

– Hill-climbing method (Rudell)

• Many variations (windows, etc.) – very time consuming

• Very tricky space/time tradeoffs

– Optimal methods seem out of reach

• Search ordering– BFS often leads to large intermediate BDD’s

– Many heursistic strategies possible

Again, we never solved these problems, just declared victory.

The standard approaches to variable ordering and image computationbring us essentially to the current state-of-the-art.

Page 13: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

More compact representations• BDD’s were first, but no reason to think only representation

– Some simple structures are hard for BDD’s (e.g., pointers)

– Decision diagrams provide a nice paradigm

• Result: bewildering array of decision diagrams (*DD’s)– Different node interpretations (ZDD’s, Kronecker DD’s, etc)

– Decompositions

• Conjunctive, disjunctive, disjoint, etc...

– Representations base on mimimal automata

• Tree BDD’s, cube sets, word automata, etc.

Many of these can be shown to be more compact than BDD’s forsome motivating class of examples. Does this mean they areuseful for SMC?

Page 14: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

BDD’s and DFA’s• BDD is, approximately, a minimal DFA over fixed-length words

0 1 0 1 0

v1 v2 v3 v4 v5

• Tree BDD – extend by analogy to tree automata, for fixed trees

0

1

1

0

1 0

v1

v2

v3

v4

v5 v6

Page 15: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Extending the analogy• BMD’s – word encodes a monomial term

0 1 0 1 0

v1 v2 v3 v4 v5= v2v4

Useful for binary arithmetic, though limited use for SMC

• Encode cubes with words– ZDD representation of prime implicants

• Extension to unbounded/infinite words and trees– Regualar model checking, QDD’s, etc.

– Breaks paradigm – requires acceleration or widening

The right analogy often provides novel generalizations,as well as a unifying view.

Page 16: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Space/Time tradeoffs• More compact representations typically require

– Greater overhead to reduce to canonical form

– Greater difficulty in optimizing representation parameters (e.g., order)

• BDD’s seem to be a “sweet spot”– Substantial space reduction

– Fast reduction to canonical form

– Moderate cost to find a good (not best) variable order

For this reason, surprisingly, BDD’s have remained the representation ofchoice for finite-state SMC over nearly two decades.

Page 17: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Beyond Decision Diagrams• Quantifier elimination using SAT solvers

• Iterative approach– Find a set of satisfying assignments to free variables– Block that set– Repeat until unsatisfiable

• Different possible representations– Cubes– Circuit cofactors

This approach avoids the difficulty of large intermediate BDD’s.For technical reasons, works well only for reverse image.

9 V’. (P’ Æ T1 Æ T2 ... Æ Tn)

Page 18: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Note low correlationbetween the two methods.

SAT based method maybe a good alternativewhen BDD’s fail.

0.01

0.1

1

10

100

1000

10000

0.01 0.1 1 10 100 1000 10000

Run time of BDD-based method (s)

Ru

n t

ime

of

SA

T-b

as

ed

me

tho

d (

s)

Comparison with BDD’s

This is typical of algorithms for intractable problems. It opens upanother class of research problems – how to efficient combine methodswhen no one method dominates.

Page 19: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

New applications• Timed automata

– KRONOS, COSPAN, UPPAAL

• Matrix problems– Probabilistic verification [PRISM]

– Worst-case power estimation

• Parameterized/infinite state systems– Regular model checking -- Real/rational variables

– QDD’s

– Invisible invariants

Each new application of the paradigm opens many new research problems.How are we to apply the paradigm in any given case?

Page 20: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

DD’s and timed automata• There are many ways in which DD’s might be applied to timed

automata:

binarytimervalue

0 1 0 1 1 0 1 0

DBM’s at leaves

0 1 0 1 1 0 1 0

ti· tj

According to Kuhn, much of “normal science” is devoted tosuch “puzzles”: how to apply a paradigm in a given situation.

Page 21: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Abstraction• Crisis: SMC approach fails to scale to large designs• Major critique of symbolic model checking

– Computing exact fixed points is too “eager” for many applications

– Weaker approximations may be sufficient to prove property

• May still be appropriate to compute exact fixed points in abstract models chosen in advance– localization abstraction

– predicate abstraction

– compositional approaches

Page 22: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

CEGAR loop

Model checkabstraction T#

Choose initial T#

Can extend Cexfrom T# to T?

Refine T#

true, done

Cex

yes, Cex

no

SMC

SMC is typically used in the CEGAR loop, but we no longer view finding the right symbolic representation as the key to scalability.

Page 23: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Interpolants

• Interpolant-based model checking is SMC, but breaks paradigm– No canonical or reduced representation

– No exact fixed point computation

• Answers the critique that exact image computation is too strong– Here we see the paradigm breaking down in response to a crisis

P FT T T T T T T

A B

t=0 t=k

A'

Page 24: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

End of a paradigm?• New research in symbolic model checking continues, but most breaks

the paradigm in some way– SAT-based image computations

– Interpolation

– Assume/guarantee via machine learning

– Infinite-state/probabilistic/etc.

• SMC is primarily a tool now– Most hardware verification tools apply it in some form

– Software model checkers use it

• SLAM, BLAST, SATABS, FSOFT

– Variety of other tools

• KRONOS, PRISM, TLV

Page 25: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Ballistic trajectory

0

500

1000

1500

2000

2500

1999 2000 2001 2002 2003 2004 2005

CadenceSMV

Downloads

Development ended

Page 26: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

The new paradigms• BMC, clearly

– responds to a crisis, solves some problems, leaves many open!

• CEGAR• Hybridization

Perhaps the most important paradigm in model checking todayis the combination of tools from many disciplines. Few toolstoday apply just one algorithm or technique. SMC has becomeprimarily a component in more complex hybrid schemes.

Page 27: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Persistent ideas• Early quantification• Canonical acceptors

– Mona

• Conditional independence

Page 28: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Lazy abstraction and interpolants• Lazy abstraction [Henzinger et al., 02]

– Refines predicate abstraction locally, as needed

– Avoids "big loop" in CEGAR

– Avoids computing unnecessary state information

• Interpolation-based model checking [McMillan, 03]

– Avoids expense of image computation

– Derives image approximations from refutations of bounded unfoldings

In this talk, we will see how to use interpolants as an alternativeto predicate abstraction in the lazy abstraction paradigm forsoftwrae model checking.

This avoids the expense of image computation in predicateabstraction, resulting in a large performance improvement.

Page 29: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

An example

do{ lock(); old = new; if(*){ unlock; new++; }} while (new != old);

program fragment

L=0

L=1; old=new

[L!=0]

L=0; new++

[new==old]

[new!=old]

control-flow graph

Page 30: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

1

L=0

T2

[L!=0]T

Unwinding the CFG

L=0

L=1; old=new

[L!=0]

L=0; new++

[new==old]

[new!=old]

control-flow graph

0T

F L=0

Label error state with false, by refining labels on path

Page 31: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

6[L!=0]T

5

[new!=old]

T

4

L=0; new++

T

3

L=1;old=new

T

Unwinding the CFG

L=0

L=1; old=new

[L!=0]

L=0; new++

[new==old]

[new!=old]

control-flow graph

0

12

L=0

[L!=0]F L=0

F L=0

L=0

T

Cutoff: state 5 is subsumed bystate 1.

Page 32: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

T

11[L!=0]

T

10

[new!=old]

T

8

T

Unwinding the CFG

L=0

L=1; old=new

[L!=0]

L=0; new++

[new==old]

[new!=old]

control-flow graph

0

12

3

4

5

L=0

L=1;old=new

[L!=0]

L=0; new++

[new!=old]

F L=0

6[L!=0]F L=0

L=0

7

[new==old]

T

old=new

F

old=new

F

T

Another cutoff. Unwinding is now complete.

9T

Page 33: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Comparisons• Compared to CEGAR...

– Refinements are local

– Do not restart model checking after each refinement

– More refinements required

• Compared to lazy predicate abstraction [Henzinger et al. 02]...– Extremely lazy.

– Does not require predicate image or "post" computation

• avoid exponential number of decision procedure calls

• avoid additional refinement of image approximation

• Compared to interpolation-based model checking [McMillan 03]...– Exploits sequential control-flow structure

– Prover is not applied to full program unwinding.

Page 34: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Interpolation Lemma• Notation: L() is the set of FO formulas over the symbols of • If A B = false, there exists an interpolant A' for (A,B) such that:

A A'A' B = falseA' 2 L(A) Å L(B)

• Example: – A = p q, B = q r, A' = q

• Interpolants from proofs– in certain quantifier-free theories, we can obtain an interpolant for a

pair A,B from a refutation in linear time. [McMillan 05]

– in particular, we can have linear arithmetic,uninterpreted functions, and restricted use of arrays

(Craig,57)

Page 35: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Interpolants for sequences• Let A1...An be a sequence of formulas

• A sequence A’0...A’n is an interpolant for A1...An when

– A’0 = True

– A’i-1 Æ Ai ) A’i, for i = 1..n

– An = False

– and finally, A’i 2 L (A1...Ai) Å L(Ai+1...An)

A1 A2 A3 Ak...

A'1 A'2 A'3 A'k-1...True False) ) ) )

In other words, the interpolant is a structured

refutation of A1...An

Page 36: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Path refinements are interpolants

x=i,y=j

[x!=0]x--, y--

[x==0][i==j][y!=0]

L1= 0

L2=1new1=old0

new1old0

False

True

True

new1=old0

))

)

1. Each formula implies the next

2. Each is over common symbols of prefix and suffix

3. Begins with true, ends with false

Path refinement procedure

SSAsequence Prover

Interpolation

PathRefinement

proof structuredproof

Page 37: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Unwinding the CFG• An unwinding is a tree with an embedding in the CFG

L=0

L=1; old=new

[L!=0]

L=0; new++

[new==old]

[new!=old]

8

0

12

3

4

L=0

L=1;old=new

[L!=0]

L=0; new++

Mv

Me

Page 38: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Expansion• Every non-leaf vertex of the unwinding must be fully expanded...

L=00

1

L=0

Mv

Me

If this is not a leaf...

...and this exists... ...then this exists.

...but we allow unexpanded leaves (i.e., we are building afinite prefix of the infinite unwinding)

Page 39: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Labeled unwinding• A labeled unwinding is equiped with...

– a lableing function : V ! L(S)

– a covering relation B µ V £ V

0

12

3

4

5

L=0

L=1;old=new

[L!=0]

L=0; new++

[new!=old]

6[L!=0]

7

[new==old]

T

F L=0

F L=0

L=0

T

T

These two nodes are covered.

(have a ancestor at the tail of a covering arc)

...

...

Page 40: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Well-labeled unwinding• An unwinding is well-labeled when...

– () = True

– every edge is a valid Hoare triple

– if x B y then y not covered

0

12

3

4

5

L=0

L=1;old=new

[L!=0]

L=0; new++

[new!=old]

6[L!=0]

7

[new==old]

T

F L=0

F L=0

L=0

T

T

Page 41: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Safe and complete• An unwinding is

– safe if every error vertex is labeled False– complete if every nonterminal leaf is covered

T

10[L!=0]

T

9

[new!=old]

T

8

T

0

12

3

4

5

L=0

L=1;old=new

[L!=0]

L=0; new++

[new!=old]

F L=0

6[L!=0]F L=0

L=0

7

[new==old]

T

old=new

F

old=new

F

T

... ...

Theorem: A CFG with a safe complete unwinding is safe.

9T

Page 42: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Unwinding steps• Three basic operations:

– Expand a nonterminal leaf

– Cover: add a covering arc

– Refine: strengthen labels along a path so error vertex labeled False

Page 43: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Covering step• If (x) ) (y)...

– add covering arc x B y

– remove all z B w for w descendant of y

x· y x=y

X

We restict covers to be descending in a suitable total order on vertices.This prevents covering from diverging.

Page 44: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Refinement step• Label an error vertex False by refining the path to that vertex with an interpolant for that path.

• By refining with interpolants, we avoid predicate image computation.

T

T

TT

T

T

T

x = 0

[x=y] [xy]

y++

[y=0]

y=2

x=0

y=0

y0

F

X

Refinement may remove cutoffs

Page 45: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Forced cutoff• Try to refine a sub-path to force a cutoff

– show that path from nearest common ancestor of x,y proves (x) at y

T

T

TT

T

T

T

x = 0

[x=y] [xy]

y++

[y=0]

y=2

x=0

y=0

y0

F

refine this path

y0

Forced cutoffs allow us to efficiently handle nested control structure

Page 46: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Overall algorithm1. Do as much covering as possible

2. If a leaf can't be covered, try forced covering

3. If the leaf still can't be covered, expand it

4. Label all error states False by refining with an interpolant

5. Continue until unwinding is safe and complete

Page 47: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Experiments• Windows decive driver benchmarks from BLAST benchmark suite

– programs flattened to "simple goto programs"

• Compare performance against BLAST, a lazy predicate abstraction tool

name source

LOC

SGP

LOC

BLAST

(s)

IMPACT

(s)

BLAST

IMPACT

kbfiltr 12K 2.3K 26.3 3.15 8.3

diskperf 14K 3.9K 102 20.0 5.1

cdaudio 44K 6.3K 310 19.1 16.2

floppy 18K 8.7K 455 17.8 25.6

parclass 138K 8.8K 5511 26.2 210

parport 61K 13K 8084 37.1 224

Almost all BLAST time spent in predicate image operation.

Page 48: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

The Saga Continues• After these results, Ranjit Jhala modified BLAST

– vertices inherit predicates from their parents, reducing refinements– fewer refinements allows more predicate localization

• Impact also made more eager, using some static analysis

name source

LOC

SGP

LOC

BLAST

(s)

IMPACT

(s)

BLAST

IMPACT

kbfiltr 12K 2.3K 11.9 0.35 34

diskperf 14K 3.9K 117 2.37 49

cdaudio 44K 6.3K 202 1.51 134

floppy 18K 8.7K 164 4.09 41

parclass 138K 8.8K 463 3.84 121

parport 61K 13K 324 6.47 50

Page 49: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Conclusions• Caveats

– Comparing different implementations is dangerous

– More and better software model checking benchmarks are needed

• Tentative conclusion– For control-dominated codes, predicate abstraction is too "eager"

• By lazy abstraction with interpolants, we can– Avoid the expense of the abstract "post" operator

– Avoid re-running the model checker with each refinement

– Avoid applying decision procedure to full program unwindings

• Result is an efficient procedure for checking control-dominated software– Three orders of magnitude speedup in lazy model checking in 6 months!

Page 50: The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Future work• Procedure summaries

– Many similar subgraphs in unwinding due to procedure expansions

– Cannot handle recursion

– Can we use interpolants to compute approximate procedure summaries?

• Quantified interpolants– Can be used to generate program invariants with quantifiers

– Works for simple examples, but need to prevent number of quantifiers from increasing without bound

• Richer theories– In this work, all program variables modeled by integers

– Need an interpolating prover for bit vector theory

• Concurrency...