The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

The Evolution ofSymbolic Model Checking

Ken McMillan

Cadence Berkeley Labs

What is Symbolic Model Checking?

• Trading in one difficulty...– The state explosion problem

• ...for another difficulty.– PSPACE completeness of QBF

Theoreticians are nonplused...

Another view• Abstract interpretation

– Compute an approximation of the collecting semantics as a fixed point

• Symbolic model checking– Compute the exact collecting semantics as a fixed point, using some

compact representation

As a working definition, we’ll say SMC is computingexact fixed points with compact representations.

Example: CTL model checking• Fixed point characterization of operators

AG p = Q. p Æ (AX Q)AF p = Q. p Ç (AX Q)EF p = Q. p Ç (EX Q)EG p = Q. p Æ (EX Q)

• Image operators

EX p = V. 9 V’. (p(V’) Æ T(V,V’))AX p = V. 8 V’. (T(V,V’) ) p(V’))

Trick is to reduce these QBF expressions tosome compact normal form (hopeless, but interesting...)

The magic of BDD’s• BDD’s provide the following desirata for SMC:

– Efficient Boolean operations

– Quantifier elimination (exponential, but efficient in practice)

– Efficient reduction to canonical form

Note, canonical form is useful, but not necessary for detectingfixed points. Main advantage is that it prevents explosion ofthe representation as we iterate.

• BDD’s exploit low mutual information of components

Component A

Component B

cut width determined by mutual information

Application domain• Symbolic model checking with BDD’s is appropriate when

– State space is dense, or

– Branching factor is high

• ...otherwise explicit state tends to be more efficient• Hardware model checking is in the “sweet spot”

– Very fine grain parallelism dense state space

– Branching factor exponential in number of inputs

• Protocol verification is a poor application– Few states reachable

– Branching factor linear in number of processes (interleaving)

SMC as Paradigm• The idea of computing fixed points with compact representations, and its

embodiment as BDD-based SMC have several qualities of Kuhn’s notion of a paradigm:– Responds to a crisis

• the state explosion problem

– Solves some specific problem previously unsolvable

• say, verify the Encore Gigamax cache protocols

– Shows potential to solve many more problems

• advantage of magic – can’t prove it doesn’t work

– Leaves many more problems unsolved than solved

• leaves room for future research

Let’s follow the history of the elaboration of this paradigm

Development of a paradigm

• Finally, a research paradigm becomes just a tool in the toolbox

BDD-basedSMC

Better BDD algorithms

More compact forms

Other applications

Abstraction methods, etc.

Image computation

• An intractable problem spins off many intractable subproblems.– Main problem: transition relation cannot be expressed as one BDD

• Approaches to image computation– Coudert and Madre

• Use vector function representation for transition relation

• Use constrain operation to reduce to range computation

• Case splitting strategies (over range, domain)

– Burch and Long

• Leave transition relation as implicit conjunction or disjunction

• Early quantification – push quantifiers inside Æ and Ç

• Many optimization approaches

Each of these creates its own intractable subproblems

Quantification scheduling• Push quantifiers inside as we build the conjunction

– Try to minimize number of intermediate variables

9 V. (P Æ T1 Æ T2 ... Æ Tn)US Patent Nr. 6,131,078

9 V. (9 v1(P Æ T1) Æ T2 ... Æ Tn)

This basic idea spins off interesting optimization problems

Optimizing Quantification

• This by itself is a hard optimization problem– Many heuristic approaches (greedy, simulated annealing, etc.)

• Cuts at gate level also possible– How to decompose? Fine grain or coarse grain?

• Case splitting can improve cut width

Find a series of cutsminimizing communication

These problems were never fully solved, but a consensus approach developed.

Variable ordering• Optimizing BDD variable order also an intractable problem

– Structural methods (many authors)

• Many variations, good for circuits, but not very effective for SMC

– Hill-climbing method (Rudell)

• Many variations (windows, etc.) – very time consuming

• Very tricky space/time tradeoffs

– Optimal methods seem out of reach

• Search ordering– BFS often leads to large intermediate BDD’s

– Many heursistic strategies possible

Again, we never solved these problems, just declared victory.

The standard approaches to variable ordering and image computationbring us essentially to the current state-of-the-art.

More compact representations• BDD’s were first, but no reason to think only representation

– Some simple structures are hard for BDD’s (e.g., pointers)

– Decision diagrams provide a nice paradigm

• Result: bewildering array of decision diagrams (*DD’s)– Different node interpretations (ZDD’s, Kronecker DD’s, etc)

– Decompositions

• Conjunctive, disjunctive, disjoint, etc...

– Representations base on mimimal automata

• Tree BDD’s, cube sets, word automata, etc.

Many of these can be shown to be more compact than BDD’s forsome motivating class of examples. Does this mean they areuseful for SMC?

BDD’s and DFA’s• BDD is, approximately, a minimal DFA over fixed-length words

0 1 0 1 0

v1 v2 v3 v4 v5

• Tree BDD – extend by analogy to tree automata, for fixed trees

0

1

1

0

1 0

v1

v2

v3

v4

v5 v6

Extending the analogy• BMD’s – word encodes a monomial term

0 1 0 1 0

v1 v2 v3 v4 v5= v2v4

Useful for binary arithmetic, though limited use for SMC

• Encode cubes with words– ZDD representation of prime implicants

• Extension to unbounded/infinite words and trees– Regualar model checking, QDD’s, etc.

– Breaks paradigm – requires acceleration or widening

The right analogy often provides novel generalizations,as well as a unifying view.

Space/Time tradeoffs• More compact representations typically require

– Greater overhead to reduce to canonical form

– Greater difficulty in optimizing representation parameters (e.g., order)

• BDD’s seem to be a “sweet spot”– Substantial space reduction

– Fast reduction to canonical form

– Moderate cost to find a good (not best) variable order

For this reason, surprisingly, BDD’s have remained the representation ofchoice for finite-state SMC over nearly two decades.

Beyond Decision Diagrams• Quantifier elimination using SAT solvers

• Iterative approach– Find a set of satisfying assignments to free variables– Block that set– Repeat until unsatisfiable

• Different possible representations– Cubes– Circuit cofactors

This approach avoids the difficulty of large intermediate BDD’s.For technical reasons, works well only for reverse image.

9 V’. (P’ Æ T1 Æ T2 ... Æ Tn)

Note low correlationbetween the two methods.

SAT based method maybe a good alternativewhen BDD’s fail.

0.01

0.1

1

10

100

1000

10000

0.01 0.1 1 10 100 1000 10000

Run time of BDD-based method (s)

Ru

n t

ime

of

SA

T-b

as

ed

me

tho

d (

s)

Comparison with BDD’s

This is typical of algorithms for intractable problems. It opens upanother class of research problems – how to efficient combine methodswhen no one method dominates.

New applications• Timed automata

– KRONOS, COSPAN, UPPAAL

• Matrix problems– Probabilistic verification [PRISM]

– Worst-case power estimation

• Parameterized/infinite state systems– Regular model checking -- Real/rational variables

– QDD’s

– Invisible invariants

Each new application of the paradigm opens many new research problems.How are we to apply the paradigm in any given case?

DD’s and timed automata• There are many ways in which DD’s might be applied to timed

automata:

binarytimervalue

0 1 0 1 1 0 1 0

DBM’s at leaves

0 1 0 1 1 0 1 0

ti· tj

According to Kuhn, much of “normal science” is devoted tosuch “puzzles”: how to apply a paradigm in a given situation.

Abstraction• Crisis: SMC approach fails to scale to large designs• Major critique of symbolic model checking

– Computing exact fixed points is too “eager” for many applications

– Weaker approximations may be sufficient to prove property

• May still be appropriate to compute exact fixed points in abstract models chosen in advance– localization abstraction

– predicate abstraction

– compositional approaches

CEGAR loop

Model checkabstraction T#

Choose initial T#

Can extend Cexfrom T# to T?

Refine T#

true, done

Cex

yes, Cex

no

SMC

SMC is typically used in the CEGAR loop, but we no longer view finding the right symbolic representation as the key to scalability.

Interpolants

• Interpolant-based model checking is SMC, but breaks paradigm– No canonical or reduced representation

– No exact fixed point computation

• Answers the critique that exact image computation is too strong– Here we see the paradigm breaking down in response to a crisis

P FT T T T T T T

A B

t=0 t=k

A'

End of a paradigm?• New research in symbolic model checking continues, but most breaks

the paradigm in some way– SAT-based image computations

– Interpolation

– Assume/guarantee via machine learning

– Infinite-state/probabilistic/etc.

• SMC is primarily a tool now– Most hardware verification tools apply it in some form

– Software model checkers use it

• SLAM, BLAST, SATABS, FSOFT

– Variety of other tools

• KRONOS, PRISM, TLV

Ballistic trajectory

0

500

1000

1500

2000

2500

1999 2000 2001 2002 2003 2004 2005

CadenceSMV

Downloads

Development ended

The new paradigms• BMC, clearly

– responds to a crisis, solves some problems, leaves many open!

• CEGAR• Hybridization

Perhaps the most important paradigm in model checking todayis the combination of tools from many disciplines. Few toolstoday apply just one algorithm or technique. SMC has becomeprimarily a component in more complex hybrid schemes.

Persistent ideas• Early quantification• Canonical acceptors

– Mona

• Conditional independence

Lazy abstraction and interpolants• Lazy abstraction [Henzinger et al., 02]

– Refines predicate abstraction locally, as needed

– Avoids "big loop" in CEGAR

– Avoids computing unnecessary state information

• Interpolation-based model checking [McMillan, 03]

– Avoids expense of image computation

– Derives image approximations from refutations of bounded unfoldings

In this talk, we will see how to use interpolants as an alternativeto predicate abstraction in the lazy abstraction paradigm forsoftwrae model checking.

This avoids the expense of image computation in predicateabstraction, resulting in a large performance improvement.

An example

do{ lock(); old = new; if(*){ unlock; new++; }} while (new != old);

program fragment

L=0

L=1; old=new

[L!=0]

L=0; new++

[new==old]

[new!=old]

control-flow graph

1

L=0

T2

[L!=0]T

Unwinding the CFG

L=0

L=1; old=new

[L!=0]

L=0; new++

[new==old]

[new!=old]

control-flow graph

0T

F L=0

Label error state with false, by refining labels on path

6[L!=0]T

5

[new!=old]

T

4

L=0; new++

T

3

L=1;old=new

T

Unwinding the CFG

L=0

L=1; old=new

[L!=0]

L=0; new++

[new==old]

[new!=old]

control-flow graph

0

12

L=0

[L!=0]F L=0

F L=0

L=0

T

Cutoff: state 5 is subsumed bystate 1.

T

11[L!=0]

T

10

[new!=old]

T

8

T

Unwinding the CFG

L=0

L=1; old=new

[L!=0]

L=0; new++

[new==old]

[new!=old]

control-flow graph

0

12

3

4

5

L=0

L=1;old=new

[L!=0]

L=0; new++

[new!=old]

F L=0

6[L!=0]F L=0

L=0

7

[new==old]

T

old=new

F

old=new

F

T

Another cutoff. Unwinding is now complete.

9T

Comparisons• Compared to CEGAR...

– Refinements are local

– Do not restart model checking after each refinement

– More refinements required

• Compared to lazy predicate abstraction [Henzinger et al. 02]...– Extremely lazy.

– Does not require predicate image or "post" computation

• avoid exponential number of decision procedure calls

• avoid additional refinement of image approximation

• Compared to interpolation-based model checking [McMillan 03]...– Exploits sequential control-flow structure

– Prover is not applied to full program unwinding.

Interpolation Lemma• Notation: L() is the set of FO formulas over the symbols of • If A B = false, there exists an interpolant A' for (A,B) such that:

A A'A' B = falseA' 2 L(A) Å L(B)

• Example: – A = p q, B = q r, A' = q

• Interpolants from proofs– in certain quantifier-free theories, we can obtain an interpolant for a

pair A,B from a refutation in linear time. [McMillan 05]

– in particular, we can have linear arithmetic,uninterpreted functions, and restricted use of arrays

(Craig,57)

Interpolants for sequences• Let A1...An be a sequence of formulas

• A sequence A’0...A’n is an interpolant for A1...An when

– A’0 = True

– A’i-1 Æ Ai ) A’i, for i = 1..n

– An = False

– and finally, A’i 2 L (A1...Ai) Å L(Ai+1...An)

A1 A2 A3 Ak...

A'1 A'2 A'3 A'k-1...True False) ) ) )

In other words, the interpolant is a structured

refutation of A1...An

Path refinements are interpolants

x=i,y=j

[x!=0]x--, y--

[x==0][i==j][y!=0]

L1= 0

L2=1new1=old0

new1old0

False

True

True

new1=old0

))

)

1. Each formula implies the next

2. Each is over common symbols of prefix and suffix

3. Begins with true, ends with false

Path refinement procedure

SSAsequence Prover

Interpolation

PathRefinement

proof structuredproof

Unwinding the CFG• An unwinding is a tree with an embedding in the CFG

L=0

L=1; old=new

[L!=0]

L=0; new++

[new==old]

[new!=old]

8

0

12

3

4

L=0

L=1;old=new

[L!=0]

L=0; new++

Mv

Me

Expansion• Every non-leaf vertex of the unwinding must be fully expanded...

L=00

1

L=0

Mv

Me

If this is not a leaf...

...and this exists... ...then this exists.

...but we allow unexpanded leaves (i.e., we are building afinite prefix of the infinite unwinding)

Labeled unwinding• A labeled unwinding is equiped with...

– a lableing function : V ! L(S)

– a covering relation B µ V £ V

0

12

3

4

5

L=0

L=1;old=new

[L!=0]

L=0; new++

[new!=old]

6[L!=0]

7

[new==old]

T

F L=0

F L=0

L=0

T

T

These two nodes are covered.

(have a ancestor at the tail of a covering arc)

...

...

Well-labeled unwinding• An unwinding is well-labeled when...

– () = True

– every edge is a valid Hoare triple

– if x B y then y not covered

0

12

3

4

5

L=0

L=1;old=new

[L!=0]

L=0; new++

[new!=old]

6[L!=0]

7

[new==old]

T

F L=0

F L=0

L=0

T

T

Safe and complete• An unwinding is

– safe if every error vertex is labeled False– complete if every nonterminal leaf is covered

T

10[L!=0]

T

9

[new!=old]

T

8

T

0

12

3

4

5

L=0

L=1;old=new

[L!=0]

L=0; new++

[new!=old]

F L=0

6[L!=0]F L=0

L=0

7

[new==old]

T

old=new

F

old=new

F

T

... ...

Theorem: A CFG with a safe complete unwinding is safe.

9T

Unwinding steps• Three basic operations:

– Expand a nonterminal leaf

– Cover: add a covering arc

– Refine: strengthen labels along a path so error vertex labeled False

Covering step• If (x) ) (y)...

– add covering arc x B y

– remove all z B w for w descendant of y

x· y x=y

X

We restict covers to be descending in a suitable total order on vertices.This prevents covering from diverging.

Refinement step• Label an error vertex False by refining the path to that vertex with an interpolant for that path.

• By refining with interpolants, we avoid predicate image computation.

T

T

TT

T

T

T

x = 0

[x=y] [xy]

y++

[y=0]

y=2

x=0

y=0

y0

F

X

Refinement may remove cutoffs

Forced cutoff• Try to refine a sub-path to force a cutoff

– show that path from nearest common ancestor of x,y proves (x) at y

T

T

TT

T

T

T

x = 0

[x=y] [xy]

y++

[y=0]

y=2

x=0

y=0

y0

F

refine this path

y0

Forced cutoffs allow us to efficiently handle nested control structure

Overall algorithm1. Do as much covering as possible

2. If a leaf can't be covered, try forced covering

3. If the leaf still can't be covered, expand it

4. Label all error states False by refining with an interpolant

5. Continue until unwinding is safe and complete

Experiments• Windows decive driver benchmarks from BLAST benchmark suite

– programs flattened to "simple goto programs"

• Compare performance against BLAST, a lazy predicate abstraction tool

name source

LOC

SGP

LOC

BLAST

(s)

IMPACT

(s)

BLAST

IMPACT

kbfiltr 12K 2.3K 26.3 3.15 8.3

diskperf 14K 3.9K 102 20.0 5.1

cdaudio 44K 6.3K 310 19.1 16.2

floppy 18K 8.7K 455 17.8 25.6

parclass 138K 8.8K 5511 26.2 210

parport 61K 13K 8084 37.1 224

Almost all BLAST time spent in predicate image operation.

The Saga Continues• After these results, Ranjit Jhala modified BLAST

– vertices inherit predicates from their parents, reducing refinements– fewer refinements allows more predicate localization

• Impact also made more eager, using some static analysis

name source

LOC

SGP

LOC

BLAST

(s)

IMPACT

(s)

BLAST

IMPACT

kbfiltr 12K 2.3K 11.9 0.35 34

diskperf 14K 3.9K 117 2.37 49

cdaudio 44K 6.3K 202 1.51 134

floppy 18K 8.7K 164 4.09 41

parclass 138K 8.8K 463 3.84 121

parport 61K 13K 324 6.47 50

Conclusions• Caveats

– Comparing different implementations is dangerous

– More and better software model checking benchmarks are needed

• Tentative conclusion– For control-dominated codes, predicate abstraction is too "eager"

• By lazy abstraction with interpolants, we can– Avoid the expense of the abstract "post" operator

– Avoid re-running the model checker with each refinement

– Avoid applying decision procedure to full program unwindings

• Result is an efficient procedure for checking control-dominated software– Three orders of magnitude speedup in lazy model checking in 6 months!

Future work• Procedure summaries

– Many similar subgraphs in unwinding due to procedure expansions

– Cannot handle recursion

– Can we use interpolants to compute approximate procedure summaries?

• Quantified interpolants– Can be used to generate program invariants with quantifiers

– Works for simple examples, but need to prevent number of quantifiers from increasing without bound

• Richer theories– In this work, all program variables modeled by integers

– Need an interpolating prover for bit vector theory

• Concurrency...

The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.

Documents

Transcript of The Evolution of Symbolic Model Checking Ken McMillan Cadence Berkeley Labs.