From Under-approximations to Over-approximations and Back

From Under-approximations to Over-approximations and Back

Complementary materialBy Yuri Meshman

[email protected]

ExampleFoo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;

else5. x = i;6. i = i + 1;7. If (x < 0)8. ERROR9. return;

Assume we have the following code example.In this case, the ERROR label is not reachable, and we want to prove that with predicate abstraction.

First step: we want to know what are all the reachable locations.

ARG DefinitonFoo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


We want to build an abstract reachability graph for it.ARG:

v2

v3

v6

v4

v1

v5

v2’

v3' v7

v9v8


1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


We want to build an abstract reachability graph for it.ARG: where

v2

v3

v6

v4

v1

v5

v2’

v3' v7

v9v8

In the graph example maps to the control reaching line i of code. Apostrophes are used to distinguish different nodes mapped to the same revisited line (e.g. , ).

– is a map from nodes to control locations (several nodes can map to the same pc)


1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;



v2

v3

v6

v4

v1

v5

v2’

v3' v7

v9v8

– is a map from edges (E) to actions (instructions) of the program

In the graph example, )=“i=0,x=0;”


1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;



v2

v3

v6

v4

v1

v5

v2’

v3' v7

v9v8

– is a map from nodes (V) to formulas over program variables.

In the graph exampleoption1 : all true – represents reachable locations.

{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}{𝑡𝑟𝑢𝑒}{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒} {𝑡𝑟𝑢𝑒}


1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;



v2

v3

v6

v4

v1

v5

v2’

v3' v7

v9v8

– is a map from nodes (V) to formulas over program variables.

In the graph exampleoption2: general formulas over variables – abstracts variables values reaching this location.

{𝑡𝑟𝑢𝑒}

{𝑥 ≥0 }

{𝑥 ≥0 }

{𝑥 ≥0∧𝑖≥0 }{𝑥 ≥0 }

{𝑥 ≥0 }

{𝑥 ≥0 }{𝑥 ≥0 }{𝑡𝑟𝑢𝑒}

{ 𝑓𝑎𝑙𝑠𝑒 } {𝑡𝑟𝑢𝑒}


1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;



v2

v3

v6

v4

v1

v5

v2’

v3' v7

v9v8

– an ancestor relation over the nodes

In the graph example is covered by if:1. , 2. is dominated by (all paths

from pass through it)3. same code line4. – the label for is subsumed by

label.

{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}


{𝑡𝑟𝑢𝑒}



Used to define fixed point, and covered vertexes. If is covered by , we don’t need

to explore more iterations of the loop.


1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;



v2

v3

v6

v4

v1

v5

v2’

v3' v7

v9v8

– a fixed linearization of the topological order.Gives us the order by which to traverse the graph.

In the graph example (one option)2.

{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}


{𝑡𝑟𝑢𝑒}



Post operator in abstract interpretation:

Post operator:Given:- An abstract state u- An operation (instruction from code)- An abstraction level (such as set of predicates)Returns: The successor state abstraction.

Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;



Definition:Post(u,v)= such that:

Where is the abstraction of state . is the instruction from code and its interpretation under the abstraction

Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;




ExampleAssume you have predicates P1:(i<n) P2:(i<=n)You want to know their values after “i=i+1” (P1`,P2`)on an abstract edge (u,v)

If only P1 was true before “i=i+1” we don’t know P1`.-But we know that P2` will be true.-If P1 was False that will mean i>=n held before “i=i+1” which will mean P1 and P2 will be false after it.-And so on..

Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;




ExampleAssume you have predicates P1:(i<n) P2:(i<=n)You want to know their values after “i=i+1” (P1`,P2`) on an abstract edge (u,v)

P1’= if P1 then Felse unknown

- P2’= if P1 then Telse if P1 then F

- else unknown

Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;



Post operator run example

Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


v2

v1{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}

P1:(i<n) P2:(i<=n)

The transition from v1 to v2 doesn’t change the predicatesPost(v1,v2)=true

Post operator Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


v2

v3


{𝑡𝑟𝑢𝑒}

{𝑖<𝑛∧𝑖≤𝑛}

P1:(i<n) P2:(i<=n)

The transition from v2 to v3 sets both predicates to true Post(v2,v3)=P1P2


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


v2

v3

v4

v1

v5

{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}


{𝑖<𝑛∧𝑖≤𝑛}{𝑖<𝑛∧𝑖≤𝑛}

…

P1:(i<n) P2:(i<=n)

The transition from v3 to v4 or from v3 to v5 doesn’t change the predicates


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


v2

v3

v6

v4

v1

v5

{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}




P1:(i<n) P2:(i<=n)

The transition from v3 to v4 or v5 doesn’t change the predicatesAnd so does the transition from v4 to v6 or from v5 to v6.So their join is the same.


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


v2

v3

v6

v4

v1

v5

v2’

{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}




{𝑖≤𝑛}

…

P1:(i<n) P2:(i<=n)

The transition from v6 to v2’ is as previously discussed

P1’= if P1 then Felse unknown

- P2’= if P1 then Telse if (P1 P2) then F

- else unknown

Under approximation driven verification:


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


For UD – Post operator will always return true.And we will see refinement, using interpolants.


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


An initial node is created and given the label true. has a single successor which we will continue to explore.v1{𝑡𝑟𝑢𝑒}

v2


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


has a single successor and as previously mentioned, the Postoperator will return true. has two possible successors, we will continue to explore for now

v2


{𝑡𝑟𝑢𝑒}

v3

v7


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


. And in that fashion, the exploration will continue until finishing the loopiteration and reaching the beginning of the loop a second time – a node .

has two sons, – which indicates a second iteration of the loop and – which indicates exiting the loop after one iteration or more.

v2

v3

v1

v7

{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}

v6

v4 v5

v2’


{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}v3'


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


-s label is subsumed by the one of meaning the exploration of will not provide new information, and itslabel will be the same as the one of This is indicated by the black arrowfrom to .v2

v3

v1

v7

{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}

v6

v4 v5

v2’


{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}v3'{𝑡𝑟𝑢𝑒}


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


After finishing exploring all the paths,the label of the error node is not false. So we want to check:1. if there is a concrete counter part to

the 2 paths .2. if not reachable, use interpolants

to find new labels that capture why those paths are not reachable.

We describe next, how this Counter Example Guided Abstraction Refinement (CEGAR) phase is done.

v9v8{𝑡𝑟𝑢𝑒} {𝑡𝑟𝑢𝑒}

v2

v3

v1

v7

{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}

v6

v4 v5

v2’


{𝑡𝑟𝑢𝑒}

{𝑡𝑟𝑢𝑒}v3'{𝑡𝑟𝑢𝑒} {𝑡𝑟𝑢𝑒}

Building a formula for CEGAR

Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


We ignore all nodes and edges irrelevant to the abstract path to err.And, we add a boolean variable to each node -- for convenience it will be the name of the node.

Intuitively, if are all true then this path will be feasible under concrete execution.

Next, we add formulas for edges.Similar to the way it would have been done for Bounded Model Checking.

v8

v2

v3

v1

v7

v6

v4 v5

v2’


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


We use Static Single Assignment (SSA)Form.

Definition:A program is in SSA form if an assignment to each variable appears at most once in its syntax.

Therefore we rename variables forwhich assignments appear more then once.““ will be at lines 1—3 will become at line 4 at line 5 etc.

v8

v2

v3

v1

v7

v6

v4 v5

v2’


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


“6.i = i + 1;” will translate to a formula on the edge :

We use the path formulas to capture error execution in the ARG:)

Meaning if is reached then will be taken and will be reached.To avoid name conflicts each time a variable appears on left side of an assignment it receives a new subscript(this is SSA).Such as for.

v8

v2

v3

v1

v7

v6

v4 v5

v2’


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


For the graph example we will receive:)))))2’)))

The formula is UNSAT

v8

v2

v3

v1

v7

v6

v4 v5

v2’

Solving the formula for CEGAR

Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


DefinitionAn interpolant for ) is such that:1. 2. 3. is over the intersection of the variables of and .


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


DefinitionAn interpolant for ) is such that:1. 2. 3. is over the intersection of the variables of and .

Note: In the following slides links appear to implementation of the formulas in iz3 (for interpolants)and z3 (for general formulas).Pressing the links opens the online z3 or iz3 tool, and pressing play at the opened site should calculate the solutions.


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


We have: is UNSAT

To derive a new label for we cancalculate an interpolant for and

We get:

v8

v2

v3

v1

v7

v6

v4 v5

v2’

B

A

𝐼 7

An interpolant for ) is such that:1. 2. 3. is over the intersection of the variables of and .


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


To derive a new label for we can calculate an interpolant for and

http://rise4fun.com/iZ3/5bIn that case we will receive:(after transforming to nnf )

Informally it means that either execution reaches with or it reaches with .

The resulting formula needs cleaning to get a label for

v8

v2

v3

v1

v7

v6

v4 v5

v2’

B

A

𝐼 2 ′

http://rise4fun.com/iZ3/5b

http://rise4fun.com/iZ3/5b

http://rise4fun.com/Z3/ngzO

http://rise4fun.com/Z3/ngzO

Cleaning the formula of CEGAR

Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


We want to extract for the label . Why ?

v8

v2

v3

v1

v7

v6

v4 v5

v2’

B

A

𝐼 2 ′


We want to extract for the label . Why x3?

B

)))))2’))) is relevant for

is relevant for 7

is relevant for is relevant for

If we return to the equations we got interpolants from


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


We want to extract for the label . To do so: we will quantify all the variables out of scope - in this case ;and quantify all node-variables other then - in this case .To remove the variable we set it to true.http://rise4fun.com/Z3/d8kmAnd so we receive . (actually )

v8

v2

v3

v1

v7

v6

v4 v5

v2’

B

A

𝐼 2 ′

http://rise4fun.com/Z3/d8km

http://rise4fun.com/Z3/d8km


Where is the set of variables and the boolean variable we added. (both were so far)𝐶𝐿𝐸𝐴𝑁 ( 𝐼𝑖 )≜∀ {𝑥∣ 𝑥∈𝑣𝑎𝑟 ( 𝐼𝑖 )∧¬𝑖𝑛𝑆𝑐𝑜𝑝𝑒 (𝑥 ,𝑢𝑖 ) )⋅ ∀ {𝑐𝑢𝑖∨𝑢 𝑗∈𝑉 }⋅ 𝐼𝑖 [𝑐𝑢 𝑖

←𝑇 ]


Where is the set of variables and the boolean variable we added. (both were so far) means variables relevant to that node.𝐶𝐿𝐸𝐴𝑁 ( 𝐼𝑖 )≜∀ {𝑥∣ 𝑥∈𝑣𝑎𝑟 ( 𝐼𝑖 )∧¬𝑖𝑛𝑆𝑐𝑜𝑝𝑒 (𝑥 ,𝑢𝑖 ) )⋅ ∀ {𝑐𝑢𝑖∨𝑢 𝑗∈𝑉 }⋅ 𝐼𝑖 [𝑐𝑢 𝑖

←𝑇 ]


Where is the set of variables and the boolean variable we added. (both were so far) means variables relevant to that node.Why is it quantified for things we want to disappear?

𝐶𝐿𝐸𝐴𝑁 ( 𝐼𝑖 )≜∀ {𝑥∣ 𝑥∈𝑣𝑎𝑟 ( 𝐼𝑖 )∧¬𝑖𝑛𝑆𝑐𝑜𝑝𝑒 (𝑥 ,𝑢𝑖 ) )⋅ ∀ {𝑐𝑢𝑖∨𝑢 𝑗∈𝑉 }⋅ 𝐼𝑖 [𝑐𝑢 𝑖←𝑇 ]


Where is the set of variables and the boolean variable we added. (both were so far) means variables relevant to that node.𝐶𝐿𝐸𝐴𝑁 ( 𝐼𝑖 )≜∀ {𝑥∣ 𝑥∈𝑣𝑎𝑟 ( 𝐼𝑖 )∧¬𝑖𝑛𝑆𝑐𝑜𝑝𝑒 (𝑥 ,𝑢𝑖 ) )⋅ ∀ {𝑐𝑢𝑖∨𝑢 𝑗∈𝑉 }⋅ 𝐼𝑖 [𝑐𝑢 𝑖

←𝑇 ]

Why is it quantified for things we want to disappear? For example we did:We wanted the invariant that holds at node regardless of whether was reachable or not. So we search solution both for when (reachable) and when .


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


(from the paper) Let .a. If k=1 then and if k=n then

b. For any two nodes s.t. :

Where is the formula on the edge as shown previously.

Where is the set of variables and the boolean variable we added. (both were so far)

v8

v2

v3

v1

v7

v6

v4 v5

v2’

𝐶𝐿𝐸𝐴𝑁 ( 𝐼𝑖 )≜∀ {𝑥∣ 𝑥∈𝑣𝑎𝑟 ( 𝐼𝑖 )∧¬𝑖𝑛𝑆𝑐𝑜𝑝𝑒 (𝑥 ,𝑢𝑖 ) )⋅ ∀ {𝑐𝑢𝑖∨𝑢 𝑗∈𝑉 }⋅ 𝐼𝑖 [𝑐𝑢 𝑖←𝑇 ]

Back to Under approximation driven verification:

Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


v9v8{ 𝑓𝑎𝑙𝑠𝑒 } {𝑡𝑟𝑢𝑒}

v2

v3

v1

v7

{𝑡𝑟𝑢𝑒}

{𝑥 ≥0 }

{𝑥 ≥0 }

v6

v4 v5

v2’

{𝑥 ≥0∧𝑖≥0 }{𝑥 ≥0 }

{𝑥 ≥0 }

{𝑥 ≥0 }v3'{𝑡𝑟𝑢𝑒} {𝑥 ≥0 }

After cleaning we get a new label per each node.

If the label of is not still subsumed by the label of , we continue to explore and iterations 2,3 etc. of the loop. With Post operator returning true as a label for each new node.

In this case, the label of is still subsumed by the label of so the algorithm terminates.

Over approximation driven verification:


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;



v2

v3

v1

v7

{𝑡𝑟𝑢𝑒}

{𝑥 ≥0 }

{𝑥 ≥0 }

v6

v4 v5

v2’

{𝑥 ≥0∧𝑖≥0 }{𝑥 ≥0 }

{𝑥 ≥0 }

{𝑥 ≥0 }v3'{𝑡𝑟𝑢𝑒} {𝑥 ≥0 }

Assuming we started with operator Post as true, and refinement staged returned as described before.

We take the predicates it used, in this case an recalculate Post operator as described before.


Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


v2


Statement “i=0,x=0;” sets both predicatesto true.

{𝑥 ≥0∧𝑖≥0 }v2

v3


v9v8{ 𝑓𝑎𝑙𝑠𝑒 }

v7

v6

v4 v5

v2’

{𝑥 ≥0∧𝑖≥0 }

v3'

{𝑥 ≥0∧𝑖≥0 }

{𝑥 ≥0∧𝑖≥0 }

{𝑥 ≥0∧𝑖≥0 }

{𝑥 ≥0∧𝑖≥0 }{𝑥 ≥0∧𝑖≥0 } {𝑥 ≥0∧𝑖≥0 }

{𝑥 ≥0∧𝑖≥0 }

And they stay true through the rest of the program.

UFO: Foo(int n):

1. i=0,x=0;2. while (i<n)3. if (i <= 2)4. x = 0;


In this paper the authors start with UD and after CEGAR continue with the new Post operator they get.


v2

v3

v1

v7

{𝑡𝑟𝑢𝑒}

{𝑥 ≥0 }

{𝑥 ≥0 }

v6

v4 v5

v2’

{𝑥 ≥0∧𝑖≥0 }{𝑥 ≥0 }

{𝑥 ≥0 }

{𝑥 ≥0 }v3' {𝑥 ≥0 }

Meaning, if was not still subsumed by the label of they would have continued exploring from with post operator for .

{𝑥≥0∧𝑖≥ 0 }?{𝑡𝑟𝑢𝑒}

Boolean/Cartezian Predicate Abstraction

Boolean Predicate AbstractionGiven predicates we represent them using boolean vectors where .

We will have possible states per each program counter location.

Cartesian Predicate AbstractionWe represent a cross product . At each location we store separately per each predicate if it is . If the predicate can be both we store “”. (Note that is now also part of the state.)

A more compact representation (compared to Boolean) but we loose precision.

{ (𝑇 ,𝑇 ,𝑇 ) , (𝐹 ,𝑇 ,𝑇 ) ,

(𝑇 ,𝐹 ,𝑇 ) }

(∗ ,∗ ,𝑇 )

Results• 105 programs in benchmark• Compared with Wolverine http://www.cprover.org/wolverine/• 5 versions of UFO

1. Pure UD called ufoNo (Post returns true)2. With Cartesian Predicate abstraction called ufoCP3. With Boolean Predicate abstraction called ufoBP4. Pure OD with Cartesian Predicate abstraction called CP5. Pure OD with Boolean Predicate abstraction called BP

• Reports results for instances that should verify (#Safe) number of instances solved.and for instance where an error should be discovered (#Unsafe) number of instances solved.

http://www.cprover.org/wolverine/

http://www.cprover.org/wolverine/

Results

Results• UFO performs much better then Wolverine• cpUFO performs significantly better than all other UFO

configurations.

• In the next slide we go deeper in to results and per example first for #SAFE instances and then for #UNSAFE

• Benchmarks of token ring protocols and SSH servers various hand shaking protocols.

• Fastest time at each line emphasized

Results a closer look (Safe)

Results a closer look (Safe)

• Number of refinements goes down as you go down the predicate abstraction

• CP failed for all but 3 examples so wasn’t included in results.

• No one clear winner in terms of time. Can be seen also from the Unsafe results.

Results a closer look (UnSafe)

From Under-approximations to Over-approximations and Back

Documents

Transcript of From Under-approximations to Over-approximations and Back