Download - Modeling Data in Formal Verification Bits , Bit Vectors, or Words

Modeling Data in Formal Verification

Bits, Bit Vectors, or Words

Karam AbdElkaderBased on: Presentations form

• Randal E. Bryant - Carnegie Mellon University• Decision Procedures An Algorithmic Point of View

D.Kroening – Oxsoford Unversity, O.Strichman - Technion

– 2 –

Agenda

• Overview and Examples.

• Introduction to Bit-Vector Logic

• Syntax

• Semantics

• Decision procedures for Bit-Vector Logic• Flattening Bit-Vector Logic• Incremental Flattening• Bit-Vector Arithmetic With Abstraction

2

– 3 –

Issue How should data be modeled in formal analysis? Verification, test generation, security analysis, …

Approaches Bits: Every bit is represented individually

Basis for most CAD, model checking Words: View each word as arbitrary value

E.g., unbounded integersHistoric program verification work

Bit Vectors: Finite precision words

Captures true semantics of hardware and softwareMore opportunities for abstraction than with bits

Over View

– 4 –

Data Path

Com.Log.

1

Com.Log.

2

Bit-Level Modeling

Represent Every Bit of State Individually Behavior expressed as Boolean next-state over current state Historic method for most CAD, testing, and verification tools

E.g., model checkers

Control Logic

– 5 –

Bit-Level Modeling in Practice

Strengths Allows precise modeling of system Well developed technology

BDDs & SAT for Boolean reasoning

Limitations Every state bit introduces two Boolean variables

Current state & next state Overly detailed modeling of system functions

Don’t want to capture full details of FPU

Making It Work Use extensive abstraction to reduce bit count Hard to abstract functionality

– 6 –

Word-Level Abstraction #1:Bits → Integers

View Data as Symbolic Words Arbitrary integers

No assumptions about size or encodingClassic model for reasoning about software

Can store in memories & registers

x0x1x2

xn-1

x

– 7 –

Data Path

Com.Log.

1

Com.Log.

2

Abstracting Data BitsControl Logic

Data Path

Com.Log.

1

Com.Log.

1? ?

What do we do about logic functions?

– 8 –

Word-Level Abstraction #2:Uninterpreted Functions

For any Block that Transforms or Evaluates Data: Replace with generic, unspecified function Only assumed property is functional consistency:

a = x b = y f (a, b) = f (x, y)

ALUf

– 9 –

Abstracting Functions

For Any Block that Transforms Data: Replace by uninterpreted function Ignore detailed functionality Conservative approximation of actual system

Data Path

Control Logic

Com.Log.

1

Com.Log.

1F1 F2

– 10 –

Word-Level Modeling: History

Historic Used by theorem provers

More Recently Burch & Dill, CAV ’94

Verify that pipelined processor has same behavior as unpipelined reference model

Use word-level abstractions of data paths and memoriesUse decision procedure to determine equivalence

Bryant, Lahiri, Seshia, CAV ’02UCLID verifierTool for describing & verifying systems at word level

– 11 –

Pipeline Verification Example

Reg.File

IF/ID

InstrMem

+4

PC ID/EX

ALU

EX/WB

=

=

RdRa

Rb

Imm

Op

Adat

Control Control

Reg.File

IF/ID

InstrMem

+4

PC ID/EX

ALU

EX/WB

=

=

RdRa

Rb

Imm

Op

Adat

Control Control

Reg.File

InstrMem

+4

ALU

RdRa

Rb

Imm

Op

Adat

Control

Bdat

Reg.File

InstrMem

+4

ALU

RdRa

Rb

Imm

Op

Adat

Control

Bdat

Pipelined Processor

Reference Model

– 12 –

Abstracted Pipeline Verification

Pipelined Processor

Reference Model

Reg.File

IF/ID

InstrMem

+4

PC ID/EX

ALU

EX/WB

=

=

RdRa

Rb

Imm

Op

Adat

Control Control

Reg.File

IF/ID

InstrMem

+4

PC ID/EX

ALU

EX/WB

=

=

RdRa

Rb

Imm

Op

Adat

Control Control

F1

F2

F3

Reg.File

InstrMem

+4

ALU

RdRa

Rb

Imm

Op

Adat

Control

Bdat

Reg.File

InstrMem

+4

ALU

RdRa

Rb

Imm

Op

Adat

Control

Bdat

PC

F1

F2

F3

F1

F2

F3

– 13 –

Experience with Word-Level Modeling

Powerful Abstraction Tool Allows focus on control of large-scale system Can model systems with very large memories

Hard to Generate Abstract Model Hand-generated: how to validate? Automatic abstraction: limited success

Andraus & Sakallah, DAC 2004

Realistic Features Break Abstraction E.g., Set ALU function to A+0 to pass operand to output

Desire Should be able to mix detailed bit-level representation with

abstracted word-level representation

– 14 –

Bit Vectors: Motivating Example #1

Do these functions produce identical results?Strategy

Represent and reason about bit-level program behavior Specific to machine word size, integer representations,

and operations

int abs(int x) { int mask = x>>31; return (x ^ mask) + ~mask + 1;}

int test_abs(int x) { return (x < 0) ? -x : x; }

– 15 –

Motivating Example #2

Is there an input string that causes value 234 to be written to address a4a3a2a1?

void fun() { char fmt[16]; fgets(fmt, 16, stdin); fmt[15] = '\0'; printf(fmt);}

Answer Yes: "a1a2a3a4%230g%n"

Depends on details of compilation But no exploit for buffer size less than 8 [Ganapathy, Seshia, Jha, Reps, Bryant, ICSE ’05]

– 16 –


Is there a way to expand the program sketch to make it match the spec?

bit[W] popSpec(bit[W] x){ int cnt = 0; for (int i=0; i<W; i++) { if (x[i]) cnt++; } return cnt;}

Answer W=16:

[Solar-Lezama, et al., ASPLOS ‘06]

bit[W] popSketch(bit[W] x){ loop (??) { x = (x&??) + ((x>>??)&??); } return x;}

x = (x&0x5555) + ((x>>1)&0x5555); x = (x&0x3333) + ((x>>2)&0x3333); x = (x&0x0077) + ((x>>8)&0x0077); x = (x&0x000f) + ((x>>4)&0x000f);

– 17 –


Is pipelined microprocessor identical to sequential reference model?

Strategy Represent machine instructions, data, and state as bit vectors

Compatible with hardware description language representation Verifier finds abstractions automatically

Reg.File

IF/ID

InstrMem

+4

PC ID/EX

ALU

EX/WB

=

=

RdRa

Rb

Imm

Op

Adat

Control Control

Reg.File

IF/ID

InstrMem

+4

PC ID/EX

ALU

EX/WB

=

=

RdRa

Rb

Imm

Op

Adat

Control Control

Reg.File

InstrMem

+4

ALU

RdRa

Rb

Imm

Op

Adat

Control

Bdat

Reg.File

InstrMem

+4

ALU

RdRa

Rb

Imm

Op

Adat

Control

Bdat

Pipelined Microprocessor Sequential Reference Model

– 18 –

Decision Procedures for System-Level Software

• What kind of logic do we need for system-level software?

• We need bit-vector logic - with bit-wise operators, arithmeticoverflow

• We want to scale to large programs - must verify largeformulas

– 19 –

Decision Procedures for System-Level Software• What kind of logic do we need for system-level software?

• We need bit-vector logic - with bit-wise operators, arithmeticoverflow

• We want to scale to large programs - must verify largeformulas

• Examples of program analysis tools that generate bit-vectorformulas:

• CBMC• SATABS• SATURN (Stanford, Alex Aiken)• EXE (Stanford, Dawson Engler, David Dill)• Variants of those developed at IBM, Microsoft

– 20 –

Bit-Vector Logic: Syntax

– 22 –

Semantics

Danger!

(x − y > 0) if and only if (x > y)

Valid over R/N, but not over the bit-vectors.(Many compilers have this sort of bug)

– 23 –

Width and Encoding

The meaning depends on the width and encoding of thevariables.

7

– 24 –

The meaning depends on the width and encoding of thevariables.

Typical encodings:

Binary encoding

Two’s complement

But maybe also fixed-point, floating-point, . . .

7

Width and Encoding

– 25 –

Examples

– 26 –

Width and Encoding

Notation to clarify width and encoding:

– 27 –

Bit-vectors Made FormalDefinition (Bit-Vector)A bit-vector is a vector of Boolean values with a given length l:

b : {0,...,l − 1} → {0,1}

– 28 –

Bit-vectors Made FormalDefinition (Bit-Vector)

The value of bit number i of x is x(i).

We also write for

Definition (Bit-Vector)A bit-vector is a vector of Boolean values with a given length l:

b : {0,...,l − 1} → {0,1}

– 29 –

Lambda-Notation for Bit-Vectors

λ expressions are functions without a name

– 30 –

Examples:• The vector of length l that consists of zeros:

• A function that inverts (flips all bits in) a bit-vector:

• A bit-wise OR:

⇒ we now have semantics for the bit-wise operators.

Lambda-Notation forBit-Vectors

λ expressions are functions without a name

– 31 –

Semantics for Arithmetic Expressions

What is the output of the following program?

unsigned char number = 200;number = number + 100;printf("Sum: %d\n", number);

– 32 –




On most architectures, this is 44!

11001000 = 200+ 01100100 = 100= 00101100 = 44


– 33 –




On most architectures, this is 44!

11001000 = 200+ 01100100 = 100= 00101100 = 44


⇒ Bit-vector arithmetic uses modular arithmetic!

– 34 –

Semantics for addition, subtraction:


– 35 –

Semantics for addition, subtraction:


We can even mix the encodings:

– 36 –

Semantics for Relational Operators

Semantics for <, ≤, ≥, and so on:

Mixed encodings:

Note that most compilers don’t support comparisons with mixedencodings.

– 37 –

Complexity

Satisfiability is undecidable for an unbounded width, evenwithout arithmetic.

Complexity

– 38 –

Complexity

Satisfiability is undecidable for an unbounded width, evenwithout arithmetic.

It is NP-complete otherwise.

– 39 –

Decision Procedures Core technology for formal reasoning

Boolean SAT Pure Boolean formula

SAT Modulo Theories (SMT) Support additional logic fragments Example theories

Linear arithmetic over reals or integersFunctions with equalityBit vectorsCombinations of theories

Formula

DecisionProcedure

Satisfying solution

Unsatisfiable(+ proof)

– 40 –

SAT made a progress…

1

10

100

1000

10000

100000

1960 1970 1980 1990 2000 2010

Year

Vars

– 41 –

BV Decision Procedures:Some HistoryB.C. (Before Chaff)

String operations (concatenate, field extraction) Linear arithmetic with bounds checking Modular arithmetic

Limitations Cannot handle full range of bit-vector operations

– 42 –

BV Decision Procedures:Using SATSAT-Based “Bit Blasting”

Generate Boolean circuit based on bit-level behavior of operations

Convert to Conjunctive Normal Form (CNF) and check with best available SAT checker

Handles arbitrary operations

Effective in Many Applications CBMC [Clarke, Kroening, Lerda, TACAS ’04] Microsoft Cogent + SLAM [Cook, Kroening, Sharygina, CAV

’05] CVC-Lite [Dill, Barrett, Ganesh], Yices [deMoura, et al]

– 43 –

A Simple Decision Procedure

Transform Bit-Vector Logic to Propositional LogicMost commonly used decision procedureAlso called ’bit-blasting’

– 44 –


• Transform Bit-Vector Logic to Propositional LogicMost commonly used decision procedureAlso called ’bit-blasting’

• Bit-Vector Flattening1

2

3

1. Convert propositional part as before2. Add a Boolean variable for each bit of each sub-expression(term)

3. Add constraint for each sub-expression

We denote the new Boolean variable for bit i of term t by .

17


– 45 –

What constraints do we generate for a given term?

Bit-vector Flattening

– 46 –


This is easy for the bit-wise operators.


Example for

(read x = y over bits as x y)

– 47 –


This is easy for the bit-wise operators.

We can transform this into CNF using Tseitin’s method.


Example for

(read x = y over bits as x y)

– 48 –


– 49 –

Flattening Bit-VectorArithmetic

How to flatten a + b?

– 50 –

Flattening Bit-Vector Arithmetic

How to flatten a + b?

→ we can build a circuit that adds them!

The full adder in CNF:


– 51 –


Ok, this is good for one bit! How about more?

– 52 –

Ok, this is good for one bit! How about more?

8-Bit ripple carry adder (RCA)

Also called carry chain adder

Adds l variablesAdds 6 · l clauses


– 53 –


– 54 –

Multipliers

Multipliers result in very hard formulas

Example:

CNF: About 11000 variables, unsolvable (Hard) for current SATsolvers

Similar problems with division, modulo

Q: Why is this hard?

– 55 –

Multipliers

Multipliers result in very hard formulas

Example:

CNF: About 11000 variables, unsolvable (Hard) for current SATsolvers

Similar problems with division, modulo

Q: Why is this hard?Q: How do we fix this?

– 56 –

Multipliers

– 57 –

Incremental Flattening

ϕsk : Boolean part of ϕF: set of terms that are in the encoding

– 58 –


?ϕf := ϕsk , F := ∅

No!?

UNSAT



– 59 –


?ϕf := ϕsk , F := ∅

Is ϕf SAT? Yes! - compute I

No!

UNSAT

I: set of terms that are inconsistent with the current assignment



– 60 –


?ϕf := ϕsk , F := ∅

?Is ϕf SAT? Yes! - compute I

No! I =∅? ?

UNSAT SAT

ϕsk : Boolean part of ϕF: set of terms that are in the encodingI: set of terms that are inconsistent with the current assignment


– 61 –


?ϕf := ϕsk , F := ∅

Pick F′ (I \ F )⊆F := F F∪ ′

ϕf := ϕf Constraint(F)∧

?Is ϕf SAT? Yes! -

No!?

UNSAT

6I =∅compute I

I =∅?

SAT

ϕsk : Boolean part of ϕF: set of terms that are in the encodingI: set of terms that are inconsistent with the current assignment


– 62 –


Idea: add ’easy’ parts of the formula first

Only add hard parts when needed

ϕf only gets stronger - use an incremental SAT solver


– 63 –

Incremental FlatteningIncremental Flattening

– 64 –

Incomplete Assignments

Hey: initially, we only have the skeleton!How do we know what terms are inconsistent with the currentassignment if the variables aren’t even in ϕf ?

– 65 –


Solution: guess some values for the missing variables.If you guess right, it’s good.


Hey: initially, we only have the skeleton!How do we know what terms are inconsistent with the currentassignment if the variables aren’t even in ϕf ?

– 66 –

Bit-Vector ChallengeIs there a better way than bit blasting?Requirements

Provide same functionality as with bit blasting Find abstractions based on word-level structure Improve on performance of bit blasting

Observation Must have bit blasting at core

Only approach that covers full functionality Want to exploit special cases

Formula satisfied by small valuesSimple algebraic properties imply unsatisfiabilitySmall unsatisfiable coreSolvable by modular arithmetic…

– 67 –

Iterative ApproximationIdeaIterative Approximation

UCLID: Bryant, Kroening, Ouaknine, Seshia, Strichman, Brady, TACAS ’07

Use bit blasting as core technique Apply to simplified versions of formula Successive approximations until solve or show unsatisfiable

– 68 –

Iterative Approach Background: Approximating Formula

Example Approximation Techniques Underapproximating

Restrict word-level variables to smaller ranges of values Overapproximating

Replace subformula with Boolean variable

Original Formula

+Overapproximation + More solutions:

If unsatisfiable, then so is

Underapproximation−

−

Fewer solutions:Satisfying solution also satisfies

– 69 –

Starting Iterations

Initial Underapproximation (Greatly) restrict ranges of word-level variables Intuition: Satisfiable formula often has small-domain

solution

1−

– 70 –

First Half of Iteration

SAT Result for 1− Satisfiable

Then have found solution for Unsatisfiable

Use UNSAT proof to generate overapproximation 1+ (Described later)

1−If SAT, then done

1+

UNSAT proof:generate overapproximation

– 71 –

Second Half of Iteration

SAT Result for 1+ Unsatisfiable

Then have shown unsatisfiable Satisfiable

Solution indicates variable ranges that must be expandedGenerate refined underapproximation

1−

If UNSAT, then done1+

SAT:Use solution to generate refined underapproximation

2−

– 72 –

Example

:= (x = y+2) ^ (x2 > y2)

1− := (x[1] = y[1]+2) ^(x[1]2 > y[1]

2)

2− := (x[2] = y[2]+2) ^ (x[2]2 > y[2]

2)

1+ := (x = y+2)

SAT, done.

UNSATLook at proof

SATx = 2, y = 0

– 73 –

Iterative Behavior

Underapproximations Successively more precise

abstractions of Allow wider variable ranges

Overapproximations No predictable relation UNSAT proof not unique

1−

1+

2−

k−

2+

k+

– 74 –

Overall EffectSoundness

Only terminate with solution on underapproximation

Only terminate as UNSAT on overapproximation

Completeness Successive

underapproximations approach

Finite variable ranges guarantee termination

In worst case, get k−

1−

1+

2−

k−

2+

k+

SAT

UNSAT

– 75 –

Generating Over approximation

Given Underapproximation 1− Bit-blasted translation of 1−

into Boolean formula Proof that Boolean formula

unsatisfiable

Generate Overapproximation 1+ If 1+ satisfiable, must lead to

refined underapproximation

1−

1+


2−

– 76 –

Bit-Vector Formula Structure DAG representation to allow shared subformulas

x + 2 z 1

x % 26 = v

w & 0xFFFF = x

x = y

Ç

Æ:

Ç

ÆÇ

a

– 77 –

Structure of Underapproximation

Linear complexity translation to CNFEach word-level variable encoded as set of Boolean variablesAdditional Boolean variables represent subformula values

x + 2 z 1

x % 26 = v

w & 0xFFFF = x

x = y

Ç

Æ:

Ç

ÆÇ

a −

RangeConstraints

wxyz

Æ

– 78 –

Encoding Range ConstraintsExplicit

View as additional predicates in formula

Implicit Reduce number of variables in encoding Constraint Encoding 0 w 8 0 0 0 ··· 0 w2w1w0

−4 x 4 xsxsxs··· xsxsx1x0

Yields smaller SAT encodings

RangeConstraints

wx

0 w 8 −4 x 4

– 79 –

RangeConstraints

wxyz

Æ

UNSAT Proof Subset of clauses that is unsatisfiable Clause variables define portion of DAG Sub graph that cannot be satisfied with given range

constraints

x + 2 z 1

x % 26 = v

w & 0xFFFF = x

x = y

a

Ç

Æ

ÆÇ

Ç

:

– 80 –

Extracting Circuit from UNSAT Proof Subgraph that cannot be satisfied with given range

constraintsEven when replace rest of graph with unconstrained

variables

x + 2 z 1

x = y

a Æ

ÆÇ

Ç

:

b1

b2

RangeConstraints

wxyz

ÆUNSAT

– 81 –

Generated Over Approximation Remove range constraints on word-level variables Creates overapproximation

Ignores correlations between values of subformulas

x + 2 z 1

x = y

a Æ

ÆÇ

Ç

:

b1

b2

1+

– 82 –

Generated Over ApproximationAlgorithm

– 83 –

Refinement PropertyClaim

1+ has no solutions that satisfy 1−’s range constraintsBecause 1+ contains portion of 1− that was shown to be

unsatisfiable under range constraints

x + 2 z 1

x = y

a Æ

ÆÇ

Ç

:

b1

b2

RangeConstraints

wxyz

ÆUNSAT

1+

– 84 –

Refinement Property (Cont.)

Consequence Solving 1+ will expand range of some variables Leading to more exact underapproximation 2−

x + 2 z 1

x = y

a Æ

ÆÇ

Ç

:

b1

b2

1+

– 85 –

Effect of Iteration

Each Complete Iteration Expands ranges of some word-level variables Creates refined underapproximation

1−

1+

SAT:Use solution to generate refined underapproximation

2−


– 86 –

Approximation Methods

So Far Range constraints

Underapproximate by constraining values of word-level variables

Subformula eliminationOverapproximate by assuming subformula value arbitrary

General Requirements Systematic under- and over-approximations Way to connect from one to another

Goal: Devise Additional Approximation Strategies

– 87 –

Function Approximation Example

§: Prohibit Via Additional Range Constraints Gives underapproximation Restricts values of (possibly intermediate) terms

§: Abstract as f (x,y) Overapproximate as uninterpreted function f Value constrained only by functional consistency

*x

y

x

0 1 else

y

0 0 0 0

1 0 1 x

else 0 y §

– 88 –

Function Approximation Example

*x

y

x

0 1 else

y

0 0 0 0

1 0 1 x

else 0 y §

– 89 –

Results: UCLID BV vs. Bit-blasting

UCLID always better than bit blasting Generally better than other available procedures SAT time is the dominating factor

[results on 2.8 GHz Xeon, 2 GB RAM]

– 90 –

Challenges with Iterative ApproximationFormulating Overall Strategy

Which abstractions to apply, when and where How quickly to relax constraints in iterations

Which variables to expand and by how much?Too conservative: Each call to SAT solver incurs costToo lenient: Devolves to complete bit blasting.

Predicting SAT Solver Performance Hard to predict time required by call to SAT solver Will particular abstraction simplify or complicate SAT?

Combination Especially Difficult Multiple iterations with unpredictable inner loop

– 91 –

Summary: Modeling LevelsBits

Limited ability to scale Hard to apply functional abstractions

Words Allows abstracting data while precisely representing control Overlooks finite word-size effects

Bit Vectors Realistic semantic model for hardware & software Captures all details of actual operation

Detects errors related to overflow and other artifacts of finite representation

Can apply abstractions found at word-level

– 92 –

Areas of Agreement

SAT-Based Framework Is Only Logical Choice SAT solvers are good & getting better

Want to Automatically Exploit Abstractions Function structure Arithmetic properties

E.g., associativity, commutativty Arithmetic reductions

E.g., LU decomposition

Base Level Should Be SAT Semantically complete approach

– 93 –

Thank you.