Modeling Data in Formal Verification
Bits, Bit Vectors, or Words
Karam AbdElkaderBased on: Presentations form
• Randal E. Bryant - Carnegie Mellon University• Decision Procedures An Algorithmic Point of View
D.Kroening – Oxsoford Unversity, O.Strichman - Technion
– 2 –
Agenda
• Overview and Examples.
• Introduction to Bit-Vector Logic
• Syntax
• Semantics
• Decision procedures for Bit-Vector Logic• Flattening Bit-Vector Logic• Incremental Flattening• Bit-Vector Arithmetic With Abstraction
2
– 3 –
Issue How should data be modeled in formal analysis? Verification, test generation, security analysis, …
Approaches Bits: Every bit is represented individually
Basis for most CAD, model checking Words: View each word as arbitrary value
E.g., unbounded integersHistoric program verification work
Bit Vectors: Finite precision words
Captures true semantics of hardware and softwareMore opportunities for abstraction than with bits
Over View
– 4 –
Data Path
Com.Log.
1
Com.Log.
2
Bit-Level Modeling
Represent Every Bit of State Individually Behavior expressed as Boolean next-state over current state Historic method for most CAD, testing, and verification tools
E.g., model checkers
Control Logic
– 5 –
Bit-Level Modeling in Practice
Strengths Allows precise modeling of system Well developed technology
BDDs & SAT for Boolean reasoning
Limitations Every state bit introduces two Boolean variables
Current state & next state Overly detailed modeling of system functions
Don’t want to capture full details of FPU
Making It Work Use extensive abstraction to reduce bit count Hard to abstract functionality
– 6 –
Word-Level Abstraction #1:Bits → Integers
View Data as Symbolic Words Arbitrary integers
No assumptions about size or encodingClassic model for reasoning about software
Can store in memories & registers
x0x1x2
xn-1
x
– 7 –
Data Path
Com.Log.
1
Com.Log.
2
Abstracting Data BitsControl Logic
Data Path
Com.Log.
1
Com.Log.
1? ?
What do we do about logic functions?
– 8 –
Word-Level Abstraction #2:Uninterpreted Functions
For any Block that Transforms or Evaluates Data: Replace with generic, unspecified function Only assumed property is functional consistency:
a = x b = y f (a, b) = f (x, y)
ALUf
– 9 –
Abstracting Functions
For Any Block that Transforms Data: Replace by uninterpreted function Ignore detailed functionality Conservative approximation of actual system
Data Path
Control Logic
Com.Log.
1
Com.Log.
1F1 F2
– 10 –
Word-Level Modeling: History
Historic Used by theorem provers
More Recently Burch & Dill, CAV ’94
Verify that pipelined processor has same behavior as unpipelined reference model
Use word-level abstractions of data paths and memoriesUse decision procedure to determine equivalence
Bryant, Lahiri, Seshia, CAV ’02UCLID verifierTool for describing & verifying systems at word level
– 11 –
Pipeline Verification Example
Reg.File
IF/ID
InstrMem
+4
PC ID/EX
ALU
EX/WB
=
=
RdRa
Rb
Imm
Op
Adat
Control Control
Reg.File
IF/ID
InstrMem
+4
PC ID/EX
ALU
EX/WB
=
=
RdRa
Rb
Imm
Op
Adat
Control Control
Reg.File
InstrMem
+4
ALU
RdRa
Rb
Imm
Op
Adat
Control
Bdat
Reg.File
InstrMem
+4
ALU
RdRa
Rb
Imm
Op
Adat
Control
Bdat
Pipelined Processor
Reference Model
– 12 –
Abstracted Pipeline Verification
Pipelined Processor
Reference Model
Reg.File
IF/ID
InstrMem
+4
PC ID/EX
ALU
EX/WB
=
=
RdRa
Rb
Imm
Op
Adat
Control Control
Reg.File
IF/ID
InstrMem
+4
PC ID/EX
ALU
EX/WB
=
=
RdRa
Rb
Imm
Op
Adat
Control Control
F1
F2
F3
Reg.File
InstrMem
+4
ALU
RdRa
Rb
Imm
Op
Adat
Control
Bdat
Reg.File
InstrMem
+4
ALU
RdRa
Rb
Imm
Op
Adat
Control
Bdat
PC
F1
F2
F3
F1
F2
F3
– 13 –
Experience with Word-Level Modeling
Powerful Abstraction Tool Allows focus on control of large-scale system Can model systems with very large memories
Hard to Generate Abstract Model Hand-generated: how to validate? Automatic abstraction: limited success
Andraus & Sakallah, DAC 2004
Realistic Features Break Abstraction E.g., Set ALU function to A+0 to pass operand to output
Desire Should be able to mix detailed bit-level representation with
abstracted word-level representation
– 14 –
Bit Vectors: Motivating Example #1
Do these functions produce identical results?Strategy
Represent and reason about bit-level program behavior Specific to machine word size, integer representations,
and operations
int abs(int x) { int mask = x>>31; return (x ^ mask) + ~mask + 1;}
int test_abs(int x) { return (x < 0) ? -x : x; }
– 15 –
Motivating Example #2
Is there an input string that causes value 234 to be written to address a4a3a2a1?
void fun() { char fmt[16]; fgets(fmt, 16, stdin); fmt[15] = '\0'; printf(fmt);}
Answer Yes: "a1a2a3a4%230g%n"
Depends on details of compilation But no exploit for buffer size less than 8 [Ganapathy, Seshia, Jha, Reps, Bryant, ICSE ’05]
– 16 –
Motivating Example #3
Is there a way to expand the program sketch to make it match the spec?
bit[W] popSpec(bit[W] x){ int cnt = 0; for (int i=0; i<W; i++) { if (x[i]) cnt++; } return cnt;}
Answer W=16:
[Solar-Lezama, et al., ASPLOS ‘06]
bit[W] popSketch(bit[W] x){ loop (??) { x = (x&??) + ((x>>??)&??); } return x;}
x = (x&0x5555) + ((x>>1)&0x5555); x = (x&0x3333) + ((x>>2)&0x3333); x = (x&0x0077) + ((x>>8)&0x0077); x = (x&0x000f) + ((x>>4)&0x000f);
– 17 –
Motivating Example #4
Is pipelined microprocessor identical to sequential reference model?
Strategy Represent machine instructions, data, and state as bit vectors
Compatible with hardware description language representation Verifier finds abstractions automatically
Reg.File
IF/ID
InstrMem
+4
PC ID/EX
ALU
EX/WB
=
=
RdRa
Rb
Imm
Op
Adat
Control Control
Reg.File
IF/ID
InstrMem
+4
PC ID/EX
ALU
EX/WB
=
=
RdRa
Rb
Imm
Op
Adat
Control Control
Reg.File
InstrMem
+4
ALU
RdRa
Rb
Imm
Op
Adat
Control
Bdat
Reg.File
InstrMem
+4
ALU
RdRa
Rb
Imm
Op
Adat
Control
Bdat
Pipelined Microprocessor Sequential Reference Model
– 18 –
Decision Procedures for System-Level Software
• What kind of logic do we need for system-level software?
• We need bit-vector logic - with bit-wise operators, arithmeticoverflow
• We want to scale to large programs - must verify largeformulas
– 19 –
Decision Procedures for System-Level Software• What kind of logic do we need for system-level software?
• We need bit-vector logic - with bit-wise operators, arithmeticoverflow
• We want to scale to large programs - must verify largeformulas
• Examples of program analysis tools that generate bit-vectorformulas:
• CBMC• SATABS• SATURN (Stanford, Alex Aiken)• EXE (Stanford, Dawson Engler, David Dill)• Variants of those developed at IBM, Microsoft
– 20 –
Bit-Vector Logic: Syntax
– 21 –
formula : formula formula | ¬formula | atom∨atom : term rel term | Boolean-Identifier | term[ constant ]
rel : = | <term : term op term | identifier | ∼ term | constant |
atom?term:term |term[ constant : constant ] | ext ( term )
op : +| − | · |/|<< | >> | & | | | | ◦⊕
∼ x: bit-wise negation of xext (x): sign- or zero-extension of xx << d: left shift with distance dx ◦ y: concatenation of x and y
Bit-Vector Logic: Syntax
– 22 –
Semantics
Danger!
(x − y > 0) if and only if (x > y)
Valid over R/N, but not over the bit-vectors.(Many compilers have this sort of bug)
– 23 –
Width and Encoding
The meaning depends on the width and encoding of thevariables.
7
– 24 –
The meaning depends on the width and encoding of thevariables.
Typical encodings:
Binary encoding
Two’s complement
But maybe also fixed-point, floating-point, . . .
7
Width and Encoding
– 25 –
Examples
– 26 –
Width and Encoding
Notation to clarify width and encoding:
– 27 –
Bit-vectors Made FormalDefinition (Bit-Vector)A bit-vector is a vector of Boolean values with a given length l:
b : {0,...,l − 1} → {0,1}
– 28 –
Bit-vectors Made FormalDefinition (Bit-Vector)
The value of bit number i of x is x(i).
We also write for
Definition (Bit-Vector)A bit-vector is a vector of Boolean values with a given length l:
b : {0,...,l − 1} → {0,1}
– 29 –
Lambda-Notation for Bit-Vectors
λ expressions are functions without a name
– 30 –
Examples:• The vector of length l that consists of zeros:
• A function that inverts (flips all bits in) a bit-vector:
• A bit-wise OR:
⇒ we now have semantics for the bit-wise operators.
Lambda-Notation forBit-Vectors
λ expressions are functions without a name
– 31 –
Semantics for Arithmetic Expressions
What is the output of the following program?
unsigned char number = 200;number = number + 100;printf("Sum: %d\n", number);
– 32 –
Semantics for Arithmetic Expressions
What is the output of the following program?
unsigned char number = 200;number = number + 100;printf("Sum: %d\n", number);
On most architectures, this is 44!
11001000 = 200+ 01100100 = 100= 00101100 = 44
Semantics for Arithmetic Expressions
– 33 –
Semantics for Arithmetic Expressions
What is the output of the following program?
unsigned char number = 200;number = number + 100;printf("Sum: %d\n", number);
On most architectures, this is 44!
11001000 = 200+ 01100100 = 100= 00101100 = 44
Semantics for Arithmetic Expressions
⇒ Bit-vector arithmetic uses modular arithmetic!
– 34 –
Semantics for addition, subtraction:
Semantics for Arithmetic Expressions
– 35 –
Semantics for addition, subtraction:
Semantics for Arithmetic Expressions
We can even mix the encodings:
– 36 –
Semantics for Relational Operators
Semantics for <, ≤, ≥, and so on:
Mixed encodings:
Note that most compilers don’t support comparisons with mixedencodings.
– 37 –
Complexity
Satisfiability is undecidable for an unbounded width, evenwithout arithmetic.
Complexity
– 38 –
Complexity
Satisfiability is undecidable for an unbounded width, evenwithout arithmetic.
It is NP-complete otherwise.
– 39 –
Decision Procedures Core technology for formal reasoning
Boolean SAT Pure Boolean formula
SAT Modulo Theories (SMT) Support additional logic fragments Example theories
Linear arithmetic over reals or integersFunctions with equalityBit vectorsCombinations of theories
Formula
DecisionProcedure
Satisfying solution
Unsatisfiable(+ proof)
– 40 –
SAT made a progress…
1
10
100
1000
10000
100000
1960 1970 1980 1990 2000 2010
Year
Vars
– 41 –
BV Decision Procedures:Some HistoryB.C. (Before Chaff)
String operations (concatenate, field extraction) Linear arithmetic with bounds checking Modular arithmetic
Limitations Cannot handle full range of bit-vector operations
– 42 –
BV Decision Procedures:Using SATSAT-Based “Bit Blasting”
Generate Boolean circuit based on bit-level behavior of operations
Convert to Conjunctive Normal Form (CNF) and check with best available SAT checker
Handles arbitrary operations
Effective in Many Applications CBMC [Clarke, Kroening, Lerda, TACAS ’04] Microsoft Cogent + SLAM [Cook, Kroening, Sharygina, CAV
’05] CVC-Lite [Dill, Barrett, Ganesh], Yices [deMoura, et al]
– 43 –
A Simple Decision Procedure
Transform Bit-Vector Logic to Propositional LogicMost commonly used decision procedureAlso called ’bit-blasting’
– 44 –
A Simple Decision Procedure
• Transform Bit-Vector Logic to Propositional LogicMost commonly used decision procedureAlso called ’bit-blasting’
• Bit-Vector Flattening1
2
3
1. Convert propositional part as before2. Add a Boolean variable for each bit of each sub-expression(term)
3. Add constraint for each sub-expression
We denote the new Boolean variable for bit i of term t by .
17
A Simple Decision Procedure
– 45 –
What constraints do we generate for a given term?
Bit-vector Flattening
– 46 –
What constraints do we generate for a given term?
This is easy for the bit-wise operators.
Bit-vector Flattening
Example for
(read x = y over bits as x y)
– 47 –
What constraints do we generate for a given term?
This is easy for the bit-wise operators.
We can transform this into CNF using Tseitin’s method.
Bit-vector Flattening
Example for
(read x = y over bits as x y)
– 48 –
Bit-vector Flattening
– 49 –
Flattening Bit-VectorArithmetic
How to flatten a + b?
– 50 –
Flattening Bit-Vector Arithmetic
How to flatten a + b?
→ we can build a circuit that adds them!
The full adder in CNF:
Flattening Bit-VectorArithmetic
– 51 –
Flattening Bit-VectorArithmetic
Ok, this is good for one bit! How about more?
– 52 –
Ok, this is good for one bit! How about more?
8-Bit ripple carry adder (RCA)
Also called carry chain adder
Adds l variablesAdds 6 · l clauses
Flattening Bit-VectorArithmetic
– 53 –
Bit-vector Flattening
– 54 –
Multipliers
Multipliers result in very hard formulas
Example:
CNF: About 11000 variables, unsolvable (Hard) for current SATsolvers
Similar problems with division, modulo
Q: Why is this hard?
– 55 –
Multipliers
Multipliers result in very hard formulas
Example:
CNF: About 11000 variables, unsolvable (Hard) for current SATsolvers
Similar problems with division, modulo
Q: Why is this hard?Q: How do we fix this?
– 56 –
Multipliers
– 57 –
Incremental Flattening
ϕsk : Boolean part of ϕF: set of terms that are in the encoding
– 58 –
Incremental Flattening
?ϕf := ϕsk , F := ∅
No!?
UNSAT
Incremental Flattening
ϕsk : Boolean part of ϕF: set of terms that are in the encoding
– 59 –
Incremental Flattening
?ϕf := ϕsk , F := ∅
Is ϕf SAT? Yes! - compute I
No!
UNSAT
I: set of terms that are inconsistent with the current assignment
Incremental Flattening
ϕsk : Boolean part of ϕF: set of terms that are in the encoding
– 60 –
Incremental Flattening
?ϕf := ϕsk , F := ∅
?Is ϕf SAT? Yes! - compute I
No! I =∅? ?
UNSAT SAT
ϕsk : Boolean part of ϕF: set of terms that are in the encodingI: set of terms that are inconsistent with the current assignment
Incremental Flattening
– 61 –
Incremental Flattening
?ϕf := ϕsk , F := ∅
Pick F′ (I \ F )⊆F := F F∪ ′
ϕf := ϕf Constraint(F)∧
?Is ϕf SAT? Yes! -
No!?
UNSAT
6I =∅compute I
I =∅?
SAT
ϕsk : Boolean part of ϕF: set of terms that are in the encodingI: set of terms that are inconsistent with the current assignment
Incremental Flattening
– 62 –
Incremental Flattening
Idea: add ’easy’ parts of the formula first
Only add hard parts when needed
ϕf only gets stronger - use an incremental SAT solver
Incremental Flattening
– 63 –
Incremental FlatteningIncremental Flattening
– 64 –
Incomplete Assignments
Hey: initially, we only have the skeleton!How do we know what terms are inconsistent with the currentassignment if the variables aren’t even in ϕf ?
– 65 –
Incomplete Assignments
Solution: guess some values for the missing variables.If you guess right, it’s good.
Incomplete Assignments
Hey: initially, we only have the skeleton!How do we know what terms are inconsistent with the currentassignment if the variables aren’t even in ϕf ?
– 66 –
Bit-Vector ChallengeIs there a better way than bit blasting?Requirements
Provide same functionality as with bit blasting Find abstractions based on word-level structure Improve on performance of bit blasting
Observation Must have bit blasting at core
Only approach that covers full functionality Want to exploit special cases
Formula satisfied by small valuesSimple algebraic properties imply unsatisfiabilitySmall unsatisfiable coreSolvable by modular arithmetic…
– 67 –
Iterative ApproximationIdeaIterative Approximation
UCLID: Bryant, Kroening, Ouaknine, Seshia, Strichman, Brady, TACAS ’07
Use bit blasting as core technique Apply to simplified versions of formula Successive approximations until solve or show unsatisfiable
– 68 –
Iterative Approach Background: Approximating Formula
Example Approximation Techniques Underapproximating
Restrict word-level variables to smaller ranges of values Overapproximating
Replace subformula with Boolean variable
Original Formula
+Overapproximation + More solutions:
If unsatisfiable, then so is
Underapproximation−
−
Fewer solutions:Satisfying solution also satisfies
– 69 –
Starting Iterations
Initial Underapproximation (Greatly) restrict ranges of word-level variables Intuition: Satisfiable formula often has small-domain
solution
1−
– 70 –
First Half of Iteration
SAT Result for 1− Satisfiable
Then have found solution for Unsatisfiable
Use UNSAT proof to generate overapproximation 1+ (Described later)
1−If SAT, then done
1+
UNSAT proof:generate overapproximation
– 71 –
Second Half of Iteration
SAT Result for 1+ Unsatisfiable
Then have shown unsatisfiable Satisfiable
Solution indicates variable ranges that must be expandedGenerate refined underapproximation
1−
If UNSAT, then done1+
SAT:Use solution to generate refined underapproximation
2−
– 72 –
Example
:= (x = y+2) ^ (x2 > y2)
1− := (x[1] = y[1]+2) ^(x[1]2 > y[1]
2)
2− := (x[2] = y[2]+2) ^ (x[2]2 > y[2]
2)
1+ := (x = y+2)
SAT, done.
UNSATLook at proof
SATx = 2, y = 0
– 73 –
Iterative Behavior
Underapproximations Successively more precise
abstractions of Allow wider variable ranges
Overapproximations No predictable relation UNSAT proof not unique
1−
1+
2−
k−
2+
k+
– 74 –
Overall EffectSoundness
Only terminate with solution on underapproximation
Only terminate as UNSAT on overapproximation
Completeness Successive
underapproximations approach
Finite variable ranges guarantee termination
In worst case, get k−
1−
1+
2−
k−
2+
k+
SAT
UNSAT
– 75 –
Generating Over approximation
Given Underapproximation 1− Bit-blasted translation of 1−
into Boolean formula Proof that Boolean formula
unsatisfiable
Generate Overapproximation 1+ If 1+ satisfiable, must lead to
refined underapproximation
1−
1+
UNSAT proof:generate overapproximation
2−
– 76 –
Bit-Vector Formula Structure DAG representation to allow shared subformulas
x + 2 z 1
x % 26 = v
w & 0xFFFF = x
x = y
Ç
Æ:
Ç
ÆÇ
a
– 77 –
Structure of Underapproximation
Linear complexity translation to CNFEach word-level variable encoded as set of Boolean variablesAdditional Boolean variables represent subformula values
x + 2 z 1
x % 26 = v
w & 0xFFFF = x
x = y
Ç
Æ:
Ç
ÆÇ
a −
RangeConstraints
wxyz
Æ
– 78 –
Encoding Range ConstraintsExplicit
View as additional predicates in formula
Implicit Reduce number of variables in encoding Constraint Encoding 0 w 8 0 0 0 ··· 0 w2w1w0
−4 x 4 xsxsxs··· xsxsx1x0
Yields smaller SAT encodings
RangeConstraints
wx
0 w 8 −4 x 4
– 79 –
RangeConstraints
wxyz
Æ
UNSAT Proof Subset of clauses that is unsatisfiable Clause variables define portion of DAG Sub graph that cannot be satisfied with given range
constraints
x + 2 z 1
x % 26 = v
w & 0xFFFF = x
x = y
a
Ç
Æ
ÆÇ
Ç
:
– 80 –
Extracting Circuit from UNSAT Proof Subgraph that cannot be satisfied with given range
constraintsEven when replace rest of graph with unconstrained
variables
x + 2 z 1
x = y
a Æ
ÆÇ
Ç
:
b1
b2
RangeConstraints
wxyz
ÆUNSAT
– 81 –
Generated Over Approximation Remove range constraints on word-level variables Creates overapproximation
Ignores correlations between values of subformulas
x + 2 z 1
x = y
a Æ
ÆÇ
Ç
:
b1
b2
1+
– 82 –
Generated Over ApproximationAlgorithm
– 83 –
Refinement PropertyClaim
1+ has no solutions that satisfy 1−’s range constraintsBecause 1+ contains portion of 1− that was shown to be
unsatisfiable under range constraints
x + 2 z 1
x = y
a Æ
ÆÇ
Ç
:
b1
b2
RangeConstraints
wxyz
ÆUNSAT
1+
– 84 –
Refinement Property (Cont.)
Consequence Solving 1+ will expand range of some variables Leading to more exact underapproximation 2−
x + 2 z 1
x = y
a Æ
ÆÇ
Ç
:
b1
b2
1+
– 85 –
Effect of Iteration
Each Complete Iteration Expands ranges of some word-level variables Creates refined underapproximation
1−
1+
SAT:Use solution to generate refined underapproximation
2−
UNSAT proof:generate overapproximation
– 86 –
Approximation Methods
So Far Range constraints
Underapproximate by constraining values of word-level variables
Subformula eliminationOverapproximate by assuming subformula value arbitrary
General Requirements Systematic under- and over-approximations Way to connect from one to another
Goal: Devise Additional Approximation Strategies
– 87 –
Function Approximation Example
§: Prohibit Via Additional Range Constraints Gives underapproximation Restricts values of (possibly intermediate) terms
§: Abstract as f (x,y) Overapproximate as uninterpreted function f Value constrained only by functional consistency
*x
y
x
0 1 else
y
0 0 0 0
1 0 1 x
else 0 y §
– 88 –
Function Approximation Example
*x
y
x
0 1 else
y
0 0 0 0
1 0 1 x
else 0 y §
– 89 –
Results: UCLID BV vs. Bit-blasting
UCLID always better than bit blasting Generally better than other available procedures SAT time is the dominating factor
[results on 2.8 GHz Xeon, 2 GB RAM]
– 90 –
Challenges with Iterative ApproximationFormulating Overall Strategy
Which abstractions to apply, when and where How quickly to relax constraints in iterations
Which variables to expand and by how much?Too conservative: Each call to SAT solver incurs costToo lenient: Devolves to complete bit blasting.
Predicting SAT Solver Performance Hard to predict time required by call to SAT solver Will particular abstraction simplify or complicate SAT?
Combination Especially Difficult Multiple iterations with unpredictable inner loop
– 91 –
Summary: Modeling LevelsBits
Limited ability to scale Hard to apply functional abstractions
Words Allows abstracting data while precisely representing control Overlooks finite word-size effects
Bit Vectors Realistic semantic model for hardware & software Captures all details of actual operation
Detects errors related to overflow and other artifacts of finite representation
Can apply abstractions found at word-level
– 92 –
Areas of Agreement
SAT-Based Framework Is Only Logical Choice SAT solvers are good & getting better
Want to Automatically Exploit Abstractions Function structure Arithmetic properties
E.g., associativity, commutativty Arithmetic reductions
E.g., LU decomposition
Base Level Should Be SAT Semantically complete approach
– 93 –
Thank you.
Top Related