Nikolaj Bjørner Senior Researcher Microsoft Research Redmond

78
Nikolaj Bjørner Senior Researcher Microsoft Research Redmond Modern Satisfiability Modulo Theories Solvers in Program Analysis

description

Nikolaj Bjørner Senior Researcher Microsoft Research Redmond. Modern Satisfiability Modulo Theories Solvers in Program Analysis. Lectures. Wednesday 10:45–12:15 An Introduction to Z3 with Applications Thursday August 30 th 15:45–17:15 Introduction to SAT and SMT Friday 10:30–10:45 - PowerPoint PPT Presentation

Transcript of Nikolaj Bjørner Senior Researcher Microsoft Research Redmond

Page 1: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Nikolaj BjørnerSenior ResearcherMicrosoft Research Redmond

Modern Satisfiability Modulo Theories Solvers in Program Analysis

Page 2: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Lectures

Wednesday 10:45–12:15An Introduction to Z3 with Applications

Thursday August 30th 15:45–17:15Introduction to SAT and SMT

Friday 10:30–10:45 Theories and Solving Algorithms

Friday 15:45–17:15 Advanced & Summary: Quantifiers, Arrays, Fixed-points

Page 3: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Plan

I. Introduction to Z3 – An Efficient SMT SolverI. ArchitectureII. Interaction – Input formatsIII. Theories

II. Introduction to ApplicationsI. Basic examples of SMT solvingII. A selection of applications using Z3/SMT.

Page 4: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Takeaways:

• Many program analysis, verification and test tools solve problems that can be reduced to logical formulas and transformations between logical formulas at their core.

• Modern SMT solvers are a often good fit.– Handle domains found in programs directly.

• The selected examples are intended to show instances where sub-tasks are reduced to SMT/Z3.

Page 5: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Z3 – AN EFFICIENT SMT SOLVER

http://research.microsoft.com/projects/z3

Page 6: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

What is Z3?

TheoriesBit-Vectors

Lin-arithmetic Nonlin-arithmetic

Free (uninterpreted) functions

Arrays

Quantifiers:E-matching

OCaml

.NET

CNative

SMT-LIB

Model Generation:Finite/parametric

Simplify

Floating pointsRecursive Datatypes

Quantifiers:Super-position

Proof objects

Tactics Assumption tracking

By Leonardo de Moura, Nikolaj Bjørner, Christoph Wintersteiger

F# quote

Python

Page 7: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Z3 is Cutting Edge

http://smtcomp.org

Z3 is Free Software

for research

Page 8: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Z3: Little Engines of Proof

Congruence

Closure

SAT Solver

Quant.Instance

s

Simplex

Super-position

Arrays

User-Theorie

s

MBQI

Freely available from http://research.microsoft.com/projects/z3

Page 9: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

INPUT FORMATS

Page 10: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Input Formats

• Text:– SMT-LIB2 - main exchange format for SMT solvers– Log - low-level log for replay– Datalog - a Datalog format for the fixed-point engine– DIMACS - a format for propositional SAT

• Programmatic:– C - API functions exposed for C– Ocaml - Ocaml wrapper around C API– .NET - .NET wrapper around C API– Python - Try it on http://rise4fun.com/z3py – Scala - by Phillipe Suter

Page 11: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

A Primer on SMT-LIB2

Online interactive tutorial

http://rise4fun.com/z3/tutorial

Page 12: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

THEORIES

Page 14: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Uninterpreted functionsArithmetic (linear)

Theories

Page 15: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Uninterpreted functionsArithmetic (linear)Bit-vectors

Theories

Page 16: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Uninterpreted functionsArithmetic (linear)Bit-vectorsAlgebraic data-types

Theories

Page 17: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Uninterpreted functionsArithmetic (linear)Bit-vectorsAlgebraic data-typesArrays

Theories

Page 18: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Uninterpreted functionsArithmetic (linear)Bit-vectorsAlgebraic data-typesArraysPolynomial Arithmetic

Theories

Page 19: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Uninterpreted functionsArithmetic (linear)Bit-vectorsAlgebraic data-typesArraysPolynomial Arithmetic

Theories

Page 20: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

USER-INTERACTION AND GUIDANCE

Page 21: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Logical Formula

Sat/Model

Interaction

Page 22: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Interaction

Logical Formula

Unsat/Proof

Page 23: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Interaction

Simplify

Logical Formula

Page 24: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Interaction

ImpliedEqualities

- x and y are equal- z + y and x + z are equal

Logical Formula

Page 25: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Interaction

QuantifierEliminatio

n

Logical Formula

Page 26: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Interaction

Logical Formula

Unsat. Core

Page 27: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

APPLICATIONS OF Z3

Page 28: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

A Decision Engine for SoftwareSome Microsoft engines:- SDV: The Static Driver Verifier- PREfix: The Static Analysis Engine for C/C++.- Pex: Program EXploration for .NET.- SAGE: Scalable Automated Guided Execution - Spec#: C# + contracts- VCC: Verifying C Compiler for the Viridian Hyper-Visor- HAVOC: Heap-Aware Verification of C-code.- SpecExplorer: Model-based testing of protocol specs.- Yogi: Dynamic symbolic execution + abstraction.- FORMULA: Model-based Design- F7: Refinement types for security protocols- M3: Model Program Modeling- VS3: Abstract interpretation and Synthesis

They all use the SMT solver Z3.

Hyper-V

Page 29: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Applications

Test case generation

Verifying Compilers

Predicate Abstraction

Invariant Generation

Type Checking

Model Based Testing

Page 30: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Theorem Provers/Satisfiability Checkers

A formula F is validIff

F is unsatisfiable

Theorem Prover/Satisfiability Checker

SatisfiableModel

FUnsatisfiableProof

Page 31: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Theorem Provers/Satisfiability Checkers

A formula F is validIff

F is unsatisfiable

Theorem Prover/Satisfiability Checker

SatisfiableModel

FUnsatisfiableProof

Timeout Memout

Page 32: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Verification/Analysis Tool: “Template”

Verification/AnalysisTool

Theorem Prover/Satisfiability Checker

Problem

Logical Formula

UnsatisfiableSatisfiable(Counter-example)

Page 33: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

SMT EXAMPLEBasic

Page 34: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Is formula satisfiable modulo theory T ?

SMT solvers have specialized algorithms for

T

Satisfiability Modulo Theories (SMT)

Page 35: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

ArithmeticArray Theory Uninterpreted Functions

𝑥+2=𝑦⇒ 𝑓 (𝑠𝑒𝑙𝑒𝑐𝑡 (𝑠𝑡𝑜𝑟𝑒 (𝑎 ,𝑥 ,3 ) , 𝑦−2 ) )= 𝑓 (𝑦−𝑥+1)

Satisfiability Modulo Theories (SMT)

Page 36: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

SMT EXAMPLEJob Shop Scheduling

Page 37: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Job Shop Scheduling

Machines

Jobs

P = NP? Laundry 𝜁 (𝑠 )=0⇒ 𝑠=12 +𝑖𝑟

Tasks

Page 38: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Constraints:Precedence: between two tasks of the same job

Resource: Machines execute at most one job at a time

413

2

[ 𝑠𝑡𝑎𝑟 𝑡2,2 ..𝑒𝑛𝑑2,2 ]∩ [ 𝑠𝑡𝑎𝑟 𝑡4,2 ..𝑒𝑛𝑑4,2 ]=∅

Job Shop Scheduling

Page 39: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Constraints:Encoding:

Precedence: - start time of job 2 on mach 3 - duration of job 2 on mach 3Resource:

413

2

[ 𝑠𝑡𝑎𝑟 𝑡2,2 ..𝑒𝑛𝑑2,2 ]∩ [ 𝑠𝑡𝑎𝑟 𝑡4,2 ..𝑒𝑛𝑑4,2 ]=∅

Not convex

Job Shop Scheduling

Page 40: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Job Shop Scheduling

Page 41: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Job Shop Scheduling

case split

case split

Efficient solvers:- Floyd-Warshal algorithm- Ford-Fulkerson algorithm

𝑧−𝑧=5 – 2 –3 – 2=−2<0

Page 42: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

EXAMPLE EXERCISESFollow-on to lectures by Hoare and Petersen

Page 43: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

True or false?

SMT-LIB2 formulation:(define-fun Max ((x Int) (y Int)) Int (if (> x y) x y))(define-fun ominus ((x Int) (y Int)) Int (Max 0 (- x y)))(declare-const x Int)(declare-const a Int)(assert (not (<= (+ (ominus x a) a) x)))(check-sat)(get-model)

Basic Arithmetic

Page 46: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Follow on Exercises

Prove the other properties for WP

Prove Axioms for Hoare and Milner calculi

Page 47: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

SMT APPLICATIONTest case generation

Page 48: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Test generation tools

SAGE

Internal. For Security Fuzzing

Runs on x86 instructions

External. For Developers

Runs on .NET code

Try it on: http://pex4fun.com

Finding security bugs before the hackers

black hat

Page 49: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

SMT APPLICATIONStatic Analysis

Integrating Z3 with PREfix static analyzer. Yannick Moy, David Selaff, B

Page 50: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

int binary_search(int[] arr, int low, int high, int key)

while (low <= high) { // Find middle value int mid = (low + high) / 2; int val = arr[mid]; if (val == key) return mid; if (val < key) low = mid+1; else high = mid-1; } return -1;}

void itoa(int n, char* s) {

if (n < 0) { *s++ = ‘-’; n = -n; } // Add digits to s ….

-INT_MIN= INT_MIN

(INT_MAX+1)/2 +(INT_MAX+1)/2

= INT_MIN

Package: java.util.ArraysFunction: binary_search

Book: Kernighan and RitchieFunction: itoa (integer to ascii)

What is wrong here?

Page 51: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

int init_name(char **outname, uint n){ if (n == 0) return 0; else if (n > UINT16_MAX) exit(1); else if ((*outname = malloc(n)) == NULL) { return 0xC0000095; // NT_STATUS_NO_MEM; } return 0;}

int get_name(char* dst, uint size) { char* name; int status = 0; status = init_name(&name, size); if (status != 0) { goto error; } strcpy(dst, name);error: return status;}

The PREfix Static Analysis Engine

C/C++ functions

Page 52: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

int init_name(char **outname, uint n){ if (n == 0) return 0; else if (n > UINT16_MAX) exit(1); else if ((*outname = malloc(n)) == NULL) { return 0xC0000095; // NT_STATUS_NO_MEM; } return 0;}

int get_name(char* dst, uint size) { char* name; int status = 0; status = init_name(&name, size); if (status != 0) { goto error; } strcpy(dst, name);error: return status;}

The PREfix Static Analysis Engine

C/C++ functions

model for function init_nameoutcome init_name_0: guards: n == 0 results: result == 0outcome init_name_1: guards: n > 0; n <= 65535 results: result == 0xC0000095outcome init_name_2: guards: n > 0|; n <= 65535 constraints: valid(outname) results: result == 0; init(*outname)

models

Page 53: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

int init_name(char **outname, uint n){ if (n == 0) return 0; else if (n > UINT16_MAX) exit(1); else if ((*outname = malloc(n)) == NULL) { return 0xC0000095; // NT_STATUS_NO_MEM; } return 0;}

int get_name(char* dst, uint size) { char* name; int status = 0; status = init_name(&name, size); if (status != 0) { goto error; } strcpy(dst, name);error: return status;}

The PREfix Static Analysis Engine

C/C++ functions

model for function init_nameoutcome init_name_0: guards: n == 0 results: result == 0outcome init_name_1: guards: n > 0; n <= 65535 results: result == 0xC0000095outcome init_name_2: guards: n > 0|; n <= 65535 constraints: valid(outname) results: result == 0; init(*outname)

path for function get_name guards: size == 0 constraints: facts: init(dst); init(size); status == 0

models

paths

warnings

pre-condition for function strcpy init(dst) and valid(name)

Page 54: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

6/26/2009 54

iElement = m_nSize;if( iElement >= m_nMaxSize ){

bool bSuccess = GrowBuffer( iElement+1 );…

}::new( m_pData+iElement ) E( element );m_nSize++;

Overflow on unsigned addition

m_nSize == m_nMaxSize == UINT_MAX

Write in unallocated

memory

iElement + 1 == 0

Code was written for

address space < 4GB

Page 55: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Using an overflown value as allocation size

ULONG AllocationSize;while (CurrentBuffer != NULL) {        if (NumberOfBuffers > MAX_ULONG / sizeof(MYBUFFER)) {             return NULL;

   }   NumberOfBuffers++;   CurrentBuffer = CurrentBuffer->NextBuffer;

}AllocationSize = sizeof(MYBUFFER)*NumberOfBuffers;UserBuffersHead = malloc(AllocationSize);

6/26/2009 55

Overflow check

Possible overflow

Increment and exit from loop

Page 56: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

LONG l_sub(LONG l_var1, LONG l_var2){ LONG l_diff = l_var1 - l_var2; // perform subtraction // check for overflow if ( (l_var1>0) && (l_var2<0) && (l_diff<0) ) l_diff=0x7FFFFFFF …

6/26/2009 56

Possible overflow

Forget corner case INT_MIN

Overflow on unsigned subtraction

Page 57: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

for (uint16 uID = 0; uID < uDevCount && SUCCEEDED(hr); uID++) {…

if (SUCCEEDED(hr)) { uID = uDevCount; // Terminates the loop

6/26/2009 57

Possible overflow

Loop does not terminate

uID == UINT_MAX

Overflow on unsigned addition

Page 58: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Using an overflown value as allocation size 

DWORD dwAlloc;dwAlloc = MyList->nElements * sizeof(MY_INFO);if(dwAlloc < MyList->nElements) … // returnMyList->pInfo = malloc(dwAlloc);

6/26/2009 58

Can overflow

Allocate less than needed

Not a proper test

Page 59: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Modular arithmetic

Bit-wise operations

1 0 1 0 1 1 0 1 1 0 0 1

1 0 1 0 1 1 0 1 1 0 0 1

=

Concatenation

1 0 1 0 1 1 [4:2] = 0 1 0

1 0 1 0 1 1

0 1 1 0 0 1

0 0 1 0 0 1=

1 0 1 0 1 1

0 1 1 0 0 1+

0 0 0 1 0 0

=

Extraction

Bit-wise and

AdditionVector

Segments

Vector Segments

Bit-precise analysis

Page 60: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Bit-precise analysis

FA

a0b0a0b1a0b2a0b3

a1b0a1b1a1b2

a2b0

HAHAHA

FA

FA

a2b1

a3b0

out0 out1 out2 out3

O(n2) clauses

SAT solving time increases exponentially. Similar for BDDs.

Brute-force enumeration + evaluation faster for 20 bits.

Method: Encode bit-vectors bit-by-bit into Boolean Satisfiability

Multiplication [out3, out2, out1, out0] = [a3, a2, a1, a0] * [b3, b2, b1, b0]

Page 61: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

SMT APPLICATIONVerified Software

Page 62: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Microsoft Verifying C Compiler

Page 63: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Hypervisor Verification (2007 – 2010)

Partners:• European Microsoft Innovation Center• Microsoft Research• Microsoft’s Windows Division• Universität des Saarlandes

co-funded by the German Ministry of Education and Research

http://www.verisoftxt.de

Page 64: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Hypervisor: Scale & Speed

• VCs have several Mega-bytes• Thousands of non ground formulas• Developers are willing to wait at most 5 min

per VC

Are you willing to wait more than 5 min for your compiler?

Page 65: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Verification Attempt Time vs.Satisfaction and Productivity

By Michal Moskal (VCC Designer and Software Verification Expert), Language quiz: “loose” or “lose” ?

Page 66: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Why did my proof attempt fail?

1. My annotations are not strong enough!weak loop invariants and/or contracts

2. My theorem prover is not strong (or fast) enough.Send “angry” email to Nikolaj and Leo.

Page 67: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

VCC Performance Trends Nov 08 – Mar 09

1

10

100

1000

Attempt to improve Boogie/Z3 interaction

Modification in invariant checking

Switch to Boogie2

Switch to Z3 v2

Z3 v2 update

Page 68: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

The Importance of Speed

Page 69: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Building VerveVerifie

d

Safe to the Last Instruction / Jean Yang & Chris Hawbliztl PLDI 2010

C# compiler

Kernel.cs

Boogie/Z3

Translator/Assembler

TAL checker

Linker/ISO generator

Verve.iso

Source file

Compilation toolVerification tool

Nucleus.bpl (x86) Kernel.obj (x86)

9 person-months

Page 70: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

SMT APPLICATIONFormal Language Theory and Security

Page 71: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Why string analysis?(motivating scenario)

Tomcat v. < 6.0.18

req = http://www.x.com/%c0%ae%c0%ae/%c0%ae%c0%ae/private/

Windows 2000 vulnerability: http://www.sans.org/security-resources/malwarefaq/wnt-unicode.php

Apache Tomcat vulnerability: http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2008-2938

1) security check: req must not contain "../"

2) dir = utf8decode("%c0%ae%c0%ae/%c0%ae%c0%ae/private/") = "../../private/"

access granted to "../../private/"

Analysis question:Does utf8decode reject overlong

utf8-encodings such as "%C0%AE" for '.'?

Page 72: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Symbolic Word Transducers

Relativized Formal Language Theory

Classical Word Transducers(e.g. decoding automata,

rational transductions)

Classical I/O Automata(e.g. Mealy machine)

ClassicalWord Acceptors

(NFA, DFA)

Symbolic Word Acceptors

regex matching

string transformation

Classical Word Acceptors modulo Th()

Classical Word Transducers modulo Th()

Page 73: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Rex & Bek – Symbolic RegEx & Transducers

Margus Veanes

Page 74: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Symbolic Finite Transducer (SFT)

• Classical transducer modulo a rich label theory• Core Idea: represent labels with guarded

transformation functions– Separation of concerns: finite graph / theory of labels

Concrete transitions:

p

q

Symbolic transition:

‘\x80’/“\xC2\x80”

… ‘\x7FF’/“\xDF\xBF”

q

p

x. 8016 ≤ x ≤ 7FF16/[C016|x10,6, 8016|x5,0]

guard

bitvector operations

1920transitions

Page 75: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

SMT APPLICATIONSoftware Model Checking

Page 76: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Static Driver Verifier

SLAM – part of SDV (Windows 7, 8)Z3 is used for:

Predicate abstraction Counter-example refinement

Yogi – next generation SDVZ3 is used for: Refining transition abstractions

Corral – uses reachability modulo theoriesSLAyer – checks memory safety using separation logic

Page 77: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Z3 & Static Driver Verifier: SLAM

• All-SAT– Better (more precise) Predicate Abstraction

• Unsatisfiable cores– Why the abstract path is not feasible?– Fast Predicate Abstraction

Page 78: Nikolaj  Bjørner Senior Researcher Microsoft Research  Redmond

Summary

Introduction to Z3: An Efficient SMT Solver

Introduction to Applications:

– Exemplary applications

– Tools built around Z3

Tomorrow. More technical:– Introduction to SAT solving.– Introduction to SMT solving.