Syntax-Guided Synthesis

Syntax-Guided Synthesis

Rajeev Alur

Joint work with R.Bodik, G.Juniwal, M.Martin, M.Raghothaman, S.Seshia, R.Singh, A.Solar-Lezama, E.Torlak, A.Udupa

1

Program Verification

Does a program P meet its specification j, where j is written as a logical formula?

Motivation: Correctness of systems, finding bugs

Program verification is hard!

Formalizing a structured program into logical formulas

Using tools (SMT solvers) to verify whether the formalized program meets its specification.

SMT-LIB – common standards and library of benchmarks of SMT solvers.

2

Program Synthesis Automatically synthesize a program P that satisfies a given

specification j

Can potentially have greater impact than program verification

Program synthesis is hard!

Let’s provide a syntactic template for the program – Syntax-Guided Synthesis (SyGuS)

Works on special cases already exist (e.g. Sketch 2008)

Let’s build a common standard and benchmarks for SyGuS solvers (SYNTH-LIB)

3

Talk Outline

Background: SMT Solvers

Formalization of SyGuS

Solution Strategies

Conclusions + SyGuS Competition

4

What is SMT?

Satisfiability Modulo Theories

+

Magnus Madsen

Recall SAT

The Boolean SATisfiability Problem:

• A=TRUE, =FALSE, =FALSE

literal or negated literal

Magnus Madsen

Recall SAT

• SAT is NP-complete (solveable in exponential time)

• Many SAT solvers exist – DPLL (1962) – Chaff (2001)– MiniSAT (2004)

• Some do remarkably well.Magnus Madsen

What is an SMT instance?

A logical formula built using– negation, conjunction and disjuction

• e.g. • e.g.

– theory specific operators• e.g. , • e.g.

theory of integers

theory of bitwise

operators

Magnus Madsen

Q: Why not encode every

formula in SAT?A: Theory

solvers have very efficient

algorithmsGraph Problems:

• Shortest-Path• Minimum Spanning Tree

Optimization:• Max-Flow• Linear Programming

(just to name a few)Magnus Madsen

Q: But then, Why not get rid

of the SAT solver?

A: SAT solvers are being

studied for a long time

Magnus Madsen

SAT Theory

Formula

YES

𝑥≥3∧ (𝑥≤0∨ 𝑦 ≥0 )

𝑎∧ (𝑏∨𝑐 )

𝑎∧𝑏

NO

add clause:

𝑎∧𝑐

𝑥≥3∧𝑥≤0𝑥≥3∧ 𝑦 ≥0

YES

SMT Solver

Magnus Madsen

Theories

Theory of:– Difference Arithemetic– Linear Arithmetic– Arrays– Bit Vectors– Algebraic Datatypes– Uninterpreted Functions

Magnus Madsen

C. Barrett & S. A. Seshia ICCAD 2009 Tutorial 13

Equivalence Checking of Program Fragmentsint fun1(int y) { int x, z; z = y; y = x; x = z;

return x*x;}

int fun2(int y) { return y*y;} What if we use SAT to check equivalence?

SMT formula Satisfiable iff programs non-equivalent

( z = y y1 = x x1 = z ret1 = x1*x1) ( ret2 = y*y ) ( ret1 ret2 )



return x*x;}

int fun2(int y) { return y*y;}

SMT formula Satisfiable iff programs non-equivalent

( z = y y1 = x x1 = z ret1 = x1*x1) ( ret2 = y*y ) ( ret1 ret2 )

Using SAT to check equivalence (w/ Minisat) 32 bits for y: Did not finish in over 5 hours 16 bits for y: 37 sec. 8 bits for y: 0.5 sec.



return x*x;}

int fun2(int y) { return y*y;}

SMT formula ’

( z = y y1 = x x1 = z ret1 = sq(x1) ) ( ret2 = sq(y) ) ( ret1 ret2 )

Using EUF solver: 0.01 sec

Verification Synthesis

16

Program Verification: Does P meet spec j ?

SMT: Is j satisfiable ?

SMT-LIB/SMT-COMP Standard API Solver competition

Program Synthesis: Find P that meets spec j

Syntax-Guided Synthesis

Plan for SyGuS-comp

Talk Outline

Formalization of SyGuS

Solution Strategies


17

Syntax-Guided Synthesis (SyGuS) Problem Fix a background theory T: fixes types and operations

Function to be synthesized: name f along with its type

Inputs to SyGuS problem:Specification j (semantic constraint)

Typed formula using symbols in T + symbol f

Set E of expressions given by a context-free grammarSet of candidate expressions that use symbols in T

(syntactic constraint)

Computational problem: Output e in E such that j[f/e] is valid (in theory T)

18

SyGuS Example Theory QF-LIA

Types: Integers and BooleansLogical connectives, Conditionals, and Linear arithmeticQuantifier-free formulas

Function to be synthesized f (int x, int y) : int

Specification: (x ≤ f(x,y)) & (y ≤ f(x,y)) & (f(x,y) = x | f(x,y) = y)

Candidate Implementations: Linear expressionsLinExp := x | y | Const | LinExp + LinExp | LinExp - LinExp

No solution exists

19

SyGuS Example Theory QF-LIA

Function to be synthesized: f (int x, int y) : int

Specification: (x ≤ f(x,y)) & (y ≤ f(x,y)) & (f(x,y) = x | f(x,y) = y)

Candidate Implementations: Conditional expressions with comparisons

Term := x | y | Const | If-Then-Else (Cond, Term, Term)Cond := Term <= Term | Cond & Cond | ~ Cond | (Cond)

Possible solution:If-Then-Else (x ≤ y, y, x)

…. Solving SyGus is hard!20

Talk Outline

Solution Strategies


21

Solving SyGuS as Active Learning:

22

Learning Algorithm

Verification Oracle

Initial examples I

Fail Success

CandidateExpression

Counterexample

Concept class: Set E of expressions

Examples: Concrete input values

Counter-Example Guided Inductive Synthesis

CEGIS Example Specification: (x ≤ f(x,y)) & (y ≤ f(x,y)) & (f(x,y) = x | f(x,y) = y)

Set E: All expressions built from x,y,0,1, Comparison, +, If-Then-Else

23

LearningAlgorithm

Verification Oracle

Examples = { }Candidatef(x,y) = x

Example(x=0, y=1)



24

LearningAlgorithm

Verification Oracle

Examples = {(x=0, y=1) } Candidate

f(x,y) = y

Example(x=1, y=0)



25

LearningAlgorithm

Verification Oracle

Examples = {(x=0, y=1) (x=1, y=0) (x=0, y=0) (x=1, y=1)} Candidate

ITE (x ≤ y, y,x)

Success

SyGuS Solutions CEGIS approach (Solar-Lezama, Seshia et al)

Related work: Similar strategies for solving quantified formulas and invariant generation

Coming up: Learning strategies based on:Enumerative (search with pruning): Udupa et al (PLDI’13)Symbolic (solving constraints): Gulwani et al (PLDI’11)Stochastic (probabilistic walk): Schkufza et al (ASPLOS’13)

26

Enumerative Learning Find an expression consistent with a given set of concrete

examples

Enumerate expressions in increasing size, and evaluate each expression on all concrete inputs to check consistency

Key optimization for efficient pruning of search space (examples):Expressions e1 and e2 are equivalent if e1(a,b)=e2(a,b) on all concrete values (x=a,y=b) in Examples Only one representative among equivalent subexpressions needs to be considered for building larger expressions

Fast and robust for learning expressions with ~ 15 nodes

27

Symbolic Learning Use a constraint solver for both the synthesis and verification

steps

28

Each production in the grammar is thought of as a component.Input and Output ports of every component are typed.

A well-typed loop-free program comprising these component corresponds to an expression DAG from the grammar.

ITETerm

TermTerm

Cond>=

Term Term

Cond

+

Term Term

Term

xTerm

yTerm

0Term

1Term

Symbolic Learning

29

xn1

xn2

yn3

yn4

0n5

1n6

+n7

+n8

>=n9

ITEn10

Synthesis Constraints:Shape is a DAG, Types are consistentSpec j[f/e] is satisfied on every concrete input values in Examples

Use an SMT solver (Z3) to find a satisfying solution.

If synthesis fails, try increasing the number of occurrences of components in the library in an outer loop

Start with a library consisting of some number of occurrences of each component.

Symbolic Learning - example Iteration 1:

30

Learned counter-example: <x= -1, y=0>

x


31

Learned counter-example: <x= 0, y=-1>

x

ITE

≥ y

xx


32

Learned counter-example: -

x

ITE

≥ y

xy

Stochastic Learning Idea: Use the Metropolis-Hastings Method to find desired

expression e by probabilistic walk on graph where nodes are expressions and edges capture single-edits.

…..in simple words:

Let En be the expressions of size n (n picked randomly).

For every expression e in En set Score(e) between 0 and 1 (“Extent to which e meets the spec φ”)

Score(e) = exp( - 0.5 Wrong(e)), where Wrong(e) = No of examples in I for which ~ j [f/e]

Score(e) is large when Wrong(e) is small. Expressions e with Wrong(e) = 0 more likely to be chosen in the limit than any other expression

33

Initial candidate expression e sampled uniformly from En

When Score(e) = 1, return e

Pick node v in parse tree of e uniformly at random. Replace subtree rooted at e with subtree of same size, sampled uniformly

Stochastic Learning

34

+z

e

+yx

+z

e’

-1z

With probability min{ 1, Score(e’)/Score(e) }, replace e with e’ Repeat until finding e’ such that Score(e’) = 1. Outer loop responsible for updating expression size n

Specification: (x ≤ f(x,y)) & (y ≤ f(x,y)) & (f(x,y) = x | f(x,y) = y) Set E: All expressions built from x,y,0,1, Comparison, +, If-Then-

Else

Stochastic Learning - example

35

Suppose n = 6 ; there are 768 expressions of size 6

e = ITE(x ≤ 0,y,x) is picked with probability 1/768

The condition x ≤ 0 is mutated to y ≤ 0 with probability 1/6 X 1/48

Suppose the set of concrete examples is: {(-1,-4),(-1, 3),(-1, 2),(1,1),(1,2)} Then Score(e)=exp(-0.5X2), and Score(e’)=exp(-0.5X4)

As e’ is replaced with e with probability exp(-0.5X2)

Note that for e’’ = ITE(x<= y,y,x) we have Score(e’’)=1.

Benchmarks and Implementation Prototype implementation of Enumerative/Symbolic/Stochastic

CEGIS

Benchmarks:Bit-manipulation programs from Hacker’s delightInteger arithmetic: Find max, search in sorted arrayChallenge problems such as computing Morton’s number

Multiple variants of each benchmark by varying grammar

Results are not conclusive as implementations are unoptimized, but offers first opportunity to compare solution strategies

36

Evaluation: Integer Benchmarks

37

array_search_2.sl

array_search_3.sl

array_search_4.sl

array_search_5.sl

max2.sl max3.sl0.01

0.1

1

10

100

1000

Relative Performance of Integer Benchmarks

Enumerative Stochastic (median) Symbolic

app

roxi

mat

e tim

e in

sec.

Evaluation 3: Hacker’s Delight Benchmarks

38

hd-01-d

0-prog

.sl

hd-01

-d5-pr

og.sl

hd-02-d

0-prog

.sl

hd-03

-d0-prog

.sl

hd-03-d

1-prog

.sl

hd-03

-d5-prog

.sl

hd-05

-d1-pr

og.sl

hd-06-d

0-prog

.sl

hd-07-d1-p

rog.sl

hd-09-d

1-prog

.sl

hd-10

-d1-prog

.sl

hd-11-d

0-prog

.sl

hd-11

-d1-prog

.sl

hd-11

-d5-prog

.sl

hd-13-d

0-prog

.sl

hd-13-d5-p

rog.sl

hd-14

-d0-prog

.sl

hd-14

-d1-prog

.sl

hd-14-d

5-prog

.sl

hd-15-d0-p

rog.sl

hd-15

-d1-pr

og.sl

hd-15-d

5-prog

.sl

hd-17-d0-p

rog.sl

hd-17

-d1-prog

.sl

hd-17

-d5-prog

.sl

hd-18-d

1-prog

.sl

hd-18

-d5-prog

.sl

hd-19

-d1-prog

.sl

hd-20

-d0-prog

.sl

hd-20-d5-p

rog.sl

0.01

0.1

1

10

100

1000

Relative Performance on a Sample of Hacker's Delight Benchmarks

Enumerative Stochastic (median) Symbolic

appr

oxim

ate

time

in se

c.

Evaluation Summary Enumerative CEGIS has best performance, and solves many

benchmarks within secondsPotential problem: Synthesis of complex constants

Symbolic CEGIS is unable to find answers on most benchmarksCaveat: Sketch succeeds on many of these

Choice of grammar has impact on synthesis timeWhen E is set of all possible expressions, solvers struggle

None of the solvers succeed on some benchmarksMorton constants, Search in integer arrays of size > 4

Bottomline: Improving solvers is a great opportunity for research !

39

Plan for SyGuS-Comp Proposed competition of SyGuS solvers at FLoC, July 2014

Organizers: Alur, Fisman (Penn) and Singh, Solar-Lezama (MIT)

Website: excape.cis.upenn.edu/Synth-Comp.html

Mailing list: [email protected]

Call for participation:Join discussion to finalize synth-lib format and competition formatContribute benchmarksBuild a SyGuS solver

40

mailto:[email protected]

Syntax-Guided Synthesis

Documents

Transcript of Syntax-Guided Synthesis