Combining data-driven and symbolic reasoning for Invariant...

26
Combining data-driven and symbolic reasoning for Invariant Synthesis in SMT (Work in Progress) Haniel Barbosa Andrew Reynolds Cesare Tinelli Clark Barrett MVD 2018 2018–09–29, Iowa City, IA, USA

Transcript of Combining data-driven and symbolic reasoning for Invariant...

Page 1: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

Combining data-driven and symbolic reasoning forInvariant Synthesis in SMT

(Work in Progress)

Haniel BarbosaAndrew Reynolds

Cesare Tinelli

Clark Barrett

MVD 2018

2018–09–29, Iowa City, IA, USA

Page 2: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

SyGuS Solving

Page 3: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

CEGIS [Solar-Lezama et al. 2006]

B Most common technique for SyGuS solving

B Specification: x ≤ f(x, y) ∧ y ≤ f(x, y)

B Expression search space:

I Combinations of x, y, 0, 1,≤,+, if-then-else

Learningalgorithm

Verificationoracle

Counter-examples = {}

Counter-Exemplef(x=0,y=1)

Candidatef(x,y)=x

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 1 / 16

Page 4: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

CEGIS [Solar-Lezama et al. 2006]

B Most common technique for SyGuS solving

B Specification: x ≤ f(x, y) ∧ y ≤ f(x, y)

B Expression search space:

I Combinations of x, y, 0, 1,≤,+, if-then-else

Learning algorithm

Verificationoracle

Counter-exemples =

{f(x=0,y=1)}

Counter-Exemplef(x=1,y=0)

Candidatef(x,y)=y

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 1 / 16

Page 5: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

CEGIS [Solar-Lezama et al. 2006]

B Most common technique for SyGuS solving

B Specification: x ≤ f(x, y) ∧ y ≤ f(x, y)

B Expression search space:

I Combinations of x, y, 0, 1,≤,+, if-then-else

Learningalgorithm

Verificationoracle

Counter-examples =

{f(x=0,y=1)

f(x=1, y=0)

f(x=0, y=0)

f(x=1, y=1)}

CandidateITE(x ≤ y, y,x)

SUCCESS

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 1 / 16

Page 6: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

Scalability issues

For this bit-vector grammar, enumerating

B Terms of size = 1 : .05 seconds

B Terms of size = 2 : .6 seconds

B Terms of size = 3 : 48 seconds

B Terms of size = 4 : 5.8 hours

B Terms of size = 5 : ??? (100+ days)

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 2 / 16

Page 7: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

Divide-and-conquer [Alur et al. 2017]

B Generate partial solutions correct on subset of inputB Combine using conditionals

Only applicable for plainly separable specifications

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 3 / 16

Page 8: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

A new framework for SyGuS solving

Page 9: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

CegisUnif : combining CEGIS with unification

B Not limited to plainly separable specifications

B Data-driven: refinement lemmas generate data points

B Divide-and-conquer : each point yields a new function to synthesize

I Terms assigned to functions must satisfy refinement lemmasI SMT solving provides term candidates through constraint solving

Learningalgorithm

Verificationoracle

Counter-examples =

f (x=0,y=1)

f (x=1, y=0)

f (x=0, y=0)

f (x=1, y=1)

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 4 / 16

Page 10: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

CegisUnif : combining CEGIS with unification

B Not limited to plainly separable specifications

B Data-driven: refinement lemmas generate data points

B Divide-and-conquer : each point yields a new function to synthesize

I Terms assigned to functions must satisfy refinement lemmasI SMT solving provides term candidates through constraint solving

Learningalgorithm

Verificationoracle

Counter-examples =

f_1 (x=0,y=1)

f_2 (x=1, y=0)

f_3 (x=0, y=0)

f_4 (x=1, y=1)

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 4 / 16

Page 11: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

Feature synthesis

B Symbolic approach : derive minimal number of features that separateconflicting points (i.e. those that cannot be assigned the same term)I Optimal fairness criteria?

Currently: consider terms of size up to log2(#features)

B Heuristic approach : accumulate “feature pool” and chose separatingfeatures based on information gain heuristic for decision tree learning

I Select features that maximize information gain

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 5 / 16

Page 12: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

Solving Invariant synthesis with CegisUnif

Page 13: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

Invariant Synthesis

Add(Int x, y) { z := x; i := 0; assume(y > 0); while (i < y) { z := z + 1; i := i + 1; } return z;}

Post-condition: Result is the sum of the inputs

Verification:

z = x ∧ i = 0 ∧ y > 0 → Inv(x, y, z, i)Inv(x, y, z, i)∧ i < y ∧ z′ = z + 1 ∧ i′ = i+ 1 → Inv(x, y, z′, i′)Inv(x, y, z, i)∧ i ≥ y → z = x+ y

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 6 / 16

Page 14: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

Invariant Synthesis

Add(Int x, y) { z := x; i := 0; assume(y > 0); while (i < y) { z := z + 1; i := i + 1; } return z;}

Post-condition:

Invariant?

Result is the sum of the inputs

Verification:

z = x ∧ i = 0 ∧ y > 0 → Inv(x, y, z, i)Inv(x, y, z, i)∧ i < y ∧ z′ = z + 1 ∧ i′ = i+ 1 → Inv(x, y, z′, i′)Inv(x, y, z, i)∧ i ≥ y → z = x+ y

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 6 / 16

Page 15: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

Invariant Synthesis

Add(Int x, y) { z := x; i := 0; assume(y > 0); while (i < y) { z := z + 1; i := i + 1; } return z;}

Post-condition: Result is the sum of the inputs

Verification:

z = x ∧ i = 0 ∧ y > 0 → Inv(x, y, z, i)Inv(x, y, z, i)∧ i < y ∧ z′ = z + 1 ∧ i′ = i+ 1 → Inv(x, y, z′, i′)Inv(x, y, z, i)∧ i ≥ y → z = x+ y

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 6 / 16

Page 16: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

Invariant Synthesis in SyGuS

B State-of-the-art: LoopInvGen [Padhi and Millstein 2017]: data-drivenloop invariant inference with automatic feature synthesisI Precondition inference from sets of “good” and “bad” states

Feature synthesis for solving conflicts

I PAC (probably approximately correct) algorithm for building candidateinvariants

B “Bad” states are dependent on model of initial condition (noguaranteed convergence)

B No support for implication counterexamples

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 7 / 16

Page 17: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

Invariant Synthesis with CegisUnif

B Refinement lemmas allows derivation of three kinds on data points:

I “good points” (invariant must always hold)I “bad points” (invariant can never hold)I “implication points” (if invariant holds in first point it must hold in

second)

B No need for restriction to one initial state

B Native support for implication counterexamples

B Straightforward usage of classic information gain heuristic to buildcandidate solutions with decision tree learning

I SMT solver “resolves” implication counterexample points as “good” and“bad”

I Out-of-the-box Shannon entropy

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 8 / 16

Page 18: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

Preliminary results

Page 19: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

Invariant generation for Lustre

B Test suite with 487 invariant synthesis benchmarks generated by theKind 2 model checker from Lustre models

B We evaluate three configurations of CVC4

I cegis : regular CEGIS

I c unif : CegisUnif framework with symbolic solution building

I c unif-infogain : CegisUnif framework with solution building determinedby information gain heuristic

B 1800s timeout

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 9 / 16

Page 20: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

50 100 150 200 250 30010−1

100

101

102

103

CPU

time

(s)

c unif-infogaincegisc unif

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 10 / 16

Page 21: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

10−1 100 101 102 103

cegis10−1

100

101

102

103

cun

if

B + 38 / - 13

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 11 / 16

Page 22: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

10−1 100 101 102 103

c unif-infogain10−1

100

101

102

103

cun

if

B + 63 / - 19

10−1 100 101 102 103

c unif-infogain10−1

100

101

102

103

cegi

sB + 73 / - 42

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 12 / 16

Page 23: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

Invariants category from SyGuS-Comp 2018

B Test suite with 127 invariant synthesis benchmarks from numerousapplications

B We evaluate three configurations of CVC4

I cegis : regular CEGIS

I c unif : CegisUnif framework with symbolic solution building

I c unif-infogain : CegisUnif framework with solution building determinedby information gain heuristic

B We also compare against LoopInvGen, the current winner of theinvariants category in SyGuS-Comp

B 1800s timeout

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 13 / 16

Page 24: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

50 60 70 80 90 100 110 12010−1

100

101

102

103

CPU

time

(s)

loopinvgencegisc unifc unif-infogain

10−1 100 101 102 103

cegis10−1

100

101

102

103

cun

if

10−1 100 101 102 103

c unif10−1

100

101

102

103

cun

if-i

nfog

ain

10−1 100 101 102 103

cegis10−1

100

101

102

103

cun

if-i

nfog

ain

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 14 / 16

Page 25: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

Future work

B Adapt ICE [Garg et al. 2016] information gain heuristics to our setting;derive new heuristics

B Extend heuristics to function synthesis [Alur et al. 2017]

B Use data to determine “relevant arguments”

I f1(0, 0, 0, 1, 2, 1, 0) � f2(1, 0, 0, 5, 2, 1, 3)

I Reducing noise: make points as similar as possible

f ′1(1, 0, 0, 1, 2, 1, 0) � f ′

2(1, 0, 0, 5, 2, 1, 0)

I Only consider relevant arguments when synthesizing features

Can drastically reduce search space

Combining data-driven and symbolic reasoning for Invariant Synthesis (WIP) 15 / 16

Page 26: Combining data-driven and symbolic reasoning for Invariant ...clc.cs.uiowa.edu/mvd18/slides/Barbosa.pdf · Combining data-driven and symbolic reasoning for Invariant Synthesis in

References

Alur, Rajeev, Arjun Radhakrishna, and Abhishek Udupa (2017). “Scaling EnumerativeProgram Synthesis via Divide and Conquer”. In:Tools and Algorithms for Construction and Analysis of Systems (TACAS). Ed. byAxel Legay and Tiziana Margaria. Vol. 10205. Lecture Notes in Computer Science,pp. 319–336.

Garg, Pranav et al. (2016). “Learning invariants using decision trees and implicationcounterexamples”. In: Symposium on Principles of Programming Languages. Ed. byRastislav Bod́ık and Rupak Majumdar. ACM, pp. 499–512.

Padhi, Saswat and Todd D. Millstein (2017). “Data-Driven Loop Invariant Inference withAutomatic Feature Synthesis”. In: CoRR abs/1707.02029. arXiv: 1707.02029.

Solar-Lezama, Armando et al. (2006). “Combinatorial sketching for finite programs”. In:Architectural Support for Programming Languages and Operating Systems (ASPLOS).Ed. by John Paul Shen and Margaret Martonosi. ACM, pp. 404–415.