Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

48
Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić

Transcript of Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Page 1: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Learning Symbolic Interfaces of Software Components

Zvonimir Rakamarić

Page 2: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

This Work

Published at Static Analysis Symposium 2012 Joint work with Dimitra Giannakopoulou

(NASA) and Vishwanath Raman (CMU/NASA)

Page 3: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Introduction

Page 4: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Motivating Exampleclass Example { private static int x = 0; private static int y = 0;

public static void init(int p, int q) { x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

• init can be called unconditionally

• a can be called unconditionally

• b can be called after init only when y != 10

Page 5: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Goal

Learn temporal interfaces of software components Legal and illegal sequences of method calls

defined as an automaton Why?

Documentation Reverse engineering Model-based testing Regression testing Compositional verification …

Page 6: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Limitations of Prior Approaches

Since method b in Example cannot be called unconditionally after init, prior approaches either consider calling b after init an error no matter what

the values of the parameters it depends on are, or expect init to be manually partitioned

Page 7: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Our Contribution

class Example { ...}

Page 8: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Background

Page 9: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Symbolic Execution

Key idea: execution of programs using symbolic input values instead of concrete data

Concrete vs symbolic Concrete execution

Program takes only one path determined by input values

Symbolic execution Program can take any feasible path – coverage! Limited by the power of constraint solver Scalability issues when faced with large (exponential)

number of paths – path explosion

Page 10: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Symbolic Program State

Symbolic values of program variables Path condition (PC)

Logical formula over symbolic inputs Accumulates constraints that inputs have to satisfy

for the particular path to be executed If a path is feasible its PC is satisfiable

Program location

Page 11: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Symbolic Execution Tree

Characterizes execution paths constructed during symbolic execution

Nodes are symbolic program states Edges are labeled with program transitions

Page 12: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Example

1) int x, y;2) if (x > y) {3) x = x + y;4) y = x – y;5) x = x – y;6) if (x > y)7) assert false;8) }

Page 13: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

x:X, y:YPC:truex:X, y:YPC:true

x:X, y:YPC:X>Yx:X, y:YPC:X>Y

x:X, y:YPC:X<=Yx:X, y:YPC:X<=Y

x:X+Y, y:YPC:X>Y

x:X+Y, y:YPC:X>Y

x:X+Y, y:XPC:X>Y

x:X+Y, y:XPC:X>Y

x:Y, y:XPC:X>Yx:Y, y:XPC:X>Y

x:Y, y:XPC:X>Y Æ

Y>X

x:Y, y:XPC:X>Y Æ

Y>X

x:Y, y:XPC:X>Y Æ

Y<=X

x:Y, y:XPC:X>Y Æ

Y<=X

true

true false

false

SAT

SATUNSAT

SAT

1) int x, y;

2) if (x > y) {

3) x = x + y;

4) y = x – y;

5) x = x – y;

6) if (x > y)

7) assert false;

8) }

Page 14: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Active Automata Learning

D. Angluin, 1987: “Learning Regular Sets from Queries and Counterexamples”

Algorithm is called L* L* learns unknown regular language U (over

alphabet ) and produces minimal DFA A such that L(A) = U

Complexity of the original algorithm is O(||*|A|3)

Page 15: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Active Automata Learning cont.

L* learner communicates with a teacher using two types of queries

Membership queries: Should word w be included in L(A)? Expected answer: yes/no

Equivalence queries: Here is a conjectured DFA A – is L(A) = U? Expected answer: yes/no+counterexample

Page 16: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

L* Learner Teacher

word w

yes/no

DFA A

yes/no+cex

DFA A

Page 17: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

PSYCO Algorithm

Page 18: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Interface Learning with L*

L* uses a teacher to answer the following queries Membership queries

Whether or not a given sequence of method calls leads to an error or not in the implementation

Equivalence queries Whether a conjectured DFA captures all the behaviors

of the implementation

Page 19: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Answering Membership Queries

L* uses a teacher to answer the following queries Membership queries

Whether or not a given sequence of method calls leads to an error or not in the implementation

Equivalence queries Whether a conjectured DFA captures all the behaviors

of the implementation

Page 20: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Running Exampleclass Example { private static int x = 0; private static int y = 0;

public static void init(int p, int q) { x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

Page 21: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Executing query <init;b>

class Example { private static int x = 0; private static int y = 0;

public static void init(int p, int q) { x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

x:P, y:QPC: truex:P, y:QPC: true

OKPC: Q != 10

OKPC: Q != 10

p:P, q:QPC: truep:P, q:QPC: true

ERRORPC: Q == 10

ERRORPC: Q == 10

Page 22: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Executing query <init;b>

class Example { private static int x = 0; private static int y = 0;

public static void init(int p, int q) { x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

x:P, y:QPC: truex:P, y:QPC: true

OKPC: Q != 10

OKPC: Q != 10

p:P, q:QPC: truep:P, q:QPC: true

ERRORPC: Q == 10

ERRORPC: Q == 10

Page 23: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Executing query <init;b>

class Example { private static int x = 0; private static int y = 0;

public static void init(int p, int q) { x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

x:P, y:QPC: truex:P, y:QPC: true

p:P, q:QPC: truep:P, q:QPC: true

OKPC: Q != 10

OKPC: Q != 10

ERRORPC: Q == 10

ERRORPC: Q == 10

Page 24: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Refinement: Split init

public static void init(int p, int q) { x = p; y = q;}

public static void init_0(int p, int q) { assume q != 10; init(p, q);}public static void init_1(int p, int q) { assume q == 10; init(p, q);}

x:P, y:QPC: truex:P, y:QPC: true

p:P, q:QPC: truep:P, q:QPC: true

OKPC: Q != 10

OKPC: Q != 10

ERRORPC: Q == 10

ERRORPC: Q == 10

init_0 := init[q != 10]

init_1 := init[q == 10]

Page 25: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Restart Learning

public static void init(int p, int q) { x = p; y = q;}

public static void init_0(int p, int q) { assume q != 10; init(p, q);}public static void init_1(int p, int q) { assume q == 10; init(p, q);}

new learner alphabet:{init_0, init_1, a, b}

learning restarts, re-using results from previous iterations

x:P, y:QPC: truex:P, y:QPC: true

p:P, q:QPC: truep:P, q:QPC: true

OKPC: Q != 10

OKPC: Q != 10

ERRORPC: Q == 10

ERRORPC: Q == 10

Page 26: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Executing query <init_0;a;b>

class Example { private static int x = 0; private static int y = 0;

public static void init_0(int p, int q) { assume q != 10; x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

x:P, y:QPC: truex:P, y:QPC: true

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

OKPC: P != 0

p:P, q:QPC: truep:P, q:QPC: true

ERRORPC: P = 0ERROR

PC: P = 0

Page 27: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Executing query <init_0;a;b>

class Example { private static int x = 0; private static int y = 0;

public static void init_0(int p, int q) { assume q != 10; x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

x:P, y:QPC: truex:P, y:QPC: true

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

OKPC: P != 0

p:P, q:QPC: truep:P, q:QPC: true

ERRORPC: P = 0ERROR

PC: P = 0

Page 28: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Executing query <init_0;a;b>

x:P, y:QPC: truex:P, y:QPC: true

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

OKPC: P != 0

p:P, q:QPC: truep:P, q:QPC: true

ERRORPC: P = 0ERROR

PC: P = 0

class Example { private static int x = 0; private static int y = 0;

public static void init_0(int p, int q) { assume q != 10; x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

Page 29: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Executing query <init_0;a;b>

class Example { private static int x = 0; private static int y = 0;

public static void init_0(int p, int q) { assume q != 10; x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

x:P, y:QPC: truex:P, y:QPC: true

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

OKPC: P != 0

p:P, q:QPC: truep:P, q:QPC: true

ERRORPC: P = 0ERROR

PC: P = 0

Page 30: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Executing query <init_0;a;b>

class Example { private static int x = 0; private static int y = 0;

public static void init_0(int p, int q) { assume q != 10; x = p; y = q; } public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

x:P, y:QPC: truex:P, y:QPC: true

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

OKPC: P != 0

p:P, q:QPC: truep:P, q:QPC: true

ERRORPC: P = 0ERROR

PC: P = 0

Page 31: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Refinement: Split init_0

ERRORPC: P = 0ERROR

PC: P = 0

public static void init_0(int p, int q) { assume q != 10; x = p; y = q;}public static void init_0_0(int p, int q) { assume p == 0 && q != 10; init(p, q);}public static void init_0_1(int p, int q) { assume p != 0 && q != 10; init(p, q);}

x:P, y:QPC: truex:P, y:QPC: true

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

OKPC: P != 0

p:P, q:QPC: truep:P, q:QPC: true

init_0_0 := init[q != 10 && p == 0]

init_0_1 := init[q != 10 && p != 0]

Page 32: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Restart Learning

ERRORPC: P = 0ERROR

PC: P = 0

public static void init_0(int p, int q) { assume q != 10; x = p; y = q;}public static void init_0_0(int p, int q) { assume p == 0 && q != 10; init(p, q);}public static void init_0_1(int p, int q) { assume p != 0 && q != 10; init(p, q);}

new learner alphabet:{init_0_0, init_0_1, init_1, a, b}

learning restarts

x:P, y:QPC: truex:P, y:QPC: true

x:P, y:10PC: P = 0x:P, y:10PC: P = 0

x:P, y:11PC: P != 0x:P, y:11PC: P != 0

OKPC: P != 0

OKPC: P != 0

p:P, q:QPC: truep:P, q:QPC: true

Page 33: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Answering Equivalence Queries

L* uses a teacher to answer the following queries Membership queries

Whether or not a given sequence of method calls leads to an error or not in the implementation

Equivalence queries Whether a conjectured DFA captures all the behaviors

of the implementation

Page 34: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Unbounded Loops in Conjectures

Component have no loops, but conjectures do!

We unroll unbounded loops in conjectures a bounded number of times

Page 35: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Answering Equivalence Queries

Walk the conjectured automaton and extract all legal method sequences to a given depth k all illegal method sequences

for each illegal sequence of depth n, extract the legal sequence of depth n - 1

We then use membership queries to check the outcome of each sequence If a sequence is misclassified by the learner, we

have a counterexample for L*

Page 36: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Running Example: Depth is 2

class Example { private static int x = 0; private static int y = 0;

public static void init(int p, int q) { x = p; y = q; }

public static void a() { if (x == 0) y = 10; else y = 11; } public static void b() { if (y == 10) assert false; }}

Page 37: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Running Example: Depth is 3

Page 38: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Implementation and Experiments

Page 39: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Architecture of PSYCO

Page 40: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Implementation of PSYCO

Implemented on top of Java PathFinder (JPF) software model checking infrastructurehttp://babelfish.arc.nasa.gov/trac/jpf

PSYCO-related modules jpf-psyco: interface generation for Java classes

including parameters uses jpf-learn and jpf-jdart

jpf-learn: implements L* jpf-jdart: symbolic execution in JPF

actually DART/concolic

Page 41: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Experiments

Example Methods

k-max

k-min

Conjectures

Refinements

Alphabet States

SIGNATURE 5 7 2 2 0 5 4

PIPEDOUTPUTSTREAM

4 7 2 2 1 5 3

INTMATH 8 1 1 1 7 16 3

ALTBIT 2 27 4 8 3 5 5

CEV-FLIGHTRULE 3 3 3 3 2 5 3

CEV 18 3 3 10 6 24 9

k-max is the maximum exploration depth reached in one hourk-min is the depth when we realized the expected interface

Automata do not change between k-min and k-max, and are k-max-full

Page 42: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Summary

Page 43: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Summary

Combined automata learning and symbolic techniques for temporal interface generation Generating richer interfaces with symbolic method

guards Implemented a prototype tool in Java PathFinder

Works well on realistic examples Equivalence queries are a potential bottleneck

Page 44: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Our Contribution cont.

We learn 3-valued Deterministic Finite Automata

mod(p, q)[q > 0 && p >= 0]

mod(p, q)[q <= 0 || p < 0]

div(p, q)[q == 0]

div(p, q)[q != 0]

ERROR

DON’T KNOW

INITIAL

Page 45: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Using 3-Valued DFA

mod(p, q)[q > 0 && p >= 0]

mod(p, q)[q <= 0 || p < 0]

div(p, q)[q == 0] div(p, q)

[q != 0]

ERROR

INITIAL

Underlying solver returns “Don’t Know”

Page 46: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Using 3-Valued DFA cont.

We learn 3-valued Deterministic Finite Automata

mod(p, q)[q > 0 && p >= 0]

mod(p, q)[q <= 0 || p < 0]

div(p, q)[q == 0]

div(p, q)[q != 0] DON’T KNOW

INITIAL

ERROR

Page 47: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Definition of k-full Interface

Interface is k-safe if all legal sequences in the automata to depth k are also legal executions in the component

Interface is k-permissive if all illegal sequences in the automata to depth k also lead to errors in the component

Interface is k-tight if all sequences to depth k leading to the don’t know state in the automata cannot be resolved in the component

Interface that is k-safe, k-permissive, and k-tight is k-full

Page 48: Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.

Guarantees of PSYCO Algorithm

Theorem: If the behavior of a component C can be characterized by an interface DFA, then PSYCO terminates with a k-full interface for C. Proof is in the SAS paper No unbounded loops/recursion in components No “mixed parameters”