Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink...

43
Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion, Israel
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    224
  • download

    2

Transcript of Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink...

Page 1: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Static Specification Mining Using Automata-Based Abstractions

Sharon Shoham Eran Yahav Stephen Fink

IBM T.J. Watson Research Center

Marco Pistoia

Technion, Israel

Page 2: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Finding What’s There(but is hard to find)

Sharon Shoham Eran Yahav Stephen Fink

IBM T.J. Watson Research Center

Marco Pistoia

Technion, Israel

Page 3: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Component APIs Are Complicated

There is only one thing more painful than learning fromexperience and that is not learning from experience.– Archibald MacLeish

Page 4: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Temporal API Specifications

Legal interactions with a componentWhat methods could be called at every state

finishConnect read,write

finishConnectread,write

close

close

0 1 2 3 4 5

config connect

java.nio.channels.SocketChannel (partial spec)

Page 5: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Applications

Program understanding

Regression Deviant behaviorsSpecs for verification…

Page 6: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Mining Temporal Specifications

Component-side mining Infer usage from component implementation Relies on error conditions in component implementation

Client-side mining Infer usage from existing clients using the component

Real usage scenarios << Permitted scenarios

connect; close;

close

connect; write; write; close;

connect; write; close;

connect; read; close;

Page 7: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Dynamic vs. Static Specification Mining

Dynamic Mine specification from representative executions Requires running the program (with varying inputs) Incomplete coverage of behaviors

Static Cover all client behaviors Challenging

Our approach Static client-side specification mining Bad news: this is hard Good news: we can still make it work

Page 8: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Example

How do I use a java.nio.channels.SocketChannel?

Page 9: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

void example() { Collection<SocketChannel> chnls = createChannels(); for (SocketChannel sc : chnls){ sc.connect(new …); while (!sc.finishConnect()) { /* ... wait for connection ... */ } if (?) { receive(sc); } else { send(sc); } } closeAll(channels); }

Collection<SocketChannel> createChannels() { List<SocketChannel> list = new

LinkedList<SocketChannel>(); list.add(createChannel(“ ", 80)); //… more channels added to list … return list; }

SocketChannel createChannel (String hostName, int port) {

SocketChannel sc = SocketChannel.open(); sc.configureBlocking(false); return sc; }

Page 10: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

void receive(SocketChannel x) { //… FileOutputStream fos = new …; ByteBuffer dst = …; int numBytesRead = 0; while (numBytesRead >= 0) { numBytesRead = x.read(dst); fos.write(dst.array()); } fos.close(); }

void send(SocketChannel x) { for (?) { … int numWritten = x.write(buf); } }

void example() { Collection<SocketChannel> chnls = createChannels(); for (SocketChannel sc : chnls){ sc.connect(new …); while (!sc.finishConnect()) { /* ... wait for connection ... */ } if (?) { receive(sc); } else { send(sc); } } closeAll(channels); }

void closeAll (Collection<SocketChannel> chnls) {

for (SocketChannel sc : chnls) { sc.close(); } }

Bad News

Interprocedural Flow

Flow Sensitivity

Context Sensitivity

Non-trivial aliasing

Page 11: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

void example() { Collection<SocketChannel>

chnls = createChannels(); for (SocketChannel sc : chnls){ sc.connect(new …); while (!sc.finishConnect()) { …

} if (?) { receive(sc); }

else { send(sc); } } closeAll(channels); }

SocketChannel createChannel (…)

{ SocketChannel sc =

SocketChannel.open(); sc.configureBlocking(false); return sc; }

void receive(SocketChannel x) { … while (numBytesRead >= 0) { numBytesRead = x.read(dst); fos.write(dst.array()); } … }}

sc=open()

cfgsc.cfg

cnccfgsc.cnc

fincnccfgsc.fin

… …

fincnccfg … finsc.fin

fincnccfg…

fin rdx.read

fincnccfg…

fin rd … rdx.read

… …

Page 12: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

finishConnect

SocketChannel Specification

read,write

finishConnectread,write

close

close

0 1 2 3 4 5config connect

(Partial specification)

Page 13: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Challenges

Dynamically allocated objects

unbounded number of objects

aliasing

objects flow through complex heap-allocated data structures

heap abstraction

Unbounded length of event sequences

event sequence observed for an object might be unbounded

event sequence abstraction

Noise

analysis imprecision and/or incorrect client programs

Noise reduction

Page 14: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Overview

Page 15: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Abstract Trace Collection

Abstract InterpretationAbstract value

• Heap abstraction: abstracts unbounded heap• Trace abstraction: abstracts unbounded sequences of

operations

Initial heap abstraction partition the heap into a fixed partition (based on allocation

site)

AS2

AS2

AS3

AS3

AS3

AS1

AS2

AS1

fincnccfg fin fincnccfg cnccfgfincnccfg …

Page 16: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

void example() { Collection<SocketChannel>

chnls = createChannels(); for (SocketChannel sc : chnls){ sc.connect(new …); while (!sc.finishConnect()) { …

} if (?) { receive(sc); }

else { send(sc); } } closeAll(channels); }

SocketChannel createChannel (…)

{ SocketChannel sc =

SocketChannel.open(); // AS1 sc.configureBlocking(false); return sc; }

void receive(SocketChannel x) { … while (numBytesRead >= 0) { numBytesRead = x.read(dst); fos.write(dst.array()); } … }}

sc=open() <AS1, >

sc.cfgcfg

<AS1, > <AS1, >

<AS1, >cnccfg

sc.cnccnc

<AS1, >

<AS1, >

… …

[write, connect,close, finCon,config, read]

[write, connect,close, finCon,read]

Page 17: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Heap data for an “abstract object” o• unique = true

• abstract value represents a single object

• must = {x.f}• the access path x.f must point to o

• mustNot = {y.g}• the access path y.g must not to point to o

• …

Must points-to information allows strong updates

<AS1, must: { sc }, >

Refined Heap Abstraction

sc.cfgcfg

<AS1, must : { sc }, > <AS1, >

sc=open()

Page 18: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

History Abstraction

Abstract history Automaton over-approximating unbounded event

sequences

Quotient-based abstractions for history Automata states which are equivalent w.r.t. a given

equivalence relation R are merged

<allocated at AS1, >fincnccfg

<o1, >fincnccfg

<o2, >fincnccfg

<allocated at ASk, >fincnccfg…

…<AS1, >fincnccfg

…fin read

?

Page 19: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

History Abstraction

Past-Future Abstraction (q1,q2) R[k1,k2] if q1 and q2 share both an incoming sequence of length k1 and an outgoing sequence of length k2

aa

ca

aa

bc

ca

b c

ca

bc

ca

bc

Past 1 Examples Future 1 Example

Page 20: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Abstract Semantics Initial abstract history

empty sequence automatonWhen an API method is invoked

history extended: append event and construct quotient

fincnccfgwhile (!sc.finCon) {

finfincnccfg

Past 1 equivalent} //endof while fincnccfg fincnccfg

fin

fin

fincnccfg

cfg

cnccfg

sc = open

sc.config

sc.connect

Page 21: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Are We Done?

Bounded is great, but not enough

Merge histories at control flow join points Speed up convergence

Merge all histories that have identical heap-data, and

satisfy a given merge criterion

Merge: union construction followed by quotient construction

a ab

x.a()

x.b()

if(?)

Page 22: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

endof while fincnccfg fincnccfg

fin

quo

fincnccfgfin

Example: Past Abstraction with Exterior Merge

union

fincnccfg

fincnccfg

fin

Page 23: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Recap: Abstraction Dimensions

0

[write, connect,close, finCon,config, read]

1

[write, connect,close, finCon,read]

0config

1

[write, connect,close, finCon,read]

0 1

3 42

5

6

config

close

connect

fincon

connect read

write

read

write

connect

connectclose

close

0

1

2

[write, connect,config, finCon,read]

[write, connect,finCon, read]

close

close close

Base APFocus (refined heap abstraction)

Pas

t /

Tot

alP

ast

/ E

xter

ior

close

Third dimension: different history abstraction, not shown here

Heap Abstraction

Merge Criteria

History Abstraction

0

[write, connect,close, finCon,config, read]

1

[write, connect,close, finCon,read]

0 1

3 4

2

5

6

config

close

connect

fincon

connectread

write

read

write

connect

connectclose

close

close

Page 24: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Summarization Phase: Noise

Analysis imprecisionBugs in training corpus

initV

Trace collection results

signup0 1 2 3

upinitS

initS

verifyup0 1’ 2’ 3’

upinitV

initV

verifyup0 1’ 2’ 3’

up

initV

n

k

1

verify

Real API

sign

up

initV

initS

up

verify

java.security.Signature

Page 25: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Naïve UnionNaïve Union

signup

0

1 2 3

up

initS initS

verifyup1’ 2’ 3’

initVinitV

upinitV

Trace collection results

signup0 1 2 3

upinitS

initS

verifyup0 1’ 2’ 3’

upinitV

initV

verifyup0 1’ 2’ 3’

up

initV

n

k

1

verify

verify

No noise reductionSound summary

Page 26: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Weighted UnionWeighted Union

initV

Trace collection results

signup0 1 2 3

upinitS

initS

verifyup0 1’ 2’ 3’

upinitV

initV

verifyup0 1’ 2’ 3’

up

initV

n

k

1

verify

Label each transition with number of input automata that contain it Transitions with weight < threshold are removed

signup

0

1 2 3

up

initS initS

verifyup1’ 2’ 3’

initV

up

n

k+1

initV

verify

1

Page 27: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

ClusteringClustering

signup0 1 2 3

initS

initS

initV verifyup0 1’ 2’ 3’

up

initV

1initS

n

k

1

initV

signup0 1 2 3

upinitS

initS

verifyup0 1’ 2’ 3’

up

initV

initV

verifyup0 1’ 2’ 3’

up

initV

1initS

n

k

1

Automata partitioned into clusters of “similar” automata, each cluster summarized separatelySimilarity = language inclusion

Trace collection results

Page 28: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Experimental Results

Mined various APIs from a suite of benchmarks APIs from Java libraries

• java.security.Signature, java.security.KeyAgreement,…

• Ganymed • Session, Connection, ConnectionManager,…

• FlickrAPI• Photo, Auth, …

Page 29: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

java.security.SignatureBase/Past/Total

Base/Past/Exterior

APFocus/Past/Exterior

Page 30: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Ganymed Session

Base/Past/Exterior

APFocus/Past/Exterior

(all results here are actual images produced by the tool)

Page 31: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Lessons from Experiments

Precise heap abstractions AND history abstractions needed

Pragmatics Summaries other than union do not guarantee an over-

approximation of behaviors, but still useful with timeout, trace collection result is not an over-

approximation, but still useful

Limitations Too detailed results (print, println) Scalability remains a challenge Single object vs. multiple objects specs

Page 32: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Summary

Client-side specification mining

based on flow-sensitive, context-sensitive abstract interpretation

combined domain abstracting both aliasing and event sequences

Novel family of abstractions to represent unbounded event sequences

Novel summarization algorithms

Preliminary experimental results

Page 33: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,
Page 34: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Invited Questions

1) How do you get the API in the motivation slide from the example program you showed?

2) Can you give an example of the effect of past vs. future?

3) I didn’t get merge, can you show another example?

4) Can you say when the results are precise? 5) Can you say something more about experiment

al results?6) Related Work?

Page 35: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

API in motivation slide vs. one from example

finConread,write

finConread,write

close

close

0 1 2 3 4 5config connect

0 1

3 42

5

6

config

close

connect

fincon

connect read

write

read

write

connect

connect

close

close

close

• Elements in list not known to be unique

• connect can be repeated• close can be repeated

• Read and write never happen together

•Thus kept in separate parts of the automaton

• This is not a bad result for an automated tool (and a single! example program)

• All these would be “washed away” with a sufficient number of other examples

Page 36: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Example: Past Abstraction with Exterior Merge

fincnccfgfin

then: while loop: x.read else: while loop: x.write

fincnccfgfin

rd fincnccfgfin

wr

fincnccfgfin

rdrd

fincnccfgfin

wrwr

if(?)

endof for

fincnccfgfin

rdrd

fincnccfgfin

wrwr

fincnccfgfin

No merge !

Page 37: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

endof for

fincnccfgfin

rdfincnccfgfin rd

wrfincnccfgfin wr

merge

fincnccfg

finrd

rd

wrfin

wr

fin

Example: Future Abstraction with Exterior Merge

Page 38: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

cnc

SocketChannel Specification

fincnccfgfin

wr

wr

rd

rd

cnc

cnc

cl

clcl

cl

cl

Past

fincnccfg

finrd

rd

wrfin

wr

fin

cl

wr

rd

fin

cfg

Future

In this example, different automata, but same language

Page 39: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Merge Criteria

a b

Total Merge

a

Exterior Merge

ab

b

a

ab

a

union union

a

ab

quo

a

b

quo

(past 1 history abstraction)

Page 40: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Can you say when the results are precise?

• when there exists an automaton such that the equivalence relation that we choose uniquely characterizes each states

Page 41: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Experimental Results

API stat

es

edg

es

den

s

stat

es

edg

es

den

s

stat

es

edg

es

den

s

stat

es

edg

es

den

s

stat

es

edg

es

den

s

stat

es

edg

es

den

s

Auth 2 3 1.50 2 3 1.5 2 3 1.5 2 2 1.00 2 2 1.00 2 2 1.00Channel 2 6 3.00 3 6 2.00 3 6 2.00 3 3 1.00 3 3 1.00 3 3 1.00ChannelMgr 2 11 5.50 5 18 3.60 6 19 3.17 4 7 1.75 5 9 1.80 5 9 1.80Cipher 1 5 5.00 4 14 3.50 6 12 2.00 7 10 1.43 7 10 1.43 7 10 1.43Connection 3 12 4.00 4 12 3.00 4 12 3.00 5 7 1.40 5 7 1.40 5 7 1.40KeyAgreement 2 5 2.50 4 6 1.50 4 6 1.5 4 3 0.75 4 3 0.75 4 3 0.8LineAndShape 3 12 4.00 6 15 2.50 6 15 2.50 6 8 1.33 6 8 1.33 6 8 1.33MsgDigest 1 2 2.00 2 2 1.00 2 2 1.00 2 2 1.00 2 2 1.00 2 2 1.00Photo 1 12 12.00 1 12 12.00 1 8 8.00 8 8 1.00 8 8 1.00 8 8 1.00PrintWriter 1 3 3.00 2 3 1.50 2 3 1.50 6 11 1.83 3 5 1.67 3 5 1.67Session 2 7 3.50 5 10 2.00 5 10 2.00 5 4 0.80 5 4 0.80 5 4 0.80Signature 2 8 4.00 5 12 2.40 5 12 2.40 4 6 1.50 4 6 1.50 4 6 1.50TransportMgr 9 24 2.67 2 19 9.50 8 27 3.38 9 26 2.89 9 24 2.67 9 24 2.67URLConnection 2 9 4.50 4 10 2.5 3 6 2 4 7 1.75 NA NA NA NA

Average 4.08 3.46 2.57 1.39 1.33 1.33Std dev 2.54 3.22 1.71 0.56 0.52 0.52

APF/Past/Ext APF/Fut/ExtBase/Past/Tot Base/Past/Ext Base/Fut/Ext APF/Past/Tot

Page 42: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

Japanese Toilet API

The two buttons linked together (next to the floating woman) are given the group label (well, "bottom" or "posterior"), one with the word "mild" and the other with "powerful." The icon on each button indicates a water jet.

I can't see the third character labeling the jog shuttle, but that appears to be a "flow" control for a water jet - not sure though.

There are several opportunities for mode errors here which (I hope) are mitigated by the LCD display: the button above the jog shuttle labeled "wide jet" is toggled on/off, and the "dryer" button cycles though three strengths. My experience with toilet UI (although not great) indicates that mode errors are a problem though. If that jet feels rather, er, surprising, a lack of mode data makes you reluctant to try to alter it...

Page 43: Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink IBM T.J. Watson Research Center Marco Pistoia Technion,

(Some) Related Work

Dynamic DAIKON (…) Perracota (ICSE06) DIDUCE (ICSE02) Strauss (Ammons et. al. POPL02) Whaley et. al. (ISSTA02) …

Static JIST (Alur et. al. POPL05) Whaley et. al. (ISSTA02) …