Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink...
-
date post
21-Dec-2015 -
Category
Documents
-
view
224 -
download
2
Transcript of Static Specification Mining Using Automata-Based Abstractions Sharon Shoham Eran Yahav Stephen Fink...
Static Specification Mining Using Automata-Based Abstractions
Sharon Shoham Eran Yahav Stephen Fink
IBM T.J. Watson Research Center
Marco Pistoia
Technion, Israel
Finding What’s There(but is hard to find)
Sharon Shoham Eran Yahav Stephen Fink
IBM T.J. Watson Research Center
Marco Pistoia
Technion, Israel
Component APIs Are Complicated
There is only one thing more painful than learning fromexperience and that is not learning from experience.– Archibald MacLeish
Temporal API Specifications
Legal interactions with a componentWhat methods could be called at every state
finishConnect read,write
finishConnectread,write
close
close
0 1 2 3 4 5
config connect
java.nio.channels.SocketChannel (partial spec)
Applications
Program understanding
Regression Deviant behaviorsSpecs for verification…
Mining Temporal Specifications
Component-side mining Infer usage from component implementation Relies on error conditions in component implementation
Client-side mining Infer usage from existing clients using the component
Real usage scenarios << Permitted scenarios
connect; close;
close
…
connect; write; write; close;
connect; write; close;
connect; read; close;
Dynamic vs. Static Specification Mining
Dynamic Mine specification from representative executions Requires running the program (with varying inputs) Incomplete coverage of behaviors
Static Cover all client behaviors Challenging
Our approach Static client-side specification mining Bad news: this is hard Good news: we can still make it work
Example
How do I use a java.nio.channels.SocketChannel?
void example() { Collection<SocketChannel> chnls = createChannels(); for (SocketChannel sc : chnls){ sc.connect(new …); while (!sc.finishConnect()) { /* ... wait for connection ... */ } if (?) { receive(sc); } else { send(sc); } } closeAll(channels); }
Collection<SocketChannel> createChannels() { List<SocketChannel> list = new
LinkedList<SocketChannel>(); list.add(createChannel(“ ", 80)); //… more channels added to list … return list; }
SocketChannel createChannel (String hostName, int port) {
SocketChannel sc = SocketChannel.open(); sc.configureBlocking(false); return sc; }
void receive(SocketChannel x) { //… FileOutputStream fos = new …; ByteBuffer dst = …; int numBytesRead = 0; while (numBytesRead >= 0) { numBytesRead = x.read(dst); fos.write(dst.array()); } fos.close(); }
void send(SocketChannel x) { for (?) { … int numWritten = x.write(buf); } }
void example() { Collection<SocketChannel> chnls = createChannels(); for (SocketChannel sc : chnls){ sc.connect(new …); while (!sc.finishConnect()) { /* ... wait for connection ... */ } if (?) { receive(sc); } else { send(sc); } } closeAll(channels); }
void closeAll (Collection<SocketChannel> chnls) {
for (SocketChannel sc : chnls) { sc.close(); } }
Bad News
Interprocedural Flow
Flow Sensitivity
Context Sensitivity
Non-trivial aliasing
void example() { Collection<SocketChannel>
chnls = createChannels(); for (SocketChannel sc : chnls){ sc.connect(new …); while (!sc.finishConnect()) { …
} if (?) { receive(sc); }
else { send(sc); } } closeAll(channels); }
SocketChannel createChannel (…)
{ SocketChannel sc =
SocketChannel.open(); sc.configureBlocking(false); return sc; }
void receive(SocketChannel x) { … while (numBytesRead >= 0) { numBytesRead = x.read(dst); fos.write(dst.array()); } … }}
sc=open()
cfgsc.cfg
cnccfgsc.cnc
fincnccfgsc.fin
… …
fincnccfg … finsc.fin
fincnccfg…
fin rdx.read
fincnccfg…
fin rd … rdx.read
… …
finishConnect
SocketChannel Specification
read,write
finishConnectread,write
close
close
0 1 2 3 4 5config connect
(Partial specification)
Challenges
Dynamically allocated objects
unbounded number of objects
aliasing
objects flow through complex heap-allocated data structures
heap abstraction
Unbounded length of event sequences
event sequence observed for an object might be unbounded
event sequence abstraction
Noise
analysis imprecision and/or incorrect client programs
Noise reduction
Overview
Abstract Trace Collection
Abstract InterpretationAbstract value
• Heap abstraction: abstracts unbounded heap• Trace abstraction: abstracts unbounded sequences of
operations
Initial heap abstraction partition the heap into a fixed partition (based on allocation
site)
AS2
AS2
AS3
AS3
AS3
AS1
AS2
AS1
fincnccfg fin fincnccfg cnccfgfincnccfg …
void example() { Collection<SocketChannel>
chnls = createChannels(); for (SocketChannel sc : chnls){ sc.connect(new …); while (!sc.finishConnect()) { …
} if (?) { receive(sc); }
else { send(sc); } } closeAll(channels); }
SocketChannel createChannel (…)
{ SocketChannel sc =
SocketChannel.open(); // AS1 sc.configureBlocking(false); return sc; }
void receive(SocketChannel x) { … while (numBytesRead >= 0) { numBytesRead = x.read(dst); fos.write(dst.array()); } … }}
sc=open() <AS1, >
sc.cfgcfg
<AS1, > <AS1, >
<AS1, >cnccfg
sc.cnccnc
<AS1, >
<AS1, >
… …
[write, connect,close, finCon,config, read]
[write, connect,close, finCon,read]
Heap data for an “abstract object” o• unique = true
• abstract value represents a single object
• must = {x.f}• the access path x.f must point to o
• mustNot = {y.g}• the access path y.g must not to point to o
• …
Must points-to information allows strong updates
<AS1, must: { sc }, >
Refined Heap Abstraction
sc.cfgcfg
<AS1, must : { sc }, > <AS1, >
sc=open()
History Abstraction
Abstract history Automaton over-approximating unbounded event
sequences
Quotient-based abstractions for history Automata states which are equivalent w.r.t. a given
equivalence relation R are merged
<allocated at AS1, >fincnccfg
<o1, >fincnccfg
<o2, >fincnccfg
<allocated at ASk, >fincnccfg…
…<AS1, >fincnccfg
…fin read
?
History Abstraction
Past-Future Abstraction (q1,q2) R[k1,k2] if q1 and q2 share both an incoming sequence of length k1 and an outgoing sequence of length k2
aa
ca
aa
bc
ca
b c
ca
bc
ca
bc
Past 1 Examples Future 1 Example
Abstract Semantics Initial abstract history
empty sequence automatonWhen an API method is invoked
history extended: append event and construct quotient
fincnccfgwhile (!sc.finCon) {
finfincnccfg
Past 1 equivalent} //endof while fincnccfg fincnccfg
fin
fin
fincnccfg
cfg
cnccfg
sc = open
sc.config
sc.connect
Are We Done?
Bounded is great, but not enough
Merge histories at control flow join points Speed up convergence
Merge all histories that have identical heap-data, and
satisfy a given merge criterion
Merge: union construction followed by quotient construction
a ab
…
x.a()
x.b()
if(?)
endof while fincnccfg fincnccfg
fin
quo
fincnccfgfin
Example: Past Abstraction with Exterior Merge
union
fincnccfg
fincnccfg
fin
Recap: Abstraction Dimensions
0
[write, connect,close, finCon,config, read]
1
[write, connect,close, finCon,read]
0config
1
[write, connect,close, finCon,read]
0 1
3 42
5
6
config
close
connect
fincon
connect read
write
read
write
connect
connectclose
close
0
1
2
[write, connect,config, finCon,read]
[write, connect,finCon, read]
close
close close
Base APFocus (refined heap abstraction)
Pas
t /
Tot
alP
ast
/ E
xter
ior
close
Third dimension: different history abstraction, not shown here
Heap Abstraction
Merge Criteria
History Abstraction
0
[write, connect,close, finCon,config, read]
1
[write, connect,close, finCon,read]
0 1
3 4
2
5
6
config
close
connect
fincon
connectread
write
read
write
connect
connectclose
close
close
Summarization Phase: Noise
Analysis imprecisionBugs in training corpus
initV
Trace collection results
signup0 1 2 3
upinitS
initS
verifyup0 1’ 2’ 3’
upinitV
initV
verifyup0 1’ 2’ 3’
up
initV
n
k
1
verify
Real API
sign
up
initV
initS
up
verify
java.security.Signature
Naïve UnionNaïve Union
signup
0
1 2 3
up
initS initS
verifyup1’ 2’ 3’
initVinitV
upinitV
Trace collection results
signup0 1 2 3
upinitS
initS
verifyup0 1’ 2’ 3’
upinitV
initV
verifyup0 1’ 2’ 3’
up
initV
n
k
1
verify
verify
No noise reductionSound summary
Weighted UnionWeighted Union
initV
Trace collection results
signup0 1 2 3
upinitS
initS
verifyup0 1’ 2’ 3’
upinitV
initV
verifyup0 1’ 2’ 3’
up
initV
n
k
1
verify
Label each transition with number of input automata that contain it Transitions with weight < threshold are removed
signup
0
1 2 3
up
initS initS
verifyup1’ 2’ 3’
initV
up
n
k+1
initV
verify
1
ClusteringClustering
signup0 1 2 3
initS
initS
initV verifyup0 1’ 2’ 3’
up
initV
1initS
n
k
1
initV
signup0 1 2 3
upinitS
initS
verifyup0 1’ 2’ 3’
up
initV
initV
verifyup0 1’ 2’ 3’
up
initV
1initS
n
k
1
Automata partitioned into clusters of “similar” automata, each cluster summarized separatelySimilarity = language inclusion
Trace collection results
Experimental Results
Mined various APIs from a suite of benchmarks APIs from Java libraries
• java.security.Signature, java.security.KeyAgreement,…
• Ganymed • Session, Connection, ConnectionManager,…
• FlickrAPI• Photo, Auth, …
…
java.security.SignatureBase/Past/Total
Base/Past/Exterior
APFocus/Past/Exterior
Ganymed Session
Base/Past/Exterior
APFocus/Past/Exterior
(all results here are actual images produced by the tool)
Lessons from Experiments
Precise heap abstractions AND history abstractions needed
Pragmatics Summaries other than union do not guarantee an over-
approximation of behaviors, but still useful with timeout, trace collection result is not an over-
approximation, but still useful
Limitations Too detailed results (print, println) Scalability remains a challenge Single object vs. multiple objects specs
Summary
Client-side specification mining
based on flow-sensitive, context-sensitive abstract interpretation
combined domain abstracting both aliasing and event sequences
Novel family of abstractions to represent unbounded event sequences
Novel summarization algorithms
Preliminary experimental results
Invited Questions
1) How do you get the API in the motivation slide from the example program you showed?
2) Can you give an example of the effect of past vs. future?
3) I didn’t get merge, can you show another example?
4) Can you say when the results are precise? 5) Can you say something more about experiment
al results?6) Related Work?
API in motivation slide vs. one from example
finConread,write
finConread,write
close
close
0 1 2 3 4 5config connect
0 1
3 42
5
6
config
close
connect
fincon
connect read
write
read
write
connect
connect
close
close
close
• Elements in list not known to be unique
• connect can be repeated• close can be repeated
• Read and write never happen together
•Thus kept in separate parts of the automaton
• This is not a bad result for an automated tool (and a single! example program)
• All these would be “washed away” with a sufficient number of other examples
Example: Past Abstraction with Exterior Merge
fincnccfgfin
then: while loop: x.read else: while loop: x.write
fincnccfgfin
rd fincnccfgfin
wr
fincnccfgfin
rdrd
fincnccfgfin
wrwr
if(?)
endof for
fincnccfgfin
rdrd
fincnccfgfin
wrwr
fincnccfgfin
No merge !
endof for
fincnccfgfin
rdfincnccfgfin rd
wrfincnccfgfin wr
merge
fincnccfg
finrd
rd
wrfin
wr
fin
Example: Future Abstraction with Exterior Merge
cnc
SocketChannel Specification
fincnccfgfin
wr
wr
rd
rd
cnc
cnc
cl
clcl
cl
cl
Past
fincnccfg
finrd
rd
wrfin
wr
fin
cl
wr
rd
fin
cfg
Future
In this example, different automata, but same language
Merge Criteria
a b
Total Merge
a
Exterior Merge
ab
b
a
ab
a
union union
a
ab
quo
a
b
quo
(past 1 history abstraction)
Can you say when the results are precise?
• when there exists an automaton such that the equivalence relation that we choose uniquely characterizes each states
Experimental Results
API stat
es
edg
es
den
s
stat
es
edg
es
den
s
stat
es
edg
es
den
s
stat
es
edg
es
den
s
stat
es
edg
es
den
s
stat
es
edg
es
den
s
Auth 2 3 1.50 2 3 1.5 2 3 1.5 2 2 1.00 2 2 1.00 2 2 1.00Channel 2 6 3.00 3 6 2.00 3 6 2.00 3 3 1.00 3 3 1.00 3 3 1.00ChannelMgr 2 11 5.50 5 18 3.60 6 19 3.17 4 7 1.75 5 9 1.80 5 9 1.80Cipher 1 5 5.00 4 14 3.50 6 12 2.00 7 10 1.43 7 10 1.43 7 10 1.43Connection 3 12 4.00 4 12 3.00 4 12 3.00 5 7 1.40 5 7 1.40 5 7 1.40KeyAgreement 2 5 2.50 4 6 1.50 4 6 1.5 4 3 0.75 4 3 0.75 4 3 0.8LineAndShape 3 12 4.00 6 15 2.50 6 15 2.50 6 8 1.33 6 8 1.33 6 8 1.33MsgDigest 1 2 2.00 2 2 1.00 2 2 1.00 2 2 1.00 2 2 1.00 2 2 1.00Photo 1 12 12.00 1 12 12.00 1 8 8.00 8 8 1.00 8 8 1.00 8 8 1.00PrintWriter 1 3 3.00 2 3 1.50 2 3 1.50 6 11 1.83 3 5 1.67 3 5 1.67Session 2 7 3.50 5 10 2.00 5 10 2.00 5 4 0.80 5 4 0.80 5 4 0.80Signature 2 8 4.00 5 12 2.40 5 12 2.40 4 6 1.50 4 6 1.50 4 6 1.50TransportMgr 9 24 2.67 2 19 9.50 8 27 3.38 9 26 2.89 9 24 2.67 9 24 2.67URLConnection 2 9 4.50 4 10 2.5 3 6 2 4 7 1.75 NA NA NA NA
Average 4.08 3.46 2.57 1.39 1.33 1.33Std dev 2.54 3.22 1.71 0.56 0.52 0.52
APF/Past/Ext APF/Fut/ExtBase/Past/Tot Base/Past/Ext Base/Fut/Ext APF/Past/Tot
Japanese Toilet API
The two buttons linked together (next to the floating woman) are given the group label (well, "bottom" or "posterior"), one with the word "mild" and the other with "powerful." The icon on each button indicates a water jet.
I can't see the third character labeling the jog shuttle, but that appears to be a "flow" control for a water jet - not sure though.
There are several opportunities for mode errors here which (I hope) are mitigated by the LCD display: the button above the jog shuttle labeled "wide jet" is toggled on/off, and the "dryer" button cycles though three strengths. My experience with toilet UI (although not great) indicates that mode errors are a problem though. If that jet feels rather, er, surprising, a lack of mode data makes you reluctant to try to alter it...
(Some) Related Work
Dynamic DAIKON (…) Perracota (ICSE06) DIDUCE (ICSE02) Strauss (Ammons et. al. POPL02) Whaley et. al. (ISSTA02) …
Static JIST (Alur et. al. POPL05) Whaley et. al. (ISSTA02) …