Mining Specifications Glenn Ammons, Dept. Computer Science University of Wisconsin Rastislav Bodik,...
-
Upload
ireland-hurley -
Category
Documents
-
view
213 -
download
0
Transcript of Mining Specifications Glenn Ammons, Dept. Computer Science University of Wisconsin Rastislav Bodik,...
Mining Specifications
Glenn Ammons, Dept. Computer Science University of Wisconsin
Rastislav Bodik, Computer Science Division University of California, Berkeley
James R. Larus, Microsoft Research
POPL 2002
Motivation
Formal verification is a promising alternative to software testing
But
Verifiers will be of little use without enough correctness specifications to be verified
The Assumption
Common behavior is (often) correct behavior.
If we can identify common behavior we can produce correct specifications, even from programs that contain errors.
A Program Using socket API
1 int s = socket(AF_INET, SOCK_STREAM, 0);
2 …
3 bind(s, &serv_addr, sizeof(serv_addr));
4 …
5 listen(s, 5);
6 …
7 while (1) {
8 int ns = accept(s, &addr, &len);
9 if (ns < 0) break;
10 do {
11 read(ns, buffer, 255);
12 …
13 write(ns, buffer, size);
14 if (cond1) return;
15 } while (cond2)
16 close(ns);
17 }
18 close(s);
An Example Trace
1 socket(domain = 2, type = 1, proto = 0, return = 7)
2 bind(so = 7, addr = 0x400120, addr_len = 6, return = 0)
3 listen(so = 7, backlog = 5, return = 0)
4 accept(so = 7, addr = 0x400200, addr_len = 0x400240, return = 8)
5 read(fd = 8, buf = 0x400320, len = 255, return = 12)
6 write(fd = 8, buf = 0x400320, len = 12, return = 12)
7 read(fd = 8, buf = 0x400320, len = 255, return = 7)
8 write(fd = 8, buf = 0x400320, len = 7, return = 7)
9 close(fd = 8, return = 0)
10 accept(so = 7, addr = 0x400200, addr_len = 0x400240, return = 10)
11 read(fd = 10, buf = 0x400320, len = 255, return = 13)
12 write(fd = 10, buf = 0x400320, len = 13, return = 13)
13 close(fd = 10, return = 0)
14 close(fd = 7, return = 0)
Design Decisions
1. Learn from traces not from source• Contain fewer bugs
2. Take a “vote” on what the common program behavior is.
• the high-probability core encodes the frequently followed protocol.
Mining System
Run
Tracer
Automatonlearner
Scenario extractor
Flow dependenceannotator
Instrumented program
Traces
Program
Test inputs
Annotated traces
Scenario seedAbstract scenario strings
Specifications
• I - the set of all traces of interaction with an API or ADT .
• C I - the set of all correct traces of interaction.
• T - an unlabelled training set of interaction traces.
Find an automaton A that generates exactly the traces in C.
The (unsolvable) Problem
Restriction 1
• C must be a regular language.– Model checkers require finite-state
specifications.– Algorithms for learning finite-state automatons
are relatively well developed.
Interaction Scenarios
LinkedList(n)
mallocmalloc
freemalloc
freefree
.
.
.
.
.
.
malloc(return = O1)malloc(return = O2)
free(p = On)malloc(return = On)
free(p = O2)free(p = O1)
.
.
.
malloc(return = O1)free(p = O1)
O1{malloc(return = O2)free(p = O2)
O2{
malloc(return = On)free(p = On)
On{
.
.
.
malloc(return = Ostd)free(p = Ostd)
O1{malloc(return = Ostd)free(p = Ostd)
O2{
malloc(return = Ostd)free(p = Ostd)
On{
The Problem – Take 2
• IS - the set of all interaction scenarios with an API or ADT that manipulate no more than k data objects.
• CS IS - the regular set of all correct scenarios.
• TS - an unlabelled training set of interaction scenarios from IS.
Find a finite-state automaton AS that generates exactly the scenarios in CS.
Restriction 2 - Linking Ts and Cs
TS = c0,c1,… be an infinite sequence of elements from CS in which each element of CS occurs at least once.
for each n > 0: c0,c1,… cn ASn
for some N ≥ 0, ASN generates exactly the
scenarios in CS and ASn= ASN
for all
n ≥ N.AS0
,AS1,… identifies CS in the limit.
The Probabilistic Approach• Is – as before.
• M – a target PFSA and PM a distribution over Is that M generates.
“Efficiently” find a PFSA M’ such that its distribution PM’ is an ε-good approximation of PM.
Mining System
Run
Tracer
Automatonlearner
Scenario extractor
Flow dependenceannotator
Instrumented program
Traces
Program
Test inputs
Annotated traces
Scenario seedAbstract scenario strings
Specifications
Tracer1. C stdio replacement (requires recompilation)2. Executable editing
1 socket(domain = 2, type = 1, proto = 0, return = 7)2 bind(so = 7, addr = 0x400120, addr_len = 6, return = 0)3 listen(so = 7, backlog = 5, return = 0)4 accept(so = 7, addr = 0x400200, addr_len = 0x400240, return = 8)
skeleton:interaction(attribute0 ,…, attributen)
Flow Dependence
Type inference
Dependence analysis Untyped trace with dependencies
Traces
Annotated traces
Dependence Analysis
Definers:socket.returnbind.solisten.soaccept.returnclose.fd
• Takes a list of attributes that define or use objects (manually created).
• Creates a flow dependence between users and definers.
Users:bind.solisten.soaccept.soread.fdwrite.fdclose.fd
Type Inference
If there exists a flow dependency between two attributes then typing gives these attributes the same type.
Type(socket.return)=T0
Type(bind.so)=T0
Type(listen.so)=T0
Type(accept.so)=T0
Type(accept.return)=T0
Type(read.fd)=T0
Type(write.fd)=T0 Type(close.fd)=T0
Scenario Extraction
Simplification
Extraction scenarios
simplified scenarios
Annotaed traces
Standardization
Scenario seeds
Abstract scenario strings
Extraction
• A scenario is a set of interactions related by flow dependences.
1 socket(domain = 2, type = 1, proto = 0, return = 7)
2 bind(so = 7, addr = 0x400120, addr_len = 6, return = 0)
3 listen(so = 7, backlog = 5, return = 0)
4 accept(so = 7, addr = 0x400200, addr_len = 0x400240, return = 8)
5 read(fd = 8, buf = 0x400320, len = 255, return = 12)
6 write(fd = 8, buf = 0x400320, len = 12, return = 12)
7 read(fd = 8, buf = 0x400320, len = 255, return = 7)
8 write(fd = 8, buf = 0x400320, len = 7, return = 7)
9 close(fd = 8, return = 0)
Simplification
Eliminate all interaction attributes that do not carry a flow dependence.
1 socket(return = 7)
2 bind(so = 7)
3 listen(so = 7)
4 accept(so = 7, return = 8) [seed]
5 read(fd = 8)
6 write(fd = 8)
7 read(fd = 8)
8 write(fd = 8)
9 close(fd = 8)
Standardization
1 socket(return = x0:T0)
2 bind(so = x0:T0)
3 listen(so = x0:T0)
4 accept(so = x0:T0, return = x1:T0) [seed]
5 read(fd = x1:T0)
7 read(fd = x1:T0)
6 write(fd = x1:T0)
8 write(fd = x1:T0)
9 close(fd = x1:T0)
1. Naming: replaces attribute values with symbolic variables.
2. Reordering
(A)
(B)
(C)
(D)
(E)
(E)
(F)
(F)
(G)
Automaton Learning
1. OTS learner learns a PFSA2. A corer removes infrequently
traversed edges and converts the PFSA into an NFA.start
final
10000
10000
10000
5
5
5
5
Specification Automaton for the Socket Protocolsocket(return = x)
bind(so = x)
listen(so = x)
accept(so = x, return = y)
read(fd = y) write(fd = y)
close(fd = x)
close(fd = y)
Experimental Results
• Analyzed traces from programs that use the Xlib and X Toolkit Intrinsics libraries for the X11 windowing system.
• Traces were generated manually• Compare mined specification to
Interclient Communication Conventions Manual (ICCCM) rules.
Experimental Results
• A small and buggy training set prevented the miner from discovering the rule.
• solution: an expert chooses correct traces as the training set.
Benefits
• Exploits the massive programmers' effort that is reflected in the code (and nowhere else).
• Offers convenience and insights.It is easier to approve a mined formal specification than to write one.
Conclusion
• Introduced a (semi) automatic machine-learning approach for discovering formal specifications.
• Reduced the problem to learning regular languages.
• Initial experience is promising.