Recovering Components from Executables [Cooperative Agreement HR0011-12-2-0012] Thomas Reps...
-
Upload
carmel-flynn -
Category
Documents
-
view
216 -
download
1
Transcript of Recovering Components from Executables [Cooperative Agreement HR0011-12-2-0012] Thomas Reps...
Recovering Components from Executables[Cooperative Agreement HR0011-12-2-0012]
Thomas RepsUniversity of Wisconsin
Thomas Reps JungheeLim
EvanDriscoll
VenkateshSrinivasan
TusharSharma
AdityaThakur
DARPA BET IPR 2
Project Goals
• Develop a “redeveloper’s workbench”– Aid in the harvesting of components from an
executable• identify components• make adjustments to components identified• issue queries about a component’s properties
– Queries• type information; function prototypes• side-effect “footprint”• error-triggering properties
DARPA BET IPR 4
Project Activities
• Component identification– generalized concept analysis– object traces
• Component extraction– specialization slicing– inferred invariants for function summaries– partial evaluation (planned)
• Verifying component properties– affine-inequality domain for modular arithmetic– symbolic abstraction (BET + ONR STTR)– format-compatibility checking (ONR)– stretched-Tree-IC3 (planned)
DARPA BET IPR 5
Project Activities
• Component identification– generalized concept analysis– object traces
• Component extraction– specialization slicing– inferred invariants for function summaries– partial evaluation (planned)
• Verifying component properties– affine-inequality domain for modular arithmetic– symbolic abstraction (BET + ONR STTR)– format-compatibility checking (ONR)– stretched-Tree-IC3 (planned)
DARPA BET IPR 6
Outline of Talk
• Review of goals• Progress (May-Oct. 2012) and plans (2013)
– Component identification• generalized concept analysis• object traces
– Component extraction• specialization slicing• inferred invariants for function summaries• partial evaluation (planned)
– Verifying component properties• affine-inequality domain for modular arithmetic• symbolic abstraction (BET + ONR STTR)• format-compatibility checking (ONR)• stretched-Tree-IC3 (planned)
• Recap of publications/submissions• Recap of plans for 2013
DARPA BET IPR 7
Outline of Talk
• Review of goals• Progress (May-Oct. 2012) and plans (2013)
– Component identification• generalized concept analysis• object traces
– Component extraction• specialization slicing• inferred invariants for function summaries• partial evaluation (planned)
– Verifying component properties• affine-inequality domain for modular arithmetic• symbolic abstraction (BET + ONR STTR)• format-compatibility checking (ONR)• stretched-Tree-IC3 (planned)
• Recap of publications/submissions• Recap of plans for 2013
Dagstuhl Seminar 2012 8
Component Identification – Attempt 1
• Aim: Identify classes/modules in stripped binaries• Key Idea: Group procedures by the data that they use
and/or modify• Methodology– Formal Concept Analysis (FCA)– procedures versus (GMOD set x GREF set)
VenkateshSrinivasan
9
Concept Analysis
Universe
{humans, whales}
{dolphins, gibbons, humans, whales}
{dolphins, gibbons} {humans, gibbons}
Generalized Concept Analysis
• Semi-lattice attributes instead of Boolean attributes– Values in range [0.0 ... 1.0]
• degree of uncertainty about an attribute’s value– Physical types
• subclass relationships– MOD/REF information
• procedure footprints
10
Footprint Example
// Linked-list operationsint ll_length(struct node* head);void ll_push(struct node** headRef, int newData);int ll_count(struct node* head, int searchFor);int ll_sum(struct node* head);void ll_reverse(struct node** headRef);
// Binary-search-tree operationsint bst_lookup(struct bnode* bnode, int target);struct bnode* bst_newbnode(int data);struct bnode* bst_insert(struct bnode* bnode, int data);struct bnode* bst_build123c();int bst_size(struct bnode* bnode);
11
1. Compile2. Build CodeSurfer/x86 project3. Compute IMOD/IREF (and GMOD/GREF) information4. Run concept analysis
Dagstuhl Seminar 2012 12
IMOD, IREF, GMOD, and GREF sets
• IMOD (IREF): the set of a-locs (abstract locations) that a procedure directly modifies (uses)
• GMOD (GREF): of a procedure is the set of a-locs that the procedure and it callees modify (use)
• Elements include– heap a-locs (heap allocated objects)– global a-locs (global objects) – local a-locs (objects allocated on the stack) by itself or
other procedures
13
Footprint Example (IMOD U IREF)
Access next fields
Access data, left, and right fields
Access data and next fields
Access left and right fields
14
Footprint Example (IMOD x IREF)
Write data and next
Read next;write next
Write next
Read next
Read data and
next
Dagstuhl Seminar 2012 15
Experiments and Results
• Experiments– 3 open source C++ apps– ground truth (classes in
application) obtained from source code
– FCA using GMOD X GREF – each class is matched
against a concept using weighted bipartite matching
• Accuracy = • - set of functions in class • – set of functions in
matched concept
Application Accuracy
TinyXML(Xml parser)
0.37
SmillaEnlarger(Bitmap enlarger)
0.21
PonyProg(Serial device programmer)
0.14
Dagstuhl Seminar 2012 16
Component Identification – Attempt 2
• Switch gears to dynamic analysis– identify classes/modules in stripped binaries– modulo code coverage
• Key idea– common idiom in object-oriented programming:
“this” pointer = 1st argument of methods of a class– record <function, 1st-argument value> pairs– use to classify sets of functions
Dagstuhl Seminar 2012 17
Object Traces
• Instrument binary to trace allocated objects and values of 1st-arguments of methods
• Group methods based on “this”-pointer values• Emit object traces, pairs <A, S> where– A is an object address– S is the sequence of methods that were passed A as the
value of the “this” pointer (1st argument)
Dagstuhl Seminar 2012 18
Challenges
• Re-use of memory– heap-allocated objects
• instrument malloc, free, new, and delete• track lifetimes
– stack-allocated objects• trace stack pointer, activation-record boundaries, call-suffix
• Static methods do not have “this” pointer as 1st argument– track all currently allocated objects and filter out methods that
have an invalid address as first argument
Dagstuhl Seminar 2012 19
Challenges
• False traces– Two or more object
traces appear concatenated because of the reuse of same stack space by compiler for different objects in different scopes
– Locate initializer and finalizer methods to split false traces
void f() {{
Foo f;…
}{
Bar b;…
}}
Stack space reused for f
and b
AR of f()
Dagstuhl Seminar 2012 20
Future Work: Class Hierarchies
• Aim: Identify class boundaries and class hierarchies from object traces
• Key Idea: Exploit order of constructor/ destructor calls for derived class objects
• Challenges:– Constructor/destructor inlining– Finding the right classes for static methods
DARPA BET IPR 21
Outline of Talk
• Review of goals• Progress (May-Oct. 2012) and plans (2013)
– Component identification• generalized concept analysis• object traces
– Component extraction• specialization slicing• inferred invariants for function summaries• partial evaluation (planned)
– Verifying component properties• affine-inequality domain for modular arithmetic• symbolic abstraction (BET + ONR STTR)• format-compatibility checking (ONR)• stretched-Tree-IC3 (planned)
• Recap of publications/submissions• Recap of plans for 2013
Specialization Slicing
• Problem statement– Ordinary “closure slices” can have mismatches between
call sites and called procedures• different call sites have different subsets of the parameters
– Idea: specialize the called procedures– Challenge: find a minimal solution (minimal duplication)
22MinAung
23
Specialization Slicing
int g1, g2, g3;
void p(int a, int b) {
g1 = a; g2 = b; g3 = g2;}
int main() { g2 = 100; p(g2, 2); p(g2, 3); p(4, g1+g2); printf("%d", g2);}
int g1, g2;
void p(int a, int b) {
g1 = a; g2 = b; }
int main() { p( 2); p(g2, 3); p( g1+g2); printf("%d", g2);}
int g1, g2;
void p1(int b) { g2 = b; }
void p2(int a, int b) {
g1 = a; g2 = b; }
int main() { p1(2); p2(g2, 3); p1(g1+g2); printf("%d", g2);}
(1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)
Closure slice Specialized slice
24
m4: g2m3: call p
m21: call printf
m5: 2 m6: g3m7: g2
m23: g2C1m2: g2=100
m8: g1
C2C3
p3: b p7: g3 p8: g2p2: a p9: g1p6: g3=g2
p4: g1=a
m22: “%d”m10: g2
m11: 3 m12: g3m13: g2m14: g1
m16: 4m15: call p
m17: g1+g2 m18: g3m19: g2m20: g1
p1: p
m1: main
m9: call p
p5: g2=b
System Dependence Graph (SDG)
25
C2
(p2, C1)
C1
(m4, ε) (m6, ε) (m8, ε)
(p7, C1) (p9, C1)(p6, C1)
(p4, C1)
(m12, ε)
(p7, C2)(p6, C2)
(m13, ε) (m14, ε)
(p3, C2) (p8, C2)(p2, C2) (p9, C2)
(p4, C2)
C3
(m16, ε) (m18, ε) (m20, ε)
(p7, C3)(p2, C3) (p9, C3)(p6, C3)
(p4, C3)
(m5, ε) (m7, ε)
(p3, C1) (p8, C1)
(m10, ε) (m11, ε)
(m21, ε)(m23, ε)(m22, ε)
(m17, ε)
(p3, C3) (p8, C3)
(m2, ε)
(p1, C2)
(p5, C2)
(m15, ε)(m3, ε)
(p1, C1)
(p5, C1)
(m1, ε)
(m9, ε)(m19, ε)
(p1, C3)
(p5, C3)
Unrolled SDG
26
C2C1 C3
p8
m21m23m22
m17m9
p1
p4 p5p5p3
m3,
m1
p1)
m7
m15
m5 m13 m14
p3 p8p2 p9
m10 m11 m19
Specialized SDG
DARPA BET IPR 27
Specialization slice of a recursive program
(1) int g1, g2;(2)(3) void s(int a,(4) int b){(5)(6) g1 = b;(7) g2 = a;(8) }(9)(10)(11)(12) int r(int k) {(13)(14) if (k > 0) {(15) s(g1, g2);(16) r(k-1);(17) s(g1, g2);(18) }(19)(20) }(21)(22)(23)(24) int main() {(25) g1 = 1;(26) g2 = 2;(27) r(3);(28) printf("%d\n", g1);(29) }
int g1, g2;
void s_1(int b) { g1 = b;}void s_2(int a) { g2 = a;}
void r_1(int k) { if (k > 0) { s_2(g1); r_2(k-1); s_1(g2); }}void r_2(int k) { if (k > 0) { s_1(g2); r_1(k-1); s_2(g1); }}
int main() { g1 = 1; r_1(3); printf("%d\n", g1);}
Calling pattern:
(27) ((16)(16))*
Calling pattern:
(27)(16)((16)(16))*
Specialization Slicing
28
• Problem statement– Ordinary “closure slices” can have mismatches between
call sites and called procedures• different call sites have different subsets of the parameters
– Idea: specialize the called procedures– Challenge: find a minimal solution (minimal duplication)
1. In the worst case, specialization causes an exponential increase in size
2. In practice, observed a 9.4% increase
Specialization Slicing
• Problem statement– Ordinary “closure slices” can have mismatches between call
sites and called procedures• different call sites have different subsets of the parameters
– Idea: specialize the called procedures– Challenge: find a minimal solution (minimal duplication)
• Key insight– minimal solution involves solving a partitioning problem on a
certain infinite graph– problem solvable using PDSs: all node-sets in infinite graph can
be represented via FSMs– algorithm: a few automata-theoretic operations
30
31
Algorithm
Input: SDG S and slicing criterion COutput: An SDG R for the specialized slice of S with respect to C
// Create A6, a minimal reverse-deterministic automaton for the// stack-configuration slice of S with respect to C1 PS = the PDS for S2 A0 = a PS-automaton that accepts C3 A1 = Prestar[PS](A0)4 A2 = reverse(A1)5 A3 = determinize(A2)6 A4 = minimize(A3)7 A5 = reverse(A4)8 A6 = removeEpsilonTransitions(A5)
// Read out SDG R from A6
...
Automata used to hold onto sets of points in possibly infinite graph
Partition obtained by determinizing and
minimizing:Each state = set of calling
contexts for one specialized procedure
32
C2
(p2, C1)
C1
(m4, ε) (m6, ε) (m8, ε)
(p7, C1) (p9, C1)(p6, C1)
(p4, C1)
(m12, ε)
(p7, C2)(p6, C2)
(m13, ε) (m14, ε)
(p3, C2) (p8, C2)(p2, C2) (p9, C2)
(p4, C2)
C3
(m16, ε) (m18, ε) (m20, ε)
(p7, C3)(p2, C3) (p9, C3)(p6, C3)
(p4, C3)
(m5, ε) (m7, ε)
(p3, C1) (p8, C1)
(m10, ε) (m11, ε)
(m21, ε)(m23, ε)(m22, ε)
(m17, ε)
(p3, C3) (p8, C3)
(m2, ε)
(p1, C2)
(p5, C2)
(m15, ε)(m3, ε)
(p1, C1)
(p5, C1)
(m1, ε)
(m9, ε)(m19, ε)
(p1, C3)
(p5, C3)
Unrolled SDG
Each yellow name has the same set of stack configurations {C1,C3}Such sets are infinite for recursive programs => FSMs
33
C2C1 C3
p8
m21m23m22
m17m9
p1
p4 p5p5p3
m3,
m1
p1
m7
m15
m5 m13 m14
p3 p8p2 p9
m10 m11 m19
Specialized SDG
Each yellow name has the same set of stack configurations {C1,C3}Such sets are infinite for recursive programs => FSMs
DARPA BET IPR 34
Feature Removalint add(int a,int b) { q: return a+b;} int mult(int a,int b) { int i = 0; int ans = 0; while(i < a) { c5: ans = add(ans,b); c6: i = add(i,1); } return ans;}
void tally(int& sum,int& prod,int N) { int i = 1; while(i <= N) { c2: sum = add(sum,i); c3: prod = mult(prod,i); c4: i = add(i,1); }} int main() { int sum = 0; int prod = 1; c1: tally(sum,prod,10); printf("%d ",sum); printf("%d ",prod);}
int add(int a,int b) { q: return a+b;} int mult( int b) { int i = 0; int ans = 0;
return;}
void tally(int& sum, int N) { int i = 1; while(i <= N) { c2: sum = add(sum,i); c3: mult( i); c4: i = add(i,1); }} int main() { int sum = 0;
c1: tally(sum, 10); printf("%d ",sum);
}
35
Feature Removal
int g1, g2, g3;
void p(int a, int b) {
g1 = a; g2 = b; g3 = g2;}
int main() { g2 = 100; p(g2, 2); p(g2, 3); p(4, g1+g2); printf("%d", g2);}
int g1, g2, g3;
void p(int a, int b) {
g1 = a; g2 = b; g3 = g2;}
int main() { g2 = 100; p(g2, 2); p(g2, 3); p(4, g1+g2); printf("%d", g2);}
int g1, g2;
void p1(int a) { g1 = a; }
void p2(int b) { g2 = b; g3 = g2; }
int main() { g2 = 100; p1(g2); p2(3); p1(4); }
(1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)
Forwardclosure slice
Specialized slice
36
C2
(p2, C1)
C1
(m4, ε) (m6, ε) (m8, ε)
(p7, C1) (p9, C1)(p6, C1)
(p4, C1)
(m12, ε)
(p7, C2)(p6, C2)
(m13, ε) (m14, ε)
(p3, C2) (p8, C2)(p2, C2) (p9, C2)
(p4, C2)
C3
(m16, ε) (m18, ε) (m20, ε)
(p7, C3)(p2, C3) (p9, C3)(p6, C3)
(p4, C3)
(m5, ε) (m7, ε)
(p3, C1) (p8, C1)
(m10, ε) (m11, ε)
(m21, ε)(m23, ε)(m22, ε)
(m17, ε)
(p3, C3) (p8, C3)
(m2, ε)
(p1, C2)
(m15, ε)(m3, ε)
(p1, C1)
(m1, ε)
(m9, ε)(m19, ε)
(p1, C3)
(p5, C3)(p5, C1) (p5, C2)
Unrolled SDG
37
C2
(p2, C1)
C1
(m4, ε) (m6, ε) (m8, ε)
(p7, C1) (p9, C1)(p6, C1)
(p4, C1)
(m12, ε)
(p7, C2)(p6, C2)
(m13, ε) (m14, ε)
(p3, C2) (p8, C2)(p2, C2) (p9, C2)
(p4, C2)
C3
(m16, ε) (m18, ε) (m20, ε)
(p7, C3)(p2, C3) (p9, C3)(p6, C3)
(p4, C3)
(m5, ε) (m7, ε)
(p3, C1) (p8, C1)
(m10, ε) (m11, ε)
(m21, ε)(m23, ε)(m22, ε)
(m17, ε)
(p3, C3) (p8, C3)
(m2, ε)
(p1, C2)
(m15, ε)(m3, ε)
(p1, C1)
(m1, ε)
(m9, ε)(m19, ε)
(p1, C3)
(p5, C3)(p5, C1) (p5, C2)
Complemented Unrolled SDG
38
C2
(p2, C1)
C1
(m4, ε) (m6, ε) (m8, ε)
(p7, C1) (p9, C1)(p6, C1)
(p4, C1)
(m12, ε)
(p7, C2)(p6, C2)
(m13, ε) (m14, ε)
(p3, C2) (p8, C2)(p2, C2) (p9, C2)
(p4, C2)
C3
(m16, ε) (m18, ε) (m20, ε)
(p7, C3)(p2, C3) (p9, C3)(p6, C3)
(p4, C3)
(m5, ε) (m7, ε)
(p3, C1) (p8, C1)
(m10, ε) (m11, ε)
(m21, ε)(m23, ε)(m22, ε)
(m17, ε)
(p3, C3) (p8, C3)
(m2, ε)
(p1, C2)
(m15, ε)(m3, ε)
(p1, C1)
(m1, ε)
(m9, ε)(m19, ε)
(p1, C3)
(p5, C3)(p5, C1) (p5, C2)
Complemented Unrolled SDG
39
C2
(p2, C1)
C1
(m4, ε) (m6, ε) (m8, ε)
(p7, C1) (p9, C1)(p6, C1)
(p4, C1)
(m12, ε)
(p7, C2)(p6, C2)
(m13, ε) (m14, ε)
(p3, C2) (p8, C2)(p2, C2) (p9, C2)
(p4, C2)
C3
(m16, ε) (m18, ε) (m20, ε)
(p7, C3)(p2, C3) (p9, C3)(p6, C3)
(p4, C3)
(m5, ε) (m7, ε)
(p3, C1) (p8, C1)
(m10, ε) (m11, ε)
(m21, ε)(m23, ε)(m22, ε)
(m17, ε)
(p3, C3) (p8, C3)
(m2, ε)
(p1, C2)
(m15, ε)(m3, ε)
(p1, C1)
(m1, ε)
(m9, ε)(m19, ε)
(p1, C3)
(p5, C3)(p5, C1) (p5, C2)
Complemented Unrolled SDG
40
C2
(p2, C1)
C1
(m4, ε) (m8, ε)
(p9, C1)
(p4, C1)
(m12, ε)
(p7, C2)(p6, C2)
(m13, ε)
(p3, C2) (p8, C2)
C3
(m16, ε) (m20, ε)
(p2, C3) (p9, C3)
(p4, C3)
(m11, ε)
(m2, ε)
(p1, C2)
(m15, ε)(m3, ε)
(p1, C1)
(m1, ε)
(m9, ε)
(p1, C3)
(p5, C2)
Complemented Unrolled SDG
41
C2
p2
C1
m4 m8
p9
p4
m12
p7p6
m13)
p3, p8
C3
m16 m20m11
m2
p1
m15m3
p1
m1
m9
p5
Complemented Unrolled SDG
BET 42
Future Work: Partial Evaluator for Machine-Code
• Slicing has limitations– limited semantic information – i.e., just dependence edges– no evaluation/simplification
• Partial evaluation: a framework for specializing programs – software specialization, optimization, etc.
• Binding-time analysis– what patterns are foo and bar called with?
• e.g, { foo(S,S,D,D), foo(S,D,S,D), bar(S,D), bar(D,S) }
– polyvariant binding-time analysis? specialized slicing!
• Design and implement an algorithm for partial evaluation of machine code
JungheeLim
VenkateshSrinivasan
DARPA BET IPR 43
Outline of Talk
• Review of goals• Progress (May-Oct. 2012) and plans (2013)
– Component identification• generalized concept analysis• object traces
– Component extraction• specialization slicing• inferred invariants for function summaries• partial evaluation (planned)
– Verifying component properties• affine-inequality domain for modular arithmetic• symbolic abstraction (BET + ONR STTR)• format-compatibility checking (ONR)• stretched-Tree-IC3 (planned)
• Recap of publications/submissions• Recap of plans for 2013
DARPA BET IPR 44
Possible-overflow example
char* concat(char* a, char* b){ unsigned size = strlen(a)+strlen(b)+1; char* out = (char*)malloc(size*sizeof(char)); // Possible overflow for(unsigned i = 0; i < strlen(a); i++) { out[i] = a[i]; // Potential memory corruption } for(unsigned i = 0; i < strlen(b); i++) { out[i+strlen(a)] = b[i]; // Potential memory corruption } out[i+strlen(a)] = '\0'; return out;}
BET 46
Bitvector Inequality domain
• Conventional domain for representing inequalities– polyhedra: conjunctions of linear inequalities a1 x1 + a2 x2 + . . . + ak xk ≤ c– operations on polyhedra: linear transformations• unsound for machine arithmetic• machine integers wrap while mathematical integers do not
• Solution: Bitvector Inequality Domain
TusharSharma
DARPA BET IPR 48
Outline of Talk
• Review of goals• Progress (May-Oct. 2012) and plans (2013)
– Component identification• generalized concept analysis• object traces
– Component extraction• specialization slicing• inferred invariants for function summaries• partial evaluation (planned)
– Verifying component properties• affine-inequality domain for modular arithmetic• symbolic abstraction (BET + ONR STTR)• format-compatibility checking (ONR)• stretched-Tree-IC3 (planned)
• Recap of publications/submissions• Recap of plans for 2013
DARPA BET IPR 49
Symbolic abstraction: Who cares?
• Win, win, win• Easier/faster implementation of analysis tools
– just state concrete (actual!) semantics in logic – supply an abstract domain
• e.g., as a class that meets a specific interface
– obtain analyzer/decision-procedure
• More precise results in abstract interpretation– can identify loop and procedure summaries that are more
precise than ones obtained via conventional techniques
• Applies to interesting, non-standard logics (we think!)– separation logic: memory safety properties
Basic scenario
DARPA BET IPR 48
interfere?
50
In 1977, Cousot & Cousot gave us a beautiful theory of overapproximation
α
Universe of States
x [2,5]y [1,3]
{(x2, y1), (x5, y3)}
{(2,1), (2,2), (2,3), (3,1), (3,2), (3,3), (4,1), (4,2), (4,3), (5,1), (5,2), (5,3)}
γ
α
54
In 2004, Reps, Sagiv, and Yorsh gave us:
Symbolic Abstract Interpretation
Symbolic ConcretizationUniverse of States
60
From “Below” vs. From “Above”
• Reps, Sagiv, and Yorsh 2004: approximation from “below”• Desirable: approximation from “above”
– always have safe over-approximation in hand– can stop algorithm at any time (e.g., if taking too long)– Thakur, A. and Reps, T., A method for symbolic computation of abstract
operations. In Proc. Computer-Aided Verification (CAV), 2012
AdityaThakur
63
Stålmarck’s method (1989)
Dilemma Rule
𝑅∪{𝑣=True \} 𝑅∪{𝑣=False \}
𝑅
𝑅1′ 𝑅2
′
• Split
• Propagate
• Merge
66⊥
⊤
𝑅∪{𝑣=False \}𝑅∪{𝑣=True \}𝐴⊓𝑎1
𝐴
Stålmarck’s method for
Dilemma Rule
𝑅
𝐹 1′ 𝐹 2
′
• Split
• Propagate
• Merge
𝐴⊓𝑎2
𝐴1′ 𝐴2
′
𝛾 (𝑎1 )∪𝛾 (𝑎2 )⊇𝛾 ( 𝐴 )
68
⊤
⊥
Reasoning: Using in an SMT solver
~𝛼↓ (𝜑 )=¿ is unsatisfiable
Property verificationvia model checking:
OK if Unsat(Program Bad)
Dual use:• for abstract interpretation• Unsat/validity checking for pure logical reasoning abstract interpretation in service to logic!
[CAV 2012]
DARPA BET IPR 69
The importance of data structures
• Classic union-find– plus layers– plus least-upper bound
• Given UF1 and UF2, find the coarsest partition that is finer than UF1 and UF2
• Roughly, “confluent, partially-persistent union-find”
Array Separation Logic• Satisfiable examples elem(a1) * arr(a2) elem(a1) * arr_seg(a2,a3) * arr(a3) arr_seg(a1,a2) * arr_seg(a2,a3) * arr(a3)
• Unsatisfiable arr_seg(a1,a2) * arr_seg(a2,a1)
71
a1 a2 a3
Separation Logic Frame Rule
72
{ P } S { Q }{ P * R } S { Q * R}
“Whatever the context is, it remains unchanged”
Local Reasoning:Do not have to describe unchanged context
Property Verification via Model CheckingOK if Unsat(Program Bad)
73
ProgramCounter == 2042 [a1 a2] * list(a2)
BET 74
Extend WALi to use
• Weighted Automaton Library (WALi):– supports context-sensitive
interprocedural analysis– weights = dataflow transformers– weighted version of PDSs (a la
material on specialized slicing)
• More precise results in abstract interpretation
• Easier implementation of analysis tools
JungheeLim
AdityaThakur
BET 75
AlphaHat
• AlphaHat technique in three ways– WALi + AlphaHat (Aditya Thakur and Junghee Lim)
• ~October 2012
– Boogie + AlphaHat for source code (Akash Lal at Microsoft India)• ~November 2012
– Boogie + AlphaHat for machine code (Aditya Thakur and Junghee Lim)• ~November 2012
DARPA BET IPR 76
Outline of Talk
• Review of goals• Progress (May-Oct. 2012) and plans (2013)
– Component identification• generalized concept analysis• object traces
– Component extraction• specialization slicing• inferred invariants for function summaries• partial evaluation (planned)
– Verifying component properties• affine-inequality domain for modular arithmetic• symbolic abstraction (BET + ONR STTR)• format-compatibility checking (ONR)• stretched-Tree-IC3 (planned)
• Recap of publications/submissions• Recap of plans for 2013
77
Goal: check format compatibility
Producer componen
t
Consumer componen
t
1. Infer output format2. Infer accepted format3. Check compatibility
DARPA BET IPREvan
Driscoll
78
Formats are strings over “types”
ID CM FG FG OS …M TIME
Header of gzip format:
short byte byte word byte byte
DARPA BET IPR
79
Current work: enhance format spec
nrows ncols pix11 pix12 pix13 pix14 pix21 pix22 pix23 …
DARPA BET IPR
80
Current work: enhance format spec
nrows ncols pix11 pix12 pix13 pix14 pix21 pix22 pix23 …
nrows
DARPA BET IPR
81
Current work: enhance format spec
nrows ncols pix11 pix12 pix13 pix14 pix21 pix22 pix23 …
nrows ncols
ncols ncols
nrows
DARPA BET IPR
82
Current work: enhance format spec
nrows ncols pix11 pix12 pix13 pix14 pix21 pix22 pix23 …ncols ncols
nrows
nrows:int ncols:int ((byte byte byte byte)ncols)nrows
Infer an automaton equivalent to:
DARPA BET IPR
84
Roadmap: Compatibility
Producer component
Consumer component
Inferred XFA Inferred XFA ?
DARPA BET IPR
85
Status
Prototype essentially done, but not well-tested. Working on performance and on finding tests.
DARPA BET IPR
86
How we do it
Exponents start as standard Kleene *,and correspond to program loops
nrows:int ncols:int ((byte byte byte)*)*
DARPA BET IPR
87
How we do it
nrows:int ncols:int ((byte byte byte)*)*
We instrument loops with trip countsWe instrument I/O calls to remember values
DARPA BET IPR
88
How we do it
nrows:int ncols:int ((byte byte byte)*)*
We instrument loops with trip countsWe instrument I/O calls to remember values
When two of these are found to always equal, replace the * with an exponent
trip countremembered I/O value
DARPA BET IPR
89
How we do it
nrows:int ncols:int ((byte byte byte)*)nrows
We instrument loops with trip countsWe instrument I/O calls to remember values
When two of these are found to always equal, replace the * with an exponent
trip countremembered I/O value
DARPA BET IPR
90
How we do it
nrows:int ncols:int ((byte byte byte)*)nrows
We instrument loops with trip countsWe instrument I/O calls to remember values
When two of these are found to always equal, replace the * with an exponent
DARPA BET IPR
91
How we do it
nrows:int ncols:int ((byte byte byte)ncols)nrows
We instrument loops with trip countsWe instrument I/O calls to remember values
When two of these are found to always equal, replace the * with an exponent
DARPA BET IPR
92
We use Daikon
Daikon identifies dynamic invariants• Hold over all test runs; might not actually be invariants• Could use statically inferred instead
We wrote our own Daikon front end for machine code• Assumes debugging information
• can we remove this restriction?• Front ends supplied with Daikon not sufficient
• checks only entry-to-exit invariants, whereas we need• loop trip-count instrumentation• I/O-to-loop-exit invariants
• Instruments program using DyninstDARPA BET IPR
93
Instrumentation remembers I/O vals
If value is returned:x = read_int(); x = __io1 = read_int();
If value is “returned” via out parameter:err = read_int(&x); err = read_int(&x);
__io2 = *(&x);
If value is passed by parameter:write_int(x); __io3 = x;
write_int(x);
DARPA BET IPR
96
Instrumentation finds trip counts
On loop entry:Set trip count to 0__trip1 = 0;
Entering loop body:Increment trip count__trip1++;
DARPA BET IPR
97
Instrumentation finds trip counts
On loop entry:Set trip count to 0__trip1 = 0;
Entering loop body:Increment trip count__trip1++;
On loop exit:Output current value of variablesInterested in invariants hereprint(__io1, __io2, …, __trip1);
DARPA BET IPR
98
We use Daikon to find I/O equalities
I/O equalitiesValue trace
Instrumentedprogram
Dakion dynamicinvariant detector
LOOP_EXIT_A__io2 = 2__io4 = 5__trip_count_A = 5
LOOP_EXIT_B__io2 = 6__io4 = 5__trip_count_B = 6
__trip_count_A = __io4 = 5__trip_count_B = __io2 = 6
DARPA BET IPR
99
We model programs as XFAs
XFAs: extended finite automata
Add separate bounded “data state” to standard FAsTransformers on transitions describe data-state changes
DARPA BET IPR
DARPA BET IPR 100
Outline of Talk
• Review of goals• Progress (May-Oct. 2012) and plans (2013)
– Component identification• generalized concept analysis• object traces• component dependences (planned)
– Component extraction• specialization slicing• inferred invariants for function summaries• partial evaluation (planned)
– Verifying component properties• affine-inequality domain for modular arithmetic• symbolic abstraction (BET + ONR STTR)• format-compatibility checking (ONR)• stretched-Tree-IC3 (planned)
• Recap of publications/submissions• Recap of plans for 2013
BET 101
Stretched-TreeIC3
• IC3 (Bradley et al.)– SAT-based symbolic model checking technique for
hardware specifications– Incrementally refines the property P, eventually producing
either an inductive strengthening of the property or a counter-example trace
BET 102
Stretched-TreeIC3 (Cont.)
• TreeIC3 (Cimatti et al.)– Software model-checking technique that adapts IC3-
inspired ideas to CFG-guided search
Unwinding CFG in an abstract space (with CEGAR)
CFG ART (Abstract Reachability Tree)
BET 103
Stretched-TreeIC3 (Cont.)
• Stretched-TreeIC3 (UW)– An extension of TreeIC3 for machine-code programs– Use trip-count-based invariant generation for better handling of loops
BET 104
Stretched-TreeIC3 (Cont.)
• We use– TSL
• Machine-code analysis framework
– CodeSurfer/x86• To obtain program variables from executables
– Daikon-Dyninst developed by Evan Driscoll• To obtain loop invariants associated with trip-count variables that
count the number of iterations taken through a loop
BET 105
Stretched-TreeIC3
• Finish the base implementation of Stretched-TreeIC3– ~ November 2012
• Integrate Daikon-Dyninst to Stretched-TreeIC3– ~ December 2012
DARPA BET IPR 106
Outline of Talk
• Review of goals• Progress (May-Oct. 2012) and plans (2013)
– Component identification• generalized concept analysis• object traces• component dependences (planned)
– Component extraction• specialization slicing• inferred invariants for function summaries• partial evaluation (planned)
– Verifying component properties• affine-inequality domain for modular arithmetic• symbolic abstraction (BET + ONR STTR)• format-compatibility checking (ONR)• stretched-Tree-IC3 (planned)
• Recap of publications/submissions• Recap of plans for 2013
DARPA BET IPR 107
Recap of publications/submissions1. Thakur, A., Elder, M., and Reps, T., Bilateral algorithms for symbolic abstraction. In Proc. Static
Analysis Symposium (SAS), Sept. 2012. http://research.cs.wisc.edu/wpis/papers/sas12-bilateral-alphahat.pdf
2. Thakur, A. and Reps, T., A generalization of Staalmarck's method. In Proc. Static Analysis Symposium (SAS), Sept. 2012. Extended version: http://research.cs.wisc.edu/wpis/papers/tr1699r.pdf
3. Thakur, A. and Reps, T., A method for symbolic computation of abstract operations. In Proc. Computer-Aided Verification (CAV), July 2012. http://www.cs.wisc.edu/wpis/papers/CAV12-Staalmarck.pdf
4. Driscoll, E., Thakur, A., and Reps, T., OpenNWA: A nested-word-automaton library (tool paper). In Proc. Computer-Aided Verification (CAV), July 2012. http://www.cs.wisc.edu/wpis/papers/CAV12-OpenNWA.pdf
5. Lim, J. and Reps. T., TSL: A system for generating abstract interpreters and its application to machine-code analysis. TR-1775, Computer Sciences Department, University of Wisconsin, Madison, WI, Oct. 2012. (Submitted for journal publication.) http://research.cs.wisc.edu/wpis/papers/tr1775.pdf
6. Aung, M., Horwitz, S., Joiner, R., and Reps, T., Specialization slicing. To be submitted for journal publication, Oct. 2012.
DARPA BET IPR 108
Papers in preparation
• Elder, M., Lim, T., Sharma, T., Andersen, T., and Reps, T., Abstract domains of affine relations. Invited by the SAS 2011 PC for accelerated refereeing for ACM TOPLAS.
• Thakur, A., Lal, A., Lim, J., and Reps, T., Alphahat and all that: Attaining most-precise abstract interpretation.
• Driscoll, E. and Reps, T., An algorithm for format-compatibility checking.
• Sharma, T., Thakur, A., and Reps, T., An abstract domain of bitvector inequalities.
DARPA BET IPR 109
Recap of plans for 2013
• Component identification– object traces class hierarchies
• Component extraction– partial evaluator for machine code
• Verifying component properties
• separation logic• WALi-based and Boogie-based invariant finding
– bitvector-inequality domain– Stretched-TreeIC3