Software Design Rules Monica Lam Stanford ...lam/hm.pdf · Software Design Rules Monica LamMonica...
Transcript of Software Design Rules Monica Lam Stanford ...lam/hm.pdf · Software Design Rules Monica LamMonica...
Software Design RulesSoftware Design RulesSoftware Design Rules
Monica LamMonica LamMonica Lam
Stanford UniversityStanford UniversityStanford University
Joint work with: Sudheendra Hangal, David Heine, Ben Livshits, Michael Martin, John Whaley
1
Software is Full of Errors
Error rate: 1-4.5 errors per 1000 linesWindows 2000
35M LOC, 63000 known bugs at time of release2 per 1000 lines
Large consumer softwareFormal specification & verification infeasible
2
Buffer Overruns
A buffer access must stay within boundsBuffer overruns are responsible for over 50% of major vulnerabilitiesMS Blaster, Slammer, Code red, nimda, …, Internet Worm, 1988Responsible for damages in billions
3
Memory Leaks
All unused memory should be freed once and only onceMemory exhaustion may cause long running programs to fail.
4
SQL Injection Errors
Database
Web serverFront end
user “Give me Bob’s credit card #”or
“Delete all records”
User may not supply SQL queries to databases directlyOne of top ten vulnerabilities
5
Happy-go-lucky SQL Query
User supplies: name, password
SELECT UserID, CreditcardFROM RecordsWHERE
Name = ‘ + name+’ AND PW =‘ + password + ’
6
Fun with SQL
“ — ”: means “the rest are comments” in SQLSELECT UserID, CreditCard
FROM RecordsWHERE:
Name = ‘bob ’ AND PW = ‘foo’Name = ‘bob’— ’ AND PW = ‘x’Name = ‘bob’ or 1=1—’ AND PW = ‘x’Name = ‘bob’; DROP Records—’ AND PW = ‘x’
7
Design Rules
Buffer overruns, memory leaks, SQL injections
Same error is often repeated many times
May be specific to a language, a class of applications, a programImportant: may be critical to securitySuccinct: governs many lines of code
8
General Practice
Design rules are implicitViolations of design rules rampant Tools: purify,
grep, emacs, program environments
9
Leverage Computer Power
Courtesy: Intel
10
Programs Don’t Grow Exponentially!
01020304050
1990 1995 2000 2005
Size of Microsoft Windows
Mill
ion
Line
s of
Cod
e
NT3.195
98NT5.0
2000XP
11
This Talk
New generation of“Computer-Aided Programming Tools”
to enforce critical software design rules
12
Custom Design Rule Checkers
Intrinsa, Coverity: C checkersBuilt-in rules for operating systemsRelatively simple analysis UnsoundFound thousands of critical errors in Windows, Linux, BSD, …
13
New-Generation Tools
Design Rules
AdvancedStatic
Analysis
DynamicDetection
& Recovery
User-Supplied, Application-
Specific
AutomaticExtraction
14
PQL: PQL: a Program Query Languagea Program Query Language
15
PQL: a Program Query Language
Design Rules
AdvancedStatic
Analysis
DynamicDetection
& Recovery
User-Supplied, Application-
Specific
AutomaticExtraction
16
PQL Query
Pattern: Illegal sequences of operations on related objectsLooks like the simplest code excerptwith pattern
Action:Print out result, halt program, or recover
17
SQL Injectionx = req.getParameter();stmt.executeQuery (x);
o1 = req.getParameter(); formal = o1;o2.f = formal;o3.g = o2.f;stmt.executeQuery(o3.g);
getParameter
executeQuery
x
PQL:
18
Basic Question
p1 and p2 point to the same object?Undecidable!
getParameter executeQueryx
p1 p2
19
Dynamic Checker
p1 and p2 point to the same object?Instrument Java byte codesgetParameter: record ID of p1’s pointeeexecuteQuery:
check if p2’s pointee has been recorded
getParameter executeQueryx
p1 p2
20
Static Checker
p1 and p2 point to same object?Pointer alias analysis
getParameter executeQueryx
p1 p2
21
Pointer Alias AnalysisPointer Alias Analysis
[Whaley & Lam, PLDI 2004]
22
Pointer Analysis
h1: v1 = new Object();h2: v2 = new Object();
v1.f = v2;v3 = v1.f;
Input RelationsvPointsTo(v1,h1)vPointsTo(v2,h2)Store(v1,f,v2)Load(v1,f,v3)
Output RelationshPointsTo(h1,f,h2)vPointsTo(v3,h2)
v1 h1
v2 h2
fv3
23
hPointsTo(h1, f, h2) :- Store(v1, f, v2),vPointsTo(v1, h1),vPointsTo(v2, h2).
v1 h1
v2 h2
f
Inference Rule in Datalog
v1.f = v2;
Stores:
24
Pointer Alias Analysis
Specified by 5 Datalog rulesCreation sitesAssignmentsStoresLoadsType filter
Apply rules until they converge
25
Method Invocations
Context insensitive is impreciseUnrealizable paths
Object id(Object x) {return x;
}
a = id(b); c = id(d);
26
Object id(Object x) {return x;
}
Object id(Object x) {return x;
}
Context Sensitivity
Context sensitivity is important for precision.Conceptually give each caller its own copy.
a = id(b); c = id(d);
27
Cloning-Based Analysis
Simple brute force technique.Clone every path through the call graph.Run context-insensitive algorithm on expanded call graph.
The catch: exponential blowup
28
Cloning is exponential!
29
Recursion
Actually, cloning is unbounded in the presence of recursive cycles.Technique: We treat all methods within a strongly-connected component as a single node.
30
Recursion
A
G
B C D
E F
A
G
B C D
E F E F E F
G G
31
Top 20 Sourceforge Java AppsNumber of Clones
1.E+001.E+021.E+041.E+061.E+081.E+101.E+121.E+141.E+16
1000 10000 100000 1000000Size of program (variable nodes)
Num
ber o
f clo
nes
1016
1012
108
104
100
32
Cloning is infeasible (?)Typical large program has ~1014 pathsIf you need 1 byte to represent a clone:
256 terabytes of storage> 12 times size of Library of Congress1GB DIMMs: $98.6 million
Power: 96.4 kilowatts (128 homes)300 GB hard disks: 939 x $250 = $234,750
Time to read sequential: 70.8 days
33
BDD comes to the rescue
Many similarities across contexts.Many copies of nearly-identical results.
BDDs (binary decision diagrams)can represent large sets of redundant data efficiently.
34
Static Checker Generation
PQL Query
Datalog
BDD code
2 lines
10 lines
1000 lines
35
BDD: Binary Decision DiagramsBDD: Binary Decision Diagrams
36
Call Graph Relation
Call graph expressed as a relation.Five edges:
Calls(A,B)Calls(A,C)Calls(A,D)Calls(B,D)Calls(C,D)
B
D
C
A
37
Call Graph RelationRelation expressed as a binary function.
A=00, B=01, C=10, D=11
01011
111100000101001001011110100011
001111
1001100x3
0111
0010011000101100100011000000fx4x2x1
B
D
C
A 00
1001
11
38
Binary Decision DiagramsGraphical encoding of a truth table.
x2
x4
x3 x3
x4 x4 x4
0 0 0 1 0 0 0 0
x2
x4
x3 x3
x4 x4 x4
0 1 1 1 0 0 0 1
x1 0 edge1 edge
39
Binary Decision DiagramsCollapse redundant nodes.
x2
x4
x3 x3
x4 x4 x4
0 0 0 0 0 0 0
x2
x4
x3 x3
x4 x4 x4
0 0 0 0
x1
11 1 1 1
40
Binary Decision DiagramsCollapse redundant nodes.
x2
x4
x3 x3
x4 x4 x4
x2
x4
x3 x3
x4 x4 x4
0
x1
1
41
Binary Decision DiagramsCollapse redundant nodes.
x2
x4
x3 x3
x2
x3 x3
x4 x4
0
x1
1
42
Binary Decision DiagramsCollapse redundant nodes.
x2
x4
x3 x3
x2
x3
x4 x4
0
x1
1
43
Binary Decision DiagramsEliminate unnecessary nodes.
x2
x4
x3 x3
x2
x3
x4 x4
0
x1
1
44
Binary Decision DiagramsEliminate unnecessary nodes.
x2
x3
x2
x3
x4
0
x1
1
45
Binary Decision Diagrams
Represent tiny and huge relations compactlySize depends on redundancy
Similar contexts have similar numberingsVariable ordering in the BDD
10 minutes or “runs out of memory”Active machine learning algorithm
46
Expanded Call GraphA
DB C
E
F G
H
0 1 2
A
DB C
E
F G
H
E E
F F GG
H H H H H
0 12
210
47
Numbering ClonesA
DB C
E
F G
H
A
DB C
E
F G
H
E E
F F GG
H H H H H
0 0 0
0 1 2
0-2 0-2
0-2 3-5
0 0 0
0 1 2
0 12
210
0 1 2 3 4 5
0
48
ExperienceContext-sensitive points-to analysis
800K byte codes in less than 20 minutes
PQL suitable for many known error patternsSQL injectionResource leakagePersistence object management
Applied to 6 large programs unknown to usFound 44 critical errors easily
49
DynamicDetection
& Recovery
AutomaticExtraction
DynamicDetection
& Recovery
AdvancedStatic
Analysis
User-Supplied, Application-
Specific
AutomaticExtraction
DynamicDetection
& Recovery
AutomaticExtraction
New-Generation Tools
Design Rules
DynamicDetection
& Recovery
AdvancedStatic
Analysis
User-Supplied, Application-
Specific
AutomaticExtraction
PQL
50
Automatic Design Rule Extraction
Too many design rules to specifyPrinciple: Most of the code is correctInconsistency = Errors
ClouseauClouseau: : a static memory leak detectora static memory leak detector
52
Ownership Model
Owner pointerObligated to delete object, orpass ownership to another pointer
Every object has an owner pointerNo memory leaks, no double deletes
53
Past Experience
Decorate each function parameter with “own” “not-own”Not enforced many mistakes
54
Assignment Statement
a = new int;
b = a;
delete a;
a = new int;
b = a;
delete b;
int
a b
int
a b
b = a own(b) = 0 own(a) = own(a’) + own(b’)
own(b),own(a)
own’(b),own’(a)
ClouseauAutomatically infers function signatures
Based on “new” “delete”Constraint: there is always 1 owning pointer
Inconsistency: errorA 125K-line commercial program
50 lines of user specification on container structure (generic data types)Root-causes 82% of memory leaked dynamicallyFound many additional errors
[Heine&Lam, PLDI 2003]
56
Clouseau
DynamicDetection
& Recovery
AutomaticExtraction
DynamicDetection
& Recovery
AdvancedStatic
Analysis
User-Supplied, Application-
Specific
AutomaticExtraction
DIDUCE
New-Generation Tools
Design Rules
DynamicDetection
& Recovery
AdvancedStatic
Analysis
AutomaticExtraction
57
DIDUCEDIDUCEDIDUCE
Dynamic Invariant Deduction ∪
Checking Engine(deduces, but sometimes incorrectly)
58
Motivation
Difficulty in finding root cause of complex errors
DIDUCE can potentially pinpoint the culprit automatically
59
Example
For line # 1234,
Times 1-1000000: 0 <= X <= 3Time 1000001: X = 0xa3d025ef
Aha! must be an error
60
DIDUCE Design
Monitors every memory access!Learns signature for the data accessed by each bytecode from correct runsReport anomalies observedAnomalies signal cause of errors
61
Experience
Pinpointed line of code causing errorsErrors in imap servers:Change triggers a latent error elsewhereMemory subsystem simulator: otherwise unknown errors
Reported corner cases of interest
62
LessonsDumb but tireless
High overhead but pain-freeUnaffected by programmers’misconception
Finds many serendipitous design rulesNeed to trigger just 1 of the rules[Hangal & Lam, ICSE 02]
63
Summary
Design Rules
AdvancedStatic
Analysis
DynamicDetection
& Recovery
User-Supplied, Application-
Specific
AutomaticExtraction
64
ConclusionsNew tools that exploit computing power to manage software complexityNew advanced analyses
Pointer alias analysis--Binary decision diagrams
Analysis available to programmersPQL: easy to express execution patterns
Automatic design rule extractionInconsistencies errors