Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori,...
-
Upload
alberta-watson -
Category
Documents
-
view
222 -
download
1
Transcript of Specification Inference for Explicit Information Flow Problems Benjamin Livshits, Aditya V. Nori,...
MerlinSpecification Inference for Explicit Information Flow Problems
Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani
Microsoft Research
Anindya BanerjeeIMDEA Software
Mining Security Specifications
• Problem: Can we automatically infer which routines in a program are sources, sinks and sanitizers?
• Technology: Static analysis + Probabilistic inference
• Applications:– Lowers false errors from tools– Enables more complete flow checking
• Results:– Over 300 new vulnerabilities discovered in 10
deployed ASP.NET applications
Motivation
Static Analysis Tools for Security
Web application vulnerabilities are a serious threat!
CAT.NET
Web Application Vulnerabilities
$username = $_REQUEST['username'];$sql = "SELECT * FROM Students WHERE username = '$username';
Propagation graph
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
Propagation graphm1→ m2 iff information flows “explicitly” from m1 to m2
void ProcessRequest() { string s1 = ReadData1("name"); string s2 = ReadData2("encoding"); string s11 = Prop1(s1); string s22 = Prop2(s2); string s111 = Cleanse(s11); string s222 = Cleanse(s22);
WriteData("Parameter " + s111); WriteData("Header " + s222); }
Specification Source
returns tainted data Sink
error to pass tainted data
Sanitizer cleanse or untaint
the input Regular nodes
propagate input to output
Every path from a source to a sink should go through a sanitizer
Any source to sink path without a sanitizer is an information flow vulnerability
Vulnerability
void ProcessRequest() { string s1 = ReadData1("name"); string s2 = ReadData2("encoding"); string s11 = Prop1(s1); string s22 = Prop2(s2); string s111 = Cleanse(s11); string s222 = Cleanse(s22);
WriteData("Parameter " + s111); WriteData("Header " + s222); }
Information flow vulnerabilities
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
Vulnerabilities?
void ProcessRequest() { string s1 = ReadData1("name"); string s2 = ReadData2("encoding"); string s11 = Prop1(s1); string s22 = Prop2(s2); string s111 = Cleanse(s11); string s222 = Cleanse(s22);
WriteData("Parameter " + s111); WriteData("Header " + s222); }
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
Information flow vulnerabilities
source
sink
source
sanitizer
void ProcessRequest() { string s1 = ReadData1("name"); string s2 = ReadData2("encoding"); string s11 = Prop1(s1); string s22 = Prop2(s2); string s111 = Cleanse(s11); string s222 = Cleanse(s22);
WriteData("Parameter " + s111); WriteData("Header " + s222); }
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
Information flow vulnerabilities
source
sanitizer
sink
source
Goal
AssumptionMost flow paths in the propagation graph
are secure
Given a propagation graph, can we infer a specification or ‘complete’ a partial specification?
Algorithms
Merlin Architecture
Initial specificati
on
Program
Final specificatio
n
Prop. graph
construction
Factor graph
construction
Probabilistic
inference
Merlin
Vulnerabilities
CAT.NET
Propagation Graph Construction
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
CAT.NET
void ProcessRequest() { string s1 = ReadData1("name"); string s2 = ReadData2("encoding"); string s11 = Prop1(s1); string s22 = Prop2(s2); string s111 = Cleanse(s11); string s222 = Cleanse(s22);
WriteData("Parameter " + s111); WriteData("Header " + s222); }
Inference?
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
source
sanitizer
sink
?ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
source
?
sink
source
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
source source
sanitizer
?
Path constraints
For every acyclic path m1 m2 … mn the probability that m1 is a source, mn is a sink, and m2, …, mn-1 are not sanitizers is very low
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
Exponential number of path constraints: O(2|V|)!
ReadData1 Prop1 Cleanse WriteDatasource sink
Triple constraints
• For every triple <m1, mi, mn> such that mi
is on a path from m1 to mn, the probability that m1 is a source, mn is a sink, and mi is not a sanitizer is very low
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
Cubic number of triple constraints: O(|V|3)!
ReadData1 Prop1 WriteData
ReadData1 WriteDataCleanse
source sink
source sink
Minimizing Sanitizers
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
source source
sink
Minimizing Sanitizers
For every pair of nodes m1, m2 such that m1 and m2 lie on the same path from a potential source to a potential sink, the probability that both m1 and m2 are sanitizers is low
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
ReadData1 Prop1 Cleanse WriteData
source sink
Need for probabilistic constraints
ReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
a
b
c
d
Triple constraints ¬(a ∧ ¬b ∧ d) ¬(a ∧ ¬c ∧ d)
Avoid double sanitizers ¬(b ∧ c)
a ∧ d ⇒ b a ∧ d ⇒ c ¬(b ∧ c)
Boolean formulas as probabilistic constraints
(x1∨ x2) ∧ (x1∨ ¬x3) C1 C2
f(x1, x2, x3) = fC1(x1, x2) ∧ fC2(x1, x3) fC1(x1, x2) = fC2(x1, x3) =
1 if x1∨ x2 = true0 otherwise
1 if x1∨ ¬x3 = true0 otherwise
Boolean formulas as probabilistic constraints
f(x1, x2, x3) = fC1(x1, x2) ∧ fC2(x1, x3) fC1(x1, x2) = fC2(x1, x3) =
1 if x1∨ x2 = true0 otherwise
1 if x1∨ ¬x3 = true0 otherwise
(x1∨ x2) ∧ (x1∨ ¬x3) C1 C2
p(x1, x2, x3) = fC1(x1, x2) × fC2(x1, x3)/Z Z = ∑x1, x2, x3 (fC1(x1, x2) × fC2(x1, x3))
0.90.1
0.90.1
Solution = Marginalization
• Step 1: choose xi with highest pi(xi) and set xi = true if pi(xi) is greater than a threshold, false otherwise• Step 2: recompute marginals and repeat Step 1 until all variables have been assigned
marginal
pi(xi) = ∑x1 … ∑x(i-1) ∑x(i+1) … ∑xN p(x1, … , xN)
Factor graphs: efficient computation of marginals
fC1(x1, x2) = fC2(x1, x3) =
1 if x1∨ x2 = true0 otherwise1 if x1∨ ¬x3 = true0 otherwise
Factor GraphsReadData1
Prop1 Prop2
Cleanse
WriteData
ReadData2
Probabilistic InferenceSourc
eSanitiz
erSink
ReadData1 .95 .001 .001ReadData2 .5 .5 .5Cleanse .5 .5 .5WriteData .5 .5 .85…
Source
Sanitizer
Sink
ReadData1 .95 .001 .001ReadData2 .5 .5 .5Cleanse .01 .997 .03WriteData .5 .5 .85…
ProbabilisticInference
Paths vs. Triples
TheoremPath refines Triple
Experiments
Implementation
Merlin is implemented in C# Uses CAT.NET for building the
propagation graph Uses Infer.NET for probabilistic
inference▪ http://research.microsoft.com/infernet
Experiments
10 line-of-business applications written in C# using ASP.NET
Summary of Discovered Specifications
Sources Sanitizers Sinks0
20
40
60
80
100
120
140
160
27 7
7794
32
152Original With Merlin
Summary of Discovered Vulnerabilities
-50 0 50 100 150 200 250 300 350 400
89
342
13
Original
With MerlinFalse
positives eliminate
d
Experiments - summary
10 large Web apps in .NET Time taken per app < 4 minutes New specs: 167 New vulnerabilities: 322 False positives removed: 13 Final false positive rate for CAT.NET
after Merlin < 1%
Summary
Merlin is first practical approach to infer explicit information flow specifications
Design based on a formal characterization of an approximate probabilistic constraint system
Able to successfully and efficiently infer explicit information flow specifications in large applications which result in detection of new vulnerabilities
http://research.microsoft.com/merlin