Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central...

Automated Whitebox Fuzz Testing(NDSS 2008)

Presented by:

Edmund Warner

University of Central Florida

April 7, 2011

David Molnar

UC Berkeley

[email protected]

Michael Y. Levin

Microsoft (CSE)

[email protected]

Patrice Godefroid

Microsoft (Research)

[email protected]

Acknowledgments

Figures are taken directly from the paper or original presentation slides

Some slides reused from the original presentation

Overview

Definition of Whitebox Fuzz Testing

The Search Algorithm

SAGE (Scalable, Automated, Guided Execution)

Test Findings

Conclusions

What is Whitebox Fuzz Testing?

Fuzz testing is a form of blackbox random testing

Can be remarkably effective, but there are limitations

Given the then branch statement:

If (x == 10) then... has 1 in 2^32 chance of being executed if

x is a random 32-bit input

Can provide low code coverage

Whitebox Fuzz Testing

Combine fuzz testing with dynamic test generation Run the code with its input

Collect constraints on inputs with symbolic execution

Generate new constraints

Solve constraints with constraint solver

Synthesize new inputs

Whitebox Fuzz Testing

In theory, this approach can lead to full program path coverage

Practically, it will fall short and the search will be incomplete:

Number of execution paths in the program is huge

Symbolic execution, constraint generation, and constraint solving are necessarily imprecise


With blackbox fuzzing, it is unlikely to catch the error

(5 values out of 2^(8*4) 4-byte cases)

This is rather simple, however, for dynamic test generation

Dynamic Test Generation

For instance, we run the input “good” on the program.

We develop a path constraint based off of the conditional statements crossed:

<i0 != 'b', i1 != 'a', i2 != 'd', i3 != '!'>

Create a new path constraint:

<i0 = 'g', i1 != 'o', i2 != 'o', i3 = '!'>

Limitations

Path Explosion Does not scale to large, realistic programs

Can be alleviated with different methods in the search algorithm

Imperfect Symbolic Execution Complex program statements (pointer manipulation)

OS and library functions (cost)


Solution: Generational Search Places the initial input in a workList

Runs program for bugs in the first execution

WorkList is processed by selecting an element and expanding it

Run with child inputs

Assigned a score

Added to workList


More on ExpandExecution Tests program with input

Generates path constraints (PC)

Attempt to expand path constraints

If so, save for later execution


What does this mean? Given input with PC

Attempts to expand all constraints in PC Instead of just the last with a depth-first search

Or the first with a breadth-first search

A parameter bound is used to limit backtracking through parent nodes

End Result: achieve the largest search space in the shortest amount of time

SAGE

Scalable, Automated, Guided Execution

Can test any file-reading program running on Windows by treating bytes read from files as symbolic input.

SAGE Architecture

Instead of being source-based, SAGE is a machine-code-based instrumentation

Multitude of languages and build processes No need for specific source, compiler and build operations

Slower to start, but encompasses much more

Compiler and post-build transformations By performing symbolic execution on binary code that actually ships,

SAGE can detects bugs also in the compiling and post-processign tools

Unavailability of source Source-based may be difficult for self-modifying or JITed code

SAGE doesn't need the data types or structures not visible at machine code level

Constraint Generation

SAGE is trace-based

Uses replay of trace to update the concrete and symbolic stores

This allows constraints to be built on input values

*Given conditional jumps, it uses bitvectors to tag the EFLAGS used for the jumps

Constraint Optimization

SAGE employs a number of optimization techniques to improve speed and decrease memory consumption:

Tag catching

Unrelated constraint elimination

Local constraint catching

Flip count limit

Concretization

Constraint subsumption**

Constraint subsumption checks to see if newly created contstraints imply or are being implied

Findings

Generational Search vs. Depth-First Search On Media1,2,3 applications they tested, DFS terminated in

~11 hours with nothing. GS ran for slightly longer and found 15 crashes in 4 buckets in Media3.

Bogus files find few bugs Divergences are common: ~60% Most bugs are shallow** Impact of the block-coverage heuristic

Adding 10407 blocks instead of 10633; not very effective in most cases

Conclusions

Most unique bugs found are on well formatted input, and in few generations

There may be a limited sample size, but the success of finding bugs previously missed suggests a new search strategy

SAGE still needs enhancement: precision, power

Contributions

A critical vulnerability was found in the MS07-017 ANI, which has been missed by extensive blackbox testing and static analysis

A new search algorithm was introduced for systematic test generation, which has been optimized for large applications

Introduction and implementation of SAGE, which can scale to programs with hundreds of millions of instructions

Weaknesses

The paper itself is hard to understand in certain areas

Sometimes there is nondeterminism shown in the coverage of the program Same input, same program, same machine,

different coverage

Improvements

Paper – more figures explaining the heuristics and rules

Nondeterminism – export input coverage results to a database to be checked so that nothing is repeated

Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central...

Documents

Transcript of Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central...