Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central...

22
Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley [email protected] Michael Y. Levin Microsoft (CSE) [email protected] Patrice Godefroid Microsoft (Research) [email protected]

Transcript of Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central...

Page 1: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

Automated Whitebox Fuzz Testing(NDSS 2008)

Presented by:

Edmund Warner

University of Central Florida

April 7, 2011

David Molnar

UC Berkeley

[email protected]

Michael Y. Levin

Microsoft (CSE)

[email protected]

Patrice Godefroid

Microsoft (Research)

[email protected]

Page 2: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

Acknowledgments

Figures are taken directly from the paper or original presentation slides

Some slides reused from the original presentation

Page 3: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

Overview

Definition of Whitebox Fuzz Testing

The Search Algorithm

SAGE (Scalable, Automated, Guided Execution)

Test Findings

Conclusions

Page 4: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

What is Whitebox Fuzz Testing?

Fuzz testing is a form of blackbox random testing

Can be remarkably effective, but there are limitations

Given the then branch statement:

If (x == 10) then... has 1 in 2^32 chance of being executed if

x is a random 32-bit input

Can provide low code coverage

Page 5: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

Whitebox Fuzz Testing

Combine fuzz testing with dynamic test generation Run the code with its input

Collect constraints on inputs with symbolic execution

Generate new constraints

Solve constraints with constraint solver

Synthesize new inputs

Page 6: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

Whitebox Fuzz Testing

In theory, this approach can lead to full program path coverage

Practically, it will fall short and the search will be incomplete:

Number of execution paths in the program is huge

Symbolic execution, constraint generation, and constraint solving are necessarily imprecise

Page 7: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

The Search Algorithm

With blackbox fuzzing, it is unlikely to catch the error

(5 values out of 2^(8*4) 4-byte cases)

This is rather simple, however, for dynamic test generation

Page 8: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

Dynamic Test Generation

For instance, we run the input “good” on the program.

We develop a path constraint based off of the conditional statements crossed:

<i0 != 'b', i1 != 'a', i2 != 'd', i3 != '!'>

Create a new path constraint:

<i0 = 'g', i1 != 'o', i2 != 'o', i3 = '!'>

Page 9: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

Limitations

Path Explosion Does not scale to large, realistic programs

Can be alleviated with different methods in the search algorithm

Imperfect Symbolic Execution Complex program statements (pointer manipulation)

OS and library functions (cost)

Page 10: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

The Search Algorithm

Solution: Generational Search Places the initial input in a workList

Runs program for bugs in the first execution

WorkList is processed by selecting an element and expanding it

Run with child inputs

Assigned a score

Added to workList

Page 11: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

The Search Algorithm

More on ExpandExecution Tests program with input

Generates path constraints (PC)

Attempt to expand path constraints

If so, save for later execution

Page 12: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

The Search Algorithm

What does this mean? Given input with PC

Attempts to expand all constraints in PC Instead of just the last with a depth-first search

Or the first with a breadth-first search

A parameter bound is used to limit backtracking through parent nodes

End Result: achieve the largest search space in the shortest amount of time

Page 13: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

SAGE

Scalable, Automated, Guided Execution

Can test any file-reading program running on Windows by treating bytes read from files as symbolic input.

Page 14: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.
Page 15: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

SAGE Architecture

Instead of being source-based, SAGE is a machine-code-based instrumentation

Multitude of languages and build processes No need for specific source, compiler and build operations

Slower to start, but encompasses much more

Compiler and post-build transformations By performing symbolic execution on binary code that actually ships,

SAGE can detects bugs also in the compiling and post-processign tools

Unavailability of source Source-based may be difficult for self-modifying or JITed code

SAGE doesn't need the data types or structures not visible at machine code level

Page 16: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

Constraint Generation

SAGE is trace-based

Uses replay of trace to update the concrete and symbolic stores

This allows constraints to be built on input values

*Given conditional jumps, it uses bitvectors to tag the EFLAGS used for the jumps

Page 17: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

Constraint Optimization

SAGE employs a number of optimization techniques to improve speed and decrease memory consumption:

Tag catching

Unrelated constraint elimination

Local constraint catching

Flip count limit

Concretization

Constraint subsumption**

Constraint subsumption checks to see if newly created contstraints imply or are being implied

Page 18: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

Findings

Generational Search vs. Depth-First Search On Media1,2,3 applications they tested, DFS terminated in

~11 hours with nothing. GS ran for slightly longer and found 15 crashes in 4 buckets in Media3.

Bogus files find few bugs Divergences are common: ~60% Most bugs are shallow** Impact of the block-coverage heuristic

Adding 10407 blocks instead of 10633; not very effective in most cases

Page 19: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

Conclusions

Most unique bugs found are on well formatted input, and in few generations

There may be a limited sample size, but the success of finding bugs previously missed suggests a new search strategy

SAGE still needs enhancement: precision, power

Page 20: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

Contributions

A critical vulnerability was found in the MS07-017 ANI, which has been missed by extensive blackbox testing and static analysis

A new search algorithm was introduced for systematic test generation, which has been optimized for large applications

Introduction and implementation of SAGE, which can scale to programs with hundreds of millions of instructions

Page 21: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

Weaknesses

The paper itself is hard to understand in certain areas

Sometimes there is nondeterminism shown in the coverage of the program Same input, same program, same machine,

different coverage

Page 22: Automated Whitebox Fuzz Testing (NDSS 2008) Presented by: Edmund Warner University of Central Florida April 7, 2011 David Molnar UC Berkeley dmolnar@eecs.berkeley.edu.

Improvements

Paper – more figures explaining the heuristics and rules

Nondeterminism – export input coverage results to a database to be checked so that nothing is repeated