Automated Whitebox Fuzz Testing(NDSS 2008)
Presented by:
Edmund Warner
University of Central Florida
April 7, 2011
David Molnar
UC Berkeley
Michael Y. Levin
Microsoft (CSE)
Patrice Godefroid
Microsoft (Research)
Acknowledgments
Figures are taken directly from the paper or original presentation slides
Some slides reused from the original presentation
Overview
Definition of Whitebox Fuzz Testing
The Search Algorithm
SAGE (Scalable, Automated, Guided Execution)
Test Findings
Conclusions
What is Whitebox Fuzz Testing?
Fuzz testing is a form of blackbox random testing
Can be remarkably effective, but there are limitations
Given the then branch statement:
If (x == 10) then... has 1 in 2^32 chance of being executed if
x is a random 32-bit input
Can provide low code coverage
Whitebox Fuzz Testing
Combine fuzz testing with dynamic test generation Run the code with its input
Collect constraints on inputs with symbolic execution
Generate new constraints
Solve constraints with constraint solver
Synthesize new inputs
Whitebox Fuzz Testing
In theory, this approach can lead to full program path coverage
Practically, it will fall short and the search will be incomplete:
Number of execution paths in the program is huge
Symbolic execution, constraint generation, and constraint solving are necessarily imprecise
The Search Algorithm
With blackbox fuzzing, it is unlikely to catch the error
(5 values out of 2^(8*4) 4-byte cases)
This is rather simple, however, for dynamic test generation
Dynamic Test Generation
For instance, we run the input “good” on the program.
We develop a path constraint based off of the conditional statements crossed:
<i0 != 'b', i1 != 'a', i2 != 'd', i3 != '!'>
Create a new path constraint:
<i0 = 'g', i1 != 'o', i2 != 'o', i3 = '!'>
Limitations
Path Explosion Does not scale to large, realistic programs
Can be alleviated with different methods in the search algorithm
Imperfect Symbolic Execution Complex program statements (pointer manipulation)
OS and library functions (cost)
The Search Algorithm
Solution: Generational Search Places the initial input in a workList
Runs program for bugs in the first execution
WorkList is processed by selecting an element and expanding it
Run with child inputs
Assigned a score
Added to workList
The Search Algorithm
More on ExpandExecution Tests program with input
Generates path constraints (PC)
Attempt to expand path constraints
If so, save for later execution
The Search Algorithm
What does this mean? Given input with PC
Attempts to expand all constraints in PC Instead of just the last with a depth-first search
Or the first with a breadth-first search
A parameter bound is used to limit backtracking through parent nodes
End Result: achieve the largest search space in the shortest amount of time
SAGE
Scalable, Automated, Guided Execution
Can test any file-reading program running on Windows by treating bytes read from files as symbolic input.
SAGE Architecture
Instead of being source-based, SAGE is a machine-code-based instrumentation
Multitude of languages and build processes No need for specific source, compiler and build operations
Slower to start, but encompasses much more
Compiler and post-build transformations By performing symbolic execution on binary code that actually ships,
SAGE can detects bugs also in the compiling and post-processign tools
Unavailability of source Source-based may be difficult for self-modifying or JITed code
SAGE doesn't need the data types or structures not visible at machine code level
Constraint Generation
SAGE is trace-based
Uses replay of trace to update the concrete and symbolic stores
This allows constraints to be built on input values
*Given conditional jumps, it uses bitvectors to tag the EFLAGS used for the jumps
Constraint Optimization
SAGE employs a number of optimization techniques to improve speed and decrease memory consumption:
Tag catching
Unrelated constraint elimination
Local constraint catching
Flip count limit
Concretization
Constraint subsumption**
Constraint subsumption checks to see if newly created contstraints imply or are being implied
Findings
Generational Search vs. Depth-First Search On Media1,2,3 applications they tested, DFS terminated in
~11 hours with nothing. GS ran for slightly longer and found 15 crashes in 4 buckets in Media3.
Bogus files find few bugs Divergences are common: ~60% Most bugs are shallow** Impact of the block-coverage heuristic
Adding 10407 blocks instead of 10633; not very effective in most cases
Conclusions
Most unique bugs found are on well formatted input, and in few generations
There may be a limited sample size, but the success of finding bugs previously missed suggests a new search strategy
SAGE still needs enhancement: precision, power
Contributions
A critical vulnerability was found in the MS07-017 ANI, which has been missed by extensive blackbox testing and static analysis
A new search algorithm was introduced for systematic test generation, which has been optimized for large applications
Introduction and implementation of SAGE, which can scale to programs with hundreds of millions of instructions
Weaknesses
The paper itself is hard to understand in certain areas
Sometimes there is nondeterminism shown in the coverage of the program Same input, same program, same machine,
different coverage
Improvements
Paper – more figures explaining the heuristics and rules
Nondeterminism – export input coverage results to a database to be checked so that nothing is repeated
Top Related