HeapMD: Identifying Heap-based Bugs using Anomaly Detection

36
HeapMD: Identifying Heap- based Bugs using Anomaly Detection Trishul M. Chilimbi Microsoft Research Redmond, WA [email protected] Vinod Ganapathy University of Wisconsin Madison, WI [email protected] ASPLOS XII, October 2006 San Jose, California

description

HeapMD: Identifying Heap-based Bugs using Anomaly Detection. ASPLOS XII, October 2006 San Jose, California. A motivating example. pNewAsset = Initialize(pAssetParams); ... if (pAssetList->next != NULL) { pNewAsset->next = pAssetList->next pAssetList->next = pNewAsset }. pNewAsset. - PowerPoint PPT Presentation

Transcript of HeapMD: Identifying Heap-based Bugs using Anomaly Detection

Page 1: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

HeapMD: Identifying Heap-based Bugs using Anomaly Detection

Trishul M. ChilimbiMicrosoft Research

Redmond, WA

[email protected]

Vinod GanapathyUniversity of Wisconsin

Madison, WI

[email protected]

ASPLOS XII, October 2006San Jose, California

Page 2: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 2

A motivating examplepNewAsset = Initialize(pAssetParams);...if (pAssetList->next != NULL) {

pNewAsset->next = pAssetList->nextpAssetList->next = pNewAsset

}

… …pAssetList

pNewAsset

Page 3: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 3

A motivating examplepNewAsset = Initialize(pAssetParams);...if (pAssetList->next != NULL) {

pNewAsset->next = pAssetList->nextpAssetList->next = pNewAsset

}

… …pAssetList

pNewAsset

Page 4: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 4

Noteworthy points

• Violation of implicit data-structure invariant– Only extremal nodes must have indegree=1

• Malformed, but pointer-correct structure– Bug does not necessarily result in a crash

… …pAssetList

pNewAsset

Page 5: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 5

Key challenges

• Programmers rarely write down invariants for heap data structures

• Bugs may not be immediately apparent

Can we infer invariants for the heapand use them for bug detection?

Page 6: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 6

Presenting HeapMD

• A tool to monitor the health of the heap

Highlights of resultsHeapMD found 40 bugs (31 new)

in 5 large commercial applications

Page 7: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 7

Talk outline

• Motivation

• Stability of the heap

• Application to bug-finding

• Related work and conclusion

Page 8: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 8

Heap data

A programmer’s perspective

1. Invisible No tangible representation

2. Can be arbitrarily modified Akin to self-modifying code

3. Can have arbitrary structure Akin to programming with gotos

If this were true in practice, building large software would be untenable

Page 9: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 9

Heap data

The reality

1. Invisible No tangible representation

2. Can be arbitrarily modified Often only a small fraction is modified

3. Can have arbitrary structure In practice, structure is simple

In practice, the heap has a simple, stable structure

Page 10: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 10

Stability of pointer-valued locations

0%

20%

40%

60%

80%

100%

gzip crafty mcf parser twolf vpr gcc vortex average

Number of pointer-valued heap locations that only store

NULL during their lifetime

Number of pointer-valued heap locations that only store only one

value during their lifetime

Number of pointer-valued heap locations that only store only two

values during their lifetime

Number of pointer-valued heap locations that store more than two values during their lifetime

91.22%

Strong evidence that a large fraction of the heap is not modified

Page 11: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 11

Simple structure of the heap

• Most data structures in large programs have low in- and out-degrees– Linked lists– Trees– Hash tables

Can we quantify the simplicity and stability of the heap?

Page 12: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 12

Simple metrics suffice

• % nodes with indegree = 0 (root nodes)

• % nodes with outdegree = 0 (leaf nodes)

• % nodes with indegree = 1

• % nodes with indegree = 2

• % nodes with outdegree = 1

• % nodes with outdegree = 2

• % nodes with indegree = outdegree

Page 13: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 13

Gathering metrics: Setup

• Instrument program (x86) to track metrics

• Execute instrumented program on a set of inputs

• Gather at metric-computation points– Function entry points– Frequency is tunable: currently 1/100,000

Page 14: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 14

% indegree = outdegree% indegree = 2

% outdegree = 0 (leaves)

Page 15: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 15

Algorithm to find stable metrics

Computemetrics

Program

Calibrationinputs

Metricreports

Determinestability

Is stable for 40% inputs?

Metric unstable(discard)

Find range of metric

Stable metrics and their ranges

Check for range violation

Other 60%

Can potentially find bugs exercised by calibration inputs

Page 16: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 16

Stable metrics exist: SPECSPEC b/m # Inputs # Stable

twolf 3 6

crafty 3 2

mcf 3 4

vpr 6 1

vortex 5 1

gzip 100 2

parser 100 3

gcc 100 2

% of vertices with Indegree = OutdegreeRange = [14.2%, 17.2%]

Page 17: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 17

Stable metrics exist: Commercial

Benchmark # Inputs # Stable

Multimedia 50 2

Interactive web-app. 50 2

PC-game (simulation) 50 2

PC-game (action) 50 1

Productivity 50 2

Page 18: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 18

Extending the observation

Several degree-based metrics of the heap-graph remain stable

as the heap evolves

Stable heap metrics exist even across different development

versions of a program

In fact, we observed that …

Page 19: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 19

Metric stability across versions

Benchmark # Versions # Inputs # Stable

Multimedia 5 10 2

Interactive web-app. 5 10 2

PC-game (simulation) 5 10 2

PC-game (action) 5 10 1

Productivity 5 10 2

Page 20: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 20

Talk outline

• Motivation

• Stability of the heap

• Application to bug-finding

• Related work and conclusion

Page 21: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 21

Architecture of HeapMD

Binaryinstrumenter

input.exe

output.exe

Stable metricsand their ranges

Metricgatherer

Metricanalyzer

Calibrationinput set

Observed metrics

Calibration

Anomalydetector

“On the field”input set

Metric rangeviolations

Bug-finding

GoalIdentify a subset of

observed metrics asstable metrics

GoalCheck the metrics

identified by trainingfor range violation

Page 22: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 22

Finding bugs using HeapMD

Key intuitionRange violation Likely bug

Page 23: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 23

Recall our motivating examplepNewAsset = Initialize(pAssetParams);...if (pAssetList->next != NULL) {

pNewAsset->next = pAssetList->nextpAssetList->next = pNewAsset

}

… …pAssetList

pNewAsset

Violated invariantOnly extremal nodes

have indegree=1

Page 24: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 24

Range violation for this exampleIndegree = 1 for PC Game (Action)

0

5

10

15

20

25

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61Execution Progress

Perc

enta

ge o

f Vert

exe

s

Log the call stackfor diagnostics

Page 25: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 25

Summary of bugs foundBenchmark # Bugs # New

bugs

# Invariant

violations

Multimedia 8 6 3

Interactive web-app 10 6 5

PC-game (simulation) 9 6 2

PC-game (action) 8 8 3

Productivity 5 5 4

Total 40 31 17

Page 26: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 26

Kinds of bugs found (1)

• Erroneous insertion into linked list

% vertices with indegree = 0, 1% vertices with outdegree = 0, 1

% vertices with indegree = outdegree

Page 27: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 27

Kinds of bugs found (2)

• Shared data structure manipulation errors

Delete

??

% vertices with outdegee = 0% vertices with indegee = 2% vertices with indegree = outdegree

Page 28: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 28

Kinds of bugs found (3)

• Data structure invariants

% vertices with outdegee = 1, 2% vertices with indegee = 1, 2

Page 29: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 29

Kinds of bugs found (3)

• Data structure invariants

% vertices with indegree = outdegree % vertices with indegee = 1

See the paper for several more examples

Page 30: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 30

Comparison with SWAT[Chilimbi and Hauswirth, ASPLOS’04]

Benchmark Leaks #FP Leaks #FP Other bugs

Multimedia 4 0 2 0 6

Interactive web-app.

9 1 4 0 6

PC-Game (simulation)

4 1 3 0 6

SWAT HeapMD

Page 31: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 31

Characteristics of bugs found

• Systemic bugs: Repeated often enough to affect heap-graph metric

• HeapMD cannot find “one-off” bugs– Temporary data structure invariant violations

Page 32: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 32

False positives and negatives

• In our experiments: 0 false positives– Metrics are computed for the whole heap

rather than per data structure– Anomaly detector only looks for range

violations, not stability

• HeapMD cannot find all bugs– Bugs with no effect on heap-graph metrics– Bugs that affect heap-graph metrics, but do

not violate calibrated range

Comes at a cost: False negatives

Page 33: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 33

Talk outline

• Motivation

• Stability of the heap

• Application to bug-finding

• Related work and conclusion

Page 34: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 34

Related work• Tools for special classes of bugs

– Purify, Valgrind, SWAT, …– Not built for detecting invariant violations

• Tools to find invariants– Daikon [Ernst], DIDUCE [Hangal and Lam, ICSE 2002]

– HeapMD complements these in the types of invariants found

• Shape analysis algorithms and tools– Can find invariant violations given a

correctness specification

Page 35: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

ASPLOS'06 Chilimbi/Ganapathy - HeapMD: Identifying Heap-based Bugs using Anomaly Detection 35

Points to take home

Stable heap-graph metrics exist.Their ranges serve as invariants

Range violation Bug likely exercised

Can find non-crashing bugse.g., invariant violations

Page 36: HeapMD: Identifying Heap-based Bugs using Anomaly Detection

HeapMD: Identifying Heap-based Bugs using Anomaly Detection

Trishul M. ChilimbiMicrosoft Research

Redmond, WA

[email protected]

Vinod GanapathyUniversity of Wisconsin

Madison, WI

[email protected]

Questions?