Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory...

30
Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia June 10, 2011

description

Simple Buffer Overflow Example 3 void foo() { int local_variables; int buffer[256]; … buffer = read_input(); … return; } Return address Local variables buffer Buffer Fill New Return address Bad Local variables If read_input() reads 200 ints If read_input() reads >256 ints buffer

Transcript of Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory...

Page 1: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

Sampling Dynamic Dataflow Analyses

Joseph L. GreathouseAdvanced Computer Architecture Laboratory

University of Michigan

University of British ColumbiaJune 10, 2011

Page 2: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

2

NIST: SW errors cost U.S. ~$60 billion/year as of 2002 FBI CCS: Security Issues $67 billion/year as of 2005

>⅓ from viruses, network intrusion, etc.

Software Errors Abound

2000 2001 2002 2003 2004 2005 2006 2007 2008 (proj.)

0

3,000

6,000

9,000CERT-Cataloged Vulnerabilities

Page 3: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

3

Simple Buffer Overflow Examplevoid foo(){ int local_variables; int buffer[256]; … buffer = read_input(); … return;}

Return address

Local variables

buffer

Buffer Fill

New Return address

Bad Local variables

If read_input() reads 200 ints If read_input() reads >256 ints

buffer

Page 4: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

4

Concurrency Bugs Also Matter

if(ptr == NULL) { len=thread_local->mylen; ptr=malloc(len); memcpy(ptr, data, len);}

Thread 2mylen=large

Nov. 2010 OpenSSL Security Flaw

Thread 1mylen=small

Page 5: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

5

Concurrency Bugs Also Matter

if(ptr==NULL)if(ptr==NULL)

memcpy(ptr, data2, len2)

ptrLEAKED

TIM

E

Thread 2mylen=large

Thread 1mylen=small

len2=thread_local->mylen;ptr=malloc(len2);

len1=thread_local->mylen;ptr=malloc(len1);

memcpy(ptr, data1, len1)

Page 6: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

6

One Layer of a Solution High quality dynamic software analysis

Find difficult bugs that other analyses miss

Distribute Tests to Large Populations Low overhead or users get angry

Accomplished by sampling the analyses Each user only tests part of the program

Page 7: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

7

Dynamic Dataflow Analysis

Associate meta-data with program values

Propagate/Clear meta-data while executing

Check meta-data for safety & correctness

Forms dataflows of meta/shadow information

Page 8: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

8

a += y z = y * 75

y = x * 1024 w = x + 42 Check wCheck w

Example Dynamic Dataflow Analysis

validate(x)x = read_input() Clear

a += y z = y * 75

y = x * 1024

x = read_input()

Propagate

Associate

Input

Check aCheck a

Check zCheck z

Data

Meta-data

Page 9: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

9

Distributed Dynamic Dataflow Analysis Split analysis across large populations

Observe more runtime states Report problems developer never thought to test

Instrumented Program Potential

problems

Page 10: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

10

Taint Analysis(e.g.TaintCheck)

Dynamic Bounds Checking

FP Accuracy Verification

Symbolic Execution

Data Race Detection(e.g. Helgrind)

Memory Checking(e.g. Dr. Memory)

Problem: DDAs are Slow

10-200x2-200x

10-80x

5-50x100-500x

2-300x

Page 11: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

One Solution: Sampling

11

Lower overheads by skipping some analyses

0

25

50

75

100

Overhead

Idea

lD

etec

tion

Acc

urac

y (%

)

CompleteAnalysis

NoAnalysis

Page 12: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

Sampling Allows Distribution

12

0

25

50

75

100

Overhead

Idea

lD

etec

tion

Acc

urac

y (%

)

Developer

Beta TestersEnd Users

Many users testing at little overhead see more errors than

one user at high overhead.

Page 13: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

13

a += y z = y * 75

y = x * 1024 w = x + 42

Cannot Naïvely Sample Code

Validate(x)x = read_input()

a += y

y = x * 1024

False Positive

False Negative

w = x + 42

validate(x)x = read_input() Skip Instr.

Input

Check wCheck w

Check zCheck zSkip Instr.

Check aCheck a

Page 14: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

14

Sampling must be aware of meta-data

Remove meta-data from skipped dataflows Prevents false positives

Sample Data, not Code

Page 15: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

15

a += y z = y * 75

y = x * 1024 w = x + 42

Dataflow Sampling Example

validate(x)x = read_input()

a += y

y = x * 1024

False Negative

x = read_input() Skip Dataflow

Input

Check wCheck w

Check zCheck z

Skip Dataflow

Check aCheck a

Page 16: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

16

Mechanisms for Dataflow Sampling (1) Start with demand analysis

Demand Analysis Tool

NativeApplication

InstrumentedApplication

(e.g. Valgrind)

InstrumentedApplication

(e.g. Valgrind)

Shadow Value DetectorMeta-data

No meta-data

Page 17: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

17

Mechanisms for Dataflow Sampling (2) Remove dataflows if execution is too slow

Sampling Analysis Tool

InstrumentedApplication

NativeApplication

InstrumentedApplication

Shadow Value DetectorMeta-data

Clear meta-dataOH Threshold

Page 18: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

18

Prototype Setup Taint analysis sampling system

Network packets untrusted Xen-based demand analysis

Whole-system analysis with modified QEMU Overhead Manager (OHM) is user-controlled

Xen Hypervisor

OS and Applications

App App App…

Linux

ShadowPage Table

Admin VM

Taint Analysis QEMU

Net Stack OHM

Page 19: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

19

Benchmarks Performance – Network Throughput

Example: ssh_receive Accuracy of Sampling Analysis

Real-world Security Exploits

Name Error DescriptionApache Stack overflow in Apache Tomcat JK ConnectorEggdrop Stack overflow in Eggdrop IRC botLynx Stack overflow in Lynx web browserProFTPD Heap smashing attack on ProFTPD ServerSquid Heap smashing attack on Squid proxy server

Page 20: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

20

ssh_receive

Performance of Dataflow Sampling (1)

0 20 40 60 80 1000

5

10

15

20

25

Maximum % Time in Analysis

Thro

ughp

ut (M

B/s

) Throughput with no analysis

Page 21: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

21

netcat_receive

Performance of Dataflow Sampling (2)

0 20 40 60 80 1000

10

20

30

40

50

60

Maximum % Time in Analysis

Thro

ughp

ut (M

B/s

)

Throughput with no analysis

Page 22: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

22

ssh_transmit

Performance of Dataflow Sampling (3)

0 20 40 60 80 1000

5

10

15

20

25

Maximum % Time in Analysis

Thro

ughp

ut (M

B/s

)

Throughput with no analysis

Page 23: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

23

Accuracy at Very Low Overhead Max time in analysis: 1% every 10 seconds Always stop analysis after threshold

Lowest probability of detecting exploits

Name Chance of Detecting ExploitApache 100%Eggdrop 100%Lynx 100%ProFTPD 100%Squid 100%

Page 24: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

24

netcat_receive running with benchmark

Accuracy with Background Tasks

10% 25% 50% 75% 90%0

20

40

60

80

100

0.38 0.4

ApacheEggdropLynxProFTPDSquid

Maximum Allowed Overhead %

% C

hanc

e of

Det

ectin

g E

xplo

it

Page 25: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

25

Accuracy with Background Tasks ssh_receive running in background

10% 25% 50% 75% 90%0

20

40

60

80

100

0.1 0.7 2

ApacheEggdropLynxProFTPDSquid

Maximum % Time in Analysis

% C

hanc

e of

Det

ectin

g E

xplo

it

Page 26: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

26

Conclusion & Future WorkDynamic dataflow sampling gives users aknob to control accuracy vs. performance

Better methods of sample choices Combine static information New types of sampling analysis

Page 27: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

27

BACKUP SLIDES

Page 28: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

28

Outline

Software Errors and Security Dynamic Dataflow Analysis Sampling and Distributed Analysis Prototype System Performance and Accuracy

Page 29: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

29

Detecting Security Errors Static Analysis

Analyze source, formal reasoning+ Find all reachable, defined errors– Intractable, requires expert input,

no system state Dynamic Analysis

Observe and test runtime state+ Find deep errors as they happen– Only along traversed path,

very slow

Page 30: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

30

Width Test