Dynamic Analysis - Simon Fraser University

Dynamic Analysis

CMPT 745Software Engineering

Nick [email protected]

2

Dynamic Analysis

● Sometimes we want to study or adapt the behavior of executions of a program

3

Dynamic Analysis


– Did my program ever …?

4

Dynamic Analysis


– Did my program ever …?– Why/how did … happen?

5

Dynamic Analysis


– Did my program ever …?– Why/how did … happen?– Where am I spending time?

6

Dynamic Analysis


– Did my program ever …?– Why/how did … happen?– Where am I spending time?– Where might I parallelize?

7

Dynamic Analysis


– Did my program ever …?– Why/how did … happen?– Where am I spending time?– Where might I parallelize?– Tolerate errors

8

Dynamic Analysis


– Did my program ever …?– Why/how did … happen?– Where am I spending time?– Where might I parallelize?– Tolerate errors– Manage memory / resources.

9

e.g. Reverse Engineering

Static CFG (from e.g. Apple Fairplay):

This is the result of a control flow flattening obfuscaton.[http://tigress.cs.arizona.edu/transformPage/docs/flatten/]

http://tigress.cs.arizona.edu/transformPage/docs/flatten/

10

e.g. Reverse Engineering

Static CFG (from e.g. Apple Fairplay):

Dynamically Simplified CFG:

This is the result of a control flow flattening obfuscaton.[http://tigress.cs.arizona.edu/transformPage/docs/flatten/]

http://tigress.cs.arizona.edu/transformPage/docs/flatten/

11

How?

● Can record the execution

12

How?

● Can record the execution– Record to a trace– Analyze post mortem / offline– Scalability issues: need enough space to store it

13

How?


● Can perform analysis online

14

How?


● Can perform analysis online– Instrument the program to collect useful facts– Modified program invokes code to 'analyze' itself

15

How?



● Can do both!– Lightweight recording– Instrument a replayed instance of the execution

16

How?



● Can do both!– Lightweight recording– Instrument a replayed instance of the execution

Some analyses only make sense online.Why?

17

Simple Idea: Basic Block Profiling

Knowing where we are spending time is useful:

● Goal: Which basic blocks execute most frequently?

18



● Goal: Which basic blocks execute most frequently?

Profiling is a common dynamic analysis!

19



● Goal: Which basic blocks execute most frequently?● How can we modify our program to find this?

20




BB:0

BB:1 BB:2

BB:3

?

21




BB:0

BB:1 BB:2

BB:3

count[2] += 1

x = foo()y = bar()...

22




BB:0

BB:1 BB:2

BB:3

count[2] += 1

x = foo()y = bar()...

for i in BBs: count[i] = 0

for i in BBs: print(count[i])Start: End:

23




BB:0

BB:1 BB:2

BB:3

count[2] += 1

x = foo()y = bar()...

for i in BBs: count[i] = 0

for i in BBs: print(count[i])Start: End:

24


● Big concern: How efficient is it?– The more overhead added, the less practical the tool

25



count[0] += 1…

count[1] += 1…

count[5] += 1…

count[6] += 1…

count[4] += 1…

count[2] += 1…

count[3] += 1…

26



– Can we do better?

count[0] += 1…

count[1] += 1…

count[5] += 1…

count[6] += 1…

count[4] += 1…

count[2] += 1…

count[3] += 1…

27




count[0] += 1…

count[1] += 1…

count[5] += 1…

count[6] += 1…

count[4] += 1…

count[2] += 1…

count[3] += 1…

count[1] = count[4] = count[2] + count[3]

28




count[0] += 1…

count[1] += 1…

count[5] += 1…

count[6] += 1…

count[4] += 1…

count[2] += 1…

count[3] += 1…

count[1] = count[4] = count[2] + count[3]count[0] = count[6] = count[1] + count[5]

29




count[0] += 1…

count[1] += 1…

count[5] += 1…

count[6] += 1…

count[4] += 1…

count[2] += 1…

count[3] += 1…

count[1] = count[4] = count[2] + count[3]count[0] = count[6] = count[1] + count[5]

30

Efficiency Tactics

● Abstraction

31

Efficiency Tactics

● Abstraction

● Identify & avoid redundant information

32

Efficiency Tactics

● Abstraction


● Sampling

33

Efficiency Tactics

● Abstraction


● Sampling

● Compression / encoding

34

Efficiency Tactics

● Abstraction


● Sampling


● Profile guided instrumentation

35

Efficiency Tactics

● Abstraction


● Sampling



● Thread local analysis

36

Efficiency Tactics

● Abstraction


● Sampling



● Thread local analysis

● Inference

37

How / When to Instrument

● Source / IR Instrumentation– LLVM, CIL, Soot, Wala– During (re)compilation– Requires an analysis dedicated build

38



● Static Binary Rewriting– Diablo, DynamoRIO, SecondWrite, – Applies to arbitrary binaries– Imprecise IR info, but more complete binary behavior

39




● Dynamic Binary Instrumentation– Valgrind, Pin, Qemu (& other Vms)– Can adapt at runtime, but less info than IR

40




● Dynamic Binary Instrumentation– Valgrind, Pin, Qemu (& other Vms)– Can adapt at runtime, but less info than IR

● Black Box Dynamic Analysis

41

Phases of Dynamic Analysis

In general, 2-3 phases occur:

1) Instrumentation– Add code to the program for data collection/analysis

42




2) Execution– Run the program an analyze its actual behavior

43





3) (Optional) Postmortem Analysis– Perform any analysis that can be deferred after termination

44





3) (Optional) Postmortem Analysis– Perform any analysis that can be deferred after termination

Very, very common mistake to mix 1 & 2.

45

Static Instrumentation

1) Compile whole program to IR

foo.cbar.c

baz.cprog.ll

46



2) Instrument / add code directly to the IR

foo.cbar.c

baz.cprog.ll prog’.ll

47




3) Generate new program that performs analysis

foo.cbar.c


48




3) Generate new program that performs analysis

4) Execute

foo.cbar.c


Test Cases

Results

49

Dynamic Binary Instrumentation (DBI)

1) Compile program as usual

2) Run program under analysis framework

(Valgrind, PIN, Qemu, etc)

valgrind --tool=memcheck ./myBuggyProgram

50

Dynamic Binary Instrumentation (DBI)

1) Compile program as usual

2) Run program under analysis framework

(Valgrind, PIN, Qemu, etc)

3) Instrument & execute in same command:– Fetch & instrument each basic block individually– Execute each basic block

valgrind --tool=memcheck ./myBuggyProgram

Example: Test Case Reduction

52

Testing and Dynamic Analysis

● In some cases, just running a program with different inputs is enough

53


● In some cases, just running a program with different inputs is enough– Carefully selected inputs can target the analysis– The result of running the program reveals coarse information about its

behavior

54



behavior

● Intuitively, even just testing is a dynamic analysis– It requires no transformation– The result is just the success or failure of tests

55



behavior

● Intuitively, even just testing is a dynamic analysis– It requires no transformation– The result is just the success or failure of tests

● But even that is interesting to consider....

56

Bug reports are problematic

● Failing inputs can be large and complex

a r h w l n y e u m g k o w h > ` p

MB? GB?

57




MB? GB? What is relevant and essential to the bug?

58




a b cBug 2

a b cBug 3

a b cBug 1

59




a b cBug 2

a b cBug 3

a b cBug 1

1) Are these reports the same bug?2) Can we make it easier to reproduce?

60




a b cBug 2

a b cBug 3

a b cBug 1

a b cBug

61




a b cBug 2

a b cBug 3

a b cBug 1

a b cBug

1) Same? Yes!2) Easier? Yes! And easier to understand!

62




a b cBug 2

a b cBug 3

a b cBug 1

a b cBug

Test Case Reduction: finding smaller test cases that reproduce a failure

63

Classically – Delta Debugging

<SELECT NAME="priority" MULTIPLE SIZE=7>

http://en.wikipedia.org/wiki/File:Netscape_2_logo.gif

print

64


<SELECT NAME="priority" MULTIPLE SIZE=7>

http://en.wikipedia.org/wiki/File:Netscape_2_logo.gif

print

65


<SELECT NAME="priority" MULTIPLE SIZE=7>Intuition: trial and error

66


<SELECT NAME="priority" MULTIPLE SIZE=7> = cIntuition: trial and error1) Start w/ a failing text configuration c

67


<SELECT NAME="priority" MULTIPLE SIZE=7>Intuition: trial and error1) Start w/ a failing text configuration c2) Try removing subsets (Δ) of input elements ({δ})

68


<SELECT NAME="priority" MULTIPLE SIZE=7>Intuition: trial and error1) Start w/ a failing text configuration c2) Try removing subsets (Δ) of input elements ({δ})3) Failure still exists → new input is “better”

69


<SELECT NAME="priority" MULTIPLE SIZE=7>Intuition: trial and error1) Start w/ a failing text configuration c2) Try removing subsets (Δ) of input elements ({δ})3) Failure still exists → new input is “better”4) Repeat on the new input

70



When do we stop? / What is our goal?● Global Minimum: c : ∀ |c'|<|c|, c'

71




Smallest subset of the originalinput reproducing the failure

72




Smallest subset of the originalinput reproducing the failure

Completely impractical! Why?

73



When do we stop? / What is our goal?● Global Minimum: c : ∀ |c'|<|c|, c'● Local Minimum: c : ∀ c'⊂c, c'

74




No subset of the result canreproduce the failure.

75




No subset of the result canreproduce the failure.

How does this differ from a global minimum?Is it still problematic?

76



When do we stop? / What is our goal?● Global Minimum: c : ∀ |c'|<|c|, c'● Local Minimum: c : ∀ c'⊂c, c'● 1-Minimal: c: ∀ δ ∈ c, (c-{δ})

77



When do we stop? / What is our goal?● Global Minimum: c : ∀ |c'|<|c|, c'● Local Minimum: c : ∀ c'⊂c, c'● 1-Minimal: c: ∀ δ ∈ c, (c-{δ})

No one element can be removedand still reproduce the failure

78

Classically – Delta Debugging1 2 3 4 5 6 7 8

Does binary search work?

79

Classically – Delta Debugging1 2 3 4 5 6 7 81 2 3 4 5 6 7 8

80

Classically – Delta Debugging1 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 8

So what should we do?

81

Classically – Delta Debugging1 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 8

So what should we do?

We refine the granularity

82

Classically – Delta Debugging1 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 8

83

Classically – Delta Debugging1 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 8

84

Classically – Delta Debugging1 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 8

85

Classically – Delta Debugging1 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 8

86

Classically – Delta Debugging1 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 8

And now check complements

87

Classically – Delta Debugging1 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 8

88

Classically – Delta Debugging1 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 8

89

Classically – Delta Debugging1 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 5 6 7 8

90

Classically – Delta Debugging1 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 5 6 7 81 2 5 6 7 8

91

Classically – Delta Debugging1 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 5 6 7 81 2 5 6 7 81 2 5 6 7 8

What's clever about how we recurse?

92

Classically – Delta Debugging1 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 5 6 7 81 2 5 6 7 81 2 5 6 7 81 2 5 6 7 81 2 5 6 7 8

93

Classically – Delta Debugging1 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 5 6 7 81 2 5 6 7 81 2 5 6 7 81 2 5 6 7 81 2 5 6 7 81 2 7 81 2 7 8 So close! How many more?

94

Classically – Delta Debugging1 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 5 6 7 81 2 5 6 7 81 2 5 6 7 81 2 5 6 7 81 2 5 6 7 81 2 7 81 2 7 8

1 2 7 81 2 7 81 2 7 81 2 7 81 2 7 81 2 7 8

Done?

95

Classically – Delta Debugging1 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 3 4 5 6 7 81 2 5 6 7 81 2 5 6 7 81 2 5 6 7 81 2 5 6 7 81 2 5 6 7 81 2 7 81 2 7 8

1 2 7 81 2 7 81 2 7 81 2 7 81 2 7 81 2 7 81 7 81 7 81 7 81 7 81 7 81 7 8

Done?

96


1) Test case to minimize

1 2 3 4 5 6 7 8c =

ddmin(c) = ddmin2(c, 2)

97


1 2 3 4 5 6 7 8c =

1) Test case to minimize2) Granularity (|Δ|=|c|/n)


98



1 2 3 4 5 6 7 8c = Δ = 4

Δ1 Δ2 Δ3 Δ4


99



1 2 3 4 5 6 7 8c = Δ = 4

Δ1 Δ2 Δ3 Δ4

∇1


100



ddmin2(c, 2)=


101


ddmin(c) = ddmin2(c, 2) 1) Test case to minimize2) Granularity (|Δ|=|c|/n)

ddmin2(c, 2)=

ddmin2(Δi, 2) If ... (a)

Δi = {3,4} (a) 1 2 3 4 5 6 7 8

102



ddmin2(c, 2)=

ddmin2(Δi, 2)

ddmin2(∇i, max(n-1,2))

1 2 3 4 5 6 7 8

If ... (a)If ... (b)

Δi = {3,4} (a)(b)

1 2 3 4 5 6 7 8

103



ddmin2(c, 2)=

ddmin2(Δi, 2)

ddmin2(∇i, max(n-1,2))

1 2 3 4 5 6 7 8

If ... (a)If ... (b)

Δi = {3,4} (a)(b)

1 2 3 4 5 6 7 8

104



ddmin2(c, 2)=

ddmin2(Δi, 2)

ddmin2(∇i, max(n-1,2))ddmin2(c, min(|c|,2n))

1 2 3 4 5 6 7 8

If ... (a)If ... (b)If ... (c)

Δi = {3,4} (a)(b)(c) n < |c|

1 2 3 4 5 6 7 8

105



ddmin2(c, 2)=

ddmin2(Δi, 2)

ddmin2(∇i, max(n-1,2))ddmin2(c, min(|c|,2n))

1 2 3 4 5 6 7 8

If ... (a)If ... (b)If ... (c)

Δi = {3,4} (a)(b)(c) n < |c|

1 2 3 4 5 6 7 8

106



ddmin2(c, 2)=

ddmin2(Δi, 2)

ddmin2(∇i, max(n-1,2))ddmin2(c, min(|c|,2n))c

1 2 3 4 5 6 7 8

If ... (a)If ... (b)If ... (c)otherwise

Δi = {3,4} (a)(b)(c) n < |c|

1 2 3 4 5 6 7 8

107


● Worst Case: |c|2 + 3|c| tests– All tests unresolved until maximum granularity– Testing complement succeeds

108



● Best Case: # tests ≤ 2log2(|c|)– Falling back to binary search!

109




● Minimality– When will it only be locally minimal?– When will it only be 1-minimal?

110




● Minimality– When will it only be locally minimal?– When will it only be 1-minimal? – Does formal minimality even matter?

111


● Observation:In practice DD may revisit elements in order to guarantee minimality

112



ddmin2(∇i, max(n-1,2))1 2 3 4 5 6 7 8

113



1 2 3 4 5 6 7 8

1 2 5 6 7 8

...ddmin2(∇i, max(n-1,2))

1 2 3 4 5 6 7 8

114



● If guaranteeing 1-minimality does not matter the algorithm can drop to linear time!– In practice this can be effective for what developers may care about

1 2 3 4 5 6 7 8

1 2 5 6 7 8


1 2 3 4 5 6 7 8

115



● If guaranteeing 1-minimality does not matter the algorithm can drop to linear time!– In practice this can be effective for what developers may care about

1 2 3 4 5 6 7 8

1 2 5 6 7 8


1 2 3 4 5 6 7 8

Don’t get bogged down by formalismwhen it doesn’t serve you!

116

Test Case Reduction in Practice

● Most problems do not use DD directly for TCR.– It provides guidance, but frequently behaves poorly

117



● What are the possible causes of problems?

1 2 3 41 2 3 41 2 3 4

Monotonicitymatters

118




1 2 3 41 2 3 41 2 3 4

Monotonicitymatters

1 2 3 41 2 3 4

Determinismmatters

119




1 2 3 41 2 3 41 2 3 4

Monotonicitymatters

1 2 3 41 2 3 4

Determinismmatters Structure

matters

LOOP

i

RANGE

ASSIGN

5 10

INDEX EXPR

a i i 5for in [ ] = *

120

Test Case Reduction for Compilers

● Programs are highly structured, so TCR for compilers faces challenges

121



● What structures could we use to guide the process?

122




LOOP

i

RANGE

ASSIGN

5 10

INDEX EXPR

a i i 5for in [ ] = *

123




● What challenges still remain?

LOOP

i

RANGE

ASSIGN

5 10

INDEX EXPR

a i i 5for in [ ] = *

124

Generalizing Further

● What else could we think of as test case reduction?

125


● What else could we think of as test case reduction?– Failing traces of a program?– “ ” in a distributed system?– “ ” microservice application?

126


● What else could we think of as test case reduction?– Failing traces of a program?– “ ” in a distributed system?– “ ” microservice application?– Automatically generated test cases?

127


● What else could we think of as test case reduction?– Failing traces of a program?– “ ” in a distributed system?– “ ” microservice application?– Automatically generated test cases?– ...

128



● The ability to treat the program as an oracle is also very powerful

129



● The ability to treat the program as an oracle is also very powerful– We can get new data by running the program– This can be combined with reinforcement learning to accomplish hard tasks

130



● The ability to treat the program as an oracle is also very powerful– We can get new data by running the program– This can be combined with reinforcement learning to accomplish hard tasks– We saw this before when discussing test suites!

Example: Memory Safety Bugs

132

Example: Finding memory safety bugs

● Memory safety bugs are one of the most common sources ofsecurity vulnerabilities

133



● Effects may only be visible long after invalid behavior– This complicates comprehension & debugging

134




● Two main types of issues:– Spatial – Out of bounds stack/heap/global accesses– Temporal – Use after free

135




● Two main types of issues:– Spatial – Out of bounds stack/heap/global accesses– Temporal – Use after free

● We would like to automatically identify & provide assistancewith high precision and low overhead– Suitable for testing & sometimes maybe deployment!

136


● Most common approach – track which regions of memory are valid– During execution!– Updated when new memory is allocated– Checked when pointers are accessed– With low overhead

137



● Common implementations– Valgrind – DBI based– AddressSanitizer – static instrumentation based

138



● Common implementations– Valgrind – DBI based– AddressSanitizer – static instrumentation based

Note, the implementation style affects which bugs can be recognized!

Why?

139

AddressSanitizer

● Need to track which memory is valid & check efficiently...

● Big Picture:– Replace calls to malloc & free

140

AddressSanitizer


● Big Picture:– Replace calls to malloc & free– Poison memory: (create red zones)

141

AddressSanitizer



1) around malloced chunks

ptr = malloc(sizeof(MyStruct));

ptr

142

AddressSanitizer



1) around malloced chunks2) when it is freed

free(ptr);

ptr

143

AddressSanitizer



1) around malloced chunks2) when it is freed3) around buffers and local variables

void foo() { int buffer[5]; ...}

buffer[0]

buffer[6]

144

AddressSanitizer




– Access of poisoned memory causes an error

145

AddressSanitizer





*address = ...;

instrumentation ?

146

AddressSanitizer





*address = ...; If (IsPoisoned(address, size)) { ReportError(address, size, isWrite);}*address = ...

instrumentation

147

AddressSanitizer





● The tricky part is tracking & efficiently checking redzones.

148

AddressSanitizer





● The tricky part is tracking & efficiently checking redzones.– Instrumenting every memory access is costly!

149

AddressSanitizer





● The tricky part is tracking & efficiently checking redzones.– Instrumenting every memory access is costly!– We must track all memory ... inside that same memory!

150

AddressSanitizer





● The tricky part is tracking & efficiently checking redzones.– Instrumenting every memory access is costly!– We must track all memory ... inside that same memory!

This kind of issue is common in dynamic analyses.

151

AddressSanitizer – Shadow Memory

Application Memory

Need to know whether any byte of application memory is poisoned.

152


Application Memory Shadow Memory

● We maintain 2 views on memory

153




Shadow memory of theshadow memory!

154




● Shadow Memory is pervasive in dynamic analysis– Can be thought of as a map containing analysis data

155




● Shadow Memory is pervasive in dynamic analysis– Can be thought of as a map containing analysis data– For every bit/byte/word/chunk/allocation/page,

maintain information in a compact table

156





maintain information in a compact table

Where have you encountered this before?

157





maintain information in a compact table– Common in runtime support, e.g. page tables

158


● Designing efficient analyses (& shadow memory) often requires a careful domain insight

Encoding & abstraction efficiency strategies

159



● NOTE: Heap allocated regions are N byte aligned (N usually 8)

160



● NOTE: Heap allocated regions are N byte aligned (N usually 8)– In an N byte region, only the first k may be addressable

k

161



● NOTE: Heap allocated regions are N byte aligned (N usually 8)– In an N byte region, only the first k may be addressable– Every N bytes has only N+1 possible states

162



● NOTE: Heap allocated regions are N byte aligned (N usually 8)– In an N byte region, only the first k may be addressable– Every N bytes has only N+1 possible states– Map every N bytes to 1 shadow byte encoding state as a number

4

6

7

0

5

3

-1

1

2

163




All good = 0 All bad = -1 Partly good = # good

4

6

7

0

5

3

-1

1

2

164




● What does accessing shadow memory for an address look like? (N=8)

165




● What does accessing shadow memory for an address look like? (N=8)– Preallocate a large table

166




● What does accessing shadow memory for an address look like? (N=8)– Preallocate a large table– Shadow = (address >> 3) + Offset

167




● What does accessing shadow memory for an address look like? (N=8)– Preallocate a large table– Shadow = (address >> 3) + Offset– With PIE, Shadow = (address >> 3)

168





if (*(address>>3)) { ReportError(...);}*address = ...

169





if (*(address>>3)) { ReportError(...);}*address = ...

Now you can also see the reason for the numerical encoding....

170


shadow = address >> 3state = *shadowif (state != 0 && (state < (address & 7) + size)) { ReportError(...);}*address = ...

● Handling accesses of size < N (N=8)

171


shadow = address >> 3state = *shadowif (state != 0 && (state < (address & 7) + size)) { ReportError(...);}*address = ...

● Handling accesses of size < N (N=8)

Careful construction of states can make runtime checks efficient

172

AddressSanitizer - Evaluating

● In dynamic analyses, we care about both overheads & result quality

173



● Overheads– Need to determine what resources are being consumed

174



● Overheads– Need to determine what resources are being consumed– Memory –

Shadow memory capacity is cheap, but accessed shadows matter

175




Shadow memory capacity is cheap, but accessed shadows matter– Running time –

Can effectively be free for I/O bound projectsUp to 25x overheads on some benchmarks

176




Shadow memory capacity is cheap, but accessed shadows matter– Running time –

Can effectively be free for I/O bound projectsUp to 25x overheads on some benchmarks

● Quality– Precision & recall matter

Where will it miss bugs?Where will it raise false alarms?

177


● False negatives– Computed pointers that are accidentally in bounds

178


● False negatives– Computed pointers that are accidentally in bounds– Unaligned accesses that are partially out of bounds

179


● False negatives– Computed pointers that are accidentally in bounds– Unaligned accesses that are partially out of bounds– Use after frees with significant churn

Example: Comparing Executions

181

Why compare traces or executions?

● Understanding the differences between two executions(& how some differences cause others)can help explain program behavior

182



● Several tasks could be made simpler by trace comparison

183



● Several tasks could be made simpler by trace comparison– Debugging regressions – old vs new

184



● Several tasks could be made simpler by trace comparison– Debugging regressions – old vs new– Validating patches – old vs new

185



● Several tasks could be made simpler by trace comparison– Debugging regressions – old vs new– Validating patches – old vs new– Understanding automated repair – old vs new

186



● Several tasks could be made simpler by trace comparison– Debugging regressions – old vs new– Validating patches – old vs new– Understanding automated repair – old vs new– Debugging with concurrency – buggy vs nonbuggy schedules

187



● Several tasks could be made simpler by trace comparison– Debugging regressions – old vs new– Validating patches – old vs new– Understanding automated repair – old vs new– Debugging with concurrency – buggy vs nonbuggy schedules– Malware analysis – malicious vs nonmalicious run

188



● Several tasks could be made simpler by trace comparison– Debugging regressions – old vs new– Validating patches – old vs new– Understanding automated repair – old vs new– Debugging with concurrency – buggy vs nonbuggy schedules– Malware analysis – malicious vs nonmalicious run– Reverse engineering – desired behavior vs undesirable

189

How it might look

Correct Buggy

190

How it might look

x was 5 instead of 3

Correct Buggy

191

How it might look


So y was 2 instead of 7

Correct Buggy

192

How it might look



So the TRUE branch executedinstead of the FALSE branch

Correct Buggy

193

How it might look



So the TRUE branch executedinstead of the FALSE branchSo the update of z was skipped

Correct Buggy

194

How it might look




So the incorrect value of z was printed

Correct Buggy

195

How it might look





Correct Buggy What do we need?● locations● state● flow● causation

196

How it might look






197

How it might look






198

How it might look






199

How it might look






200

How it might look






We can construct this backwardfrom a point of failure/difference

201

So why not just...

● Traces can be viewed as sequences....– Why not just do LCS based sequence alignment?

202

So why not just...


def foo(int c): if c: while bar(): ...

foo(...)baz(...)foo(...)

203

So why not just...



foo(...)baz(...)foo(...)

foo()

baz()

foo()

204

So why not just...



foo(...)baz(...)foo(...)

foo()

baz()

foo()

foo()

baz()

foo()

205

So why not just...



foo(...)baz(...)foo(...)

foo()

baz()

foo()

foo()

baz()

foo()

What is marked as different?

206

foo()

foo()

So why not just...



foo(...)baz(...)foo(...)

foo()

baz()

baz()

foo()


207

baz()

baz()

So why not just...



foo(...)baz(...)foo(...)

foo()

foo()

foo()

foo()


What is intuitively different?

208

baz()

baz()

So why not just...



foo(...)baz(...)foo(...)

foo()

foo()

foo()

foo()



209

baz()

baz()

So why not just...



foo(...)baz(...)foo(...)

foo()

foo()

foo()

foo()



Execution comparison mustaccount for what a program

means and does!

210

The big picture

● Fundamentally, execution comparison needs to account for

211

The big picture

● Fundamentally, execution comparison needs to account for– Structure – How is a program organized?

212

The big picture

● Fundamentally, execution comparison needs to account for– Structure – How is a program organized?– Value – What are the values in the different executions?

213

The big picture

● Fundamentally, execution comparison needs to account for– Structure – How is a program organized?– Value – What are the values in the different executions?– Semantics – How is the meaning of the program affected by the differences?

214

The big picture

● Fundamentally, execution comparison needs to account for– Structure– Value– Semantics

● We can attack these through

215

The big picture


● We can attack these through– Temporal alignment

● What parts of the trace correspond?

216

The big picture




– Spatial alignment● What variables & values correspond across traces?

217

The big picture





– Slicing● How do differences transitively flow through a program?

218

The big picture





– Slicing● How do differences transitively flow through a program?

– Causality testing● Which differences actually induce difference behavior?

219

Temporal Alignment

● Given i1 in T1 and i2 in T2,– when should we say that they correspond? [Xin, PLDI 2008][Sumner, ASE 2013]

– how can we compute such relations?

i2

i1

?

220

Temporal Alignment



● In the simplest case T1 and T2 may follow the same path[Mellor-Crummey, ASPLOS 1989]

i2

i1

?

221

Temporal Alignment




foo()

i2

i1

?

222

Temporal Alignment




foo()

i2

i1

?

223

Temporal Alignment




foo()

i2

i1

?

224

Temporal Alignment




foo()

i2

i1

?

225

Temporal Alignment




foo()

i2

i1

?

226

Temporal Alignment




foo()

i2

i1

?

227

Temporal Alignment




foo()

i2

i1

?

228

Temporal Alignment




foo()

i2

i1

?

229

Temporal Alignment




foo()Position along a path can be maintained via a counter

i2

i1

?

230

Temporal Alignment




foo()Position along a path can be maintained via a counter

Only need to increment along1) back edges2) function callsi

2

i1

?

231

Temporal Alignment




● Suppose that we know the programs are acyclic?

i2

i1

?

232

Temporal Alignment





i2

i1

?

233

Temporal Alignment





The position in the DAG relates the paths

i2

i1

?

234

Temporal Alignment





● Now consider the case where cycles can occur... [Sumner, ASE 2013]

How can we extend the acyclic case?

i2

i1

?

235

Temporal Alignment







We can unwind the loop to make it logically acyclic

i2

i1

?

236

Temporal Alignment







i2

i1

?

237

Temporal Alignment







i2

i1

?

238

Temporal Alignment







...

i2

i1

?

239

Temporal Alignment







...

These are different iterations of one loop.A counter for each active loop suffices (mostly).

i2

i1

?

240

Temporal Alignment







...

1 counter per active loop+ the call stack disambiguates!

i2

i1

?

241

Temporal Alignment






– Can we efficiently represent this?

i2

i1

?

242

Temporal Alignment







Call stack/context Iteration stack Instruction IDi2

i1

?

243

Temporal Alignment







Call stack/context Iteration stack Instruction ID

Can be encoded/inferred Can be inferred

i2

i1

?

244

Spatial Alignment

● We must also ask what it means to compare program state across executions

245

Spatial Alignment

● We must also ask what it means to compare program state across executions– How can we compare two integers X and Y?

3 != 5

246

Spatial Alignment

● We must also ask what it means to compare program state across executions– How can we compare two integers X and Y?– How can we compare two pointers A and B?

0xdeadbeef in T1 = 0xcafef00d in T2?

247

Spatial Alignment

● We must also ask what it means to compare program state across executions– How can we compare two integers X and Y?– How can we compare two pointers A and B?

0xdeadbeef in T1 = 0xcafef00d in T2?

If you allocated other stuff in only one run,this can be true even without ASLR!

248

Spatial Alignment

● We must also ask what it means to compare program state across executions– How can we compare two integers X and Y?– How can we compare two pointers A and B?– How can we compare allocated regions on the heap?

Should they even be compared?!?

249

Spatial Alignment



● In practice, comparing state across executions requires comparing memory graphs– We need a way to identify corresponding nodes (state elements)

250

Spatial Alignment




A B C

A BT1

T2

251

Spatial Alignment




A B C

A BT1

T2

What are the differences?

252

Spatial Alignment




A B C

A BT1

T2

What are the differences?1) list.append(value++)2) if c:3) list.append(value++)4) list.append(value++)

253

Spatial Alignment




A B C

A BT1

T2


1

1

254

Spatial Alignment




A B C

A BT1

T2


1

1 3

255

Spatial Alignment




A B C

A BT1

T2


1

1 3

4

4

256

Spatial Alignment




A B C

A BT1

T2


1

1 3

4

4

257

Spatial Alignment




A B C

A BT1

T2


1

1 3

4

4

258

Spatial Alignment




● Again, the semantics of the program dictate the solution– Identify heap allocations by the aligned time of allocation

A B C

A BT1

T2

1

1 3

4

4

259

Dual Slicing

● Now we can– Identify corresponding times across executions– Identify & compare corresponding state at those times

260

Dual Slicing


● We can use these to enhance dynamic slicing by being aware of differences! (called dual slicing)

261

Dual Slicing


● We can use these to enhance dynamic slicing by being aware of differences! (called dual slicing)– Based on classic dynamic slicing– Include transitive dependencies that differ or exist in only 1 execution

262

Dual Slicing



1)x 1←

2)y 1←

3)print(x+y)

1)x 0←

2)y 1←

3)print(x+y)3

2

1

3

2

1

0

1 1

1

263

Dual Slicing



1)x 1←

2)y 1←

3)print(x+y)

1)x 0←

2)y 1←

3)print(x+y)3

2

1

3

2

1

0 1 1

1

264

Dual Slicing



1)x 1←

2)y 1←

3)print(x+y)

1)x 0←

2)y 1←

3)print(x+y)3

2

1

3

2

1

3

1

0

1 1

10 1

265

Dual Slicing

● The differences in dependencies capture multiple kinds of information

266

Dual Slicing

● The differences in dependencies capture multiple kinds of information– Value-only differences

1

0 2

1

1

3

2

3

267

Dual Slicing

● The differences in dependencies capture multiple kinds of information– Value-only differences– Provenance/Source differences

1

0 2

1

13

2

3

268

Dual Slicing

● The differences in dependencies capture multiple kinds of information– Value-only differences– Provenance/Source differences– Missing/Extra behavior

4

1

0

4

2

1

13

269

Dual Slicing


● Recall: Dynamic slicing could not handle execution omission,but dual slicing can!

270

Dual Slicing


● Recall: Dynamic slicing could not handle execution omission,but dual slicing can!

● Dual slices can be effective for concurrent debugging & exploit analysis[Weeratunge, ISSTA 2010][Johnson, S&P 2011]

271

Adding Causation

● Now we can produce explanations exactly like our example!– Can answer “Why” and “Why not” questions about behavior & differences

[Ko, ICSE 2008]

Correct Buggy

272

Adding Causation


[Ko, ICSE 2008]

– But they may still contain extra information/noise...

273

Adding Causation


[Ko, ICSE 2008]


1) x = ...2) y = ...3) if x + y > 0:4) z = 05) else:6) z = 17) print(z)

274

Adding Causation


[Ko, ICSE 2008]



Correct

x = 10y = -1Truez = 0

“0”

275

Adding Causation


[Ko, ICSE 2008]



Correct

x = 10 x = 0y = -1 y = -2True Falsez = 0

z = 1“0” “1”

Buggy

276

Adding Causation


[Ko, ICSE 2008]



11

33

77

4

Correct Buggy

6

2 2

Correct

x = 10 x = 0y = -1 y = -2True Falsez = 0

z = 1“0” “1”

Buggy

277

Adding Causation


[Ko, ICSE 2008]



11

33

77

4

Correct Buggy

6

2 2

Correct

x = 10 x = 0y = -1 y = -2True Falsez = 0

z = 1“0” “1”

Buggy

278

Adding Causation


[Ko, ICSE 2008]



11

33

77

4

Correct Buggy

6

2 2

Correct

x = 10 x = 0y = -1 y = -2True Falsez = 0

z = 1“0” “1”

Buggy

279

Adding Causation


[Ko, ICSE 2008]



11

33

77

4

Correct Buggy

6

2 2

Correct

x = 10 x = 0y = -1 y = -2True Falsez = 0

z = 1“0” “1”

Buggy

280

Adding Causation


[Ko, ICSE 2008]



11

33

77

4

Correct Buggy

6

2 2

Correct

x = 10 x = 0y = -1 y = -2True Falsez = 0

z = 1“0” “1”

Buggy

281

Adding Causation


[Ko, ICSE 2008]



11

33

77

4

Correct Buggy

6

2 2

Correct

x = 10 x = 0y = -1 y = -2True Falsez = 0

z = 1“0” “1”

BuggyDual slicing captures differences, not causes.

What does that mean here?

282

Adding Causation


[Ko, ICSE 2008]



11

33

77

4

Correct Buggy

6

2 2

Correct

x = 10 x = 0y = -1 y = -2True Falsez = 0

z = 1“0” “1”

Buggy

283

Adding Causation


[Ko, ICSE 2008]



11

33

77

4

Correct Buggy

6

2 2

Correct

x = 10 x = 0y = -1 y = -2True Falsez = 0

z = 1“0” “1”

Buggy

The cost of these extra edges is high in practice!All transitive dependencies...

284

Adding Causation

● So what would we want an explanation to contain?– This is an area with unsolved problems & open research

285

Adding Causation

● So what would we want an explanation to contain?– This is an area with unsolved problems & open research– What does it mean for one explanation to be better than another?

1

3

7

4

E1

2

1

3

7

4

E2

286

Adding Causation


● There are several things we could consider– In general, simpler explanations are preferred 1

3

7

4

E1

2

1

3

7

4

E2

287

Adding Causation


● There are several things we could consider– In general, simpler explanations are preferred– Minimize the “# steps”?

1

3

7

4

E1

2

1

3

7

4

E2

288

Adding Causation


● There are several things we could consider– In general, simpler explanations are preferred– Minimize the “# steps”?– Minimize the “# dependencies”?

1

3

7

4

E1

2

1

3

7

4

E2

289

Adding Causation


● There are several things we could consider– In general, simpler explanations are preferred– Minimize the “# steps”?– Minimize the “# dependencies”?

1

3

7

4

E1

2

1

3

7

4

E2

What big challenges do you see withthese 2 approaches?

290

Adding Causation


● There are several things we could consider– In general, simpler explanations are preferred– Minimize the “# steps”?– Minimize the “# dependencies”?– Minimize the “# local dependencies”?

1

3

7

4

E1

2

1

3

7

4

E2

291

Adding Causation



1

3

7

4

E1

2

1

3

7

4

E2

argmins∈sd i|sd|

∧E1[ sd (E2)i ]→sd (E2)i+1

∧E2[ sd (E1)i ]→sd (E1)i+1

292

Adding Causation



1

3

7

4

E1

2

1

3

7

4

E2

argmins∈sd i|sd|

∧E1[ sd (E2)i ]→sd (E2)i+1

∧E2[ sd (E1)i ]→sd (E1)i+1

293

Adding Causation



1

3

7

4

E1

2

1

3

7

4

E2

294

Adding Causation



1

7

E1

2

3

4

1

3

7

4

E2

295

Adding Causation



1

7

E1

2

3

4

1

3

7

4

E2

296

Adding Causation



1

3

7

4

E1

1

3

7

4

E2

297

Adding Causation



● There are currently unknown trade offs betweentractability, intuitiveness, and correctness

1

3

7

4

E1

2

1

3

7

4

E2

298

Adding Causation



● There are currently unknown trade offs betweentractability, intuitiveness, and correctness

1

3

7

4

E1

2

1

3

7

4

E2

Even local blame is actually challenging

299

Adding Causation

● Causation is often framed via “alternate worlds” & “what if” questions...– We can answer these causality questions by running experiments!

What Should We Blame?

?

Trial


x = 5y = 4z = 3

x = 5y = 4z = 1

y = 4

?

Trial


x = 5y = 4z = 3

x = 5y = 4z = 1

y = 4

?

Trial

What does this patched run even mean?

x ← 1y ← 3z ← 6if False:

else: y ← 4print(4)

Example – Altered Meaning

1)x ← input()2)y ← input()3)z ← input()4)if y+z > 10:5) y ← 56)else: y ← y+17)print(y)



CorrectBuggy


What should we blame here?






CorrectBuggy







CorrectBuggy Trial



x ← 0y ← 7z ← 3





CorrectBuggy Trial




else: y←8print(8)





CorrectBuggy Trial



y ← 7





CorrectBuggy Trial



x ← 1y ← 7z ← 6





CorrectBuggy Trial



x ← 1y ← 7z ← 6if True: y ← 5

print(5)





CorrectBuggy Trial

x ← 1y ← 7z ← 6if True: y ← 5

print(5)



● New control flow unlike original runs

● Occurs in large portion of real bugs





CorrectBuggy Trial

Dual Slicing


76

2

2)y ← input()6)y ← y+17)print(y)

Extract





CorrectBuggy

y ← 3y ← 4print(4)

Example – Extracted Meaning

y ← 7y ← 8print(8)


CorrectBuggy Trial

y ← 7 y ← 7y ← 4print(4)


y ← 7y ← 8print(8)


CorrectBuggy Trial

y ← 8print(8)

y ← 7 y ← 7y ← 4print(4)


y ← 7y ← 8print(8)


Trial can now correctly blame y

CorrectBuggy Trial

318

Adding Causation


● We perform these causality tests in both directions in order to collect symmetric information

319

Adding Causation


● We perform these causality tests in both directions in order to collect symmetric information– How did the buggy run behave differently than the correct one?– How did the correct run behave differently than the buggy one?– These questions do not have the same answer!

320

Adding Causation



● In practice, there are additional issues,and even defining causation in this context needs further research.

321

Adding Causation



● In practice, there are additional issues,and even defining causation in this context needs further research.– Did we want to blame only y in the example?– Pruning blame on y is necessary in many real cases, can they be refined?– Can it be done without execution? With a stronger statistical basis?

Summing Up

323

Key Challenges

● Identifying the information you care about– Dynamic dependence? Valid memory? Just the execution outcome?

324

Key Challenges


● Collecting that information efficiently– abstraction, encoding, compression, sampling, ...

325

Key Challenges



● Selecting which executions to analyze– Existing test suite? Always on runtime? Directed test generation?– How does underapproximation affect your conclusions?– Can you still achieve your objective in spite of it?

326

Key Challenges



● Selecting which executions to analyze– Existing test suite? Always on runtime? Directed test generation?– How does underapproximation affect your conclusions?– Can you still achieve your objective in spite of it?

● Doing some of the work ahead of time– What can you precompute to improve all of the above?

327

Summary

● Analyze the actual/observed behaviors of a program

328

Summary


● Modify or use the program’s behavior to collect information

329

Summary



● Analyze the information online or offline

330

Summary



● Analyze the information online or offline

● The precise configuration must be tailored to the objectives & insights– Compiled vs DBI– Online vs Postmortem– Compressed, Encoded, Samples, ...– ...

Dynamic Analysis - Simon Fraser University

Documents

Transcript of Dynamic Analysis - Simon Fraser University