Safe and Efficient Instrumentation

28
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat

description

Safe and Efficient Instrumentation. Andrew Bernat. Binary Instrumentation. Instrumentation modifies the original code Moves original code Allocates new memory Overwrites original code This affects the behavior of: Moved code Code that references moved code - PowerPoint PPT Presentation

Transcript of Safe and Efficient Instrumentation

Page 1: Safe and Efficient Instrumentation

Paradyn Project

Paradyn / Dyninst WeekMadison, WisconsinApril 12-14, 2010

Paradyn Project

Safe and Efficient Instrumentation

Andrew Bernat

Page 2: Safe and Efficient Instrumentation

Binary Instrumentation

2Safe and Efficient Instrumentation

• Instrumentation modifies the original code• Moves original code• Allocates new memory• Overwrites original code

• This affects the behavior of:• Moved code• Code that references moved code• Code that references changed memory

Page 3: Safe and Efficient Instrumentation

Sensitivity Models• A program is sensitive to a particular

modification if that modification changes the program’s behavior

• Current binary instrumenters rely on fixed sensitivity models

• Compensating for sensitivity imposes overhead

3Safe and Efficient Instrumentation

pop %eaxcall addr_translatejmp %eax

ret

Page 4: Safe and Efficient Instrumentation

Safe and Efficient

Approach

Safe and Efficient

Approach

Efficiency vs Sensitivity

4Safe and Efficient Instrumentation

Sensitivity Malware

Optimized Code

Conventional Code

Efficiency

Pin, Valgrind, …

Dyninst

Safe and Efficient

Approach

Page 5: Safe and Efficient Instrumentation

How do we do this?• Formal model of code relocation• Visible behavior• Instruction sensitivity• External sensitivity

• Implementation in Dyninst• Analysis phase• Transformation phase

• Performance Results

5Safe and Efficient Instrumentation

Page 6: Safe and Efficient Instrumentation

Three Questions

• What program behavior do we wish to preserve?

• How does modification affect instructions?

• How do instructions change program behavior?

6Safe and Efficient Instrumentation

Page 7: Safe and Efficient Instrumentation

Approach• Preserve visible behavior• Relationship of input to output

• Identify sensitive instructions• Those whose behavior is changed

• Emulate only externally sensitive instructions• Those whose sensitivity affects visible

behavior7Safe and Efficient Instrumentation

Page 8: Safe and Efficient Instrumentation

Visible Behavior• Intuition: we can change anything that

does not affect the output of the program

• Formalization: in terms of denotational semantics• Briefly: two programs P, P’ are equivalent if:

8Safe and Efficient Instrumentation

Page 9: Safe and Efficient Instrumentation

Visibly Equivalent Programs

9Safe and Efficient Instrumentation

Original Binary

X YInstrumented

Binary

X + A Y + BInstrumentati

onInput

Instrumentation

Output

Page 10: Safe and Efficient Instrumentation

Sensitivity• What does instrumentation change?• Addresses of instructions• Contents of memory• Shape of the address space

• Sensitive instructions are directly affected• Access the PC (and are moved)• Read modified memory• Test allocated memory

10Safe and Efficient Instrumentation

Page 11: Safe and Efficient Instrumentation

Sensitivity Examples

11Safe and Efficient Instrumentation

main: push %ebp mov %esp, %ebp … call worker … leave ret

worker: push %ebp mov %esp, %ebp … ret

jumptable: push %ebp mov %esp, %ebp call get_pc_thunk add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx

get_pc_thunk: mov (%esp), %ebx ret

Call/Return pair:

Jumptable:protect: call initializeinitialize: pop %esi mov $(unpack_base), %edi mov $0x0, %ebxloop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpacked_base)

Self-Unpacking Code(Simplified)

Page 12: Safe and Efficient Instrumentation

External Sensitivity• An instruction is externally sensitive if it

causes a visible change in behavior• Approximation: or changes control flow

• This requires:• The sensitive instruction must produce

different values• These differences must reach an instruction

that affects output (or control flow)• … and change its behavior

12Safe and Efficient Instrumentation

Page 13: Safe and Efficient Instrumentation

Program Modification

13Safe and Efficient Instrumentation

Analysis

Compensation

Code

Original Binary

Modified BinaryCode

Relocated Code

Page 14: Safe and Efficient Instrumentation

Analysis Phase• Identify sensitive instructions• InstructionAPI: used and defined sets

• Determine affected instructions• DepGraphAPI: forward slice

• Analyze effects of modification• SymEval: symbolic expansion of the slice

14Safe and Efficient Instrumentation

Page 15: Safe and Efficient Instrumentation

Analysis Example: Call/Return Pair

15Safe and Efficient Instrumentation

main: push %ebp mov %esp, %ebp … call worker … leave ret

worker: push %ebp mov %esp, %ebp … ret

Call/Return pair:

Sensitivity: call (moved, uses PC)

Slice: call ret

Symbolic Expansion: call: ret:

Page 16: Safe and Efficient Instrumentation

Analysis Example: Jumptable

16Safe and Efficient Instrumentation

Sensitivity: call (moved, uses PC)

Slice: call mov (%esp), %ebx

Symbolic Expansion: call: ret: jmp:

jumptable: push %ebp mov %esp, %ebp call get_pc_thunk add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx

get_pc_thunk: mov (%esp), %ebx ret

Jumptable:

add $0x42, %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx

Page 17: Safe and Efficient Instrumentation

Analysis Example: Unpacking Code

17Safe and Efficient Instrumentation

Sensitivity: call (moved, uses PC)

Slice: call initialize pop %esi mov (%esi, %ebx, 4), %eax call unpack … Symbolic Expansion: call: pop: mov:

protect: call initialize…initialize: pop %esi mov $(unpack_base), %edi mov $0x0, %ebxloop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpacked_base)

Self-Unpacking Code(Simplified)

Page 18: Safe and Efficient Instrumentation

Compensation Phase• Generates the relocated code

• Two approaches:• Instruction transformation• Group transformation

18Safe and Efficient Instrumentation

Page 19: Safe and Efficient Instrumentation

Instruction Transformation• Emulate each externally sensitive

instruction• Replace some instructions (e.g., calls) with

sequences

• Straightforward to implement

• Some sequences impose high overhead• e.g., address translation

19Safe and Efficient Instrumentation

pop %eaxcall addr_translatejmp %eax

ret

Page 20: Safe and Efficient Instrumentation

Group Transformation• Emulate the behavior of a group of

instructions• Motivating example: thunks

• Open questions:• Which instructions are included in the

group?• How is the replacement sequence

determined?• Current status: hand-crafted templates

20Safe and Efficient Instrumentation

Page 21: Safe and Efficient Instrumentation

Transformation: Call/Return Pair

21Safe and Efficient Instrumentation

main: push %ebp mov %esp, %ebp … call worker … leave ret

worker: push %ebp mov %esp, %ebp … ret

Original Codemain: push %ebp mov %esp, %ebp … call worker … leave ret

worker: push %ebp mov %esp, %ebp … ret

Relocated Code

Page 22: Safe and Efficient Instrumentation

Transformation: Jumptable

22Safe and Efficient Instrumentation

Original Code Relocated Codejumptable:

push %ebp mov %esp, %ebp call get_pc_thunk add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx

get_pc_thunk: mov (%esp), %ebx ret

jumptable: push %ebp mov %esp, %ebp mov $(orig_ret_addr), %ebx add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx

Page 23: Safe and Efficient Instrumentation

Transformation: Unpacking Code

23Safe and Efficient Instrumentation

Relocated Codeprotect:

call initialize…initialize: pop %esi mov $(unpack_base), %edi mov $0x0, %ebxloop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpack_base)

Original Codeprotect: jmp initialize…initialize: mov $(orig_addr), %esi mov $(unpack_base), %edi mov $0x0, %ebxloop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpacked_base)

Page 24: Safe and Efficient Instrumentation

Results

Type of Binary % PC Sensitive % Externally Sensitive

% Unanalyzable

Executable (a.out) 9.0% 1.1% 6.6%Library (.so) 7.9% 6.9% 9.1%

24Safe and Efficient Instrumentation

Percentage of PC-Sensitive Instructions (32-bit, GCC, static analysis)

Dyninst S&E (no memory)

S&E (memory)

go (uninstrumented)

21.3 (73.2%) 12.4s (0.8%) 15.0s (22.0%)

go (basic block count)

23.4 (90.2%) 16.3s (32.5%) 19.5s (58.5%)

Instrumentation Overhead (go, 32-bit, 12.3s base time)

Page 25: Safe and Efficient Instrumentation

Future Work• Memory sensitivity and compensation• Improved pointer analysis• Useful user intervention?

• Investigate group transformations• Widen range of input binaries• Expand supported platforms

25Safe and Efficient Instrumentation

Page 26: Safe and Efficient Instrumentation

Questions?

26Safe and Efficient Instrumentation

Page 27: Safe and Efficient Instrumentation

ASProtect code loop

27Safe and Efficient Instrumentation

8049756: call 8049761

8049761: mov EDX, ECX8049763: pop EDI8049764: push EAX8049765: pop ESI8049766: add EDI, 2183804976c: mov ESI, EDI804976e: push 08049773: jz 804977c

8049779: adc DH, 229

804977c: pop EBX804977d: mov EAX, 2015212641

8049782: mov ECX, EBX(EDI)8049785: jmp 804979c

804979c: add ECX, 158698631680497a2: xor ESI, 31433375680497a8: xor ECX, 59491573380497ae: jmp 80497c3

80497c3: sub ECX, 59494877880497c9: sub ESI, 6426080497ce: push ECX, ESP80497cf: mov EAX, 88437732180497d4: pop EBX(EDI)80497d7: jmp 80497ed

80497ed: adc AL, 10080497f0: sub EBX, 159502605080497f6: xor EAX, 3477880497fb: add EBX, 15950260468049801: call 804980c

804980c: mov AX, 27838049810: pop ESI8049811: cmp EBX, 42949653448049817: jnz 8049834

804981d: or ESI, 8391819108049823: jmp 8049847

8049834: mov ESI, 12875703758049839: jmp 8049782

Page 28: Safe and Efficient Instrumentation

Emulation Examples

28Safe and Efficient Instrumentation

add %eax, %ebx

jnz 0xf3e

call fprintf

mov (%esi, %ebx, 4), %eax

jnz 0xe498d3

add %eax, %ebx

push $804391jmp fprintf

lea (%esi, %ebx, 4), %eaxcall mem_addr_translatemov (%eax), %eax

retpop %eaxcall addr_translatejmp %eax