Safe and Efficient Instrumentation
description
Transcript of Safe and Efficient Instrumentation
Paradyn Project
Paradyn / Dyninst WeekMadison, WisconsinApril 12-14, 2010
Paradyn Project
Safe and Efficient Instrumentation
Andrew Bernat
Binary Instrumentation
2Safe and Efficient Instrumentation
• Instrumentation modifies the original code• Moves original code• Allocates new memory• Overwrites original code
• This affects the behavior of:• Moved code• Code that references moved code• Code that references changed memory
Sensitivity Models• A program is sensitive to a particular
modification if that modification changes the program’s behavior
• Current binary instrumenters rely on fixed sensitivity models
• Compensating for sensitivity imposes overhead
3Safe and Efficient Instrumentation
pop %eaxcall addr_translatejmp %eax
ret
Safe and Efficient
Approach
Safe and Efficient
Approach
Efficiency vs Sensitivity
4Safe and Efficient Instrumentation
Sensitivity Malware
Optimized Code
Conventional Code
Efficiency
Pin, Valgrind, …
Dyninst
Safe and Efficient
Approach
How do we do this?• Formal model of code relocation• Visible behavior• Instruction sensitivity• External sensitivity
• Implementation in Dyninst• Analysis phase• Transformation phase
• Performance Results
5Safe and Efficient Instrumentation
Three Questions
• What program behavior do we wish to preserve?
• How does modification affect instructions?
• How do instructions change program behavior?
6Safe and Efficient Instrumentation
Approach• Preserve visible behavior• Relationship of input to output
• Identify sensitive instructions• Those whose behavior is changed
• Emulate only externally sensitive instructions• Those whose sensitivity affects visible
behavior7Safe and Efficient Instrumentation
Visible Behavior• Intuition: we can change anything that
does not affect the output of the program
• Formalization: in terms of denotational semantics• Briefly: two programs P, P’ are equivalent if:
8Safe and Efficient Instrumentation
Visibly Equivalent Programs
9Safe and Efficient Instrumentation
Original Binary
X YInstrumented
Binary
X + A Y + BInstrumentati
onInput
Instrumentation
Output
Sensitivity• What does instrumentation change?• Addresses of instructions• Contents of memory• Shape of the address space
• Sensitive instructions are directly affected• Access the PC (and are moved)• Read modified memory• Test allocated memory
10Safe and Efficient Instrumentation
Sensitivity Examples
11Safe and Efficient Instrumentation
main: push %ebp mov %esp, %ebp … call worker … leave ret
worker: push %ebp mov %esp, %ebp … ret
jumptable: push %ebp mov %esp, %ebp call get_pc_thunk add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx
get_pc_thunk: mov (%esp), %ebx ret
Call/Return pair:
Jumptable:protect: call initializeinitialize: pop %esi mov $(unpack_base), %edi mov $0x0, %ebxloop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpacked_base)
Self-Unpacking Code(Simplified)
External Sensitivity• An instruction is externally sensitive if it
causes a visible change in behavior• Approximation: or changes control flow
• This requires:• The sensitive instruction must produce
different values• These differences must reach an instruction
that affects output (or control flow)• … and change its behavior
12Safe and Efficient Instrumentation
Program Modification
13Safe and Efficient Instrumentation
Analysis
Compensation
Code
Original Binary
Modified BinaryCode
Relocated Code
Analysis Phase• Identify sensitive instructions• InstructionAPI: used and defined sets
• Determine affected instructions• DepGraphAPI: forward slice
• Analyze effects of modification• SymEval: symbolic expansion of the slice
14Safe and Efficient Instrumentation
Analysis Example: Call/Return Pair
15Safe and Efficient Instrumentation
main: push %ebp mov %esp, %ebp … call worker … leave ret
worker: push %ebp mov %esp, %ebp … ret
Call/Return pair:
Sensitivity: call (moved, uses PC)
Slice: call ret
Symbolic Expansion: call: ret:
Analysis Example: Jumptable
16Safe and Efficient Instrumentation
Sensitivity: call (moved, uses PC)
Slice: call mov (%esp), %ebx
Symbolic Expansion: call: ret: jmp:
jumptable: push %ebp mov %esp, %ebp call get_pc_thunk add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx
get_pc_thunk: mov (%esp), %ebx ret
Jumptable:
add $0x42, %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx
Analysis Example: Unpacking Code
17Safe and Efficient Instrumentation
Sensitivity: call (moved, uses PC)
Slice: call initialize pop %esi mov (%esi, %ebx, 4), %eax call unpack … Symbolic Expansion: call: pop: mov:
protect: call initialize…initialize: pop %esi mov $(unpack_base), %edi mov $0x0, %ebxloop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpacked_base)
Self-Unpacking Code(Simplified)
Compensation Phase• Generates the relocated code
• Two approaches:• Instruction transformation• Group transformation
18Safe and Efficient Instrumentation
Instruction Transformation• Emulate each externally sensitive
instruction• Replace some instructions (e.g., calls) with
sequences
• Straightforward to implement
• Some sequences impose high overhead• e.g., address translation
19Safe and Efficient Instrumentation
pop %eaxcall addr_translatejmp %eax
ret
Group Transformation• Emulate the behavior of a group of
instructions• Motivating example: thunks
• Open questions:• Which instructions are included in the
group?• How is the replacement sequence
determined?• Current status: hand-crafted templates
20Safe and Efficient Instrumentation
Transformation: Call/Return Pair
21Safe and Efficient Instrumentation
main: push %ebp mov %esp, %ebp … call worker … leave ret
worker: push %ebp mov %esp, %ebp … ret
Original Codemain: push %ebp mov %esp, %ebp … call worker … leave ret
worker: push %ebp mov %esp, %ebp … ret
Relocated Code
Transformation: Jumptable
22Safe and Efficient Instrumentation
Original Code Relocated Codejumptable:
push %ebp mov %esp, %ebp call get_pc_thunk add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx
get_pc_thunk: mov (%esp), %ebx ret
jumptable: push %ebp mov %esp, %ebp mov $(orig_ret_addr), %ebx add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx
Transformation: Unpacking Code
23Safe and Efficient Instrumentation
Relocated Codeprotect:
call initialize…initialize: pop %esi mov $(unpack_base), %edi mov $0x0, %ebxloop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpack_base)
Original Codeprotect: jmp initialize…initialize: mov $(orig_addr), %esi mov $(unpack_base), %edi mov $0x0, %ebxloop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpacked_base)
Results
Type of Binary % PC Sensitive % Externally Sensitive
% Unanalyzable
Executable (a.out) 9.0% 1.1% 6.6%Library (.so) 7.9% 6.9% 9.1%
24Safe and Efficient Instrumentation
Percentage of PC-Sensitive Instructions (32-bit, GCC, static analysis)
Dyninst S&E (no memory)
S&E (memory)
go (uninstrumented)
21.3 (73.2%) 12.4s (0.8%) 15.0s (22.0%)
go (basic block count)
23.4 (90.2%) 16.3s (32.5%) 19.5s (58.5%)
Instrumentation Overhead (go, 32-bit, 12.3s base time)
Future Work• Memory sensitivity and compensation• Improved pointer analysis• Useful user intervention?
• Investigate group transformations• Widen range of input binaries• Expand supported platforms
25Safe and Efficient Instrumentation
Questions?
26Safe and Efficient Instrumentation
ASProtect code loop
27Safe and Efficient Instrumentation
8049756: call 8049761
8049761: mov EDX, ECX8049763: pop EDI8049764: push EAX8049765: pop ESI8049766: add EDI, 2183804976c: mov ESI, EDI804976e: push 08049773: jz 804977c
8049779: adc DH, 229
804977c: pop EBX804977d: mov EAX, 2015212641
8049782: mov ECX, EBX(EDI)8049785: jmp 804979c
804979c: add ECX, 158698631680497a2: xor ESI, 31433375680497a8: xor ECX, 59491573380497ae: jmp 80497c3
80497c3: sub ECX, 59494877880497c9: sub ESI, 6426080497ce: push ECX, ESP80497cf: mov EAX, 88437732180497d4: pop EBX(EDI)80497d7: jmp 80497ed
80497ed: adc AL, 10080497f0: sub EBX, 159502605080497f6: xor EAX, 3477880497fb: add EBX, 15950260468049801: call 804980c
804980c: mov AX, 27838049810: pop ESI8049811: cmp EBX, 42949653448049817: jnz 8049834
804981d: or ESI, 8391819108049823: jmp 8049847
8049834: mov ESI, 12875703758049839: jmp 8049782
Emulation Examples
28Safe and Efficient Instrumentation
add %eax, %ebx
jnz 0xf3e
call fprintf
mov (%esi, %ebx, 4), %eax
jnz 0xe498d3
add %eax, %ebx
push $804391jmp fprintf
lea (%esi, %ebx, 4), %eaxcall mem_addr_translatemov (%eax), %eax
retpop %eaxcall addr_translatejmp %eax