The Deconstruction of Dyninst

Post on 23-Feb-2016

67 views 0 download

Tags:

description

ProcControlAPI, ParsingAPI, and Binary Analysis. The Deconstruction of Dyninst. This Talk. Componentization Overview Principles Challenges Techniques ProcControlAPI ParsingAPI Binary Analysis Components. Dyninst and the Components. = New Component. = Existing Component. AST. - PowerPoint PPT Presentation

Transcript of The Deconstruction of Dyninst

Matthew LeGendre and Nathan RosenblumParadyn Project

Paradyn / Dyninst WeekMadison, WisconsinApril 12-14, 2010

The Deconstruction of DyninstProcControlAPI, ParsingAPI, and Binary Analysis

This Talko Componentization

oOverviewo PrinciplesoChallengeso Techniques

o ProcControlAPIo ParsingAPIo Binary Analysis Components

2The Deconstruction of Dyninst

Dyninst and the Components

AST

Binary

Process

StackwalkerAPI

BinaryPatching

SymtabAPI

DepGraphAPI

InstructionAPI

SymEval

ParsingAPI

CodeGen

ProcControlAPI

Binary

= Existing Component = New Component

= Proposed

DyninstAPI

Guiding Principleso Clean Abstractions

o Hide complexityo Necessary for portability

o Portabilityo Same interface across multiple systemso System differences not visible

o Sharingo Quickly build new tools

4The Deconstruction of Dyninst

Challenges

5The Deconstruction of Dyninst

o Interface and Abstractionso Finding the right level of detailoBuilding general componentsoAllowing inter-operability between

components

o Implementationo Integrating back into DyninstAPI

Challenges – The Right Abstractionso Ask “What will a user do with our tool?”

o e.g., “Why does someone need dynamic linking info from SymtabAPI?”

o Get the right level of detail in interfaceoDon’t want to provide “thin wrapper”

libraries.oe.g., SymtabAPI vs. libbfd or libelf

oHigh level interface allows platform independence

6The Deconstruction of Dyninst

Challenges – General Componentso Did we ask the right questions?

oDyninstAPI:oOperates third-partyoFocuses on instrumentationoUsed in HPC and security

o Embedded systems, software testing, …?

o Hard to anticipate users’ needso E.g., StackwalkerAPI written in C++

7The Deconstruction of Dyninst

Challenges – Interoperable components o Easy to combine components we built:

o StackwalkerAPI depends on SymtabAPIo ParsingAPI depends on InstructionAPI

o Harder to combine components between groups:o Abstraction mismatch between MRNet and

LaunchMON

o How strong should dependencies between components be?

8The Deconstruction of Dyninst

Actual:

o Re-integration can temporarily leave two implementations.

9The Deconstruction of Dyninst

DyninstAPIIdeal:

InstructionAPI SymtabAPI ProcControl

APIControl Flow

APIIA-64/Sparc

Instruction Parsing

o Extra maintenance costs

Challenges – Put it back together

Examples of What We Did Righto Function abstraction in SymtabAPI

o Level of detail in InstructionAPI

o Flexibility in StackwalkerAPI

o Hiding complexity in ProcControlAPI

10The Deconstruction of Dyninst

SymtabAPI Function Abstraction“Why do users look up code symbols?”Because they care about functions!

11The Deconstruction of Dyninst

Function

Symbol

Symbol

Symbol

TypeSignatur

eCode Range

Local Variables

InstructionAPI Level of Detailo Instructions are very platform dependent

o Don’t want to hide all platform detailso Don’t want to show all platform details

o Targeting binary analysis

o Common operations platform independento Read/Write setso Bind/Evalo …

12The Deconstruction of Dyninst

StackwalkerAPI’s Flexibilityo StackwalkerAPI easily customized

o Plug-in interface o 1st vs. 3rd party StackwalkingoCustom symbol access layeroCustom frame stepping routines

o Defaults provided for common case

13The Deconstruction of Dyninst

Achievementso More flexibility within our tools

o Static binary rewriter benefited from componentization

o DyninstAPI more maintainableo Fine-grained testingo Easier internal learning curve

o Opportunity to spring-clean code base.

14The Deconstruction of Dyninst

Achievementso Easier to adopt

oMore usersoHPCToolkit, STAT, Libra, CBI, Cray APT,

Open|SpeedShop, TAU, …

o Quickly develop new end user toolso STAT uses MRNet, StackwalkerAPI,

SymtabAPI and LaunchMON

15The Deconstruction of Dyninst

ProcControlAPIo A component library for controlling and

monitoring processes.

16The Deconstruction of Dyninst

Target Process

Target Process

Target Process

Controller

ProcessProcControlAP

I Library

Control/Query Process

Receive Events

Example Operationso Control Processes

oRead/Write memory and registerso Pause/Resume threadso Insert Breakpoints

o Receive Eventso Thread creation/destructiono System calls (fork/exec/…)o Signals

17The Deconstruction of Dyninst

ProcControlAPI Use Caseso StackwalkerAPI

oRead stack memory from target processesoGet list of loaded libraries

o DyninstAPIoWrite instrumentation to processesoHandle process events (fork/exec)oMonitor threads

o Debuggero Everything

18The Deconstruction of Dyninst

ProcControlAPI Goalso Platform Independent Interface

o Implemented On: Linux, Windows, BlueGene, AIX, Solaris, VXWorks, FreeBSD

o Simple Interface (also powerful)oHigh-level abstractions in interface. E.g.,

oBreakpointso Inferior RPCs

19The Deconstruction of Dyninst

Complexities

20The Deconstruction of Dyninst

User API Calls

OS Debug Interface

Event Pipe

Process Control

o Handle multiple inputs

Complexities

21The Deconstruction of Dyninst

User API Calls

OS Debug Interface

Event Pipe

Process Control

o Add threads for each input!

Event Generator ThreadsEvent

Generator Threads

o Handle multiple inputs

Complexities

22The Deconstruction of Dyninst

User API Calls

OS Debug Interface

Event Pipe

Process Control

o Add thread for event handling!

Event Generator ThreadsEvent

Generator Threads

o Event handling may block

Event Handler Thread

Complexities

23The Deconstruction of Dyninst

User API Calls

OS Debug Interface

Event Pipe

Process Control

o Add multiple event handler threads!

Event Generator ThreadsEvent

Generator Threads

o May be handling multiple events

Event Handler Threads

Complexities

24The Deconstruction of Dyninst

User API Calls

OS Debug Interface

Event Pipe

Process Control

o Add dedicated ptrace thread!

Event Generator ThreadsEvent

Generator Threads

o Linux allows only one thread to call ptrace

Event Handler Threads Ptrace Thread

Complexities

25The Deconstruction of Dyninst

User API Calls

OS Debug Interface

Event Pipe

Process Control

o Send callbacks to user thread for delivery!

Event Generator ThreadsEvent

Generator Threads

o Don’t want handlers delivering callbacks

Event Handler Threads Ptrace Thread

Complexities

26The Deconstruction of Dyninst

User API Calls

OS Debug Interface

Event Pipe

Process Control

o Merge ptrace and OS generator threads!

Event Generator ThreadsEvent

Generator Threads

o Linux 2.4 wants waitpid and ptrace on same thread

Event Handler Threads Ptrace Thread

Merged Ptrace & Generator

Thread

Complexities

27The Deconstruction of Dyninst

User API Calls

OS Debug Interface

Event Pipe

Process Control

o Send continue events to generator!

Event Generator ThreadsEvent

Generator Threads

o Windows wants continue calls on generator thread

Event Handler Threads Ptrace Thread

Merged Ptrace & Generator

Thread

More Complexitieso Some Linux versions kernel panic if

children are forked by a non-ptracer thread.

o Need to handle multiple target processes.

o Need to handle multi-threaded target processes.

o BlueGene has high-latency debug interface

28The Deconstruction of Dyninst

Interfaceo Hide complexity

oUser sees only one threadoHigh level abstractions

o Consistent across platforms

o Two primary interfaces:oQuery/Control target process.oReceive and handle events

29The Deconstruction of Dyninst

Process Classo Handle to target process

oCreate and attach to processesoAccess process memoryo Insert breakpointso Track library loads/unloadso Track thread creation and destructiono Stop/Continue all threadsoDetach from/terminate target process

30The Deconstruction of Dyninst

Thread Classo Handle to thread in target process

o Stop/Continue individual threadoGet/Set registerso Single step

o Thread represents OS level handle for threadsoMay also provide thread library

information

31The Deconstruction of Dyninst

Inferior RPCso Insert and invoke code in target process

32The Deconstruction of Dyninst

(gdb) call getpid()$1 = 14218(gdb)

o User provides machine code to run.o ProcControlAPI allocates memory, saves

registers, runs code, restores registers and memory.

o Returns result of iRPC

Eventso Can register callbacks for process

events:

33The Deconstruction of Dyninst

o Forko Execo Thread Createo Thread Destroyo Library Loado Library Unload

o Breakpointo RPC Completiono Exito Crasho Single Stepo Signal

o Some events can have pre/post timeso E.g., pre-exit and post-exit

Callbackso Events are delivered via callbacks

oUser registers callbacks for interesting events

oOnly certain ProcControlAPI functions will trigger callbacks

o Restrictions on callback functionsoCan not call anything that would

recursively trigger more callbacks.o This prevents races

34The Deconstruction of Dyninst

Notificationo Want to deliver callback on user thread

o User thread may be busy or blocked

35The Deconstruction of Dyninst

User Code ProcControlAPIFork Event

o Notify user that event is pending via Callback or FD

read(...)

handleEvents()

on_fork_callback() { …}

{

}

Implementationo Internally threaded

oOne thread to receive events (Generator)oOne thread to handle events (Handler)oOne thread to perform ptrace calls on LinuxoUser thread to handle events, receive

callbacks, or trigger operations.

o Event handling is a state machine to avoid recursive events.

36The Deconstruction of Dyninst

Implementation

37The Deconstruction of Dyninst

User API Calls

OS Debug Interface

ProcControlAPI

o Use state machine to keep one handler thread.

o No support for “problematic” OSs (Linux 2.4)

o May expand thread model for parallelism on BlueGene

Event Generator Threads

User Thread

Event Handler Thread

Ptrace Thread

Current Statuso Currently on Linux/x86, Linux/x86_64o Next platforms are Linux/ppc,

BlueGene, and FreeBSD.o Windows, AIX, and Solaris support to

follow.

o Beta release available upon request.

38The Deconstruction of Dyninst

Intermission

Binary parsing

40The Deconstruction of Dyninst

_lock_foo

main

foo

dynamic instrumentation, debugger, static binary analysis tools, malware analysis, binary

editor/rewriter, …

41

Familiar territory

Benjamin Schwarz, Saumya Debray, and Gregory R. Andrews. Disassembly of executable code revisited. 2002

Cristina Cifuentes and K. John Gough. Decompilation of binary programs. 1995

Richard L. Sites, Anton Chernoff, Matthew B. Kirk, Maurice P. Marks, and Scott G. Robinson. Binary translation. 1993.

HenrikTheiling. Extracting safe and precise control flow from binaries. 2000.

Ramkumar Chinchani and Eric van den Berg. A fast static analysis approach to detect exploit code inside network flows. 2005.

J. Troger and C. Cifuentes. Analysis of virtual method invocation for binary translation. 2002.

Laune C. Harris and Barton P. Miller. Practical analysis of stripped binary code. 2005.

Christopher Kruegel, William Robertson, Fredrik Valeur, and Giovanni Vigna. Static disassembly of obfuscated binaries. 2004.

Nathan Rosenblum, Xiaojin Zhu, Barton P. Miller, and Karen Hunt. Learning to analyze binary computer code. 2008.

Amitabh Srivastava and Alan Eustace. ATOM: a system for building customized program analysis tools. 1994.

Barton Miller, Jeffrey Hollingsworth, and Mark Callaghan. Dynamic Program Instrumentation for Scalable Performance Tools. 1994.

We’ve been down this road…

42The Deconstruction of Dyninst

recursive traversal parsing“gap” parsing heuristicsprobabilistic code models

non-contiguous functions

code sharing non-returning

functions

preamble scanning handles stripped

binaries

learn to recognize function entry points

very accurate gap parsing

the DYNINST binary parser

What makes a parsing component?

43The Deconstruction of Dyninst

0111010110

1010101010

1110101001

0101011100

0100100101

1010110011

0101010101

0101001001

1110

0101110010110

Parsing API

simple, intuitive

representation

2

functions

blocksedgesInstructionAPI

SymtabAPI

platform independence supported by previous Dyninst components

3

Binarycodesource

abstraction

1

Flexible code sources

44The Deconstruction of Dyninst

a binary code object

Parser code source requirements:code location

code data

access to code bytesunsigned char * buf

41 56 49 89 fe 41 55 …

main foo bar baz

function hints & names

a few (optional) facts

pointer width

external linkage

PLT

Code source contract

45The Deconstruction of Dyninst

bool isValidAddressbool isExecutableAddressvoid * getPtrToInstructionvoid * getPtrToDataunsigned

getAddressWidth

bool isCodebool isDataAddress codeOffsetAddress codeLength

Nine mandatory methods

SymtabAPI implementation in 232 lines (including optional hints, function names)

Any binary code object that can be memory mapped can be parsed

Simple control flow interface

46The Deconstruction of Dyninst

Functions Blocks Edges

start addr.

extents

contain joined by

start addr.

end addr.

in edges

out edges

src targ

type

Views of control flow

47The Deconstruction of Dyninst

while(!work.empty()) { Block *b = work.pop();

/* do something with b */

edgeiter eit = b->out().begin(); while(eit != b->out().end()) { work.push(*eit++); }}

Walking a control flow graphstarting here

What if we only want intraprocedural

edges?

Edge predicates

48The Deconstruction of Dyninst

while(!work.empty()) { Block *b = work.pop();

/* do something with b */

IntraProc pred; edgeiter eit = b->out().begin(&pred); while(eit != b->out().end()) { work.push(*eit++); }}

Walking a control flow graph Edge Predicates

Tell iterator whether Edge argument should be returnedComposable (and, or)

Examples: Intraprocedural Single function

context Direct branches

only

Extensible CFG objects

49The Deconstruction of Dyninst

image_func

Function

Dyninst image_func

ParseAPI FunctionSimple, only need to represent control flow graph

Complex, handles instrumentation, liveness, relocation, etc.

Special callback points during

parsingparse parse parse

unresBranchNotify(insn)

[derived class does stuff]

parse parse parse

Factory interface for CFG objects

parser

custom

factory

mkfunc()(Function*)image_func

What’s in the box?

50The Deconstruction of Dyninst

* box to be released soon

Binary Parser Control Flow Graph Representation

SymtabAPI-based Code Source

recursive descent parsing

speculative gap parsing

cross platform: x86, x86-64, PPC, IA64, SPARC

graph interface extensible objects for

easy tool integration exports Dyninst

InstructionAPI interface

cross-platform supports ELF, PE,

XCOFF formats

Status

51The Deconstruction of Dyninst

conception code refactoring interface designDyninst re-integration(major test case)

other major test case: compiler

provenance (come tomorrow!)

Intermission

Binary Analysis

53The Deconstruction of Dyninst

InstructionSemantics

AliasAnalysis

DepGraphAPI

AST Simplificatio

n

ComponentsSymEval

Instruction Semantics

54The Deconstruction of Dyninst

Instruction Semantics

InstructionAPI ROSE

add 4,%eax

eax

=

eax

+4

o Instructions into semantic ASTso Built with ROSE

Alias Analysis

55The Deconstruction of Dyninst

push %ebpmov %esp,%ebpsub $12,%esplea 4(%ebp),%eaxmov $12,(%eax)mov 16(%esp),(%ebx)mov 4(%ebp),0x8084100leaveret

push %ebpmov %esp,%ebpsub $12,%esplea 4(ebp), %eaxmov $12,(%eax)mov -8(%esp), %ebpmov 4(ebp),0x8084100leaveret

Local 1 Local

1Local

1Param 1

GlobalUnknown

o Identify stack and global variableso Plug-in more sophisticated analysis

AST Simplification

56The Deconstruction of Dyninst

mov $10,%ebxshl $2,%ebxadd $2,%ebxadd %ebx,%eaxmov $5,(%eax)

ebx

=10

ebx

=

ebx

<<2

=

eax

+42

* 5

ebx

=

ebx

+2

DepGraphAPI

57The Deconstruction of Dyninst

o Build Control Dependence Graphs (CDG)o “Why am I executed”

o Build Data Dependence Graphs (DDG)o “Where do my inputs come from?”o “Where do my outputs go?”

o Already beta released in Dyninst 6.0

DepGraphAPI - Slicing

58The Deconstruction of Dyninst

sub $4,%espinc %edxpop %ebxmov 0x1000,%ecxlea 10(%ebx),%eaxadd $2,%ecxcall %eax

sub $4,%esp

pop %ebx %esp

lea 10(%ebx),%eax

call %eax

o Derive slices from CDG and DDGo New features to build slices on-the-fly.o Cheaper for small numbers of slices

Example: Jump Tables

59The Deconstruction of Dyninst

Where can the jump go? What is %rax’s final value?

...cmp $0xa,%eaxja 0x804c900inc %ecxlea 0x1920(%rip),%rdxmov (%rdx,%rax,4),%raxadd %rdx,%raxjmpq *%rax

Example: Jump Tables

60The Deconstruction of Dyninst

...cmp $0xa,%eaxja 0x804c900inc %ecxlea 0x1920(%rip),%rdxmov (%rdx,%rax,4),%raxadd %rdx,%raxjmpq *%rax

1. Slice from jump for relevant instructions

cmp $0xa,%eaxja 0x804c900

lea 0x1920(%rip),%rdxmov (%rdx,%rax,4),%raxadd %rdx,%raxjmpq *%rax

Example: Jump Tables

61The Deconstruction of Dyninst

2. Run alias analysis to simplify

cmp $0xa,%eaxja 0x804c900lea 1920(%rip),%rdxmov (%rdx,%rax,4),%raxadd %rdx,%raxjmpq *%rax

cmp $0xa,%eaxja 0x804c900lea 0x1920(%rip),%rdxmov (%rdx,%rax,4),%raxadd %rdx,%raxjmpq *%rax

CodeData@804e100

Example: Jump Tables

62The Deconstruction of Dyninst

3. Use Instruction semantics to convert to ASTs

cmp $0xa,%eaxja 0x804c900lea 0x1920(%rip),%rdxmov (%rdx,%rax,4),%raxadd %rdx,%raxjmpq *%rax

CodeData@804e100

rax=

rdx+

xrax 4rax

=

rdx+

rax

rdx=Data@804e100

Example: Jump Tables

63The Deconstruction of Dyninst

4. Use AST Simplification to get single AST

rax=

rdx+

xrax 4rax

=

rdx+

rax

rdx=Data@804e100

rip=

+

+

*

xrax 4

Data@804e100

Data@804e100

Example: Jump Tables

64The Deconstruction of Dyninst

rip=

+

+

*

xrax 4

Data@804e100

Data@804e100

Single Unknown Input

Reference to RO data

Status of Analysis Components

65The Deconstruction of Dyninst

o Work in progresso Build public interfaceso Finish implementations

o Currently being used foro Concolic executiono Safe code relocationo Jump tableso Fingerprinting

o DepGraphAPI in beta

Conclusions

66The Deconstruction of Dyninst

o ProcControlAPI & ParsingAPIo New componentso Available for friendly beta

o Binary Analysiso Set of components for binary

analysiso Still under development.

Questions?

67The Deconstruction of Dyninst