WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim,...

42
University of Wisconsin- Madison WCRE2006 Extracting File Formats from Executables Junghee Lim , Thomas Reps and Ben Lib lit University of Wisconsin-Madison 13 th Working Conference on Reverse En gineering Oct. 26, 2006 http://www.cs.wisc.edu/~junghee/WCRE2006.p pt

Transcript of WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim,...

Page 1: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

Extracting File Formats from Executables

Junghee Lim, Thomas Reps and Ben LiblitUniversity of Wisconsin-Madison

13th Working Conference on Reverse EngineeringOct. 26, 2006

http://www.cs.wisc.edu/~junghee/WCRE2006.ppt

Page 2: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

Data Format (File Format)• Goal: automatically ex

tract a specification of a program’s output format– E.g., something similar

to the file-format specification for gzip

• FFE (File Format Extractor)– Input: an executable without source code or documentation

– Output: a representation of the output data format

– (e.g., a regular expression)* *

size:1

value:

0x1F

size:1

value:

0x8B

size:1

value:0x08

size:1

value:

Top

size:Top

value:

Top

size:Top

value:Top

size:4

value:

Top

size:4

value:

Top

size:4

value:

Top

size:1

value:

Top

size:1

value:

Top

Page 3: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

Gzip specification vs. our structure* *

size:1

value:0x1F

size:1

value:0x8B

size:1

value:0x08

size:1

value:Top

size:Top

value:Top

size:Top

value:Top

size:4

value:Top

size:4

value:Top

size:4

value:Top

size:1

value:Top

size:1

value:Top

Page 4: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

Usage Scenarios• Reuse components of a tool chain

– COTS (Commercial Off-The-Shelf) products

• Detect malware– Recover output format (= network-communication pattern)

from captured malware

– Detect variants in the wild by detecting network traffic with that pattern

• Characterize what a program computes/creates• Find inconsistencies between specifications and impl

ementations

Page 5: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

2Bulk writes1Individual writes

Programming Stylese.g.

- gzip

- compress95

- png2ico

e.g.

- tar

- cpio

Page 6: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

What are the Steps?• Disassemble executable• Recover

– Interprocedural CFG– Variables (and their sizes)– Possible values of variables

• Construct Hierarchical Finite-State Machine (HFSM)• Annotate HFSM with size/value information• [Construct regular expression]

– Perform in-line expansion

• [Validation]– Regular exp. flex spec. recognizer– Examples recognizer success/failure

Page 7: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

What are the Steps?• Disassemble executable• Recover

– Interprocedural CFG– Variables (and their sizes)– Possible values of variables

• Construct Hierarchical Finite-State Machine (HFSM)• Annotate HFSM with size/value information• [Construct regular expression]

– Perform in-line expansion

• [Validation]– Regular exp. flex spec. recognizer– Examples recognizer success/failure

Page 8: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

What are the Steps?• Disassemble executable• Recover

– Interprocedural CFG– Variables (and their sizes)– Possible values of variables

• Construct Hierarchical Finite-State Machine (HFSM)• Annotate HFSM with size/value information• [Construct regular expression]

– Perform in-line expansion

• [Validation]– Regular exp. flex spec. recognizer– Examples recognizer success/failure

call bar

foo

1

2

3

4

bar

10

9

5

6

7

8

baz

call bar

call baz

call baz

1 2 5 6 9 10 7 9 10 8

3 5’ 6’ 9’ 10’ 7’ 9’ 10’ 8’ 4

FSM

HFSM

Page 9: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

What are the Steps?• Disassemble executable• Recover

– Interprocedural CFG– Variables (and their sizes)– Possible values of variables

• Construct Hierarchical Finite-State Machine (HFSM)• Annotate HFSM with size/value information• [Construct regular expression]

– Perform in-line expansion

• [Validation]– Regular exp. flex spec. recognizer– Examples recognizer success/failure

Page 10: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

What are the Steps?• Disassemble executable• Recover

– Interprocedural CFG– Variables (and their sizes)– Possible values of variables

• Construct Hierarchical Finite-State Machine (HFSM)• Annotate HFSM with size/value information• [Construct regular expression]

– Perform in-line expansion

• [Validation]– Regular exp. flex spec. recognizer– Examples recognizer success/failure

Page 11: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

What are the Steps?• Disassemble executable• Recover

– Interprocedural CFG– Variables (and their sizes)– Possible values of variables

• Construct Hierarchical Finite-State Machine• Annotate HFSM with size/value information• [Construct regular expression]

– Perform in-line expansion

• [Validation]– Regular exp. flex spec. recognizer– Examples recognizer success/failure

Well-known concepts fromformal-language theory• but we use varying-sized alphabet symbols

Page 12: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

What are the Steps?• Disassemble executable• Recover

– Interprocedural CFG– Variables (and their sizes)– Possible values of variables

• Construct Hierarchical Finite-State Machine (HFSM)• Annotate HFSM with size/value information• [Construct regular expression]

– Perform in-line expansion

• [Validation]– Regular exp. flex spec. recognizer– Examples recognizer success/failure

Page 13: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

1Individual writes

Example code

disassemble

compile

Executable

0100100010001001001001000010111010100111010101010101010101010000101010100101010110110100101010100101010101010010010101010101010100101010101010101001001010101010

Page 14: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

sub_401050 (put_byte) : void put_byte(char c); sub_401075 (put_long) : void put_long(int n); sub_4010E4 (writes) : void writes(char* str, int size);

The disassembled code for our example401120 sub_401120 proc near; type401120 push ebp401121 mov ebp, esp401123 sub esp, 0Ch401126 mov eax, [ebp-4]401129 mov [ebp-8], eax40112C cmp [ebp-8], 0401130 jz short loc_40113A401132 cmp [ebp-8], 1401136 jz short loc_401147401138 jmp short loc_40115240113A loc_40113A:40113A mov eax, [ebp-4]40113D mov [esp], eax401140 call sub_401050401145 jmp short loc_401152401147 loc_401147:401147 mov eax, [ebp-4]40114A mov [esp], eax40114D call sub_401050401152 loc_401152:401152 leave401153 retn401154 sub_401154 proc near; chksum401154 push ebp401155 mov ebp, esp401157 sub esp, 840115A mov eax, [ebp-4]40115D mov [esp], eax401160 call sub_401075401165 leave401166 retn401167 sub_401167 proc near; fill_data401167 push ebp401168 mov ebp, esp40116A sub esp, 840116D loc_40116D:40116D cmp [ebp-1], 0401171 jz short loc_401181401173 movsx eax, [ebp-1]401177 mov [esp], eax40117A call sub_40105040117F jmp short loc_40116D401181 loc_401181:401181 leave401182 retn

401183 sub_401183 proc near; main401183 push ebp401184 mov ebp, esp401186 sub esp, 28h401189 and esp, 0FFFFFFF0h40118C mov eax, 0401191 add eax, 0Fh401194 add eax, 0Fh401197 shr eax, 440119A shl eax, 440119D mov [ebp-14h], eax4011A0 mov eax, [ebp-14h]4011A3 call sub_4012004011A8 call __main4011AD mov eax, [ebp-10h]4011B0 mov [esp], eax4011B3 call sub_4010754011B8 mov eax, [ebp-0Ch]4011BB mov [esp], eax4011BE call sub_4010754011C3 mov [esp+4], 44011CB mov eax, [ebp-8]4011CE mov [esp], eax4011D1 call sub_4010E44011D6 call sub_4011204011DB call sub_4011674011E0 mov eax, [ebp-4]4011E3 mov [esp], eax4011E6 call sub_4010754011EB call sub_4011544011F0 mov eax, 04011F5 leave4011F6 retn

Output operations 401140, 40114D, 401160, 40117A, 4011B3, 4011BE, 4011D1, 4011E6

User-supplied information• Library function, or• Wrapped library function

Output functions

Page 15: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

HFSM for our example

4011B3call

sub_401075(put_long)

4011BEcall

sub_401075(put_long)

4011D1call sub_4010E4

(writes)

4011E6call

sub_401075(put_long)

40117Acall sub_401050

(put_byte)

401160call sub_401075

(put_long)

401140call sub_401075

(put_byte)

40114Dcall sub_401075

(put_byte)

4011DBcall

sub_401167(fill_data)

4011D6call sub_401120

(type)

4011EBcall sub_401154

(chksum)

Page 16: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

4051b4_ENTRY

HFSM for gzip4051b4_ENTRY

call 4056df

call 40510c

call 4054e6

call 4056df

call 4057f2

call 4056df

call 4054e6

call 4057a5

40572b

404366_ENTRY

call 4051b4

call 4051b4

call 4051b4

call 404145

404145_ENTRY

call 4051b4

call 4051b4

403d20_ENTRY

403d62

403d6e

403d7a

403d90

403d9d

403df1

403dfd

403e1f

call 404366

403e43

403e50

40510c_ENTRY

call 4056df

call 4056df

call 4056df

call 404f0e

call 404f0e

call 4056df

40510c_ENTRY

call 4056df

403e50

403e50

403e50

4059c8_ENTRY

403e50

408281_ENTRY

408414

4057a5_ENTRY

4057d8 4057be

404f0e_ENTRY

call 4056df

call 4056df

call 4056df

call 4056df

call 4056df

call 4056df

call 4056df

call 4056df

404f0e_ENTRYcall 4056df

call 4056df

call 4056df

call 4056df

call 4056df

call 4056df

- 12 FSMs - 64 nodes - 36 call-sites

Page 17: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

A fragment of the call graph of gzip

Page 18: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

4051b4_ENTRY

HFSM for gzip4051b4_ENTRY

call 4056df

call 40510c

call 4054e6

call 4056df

call 4057f2

call 4056df

call 4054e6

call 4057a5

40572b

404366_ENTRY

call 4051b4

call 4051b4

call 4051b4

call 404145

404145_ENTRY

call 4051b4

call 4051b4

403d20_ENTRY

403d62

403d6e

403d7a

403d90

403d9d

403df1

403dfd

403e1f

call 404366

403e43

403e50

40510c_ENTRY

call 4056df

call 4056df

call 4056df

call 404f0e

call 404f0e

call 4056df

40510c_ENTRY

call 4056df

403e50

403e50

403e50

4059c8_ENTRY

403e50

408281_ENTRY

408414

4057a5_ENTRY

4057d8 4057be

404f0e_ENTRY

call 4056df

call 4056df

call 4056df

call 4056df

call 4056df

call 4056df

call 4056df

call 4056df

404f0e_ENTRYcall 4056df

call 4056df

call 4056df

call 4056df

call 4056df

call 4056df

- 12 FSMs - 64 nodes - 36 call-sites

Page 19: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

Regular Expression for gzip

* *size:

1value:0x1F

size:1

value:0x8B

size:1

value:0x08

size:1

value:Top

size:Top

value:Top

size:Top

value:Top

size:4

value:Top

size:4

value:Top

size:4

value:Top

size:1

value:Top

size:1

value:Top

If HFSM is too complicated and there is no

recursion, in-line expand to create regular expression

Page 20: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

Executable

disassembleExecutable

Build CFGs

IDA Pro

VSA*

ASI*

CodeSurfer Back-end

CodeSurfer/x86

Organization of CodeSurfer/x86

Augmenting an HFSM with VSA and ASI information

Connector

File Format

Extractor (FFE/x86)

* VSA (Value Set Analysis)A combined numeric-analysis and pointer-analysis algorithm that determines an over-approximation of the set of numeric values and addresses that each abstract memory location holds at each program point. (G. Balakrishnan and T. Reps. “Analyzing memory accesses in x86 executables”, CC04)

* ASI (Aggregate Structure Identification)A unification-based, flow-insensitive algorithm to identify a program’s arrays and structs. (G. Ramalingam and et. al, “Aggregate structure identification and its application to program analysis”, POPL99)(G. Balakrishnan and T. Reps, “Recovery of variables and heap structure in x86 executables”, TR-1533, Comp. Sci. Dept., UW-Madison, 2005)

VSA*

ASI*

Page 21: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

Value Set Analysis (VSA)

Output functionvoid put_long(int n) { put_short(n&0xffff); put_short((ulong)n >> 16);} stack

esp

push 12345678hcall put_long

Output operation

Output functionvoid writes(char* c, uint len) { for(int i=0; i<len; i++) { outbuf[outcnt++]=(uchar)(c[i]); if(outcnt==OUTBUFSIZE) flush_outbuf(); }}

Page 22: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

Value Set Analysis (VSA)

Output function Output operationvoid put_long(int n) { put_short(n&0xffff); put_short((ulong)n >> 16);}

push 12345678hcall put_long

stack

78h

size:4

espLookupVSA(esp-4x8, 4)=12345678h

Output functionvoid writes(char* c, uint len) { for(int i=0; i<len; i++) { outbuf[outcnt++]=(uchar)(c[i]); if(outcnt==OUTBUFSIZE) flush_outbuf(); }}

.

.

.

1000a

b

c

d

1001

1002

1003

1004

esp

stack

mov ebx, 1000...push 4 push ebxcall writes

Output operation

56h

34h

12h

Page 23: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

Value Set Analysis (VSA)

.

.

.

1000a

b

c

d

1001

1002

1003

1004

Output function Output operationvoid writes(char* c, uint len) { for(int i=0; i<len; i++) { outbuf[outcnt++]=(uchar)(c[i]); if(outcnt==OUTBUFSIZE) flush_outbuf(); }}

mov ebx, 1000...push 4 push ebxcall writes

stack

size:4

4esp

Output function Output operationvoid put_long(int n) { put_short(n&0xffff); put_short((ulong)n >> 16);}

push 12345678hcall put_long

stack

78h

size:4

espLookupVSA(esp-4x8, 4)=12345678h

esp

56h

34h

12h

Page 24: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

Value Set Analysis (VSA)

.

.

.

1000a

b

c

d

1001

1002

1003

1004

Output function Output operationvoid writes(char* c, uint len) { for(int i=0; i<len; i++) { outbuf[outcnt++]=(uchar)(c[i]); if(outcnt==OUTBUFSIZE) flush_outbuf(); }}

mov ebx, 1000...push 4 push ebxcall writes

size:4

4

1000 esp

stack

Output function Output operationvoid put_long(int n) { put_short(n&0xffff); put_short((ulong)n >> 16);}

push 12345678hcall put_long

stack

78h

size:4

espLookupVSA(esp-4x8, 4)=12345678h

esp

56h

34h

12h

Page 25: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

Value Set Analysis (VSA)

.

.

.

1000a

b

c

d

1001

1002

1003

1004

Output function Output operationvoid writes(char* c, uint len) { for(int i=0; i<len; i++) { outbuf[outcnt++]=(uchar)(c[i]); if(outcnt==OUTBUFSIZE) flush_outbuf(); }}

mov ebx, 1000...push 4 push ebxcall writes

size:4

4

1000 esp

LookupVSA(*(esp-4*8))=“abcd”

stack

Output function Output operationvoid put_long(int n) { put_short(n&0xffff); put_short((ulong)n >> 16);}

push 12345678hcall put_long

stack

78h

size:4

espLookupVSA(esp-4x8, 4)=12345678h

esp

56h

34h

12h

Page 26: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

*

size:4

value:?

size:4

value:?

size:4

value:?

size:4

value:?

size:4

value:?

size:2

value:?

size:2

value:?

size:4

value:?

size:4

value:?

size:4

value:?

size:4

value:?

size:4

value:?

* size:Top

value:?

size:1

value:?

size:Top

value:?

*

size:2

value:?

size:2

value:?

size:2

value:?

size:1

value:?

size:1

value:?

size:1

value:?

size:1

value:?

size:2

value:?

size:4

value:?

size:4

value:?

size:2

value:?

*

* *

size:4

value:40

size:4

value:

Top

size:4

value:0

size:4

value:

Top

size:4

value:

Top

size:2

value:1

size:2

value:

Top

size:4

value:0

size:4

value:

Top

size:4

value:0

size:4

value:0

size:4

value:0

* size:Top

value:

Top

size:1

value:0

size:Top

value:

Top

*

size:2

value:0

size:2

value:1

size:2

value:

Top

size:1

value:

Top

size:1

value:

Top

size:1

value:

Top

size:1

value:0

size:2

value:0

size:4

value:

Top

size:4

value:

Top

size:2

value:

Top

*

*

Before After

Page 27: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

ASI output :

Aggregate Structure Identification (ASI)

[14][14] call sendto call sendto...

Page 28: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

Experiments

• gzip– GNU data-compression program

• png2ico– converts PNG files to Windows icon-resource files

• ping– sends ICMP ECHO_REQUEST packets to a host to see

if the host is reachable via the network

Page 29: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

gzip* *

size:1

value:0x1F

size:1

value:0x8B

size:1

value:0x08

size:1

value:Top

size:Top

value:Top

size:Top

value:Top

size:4

value:Top

size:4

value:Top

size:4

value:Top

size:1

value:Top

size:1

value:Top

Page 30: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

png2ico (1)

• Usage scenario– Find inconsistencies between specifications and

implementations

Page 31: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

*

size:4

value:40

size:4

value:

Top

size:4

value:0

size:4

value:

Top

size:4

value:

Top

size:2

value:1

size:2

value:

Top

size:4

value:0

size:4

value:

Top

size:4

value:0

size:4

value:0

size:4

value:0

* size:Top

value:

Top

size:1

value:0

size:Top

value:

Top

*

size:2

value:0

size:2

value:1

size:2

value:

Top

size:1

value:

Top

size:1

value:

Top

size:1

value:

Top

size:1

value:0

size:2

value:0

size:4

value:

Top

size:4

value:

Top

size:2

value:

Top

*

*

*

png2ico (2)

Page 32: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

*

size:4

value:40

size:4

value:

Top

size:4

value:0

size:4

value:

Top

size:4

value:

Top

size:2

value:1

size:2

value:

Top

size:4

value:0

size:4

value:

Top

size:4

value:0

size:4

value:0

size:4

value:0

* size:Top

value:

Top

size:1

value:0

size:Top

value:

Top

*

size:2

value:0

size:2

value:1

size:2

value:

Top

size:1

value:

Top

size:1

value:

Top

size:1

value:

Top

size:1

value:0

size:2

value:0

size:4

value:

Top

size:4

value:

Top

size:2

value:

Top

*

*

*

png2ico (2)

bug?

Page 33: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

png2ico (3)

• We found an inconsistency between the file-format specification for Windows icons and the converter png2ico– png2ico regular exp. flex spec. recognizer

– Windows icon files recognizer failure!

Page 34: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

*

size:4

value:40

size:4

value:

Top

size:4

value:0

size:4

value:

Top

size:4

value:

Top

size:2

value:1

size:2

value:

Top

size:4

value:0

size:4

value:

Top

size:4

value:0

size:4

value:0

size:4

value:0

* size:Top

value:

Top

size:1

value:0

size:Top

value:

Top

*

size:2

value:0

size:2

value:1

size:2

value:

Top

size:1

value:

Top

size:1

value:

Top

size:1

value:

Top

size:1

value:0

size:2

value:0

size:4

value:

Top

size:4

value:

Top

size:2

value:

Top

*

*

*

png2ico (4)writeWord(outfile,0); // wPlanes

Page 35: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

ping (1)

pinger pinger

*

pinger

* ?

pinger

pinger

catcherentry

catcherexit

pingermainentry

mainexitpinger catcher

pinger

catcher

The HFSM gives a hint about the behavior of ping.

Page 36: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

ping (2)typedef struct icmp { uint8 icmp_type; /* type of message, see below */ uint8 icmp_code; /* type sub code */ uint16 icmp_checksum; /* ones complement cksum of struct */ #define icmp_cksum icmp_checksum union { uint8 ih_pptr; /* ICMP_PARAMPROB */ struct in_addr ih_gwaddr; /* ICMP_REDIRECT */ struct ih_idseq { uint16 icd_id; uint16 icd_seq; } ih_idseq; int ih_void; /* ICMP_UNREACH_NEEDFRAG – Path MTU Discovery (RFC1191) */ struct ih_pmtu { uint16 ipm_void; uint16 ipm_nextmtu; } ih_pmtu; struct ih_rtradv { uint8 irt_num_addrs; uint8 irt_wpa; uint16 irt_lifetime; } ih_rtradv; } icmp_hun; #define icmp_pptr icmp_hun.ih_pptr ... union { struct id_ts { uint32 its_otime; uint32 its_rtime; uint32 its_ttime; } id_ts; struct id_ip { struct ip idi_ip; /* options and then 64 bits of data */ } id_ip; struct icmp_ra_addr id_radv; uint32 id_mask; char id_data[1]; } icmp_dun; #define icmp_otime icmp_dun.id_ts.its_otime ...} icmp_t;

pinger pinger*

pinger* ?

pinger

size:1

value:

Top

size:1

value:

Top

size:2

value:

Top

size:2

value:

Top

size:2

value:

Top

Page 37: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

- A technique for extracting an over-approximation of a program’s output data format, including- a way to extract a preliminary structure for the output

data format- a way to elaborate the structure by annotating it with

information about possible output values and sizes

Conclusion

Page 38: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

Over-Approximation?

• Yes, modulo . . .– All operations must append to the output

• No tracking of file-pointer rewind, seek, . . .

– Multiple different formats in a program– Signals and exceptions ignored

• In principle, could use the same technique used in the MOPS tool

Page 39: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

- Automatic detection of output functions- Other operation sequences other formats

– Input operations– Network-communication operations

- Adoption of a learning technique for refining output formats

Possible Future Work

Page 40: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

Thank you!Clarifications?

Page 41: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

Page 42: WCRE2006 University of Wisconsin-Madison Extracting File Formats from Executables Junghee Lim, Thomas Reps and Ben Liblit University of Wisconsin-Madison.

University of Wisconsin-MadisonWCRE2006

Identifying Output Operations

• IDAPro disassembler identifies library output procedures

• Typically, inspect the call graph to choose which application procedures should be considered output wrappers