ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48...
Transcript of ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48...
![Page 1: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/1.jpg)
ByteWeight: Learning to Recognize Func5ons in Binary Code
Tiffany Bao Jonathan Burket Maverick Woo Rafael Turner David Brumley Carnegie Mellon University
USENIX Security ’14
![Page 2: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/2.jpg)
Binary Analysis
Decompiler Control Flow Integrity (CFI)
01010010101010100101011011101010100101010101010111110001010001010110100101010001001010110101010101101011101010110110001010001000111010010011110101
Function Information Function 1 Function 3 Function 2
2
Binary Reuse
Malware Analysis … Vulnerability Signature Generation
![Page 3: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/3.jpg)
3
Binary Analysis
Decompiler Control Flow Integrity (CFI) Binary Reuse
Malware Analysis … Vulnerability Signature Generation
01010010101010100101011011101010100101010101010111110001010001010110100101010001001010110101010101101011101010110110001010001000111010010011110101
Function Information Function 1 Function 3 Function 2 Stripped
Can we automatically and accurately recover function
information from stripped binaries?
![Page 4: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/4.jpg)
4
#include <stdio.h>!int fac(int x){! if (x == 1) !
return 1;! else!
return x * fac(x - 1);!}!!void main(int argc, char **argv){! printf("%d", fac(10));!}!
Source Code
Example: GCC
![Page 5: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/5.jpg)
5
08048443 <main>:!push %ebp!mov %esp,%ebp!and $0xfffffff0,%esp!sub $0x10,%esp!…!!0804841c <fac>:!push %ebp!mov %esp,%ebp!sub $0x18,%esp!cmpl $0x1,0x8(%ebp)!jne 804842f <fac+0x13>!mov $0x1,%eax!…!
–O0: Default
Example: GCC
![Page 6: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/6.jpg)
6
08048330 <main>:!mov $0x1,%edx!mov $0xa,%eax!lea 0x0(%esi),%esi!…!push %ebp!mov %esp,%ebp!and $0xfffffff0,%esp!sub $0x10,%esp!…!
!0804841c <fac>:!push %ebx!sub $0x18,%esp!mov 0x20(%esp),%ebx!mov $0x1,%eax!cmp $0x1,%ebx!…!
–O1: Optimize –O2: Optimize Even More
Example: GCC
![Page 7: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/7.jpg)
Current Industry Solu5on: IDA
#include<stdio.h>!#include<string.h>!#define MAX 128!!
void sum(char a[MAX], char b[MAX]){! printf("%s + %s = %d\n", a, b, atoi(a) + atoi(b));!}!!void sub(char a[MAX], char b[MAX]){! printf("%s - %s = %d\n", a, b, atoi(a) - atoi(b));!}!!void assign(char a[MAX], char b[MAX]){! char pre_b[MAX];!
strcpy(pre_b, b);! strcpy(b, a);! printf("b is changed from %s to %s\n", pre_b, b);!}!!int main(){! void (*funcs[3]) (char x[MAX], char y[MAX]);! int f;! char a[MAX], b[MAX];! funcs[0] = sum;! funcs[1] = sub;! funcs[2] = assign;! scanf("%d %s %s", &f, &a, &b);! (*funcs[f])(a, b);! return 0;!}!
IDA Misses
IDA Misses
IDA Misses
7
![Page 8: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/8.jpg)
8
01010010101010100101011011101010100101010101010111110001010001010110100101010001001010110101010101101011101010110110001010001000111010010011110101
Func5on Iden5fica5on Problems Given a stripped binary, return 1. A list of function start addresses – “Function Start Identi3ication (FSI) Problem”
2. A list of function (start, end) pairs – “Function Boundary Identi3ication (FBI) Problem”
3. A list of functions as sets of instruction address – “Function Identi3ication (FI) Problem”
![Page 9: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/9.jpg)
ByteWeight A machine learning + program analysis approach
to function identiWication Training: 1. Creates a model of function start patterns using supervised
machine learning Usage: 1. Use trained models to match function start on stripped
binaries — Function Start IdentiWication 2. Use program analysis to identify all bytes associated with a
function — Function IdentiWication 3. Calculate the minimum and maximum addresses of each
function — Function Boundary IdentiWication
9
![Page 10: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/10.jpg)
Func5on Start Iden5fica5on 1. Previous approaches 2. Our approach
10
![Page 11: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/11.jpg)
11
b1 b2 b3 b4 b5 b6 b7 b8
Entry Idiom: push ebp | * | mov esp,ebp!
PreWix Idiom: ret | int3!
0.84!
Previous Work: Rosenblum et al.[1]
[1] N. E. Rosenblum, X. Zhu, B. P. Miller, and K. Hunt. Learning to Analyze Binary Computer Code. In Proceedings of the 23rd National Conference on ArtiWicial Intelligence (2008), AAAI, pp. 798–804.
Method: Select instruction idioms up to length 4; learn idiom parameters; label test binaries
“Feature (idiom) selection for all three data sets (1,171 binaries) consumed over 150 compute-‐days
of machine computation”
![Page 12: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/12.jpg)
ByteWeight: Lighter (Linear) Method
Testing Binary
Weight Calculation
Training Binaries
Weighted Sequences
Weighted Prefix Tree
Extraction
Training
Function Start
Function Bytes
Function Boundary
CFG Recovery
Function Boundary
Identification RFCR
Classification Function
Identification
12
Tree Generation
Extracted Sequences
![Page 13: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/13.jpg)
Step 1: Extract All ≤ K-‐length Sequences
!0000000100000e3b <func_1>:!55 push %rbp!48 89 e5 mov %rsp,%rbp!48 83 ec 10 sub $0x10,%rsp!89 7d fc mov %edi,-0x4(%rbp)!89 75 f8 mov %esi,-0x8(%rbp)!8b 55 f8 mov -0x8(%rbp),%edx!8b 45 fc mov -0x4(%rbp),%eax!89 c6 mov %eax,%esi!48 8d 3d c0 00 00 00 lea 0xc0(%rip),%rdi!b8 00 00 00 00 mov $0x0,%eax!e8 86 00 00 00 callq 100000ee8!c9 leaveq!c3 retq!!
!0000000100000e3b <func_1>:!55 48 89 e5 48 83 ec 10 89 7d fc 89 75 f8 8b 55 f8 8b 45 fc 89 c6 48 8d 3d c0 00 00 00 b8 00 00 00 00 e8 86 00 00 00 c9 c3!!
• 55!• 55 48!• 55 48 89 !• 55 48 89 e5!• …!
Bytes
• push %rbp!• push %rbp; mov %rsp,%rbp!• push %rbp; mov %rsp,%rbp; sub $0x10,%rsp!• push %rbp; mov %rsp,%rbp; sub $0x10,%rsp; mov %edi,-0x4(%rbp)!• …!
Instructions
13
!0000000100000e3b <_func_1>:!push %rbp!mov %rsp,%rbp!sub $0x10,%rsp!mov %edi,-0x4(%rbp)!mov %esi,-0x8(%rbp)!mov -0x8(%rbp),%edx!mov -0x4(%rbp),%eax!mov %eax,%esi!lea 0xc0(%rip),%rdi!mov $0x0,%eax!callq 100000ee8!leaveq!retq!!
![Page 14: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/14.jpg)
Step 2: Weight Sequences !0000000100000e3b <_func_1>:!55 push %rbp!48 89 e5 mov %rsp,%rbp!48 83 ec 10 sub $0x10,%rsp!89 7d fc mov %edi,-0x4(%rbp)!89 75 f8 mov %esi,-0x8(%rbp)!8b 55 f8 mov -0x8(%rbp),%edx!8b 45 fc mov -0x4(%rbp),%eax!89 c6 mov %eax,%esi!48 8d 3d c0 00 00 00 lea 0xc0(%rip),%rdi!b8 00 00 00 00 mov $0x0,%eax!e8 86 00 00 00 callq 100000ee8!c9 leaveq!c3 retq!!0000000100000e64 <_func_2>:!55 push %rbp!48 89 e5 mov %rsp,%rbp!48 83 ec 16 sub $0x16,%rsp!89 7d fc mov %edi,-0x4(%rbp)!89 75 f8 mov %esi,-0x8(%rbp)!8b 55 f8 mov -0x8(%rbp),%edx!8b 45 fc mov -0x4(%rbp),%eax!89 c6 mov %eax,%esi!48 8d 3d a6 00 00 00 lea 0xa6(%rip),%rdi!b8 00 00 00 00 mov $0x0,%eax!e8 5d 00 00 00 callq 100000ee8!c9 leaveq!c3 retq! 14
push %rbp à 55!
score: 2 / (2 + 2) = 0.5
![Page 15: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/15.jpg)
Step 2: Weight Sequences !0000000100000e3b <_func_1>:!55 push %rbp!48 89 e5 mov %rsp,%rbp!48 83 ec 10 sub $0x10,%rsp!89 7d fc mov %edi,-0x4(%rbp)!89 75 f8 mov %esi,-0x8(%rbp)!8b 55 f8 mov -0x8(%rbp),%edx!8b 45 fc mov -0x4(%rbp),%eax!89 c6 mov %eax,%esi!48 8d 3d c0 00 00 00 lea 0xc0(%rip),%rdi!b8 00 00 00 00 mov $0x0,%eax!e8 86 00 00 00 callq 100000ee8!c9 leaveq!c3 retq!!0000000100000e64 <_func_2>:!55 push %rbp!48 89 e5 mov %rsp,%rbp!48 83 ec 16 sub $0x16,%rsp!89 7d fc mov %edi,-0x4(%rbp)!89 75 f8 mov %esi,-0x8(%rbp)!8b 55 f8 mov -0x8(%rbp),%edx!8b 45 fc mov -0x4(%rbp),%eax!89 c6 mov %eax,%esi!48 8d 3d a6 00 00 00 lea 0xa6(%rip),%rdi!b8 00 00 00 00 mov $0x0,%eax!e8 5d 00 00 00 callq 100000ee8!c9 leaveq!c3 retq! 15
push %rbp; mov %rsp,%rbp!à 55 48 89 e5!
score: 2 / (2 + 0) = 1.0
![Page 16: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/16.jpg)
16
push %rbp! à 2/(2+2)=0.5!
push %rbp; mov %rsp,%rbp! à 2/(2+0)=1.0!
...!
Step 3: Generate Weighted Prefix Tree
push %rbp!(55)!
mov %rsp,%rbp!(48 89 e5)!
…!
2 / (2 + 0) = 1.0
2 / (2 + 2) = 0.5
sub $0x10,%rsp!(48 83 ec 10)!
…!
1 / (1 + 0) = 1.0 sub $0x16,%rsp!(48 83 ec 16)!
1 / (1 + 0) = 1.0
![Page 17: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/17.jpg)
Classifica5on
push %rbp!(55)!
mov %rsp,%rbp!(48 89 e5)!
…!
sub $0x10,%rsp!(48 83 ec 10)!
…!
sub $0x16,%rsp!(48 83 ec 16)!
00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d 4d a8 48 8d 55 ac 48 8d 45 a4!
1.0
0.5
17
1.0
0.5
1.0 1.0
0.0 55 48 89 e5
55
55 48 83 ec 60
Test Binary
![Page 18: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/18.jpg)
1 0 1.0
Normaliza5on (Op5onal)
18
sub $0x16,%rsp!(48 83 ec 16)!
4 6 0.4
4 6 0.4
1 0 1.0
jne 0x12345678!(0f 85 1c 01 00 00)!
2 6 0.25
00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d 4d a8 48 8d 55 ac 48 8d 45 a4!
55 48 89 e5
jne 0x[0-9a-f]*!
2 0 1.0
48 83 ec 60
sub $0x60,%rsp!
push %rbp!(55)!
mov %rsp,%rbp!(48 89 e5)!
sub $0x10,%rsp!(48 83 ec 10)!
…!
…!
sub $0x[1-9a-f][0-9a-f]*,%rsp!
![Page 19: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/19.jpg)
Func5on (Boundary) Iden5fica5on Identify all bytes associated with a function, and extract the lowest and highest addresses
19
![Page 20: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/20.jpg)
ByteWeight: Func5on (Boundary) Iden5fica5on
Testing Binary
Function Start
Function Bytes
Function Boundary
Control Flow Graph Recovery
Function Boundary
Identification RFCR
Classification Function
Identification
20
[2] G. Balakrishan. WYSINWYX: What You See Is Not What You Execute. PhD thesis, University of Wisconsin-‐Madison, 2007.
Weight Calculation
Training Binaries
Weighted Sequences
Weighted Prefix Tree
Extraction
Training
Tree Generation
Extracted Sequences
1. Recursive disassembly, using Value Set Analysis[2] to resolve indirect jumps.
2. Recursive Function Call Resolution—add any call target as a function start.
![Page 21: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/21.jpg)
ByteWeight: Func5on (Boundary) Iden5fica5on
Testing Binary
Function Start
Function Bytes
Function Boundary
Function Boundary
Identification RFCR
Classification Function
Identification
21
[2] G. Balakrishan. WYSINWYX: What You See Is Not What You Execute. PhD thesis, University of Wisconsin-‐Madison, 2007.
(instr1, instr101) instr1, instr2, instr3, instr6, instr10, instr12,
…, instr100, instr101.
F1
Control Flow Graph Recovery
![Page 22: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/22.jpg)
Experiment Results Compilers: GCC , ICC, and MSVS Platforms: Linux and Windows Optimizations: O0(Od), O1, O2, and O3(Ox)
22
![Page 23: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/23.jpg)
Training Performance ByteWeight: – 10-‐fold cross-‐validation, 2200 binaries – 6.1 days to train from all platforms and all compilers including logging
Rosenblum et al.: – ??? (They reported 150 compute days for one step of training, but did not report total time, or make their training implementation available.) • training data and code both unavailable
23
![Page 24: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/24.jpg)
Precision and Recall
24
Precision = TP
TP + FP Recall =
TP TP + FN
TP FN FP
Tool Truth
![Page 25: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/25.jpg)
Func5on Start Iden5fica5on: Comparison with Rosenblum et al.
0
0.5
1
GCC ICC
Precision
Rosenblum et al.
ByteWeight (K=3)
ByteWeight (no norm)
ByteWeight
0
0.5
1
GCC ICC
Recall
Rosenblum et al.
ByteWeight (K=3)
ByteWeight (no norm)
ByteWeight
25
![Page 26: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/26.jpg)
Func5on Start Iden5fica5on: Exis5ng Binary Analysis Tools
0
0.5
1
ELF x86 ELF x86-‐64 PE x86 PE x86-‐64
Precision Naïve Dyninst BAP IDA ByteWeight (no RFCR) ByteWeight
0
0.5
1
ELF x86 ELF x86-‐64 PE x86 PE x86-‐64
Recall Naïve Dyninst BAP IDA ByteWeight (no RFCR) ByteWeight
26
![Page 27: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/27.jpg)
Func5on Boundary Iden5fica5on: Exis5ng Binary Analysis Tools
0
0.5
1
ELF x86 ELF x86-‐64 PE x86 PE x86-‐64
Precision Naïve Dyninst BAP IDA ByteWeight (no RFCR) ByteWeight
0
0.5
1
ELF x86 ELF x86-‐64 PE x86 PE x86-‐64
Recall Naïve Dyninst BAP IDA ByteWeight (no RFCR) ByteWeight
27
![Page 28: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/28.jpg)
Summary: ByteWeight Machine-‐learning based approach – Creates a model of function start patterns using supervised machine learning
– Matches model on new samples – Uses program analysis to identify all bytes associated with a function
– Faster and more accurate than previous work
28
![Page 29: ByteWeight: LearningtoRecognizeFunc5onsinBinaryCode00 00 00 00 e8 5d 00 00 00 c9 c3 55 48 89 e5 48 83 ec 60 48 8d 05 9f ff ff ff 48 89 45 b8 48 8d 05 bd ff ff ff 48 89 45 c0 48 8d](https://reader035.fdocuments.net/reader035/viewer/2022070211/60fed0c5597ef17e607100b5/html5/thumbnails/29.jpg)
Thank You
29
Our experiment VM is available at: http://security.ece.cmu.edu/byteweight/
Tiffany Bao [email protected]