Similarity of Binaries through re-Optimization
ByYaniv David, Nimrod Partush & Eran Yahav
Technion, Israel
Motivation
2
Developer
httpdOpenSSL
Motivation
3
SecurityResearcher
?
?
?
?
OpenSSL
Problem Definition
4
mov x0, x20mov x20, 3add x0, x0, 1sub x21, x21, x0cmn x21, 2
...
Procedure ππ
lea r15, [rax+1]sub r13, r15xor rax, raxadd rax, 3cmp r13, -2
...
Procedure ππ
mov edi, 10mov esi, 489mov edx, 0
...
Procedure ππ
mov rbp, rdimov r12d, esicmp dword ptr [rbp+64], 0jz short loc_143E
...
Procedure ππ
push r12push rbxpush rbpsub rsp, 10
...
Procedure π|π»|β¦
Corpus π»
lea r15, [rax+1]sub r13, r15xor rax, raxadd rax, 3cmp r13, -2
...
Procedure π
π» β₯ 106
Procedure ππ
gcc 4.8βO0
icc 15.0.3βO3
CLang 3.4βOs
CLang4
βO1
icc 15.0.3βO3
gcc 4.6βOs
Challenge
lea r15, [rax+1]sub r13, r15xor rax, raxadd rax, 3cmp r13, -2
mov x0, x20mov x20, 3add x0, x0, 1sub x21, x21, x0cmn x21, 2
5
OpenSSLβsdtls1_buffer_message()
icc 15.0.3 βO3
gcc 4.8 βO0
Our ApproachFor finding Similarity of Binaries
6
Our Approach: What
7
push r12push rbxpush rbp
...
lea r15, [rax+1]sub r13, r15xor rax, raxadd rax, 3cmp r13, -2
Query π: dtls1_buffer_message()Compiler: icc 15.0.3 βO3Architecture:
Query ππ: dtls1_buffer_message()Compiler: gcc 4.8 βO0Architecture:
mov x0, x20mov x20, 3add x0, x0, 1sub x21, x21, x0cmn x21, 2
...
=...
πΉπππ ππππ’π
...
...
Our Approach: What
8
push r12push rbxpush rbp
...
Query ππ: dtls1_buffer_message()Compiler: gcc 4.8 βO0Architecture:
mov x0, x20mov x20, 3add x0, x0, 1sub x21, x21, x0cmn x21, 2
...
...
πΉπππ ππππ’π
...
...
lea r15, [rax+1]sub r13, r15xor rax, raxadd rax, 3cmp r13, -2
Query π: dtls1_buffer_message()Compiler: icc 15.0.3 βO3Architecture:
Our Approach: What
9
push r12push rbxpush rbp
...
Procedure ππ: unrelated()
Compiler: icc 15.0.3 βO3Architecture:
push r12push rbxpush rbp
...
...
=...
lea r15, [rax+1]sub r13, r15xor rax, raxadd rax, 3cmp r13, -2
...
πΉπππ ππππ’π
Query π: dtls1_buffer_message()Compiler: icc 15.0.3 βO3Architecture:
Our Approach: How
β’ Decompose procedure to fragments β’ Transform fragments to canonical form
β’ Count shared fragments while weighing in their statistical significance
10
tmp0 = register0 + 1register1 = tmp0tmp1 = register2 β tmp0register2 = tmp1tmp2 = tmp1 - 2
mov x0, x20add x0, x0, 1sub x21, x21, x0cmn x21, 2
lea r15, [rax+1]sub r13, r15cmp r13, -2
tmp0 = register0 + 1register1 = tmp0tmp1 = register2 β tmp0register2 = tmp1tmp2 = tmp1 - 2
=
Decomposing Assembly Procedures
β’ The procedure is broken at basic block levelβ’ Block ordering is ignored
11
push r12push rbxpush rbpsub rsp, 10
...
mov edi, 10...
lea r15, [rax+1]sub r13, r15
...
...
πΉπππ ππππ’π
β’ We use slicing to break basic blocks into separate data-independent computations:
mov x0, x20mov x20, 3add x0, x0, 1sub x21, x21, x0cmn x21, 2
Slice
Slicing Basic Blocks
mov x0, x20add x0, x0, 1sub x21, x21, x0cmn x21, 2
slice 1
mov x20, 3 slice 2
12
β’ βOut-of-contextβ re-optimization
Optimize
Moving to Canonical Form (re-Optimizing)
13
Lift
mov x0, x20add x0, x0, 1sub x21, x21, x0cmn x21, 2
t0 = load x20store t0, x0t1 = load x0t2 = 1t3 = add t1, t2store t3, x0
...
LLVM
t0 = load x20t1 = add t0, 1store t1, x0
...
t0 = load r0t1 = add t0, 1store t1, r1
...
Normalize
Procedure Representation
14
dtls1_buffer_message()
{
,
, β¦ }
R(dtls1_buffer_message)=
t0 = load r0t1 = add t0, 1store t1, r1t2 = load r2t3 = sub t2, t1store t3, r2
push r12push rbxpush rbpsub rsp, 10
...
mov edi, 10...
lea r15, [rax+1]sub r13, r15
...
...
πΉπππ ππππ’π
t0 = load r0t1 = sub t0, 10t2 = load r1t3 = add t2, 3t3 = mul t3, t1store t3, r2
Computing Similarity
β’Reminder:
πππππππππ‘π¦(π, π‘) = |π π β© π π‘ |
15
push r12push rbxpush rbpsub rsp, 10
dtls1_buffer_message(), icc 15.0.3, βO3
unrelated(), icc 15.0.3, βO3
push r12push rbxpush rbpsub rsp, 10
=
Statistical Significance
β’ Given a canonical fragment s, we need to determine its significance.
β’ We estimate π with a bound, random sample of procedures π βa βGlobal Contextβ
Prπ(π ) =
| π β π π β π (π) }|
|π|Set of all
procedures, in existence
π
ππ
16
Computing Similarity
πππππππππ‘π¦(π, π‘) = π β
π π β©π π‘
1
πππ(π )
A sum ranging over the shared canonical
fragments
Of the inversed probability of each
fragment
17
A common fragment with π·ππ·(π) =
π
πwill
contribute 5
A distinctive fragment with π·π
π·(π) = π. πππ
will contribute 1000
EvaluationPrototype Implementation: GitZ
18
1. Procedure πππSimilarity: 170.34
2. Procedure πππSimilarity: 168.91
3. Procedure ππππSimilarity: 130.41
4. Procedure ππππ,πππSimilarity: 101.11
5. Procedure πππ,πππSimilarity: 13.19
β¦
GitZ Output
19
SecurityResearcher
Procedure π: OpenSSLβs dtls1_buffer_message()
Evaluation Corpus
β’ Real-world code packagesβ’ OpenSSL, , , , ,QEMU, wget, ffmpeg, Coreutilsβ’ Containing ~11,000 procedures
β’ Compiled to:β’ x86_64 with CLang 3.{4,5}, gcc 4.{6,8,9} and icc {14,15}
β’ ARM-64 with aarch64-gcc 4.8 and aarch64-Clang 4
β’ Optimization levels -O{0,1,2,3,s}
β’ Corpus size: π = 45 β ~11,000 = ~500,000
β’ Queries: 9 procedures from notable CVEs
20
x 9
x 5
1. Procedure πππSimilarity: 170.34
2. Procedure πππSimilarity: 168.91
3. Procedure ππππSimilarity: 130.41
4. Procedure ππππ,πππSimilarity: 101.11
5. Procedure πππ,πππSimilarity: 13.19
β¦
500,000. Procedure πππSimilarity: 0.0
GitZ Accuracy
β’ Here, we report accuracy as the number of FPs ranked above the lowest TP
21
Negative
Positive
#ππ· = π
ππ·π =π
πππ, πππ
Positive
Positive
Negative
Negative
Negative
Negative
Negative
GitZ Usefulness
β’ How useful is GitZ in the vulnerability search scenario?
# Alias CVE #FPs FP Rate
1 Heartbleed 2014-0160 52 0.000104
2 Shellshock 2014-6271 0 0
3 Venom 2015-3456 0 0
4 Clobberin' 2014-9295 0 0
5 Shellshock #2 2014-7169 0 0
6 WS-snmp 2011-0444 0 0
7 wget 2014-4877 0 0
8 ffmpeg 2015-6826 0 0
9 WS-statx 2014-8710 0 0
22
Time
15m 17m 16m 16m 12m 14m 10m 17m 18m
β’ How well does GitZ scale? Over 500K corpus 72 Cores
0.12s for query-target pair, single
core
GitZ Scalability
Evaluating Solution Components
β’ How does each component of our solution affect the accuracy of GitZ?
23
Query Corpus Size |π| #Positives
Heartbleed 10000 45
Evaluating the Global Context
1
10
100
1 101 201 301 401 501 601 701 801 901
#FP
Size of π· (in # Procedures)24
Reminder: π·ππ·(π) =
| πβπ· πβ πΉ(π) }|
|π·|
Evaluating Solution VectorsFalse Positive Rate
25
Canonical w/GC
CanonNormLLVM
CanonLLVMw/ GC
CanonLLVM
NormLLVMw/ GC
NormLLVM
NormVEX w/
GC
NormVEX
CROC 0.9780.9270.8750.8510.9540.9080.9430.905
0
0.02
0.04
0.06
0.08
0.10
0.12
Evaluating Solution VectorsFalse Positive Rate
26
Canonical w/GC
CanonNormLLVM
CanonLLVMw/ GC
CanonLLVM
NormLLVMw/ GC
NormLLVM
LLVMw/ GC
LLVMFragme
ntsCROC 0.0220.0730.1250.1490.0460.0920.1220.153
0
0.02
0.04
0.06
0.08
0.10
0.12
Lifted (Only) LLVMFragments
Evaluating Solution Vectors
27
Canonical w/GC
CanonNormLLVM
CanonLLVMw/ GC
CanonLLVM
NormLLVMw/ GC
NormLLVM
LLVMw/ GC
LLVMFragme
ntsCROC 0.0220.0730.1250.1490.0460.0920.1220.153
LLVM
No Global Context
With Global Context
LLVM w/Global Context
False Positive Rate
0
0.02
0.04
0.06
0.08
0.10
0.12
Evaluating Solution Vectors
28
Canonical w/GC
CanonNormLLVM
CanonLLVMw/ GC
CanonLLVM
NormLLVMw/ GC
NormLLVM
LLVMw/ GC
LLVMFragme
ntsCROC 0.0220.0730.1250.1490.0460.0920.1220.153
No Global Context
With Global Context
LLVM LLVM w/ GC Normalized
LLVM Fragments
False Positive Rate
0
0.02
0.04
0.06
0.08
0.10
0.12
Evaluating Solution Vectors
30
Canonical w/GC
CanonNormLLVM
CanonLLVMw/ GC
CanonLLVM
NormLLVMw/ GC
NormLLVM
LLVMw/ GC
LLVMFragme
ntsCROC 0.0220.0730.1250.1490.0460.0920.1220.153
No Global Context
With Global Context
LLVM LLVM w/ GC
NormalizedLLVM Canonical
FragmentsOptimized
LLVM
False Positive Rate
0
0.02
0.04
0.06
0.08
0.10
0.12
Take Aways
β’A procedure can be identified using a set of statistically significant fragmentsβ’ The statistical data can be collected over a relatively
small set
β’Applying an optimizer βout-of-contextβ is useful at transforming fragments to canonical formβ’ A form that allows finding similarity
32
Questions
β’ Canonical Form: The Good
β’ Canonical Form: The Bad
β’ Previous Work
β’ BinDiff
β’ More Experiments!
β’ Youβre over-thinking this
β’ Out-of-Context?
β’ Limitations
β’ Evaluating Binary Classifiers
β’ All v. All
β’ Is Pr(s) a probability even?
β’ The Global Context
Canonical Form: The Good
34
mov rax, -2sub rbx, rax
add r12, 2
t0 = load r0t1 = add 2, t0store t1, r0
t0 = load r0t1 = add 2, t0store t1, r0
Canonical
Canonical
=
add rax, 1add rbx, 2add rbx, rax
add rbx, 2add rax, 1add rax, rbx
t0 = load r0t1 = load r1t2 = add t0, t1t3 = add t1, 3store t3, r1
t0 = load r0t1 = add 2, t0store t1, r0
Canonical
Canonical
=The Bad:
Canonical Form: The Bad
35
add rax, 1add rbx, 2
add rbx, rax
add rax, 1add rbx, 2
add rax, rbx
t0 = load r0t1 = load r1t2 = add t0, t1t3 = add t1, 3
store t3, r1
Canonical
Canonical
β t0 = load r0t1 = load r1t2 = add t0, t1t3 = add t1, 3
store t3, r0
Previous Work
36
BinDiff
The Global Context
β’Sampled over canonical fragments, from procedures of all archs\compilers\optimizationsβ’A new arch\compiler\optimization?
β’ We are somewhat future proof due to optimizationβ’ Even if a compiler decides to do things a bit different, it should
arrive at the same canonical form
β’ Entirely new behaviors will require a (partial) resamplingβ’ For instance: using the stack in offsets of 13 O_0
38
All v. All
39
Limitations
40
Evaluating Binary Classifiers
β’ The Receiver Operating Characteristic (ROC) is a widespread method for evaluating a binary classifier, by plotting the ratio of TPs to FPs, for all the different thresholds
β’ The Area Under Curve is then summed and value between 0-1 is produced. Our results were > .96.
β’ ROC means βhow well did we cover all true positives, before we encounter false positivesβ
β’ We used Concentrated ROC, an adaptation for huge corpora, which further βpunishesβ highly ranked FPs
Jaccard Index?
β’ Major difference: does not take statistical significance into account, at all.
42
π½ π΄, π΅ =|π΄ β© π΅|
|π΄ βͺ π΅|
Is Pr(π ) a probability even?
β’ Prπ(π ) is a probability over the sample space of π
β’ π is a βmultisetβ of all canonical fragments in existenceβ’ | π β π π β π (π) }| counts the occurrences of π in πβ’ Pr π 1 + Pr π 2 + Pr π 3 + Pr π 4 =
3
7+
2
7+ 1
7+ 1
7= 1
β’ Prπ(π ) is an estimation of π, which betters as π growsβ’ As we evaluated
Prπ(π ) =
| π β π π β π (π) }|
|π|
ππππ
ππ
ππππππ
ππ
ππ
ππππ
πΎ
Youβre over-thinking this
β’Why not just run the binary and get a version string??β’ Sometimes a lib is embeddedβ’ You canβt always easily run (different arch,
dependencies, etc.)β’ Running can put you in unnecessary riskβ’ Purely staticβ’ Weβve seen cases where the version string is not
maintained correctly!
Out-of-Context?
β’ In context: The slice is surrounded with:β’ Other instructions from the blockβ’ Other blocksβ’ Other procedures
The optimizer here must account for the surroundings, and cannot easily βcut-throughβ unrelated operations
β’ Out-of-context: Just the slice. The optimizer can easily extract a concise canonical fragment, that can be matched with semantically equivalent fragments from other procedures.
More Experiments!
Scenario Queries Targets FP Rate
Cross-(Optimization β¨Version) x86_64
gcc 4.{6,8,9} βO* gcc 4.{6,8,9} βO* 0.001
Scenario Queries Targets FP Rate
Cross-Optimization ARM-64
aarch64-gcc 4.8 βO* aarch64-gcc 4.8 βO* 0
Scenario Queries Targets FP Rate
Cross-Compiler x86_64
πΆππππππππ π₯86 βO1 πΆππππππππ π₯86 βO1 0.002
46
GitZ Accuracy: Cross-Compiler/Optimization/Architecture
Scenario Queries Targets FP Rate
Cross-ArchitectureLow Optimization
(πΆππππππππ π₯86 β¨
πΆππππππππ π΄π π) βO1(πΆππππππππ π₯86 β¨
πΆππππππππ π΄π π) βO10.006
Scenario Queries Targets FP Rate
Cross-ArchitectureStandard Optimization
(πΆππππππππ π₯86 β¨
πΆππππππππ π΄π π) βO2(πΆππππππππ π₯86 β¨
πΆππππππππ π΄π π) βO20.005
Scenario Queries Targets FP Rate
Cross-ArchitectureHeavy Optimization
(πΆππππππππ π₯86 β¨
πΆππππππππ π΄π π) βO3(πΆππππππππ π₯86 β¨
πΆππππππππ π΄π π) βO30.004
47
Top Related