Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for...

23
Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H. H. Ding Data Mining and Security Lab School of Information Studies McGill University Montreal, Canada Benjamin C. M. Fung Data Mining and Security Lab School of Information Studies McGill University, Montreal, Canada Philippe Charland Mission Critical Cyber Security Section Defence R&D Canada – Valcartier Quebec, Canada

Transcript of Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for...

Page 1: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization

StevenH.H.Ding

DataMiningandSecurityLab

SchoolofInformationStudies

McGillUniversity

Montreal,Canada

BenjaminC.M.FungDataMiningandSecurityLabSchoolofInformationStudies

McGillUniversity,Montreal,Canada

PhilippeCharland

MissionCriticalCyberSecuritySectionDefenceR&DCanada–Valcartier

Quebec,Canada

Page 2: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

Reverseengineer

Manualanalysis

Reverseengineering

2

Didanyoneanalyzesomethingsimilarbefore?Isitalibraryfunction?

f1f2f3

LDR R3,[R11,#sct]LDR R2,[R3,#0xC]LDR R3,[R11,#applet_no]CMP R2,R3BEQ loc_DFD0LDR R3,[R11,#sct]LDR R3,[R3]STR R3,[R11,#sct]loc_DFC0LDR R3,[R11,#sct]CMP R3,#0BNE loc_DFA0

Disassemble

Abinaryfile

Page 3: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

WithKam1n0

3

LDR R3,[R11,#sct]LDR R2,[R3,#0xC]LDR R3,[R11,#applet_no]CMP R2,R3BEQ loc_DFD0LDR R3,[R11,#sct]LDR R3,[R3]STR R3,[R11,#sct]loc_DFC0LDR R3,[R11,#sct]CMP R3,#0BNE loc_DFA0

Commentedassemblyfunction

LDR R3,[R11,#sct]LDR R2,[R3,#0xC]LDR R3,[R11,#applet_no]CMP R2,R3BEQ loc_DFD0LDR R3,[R11,#sct]LDR R3,[R3]STR R3,[R11,#sct]loc_DFC0LDR R3,[R11,#sct]CMP R3,#0BNE loc_DFA0

Labeledlibraryfunction

Page 4: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

TypeI:Exactclone

4

0x1FE69C0+ PUSHebp

0x1FE69C1+ MOVebp,esp

0x1FE69C3+ MOVecx,[ebp+arg_0]

0x1FE69C6+ PUSHebx

0x1FE69C7+ MOVebx,[ebp+arg_8]

0x1FE69CA+ PUSHesi

0x1FE69CB+ MOVesi,ecx

0x1FE69CD+ ANDecx,0FFFFh

0x1FE69D3+ SHResi,10h

0x1FE69D6+ CMPebx,1

0x1FE69D9+ +JNZloc_1FE6A0C

0x1FE69C0+ PUSHebp

0x1FE69C1+ MOVebp,esp

0x1FE69C3+ MOVecx,[ebp+arg_0]

0x1FE69C6+ PUSHebx

0x1FE69C7+ MOVebx,[ebp+arg_8]

0x1FE69CA+ PUSHesi

0x1FE69CB+ MOVesi,ecx

0x1FE69CD+ ANDecx,0FFFFh

0x1FE69D3+ SHResi,10h

0x1FE69D6+ CMPebx,1

0x1FE69D9+ +JNZloc_1FE6A0C

Page 5: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

TypeII:Syntacticallyequivalent

5

0x1FE05B0+ PUSHebp

0x1FE05B1+ MOVebp,esp

0x1FE05B3+ MOVecx,[ebp+arg_0]

0x1FE05B6+ PUSHebx

0x1FE05B7+ MOVebx,[ebp+arg_8]

0x1FE05BA+ PUSHesi

0x1FE05BB+ MOVesi,ecx

0x1FE05BD+ ANDecx,0FFFFh

0x1FE05B3+ SHResi,10h

0x1FE05B6+ CMPebx,1

0x1FE05B9+ +JNZloc_1FE05BC

0x1FE69C0+ PUSHebp

0x1FE69C1+ MOVebp,esp

0x1FE69C3+ MOVeax,[ebp+msg_0]

0x1FE69C6+ PUSHedx

0x1FE69C7+ MOVedx,[ebp+msg_1]

0x1FE69CA+ PUSHesi

0x1FE69CB+ MOVesi,eax

0x1FE69CD+ ANDeax,0FFFFh

0x1FE69D3+ SHResi,10h

0x1FE69D6+ CMPedx,1

0x1FE69D9+ +JNZloc_1FE6A0C

Page 6: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

TypeIII:Minormodification

6

0x1FE05B0+ PUSHebp

0x1FE05B1+ MOVebp,esp

+

+

0x1FE05B7+ MOVebx,[ebp+arg_8]

0x1FE05BA+ PUSHesi

0x1FE05BB+ MOVesi,ecx

0x1FE05BD+ ANDecx,0FFFFh

0x1FE05B3+ MOVeax,ecx

0x1FE05B6+ SHResi,10h

0x1FE05B9+ CMPebx,1

0x1FE05C1+ +JNZloc_1FE05BC

0x1FE69C0+ PUSHebp

0x1FE69C1+ MOVebp,esp

0x1FE69C3+ MOVeax,[ebp+msg_0]

0x1FE69C6+ PUSHedx

0x1FE69C7+ MOVedx,[ebp+msg_1]

0x1FE69CA+ PUSHesi

0x1FE69CB+ MOVesi,eax

0x1FE69CD+ ANDeax,0FFFFh

0x1FE69D3+ SHResi,10h

0x1FE69D6+ CMPedx,1

0x1FE69D9+ +JNZloc_1FE6A0C

Page 7: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

originalclone7

Page 8: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

Obfuscation and Optimization - Challenges

8

Page 9: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

Obfuscation and Optimization - Problems

•  P1:Therelationshipsamongassemblytokens•  xmm0(SSE)registervs.SSEoperationssuchasmovaps•  fclosevs.fopen.•  strcpyvs.memcpy.

•  P2:Tokencombinationweights•  Reverseengineerslookfor‘interestingpattern’.(higherweight)•  Regular,random,orrepeatedpatternisnotinteresting.(lowerweight)

•  SoundsofamiliarinNLP!

9

Page 10: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

Learning English

1)Thecat____onthemat.

A:foodB:satC:sittingD:isspeaking

10

Page 11: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

Paragraph Vector (p2vec):

11

king–man+woman=queenbad-good=maniacal_killer*

* ExamplecollectedfromAndreasMueller@amuellerml

Page 12: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

Asm2Vec:

12

Page 13: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

T-SNE Visualization

13

Page 14: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

T-SNE Visualization

14

Page 15: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

Evaluation (Quantitative)

15

Page 16: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

Evaluation (Quantitative)

16

Page 17: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

Evaluation (Case Studies)

17

Vulnerability retrieval

Page 18: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

Evaluation (Case Studies)

18

Page 19: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

Asm2Vec (IEEE S&P19) +Againstobfuscationandoptimization.+Evenbetterthanthemostrecentdynamicapproach.+Staticapproach:efficientandscalable.-  Binarydiffering(interpretability?)-  Staticapproach:cannotrecognizejumptable,etc.-Assemblycodecomefromthesameprocessorfamily.

19

Page 20: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

TheKam1n02.xBinaryAnalysisPlatform

20

Page 21: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

Subgraphclone

21

Page 22: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

Sym1n0

22

Page 23: Asm2Vec: Boosting Static Representation …...Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization Steven H.

Thank you. Questions?