DEEPSEC 2013: Malware Datamining And Attribution

43
Malware Attribution Theory, Code and Result

description

Greg Hoglund explained at BlackHat 2010 that the development environments that malware authors use leaves traces in the code which can be used to attribute malware to a individual or a group of individuals. Not with the precision of name, date of birth and address but with evidence that a arrested suspects computer can be analysed and compared with the "tool marks" on the collected malware sample.

Transcript of DEEPSEC 2013: Malware Datamining And Attribution

Page 1: DEEPSEC 2013: Malware Datamining And Attribution

Malware AttributionTheory, Code and Result

Page 2: DEEPSEC 2013: Malware Datamining And Attribution

Who am I?

• Michael Boman, M.A.R.T. project

• Have been “playing around” with malware analysis “for a while”

• Working for FireEye

• This is a HOBBY project that I use my SPARE TIME to work on

Page 3: DEEPSEC 2013: Malware Datamining And Attribution

Agenda

Theorybehind Malware Attribution

Codeto conduct Malware Attribution analysis

Resultof analysis

Page 4: DEEPSEC 2013: Malware Datamining And Attribution

Theory

Page 5: DEEPSEC 2013: Malware Datamining And Attribution

• Malware Attribution: tracking cyber spies - Greg Hoglund, Blackhat 2010

http://www.youtube.com/watch?v=k4Ry1trQhDk

Page 6: DEEPSEC 2013: Malware Datamining And Attribution

What am I trying to do?

Binary Human

Move this way

Page 7: DEEPSEC 2013: Malware Datamining And Attribution

What am I trying to do?

Binary Human

BlacklistsNet ReconCommand

and Control

Developer Fingerprints

TacticsTechniquesProcedures

Social Cyberspace

DIGINT

Physical Surveillance HUMINT

Page 8: DEEPSEC 2013: Malware Datamining And Attribution

What am I trying to do?

Binary Human

BlacklistsNet ReconCommand

and Control

Developer Fingerprints

TacticsTechniquesProcedures

Social Cyberspace

DIGINT

Physical Surveillance HUMINT

Page 9: DEEPSEC 2013: Malware Datamining And Attribution

BlacklistsNet ReconCommand

and Control

Developer Fingerprints

TacticsTechniquesProcedures

Social Cyberspace

DIGINT

Physical Surveillance HUMINT

Page 10: DEEPSEC 2013: Malware Datamining And Attribution

BlacklistsNet ReconCommand

and Control

Developer Fingerprints

TacticsTechniquesProcedures

Social Cyberspace

DIGINT

Physical Surveillance HUMINT

Actions / Intent

Installation / Deploym

ent

CN

A (spreader) / C

NE (search &

exfil tool)

CO

MS

Defensive / A

nti-forensic

Exploit

Shellcode

DN

S, Com

mand and C

ontrol Protocol,

Encryption

Page 11: DEEPSEC 2013: Malware Datamining And Attribution

BlacklistsNet ReconCommand

and Control

Developer Fingerprints

TacticsTechniquesProcedures

Social Cyberspace

DIGINT

Physical Surveillance HUMINT

Actions / Intent

Installation / Deploym

ent

CN

A (spreader) / C

NE (search &

exfil tool)

CO

MS

Defensive / A

nti-forensic

Exploit

Shellcode

DN

S, Com

mand and C

ontrol Protocol,

Encryption

Page 12: DEEPSEC 2013: Malware Datamining And Attribution

Steps

• Step 0: Gather malware

• Step 1: Extract metadata from binary

• Step 2: Store metadata and binary in MongoDB

• Step 3: Analyze collected data

Page 14: DEEPSEC 2013: Malware Datamining And Attribution

Step 1: Extract metadata from binary

Page 15: DEEPSEC 2013: Malware Datamining And Attribution

Development Steps

Core “backbone” sourcecode

Tweaks & Mods

3rd party sourcecode

3rd party libraries

Compiler

Runtime libraries

Time

Paths

MAC Address

Malware

Packing

Machine Binary

Source

Page 16: DEEPSEC 2013: Malware Datamining And Attribution

Development Steps

Core “backbone” sourcecode

Tweaks & Mods

3rd party sourcecode

3rd party libraries

Compiler

Runtime libraries

Time

Paths

MAC Address

Malware

Packing

Machine Binary

Source

Page 17: DEEPSEC 2013: Malware Datamining And Attribution

Development Steps

Core “backbone” sourcecode

Tweaks & Mods

3rd party sourcecode

3rd party libraries

Compiler

Runtime libraries

Time

Paths

MAC Address

Malware

Packing

Machine Binary

Source

Page 18: DEEPSEC 2013: Malware Datamining And Attribution

Step 1: Extract metadata from binary• Hashes (for sample identification)

• md5, sha1, sha256, sha512, ssdeep etc.

• File type / Exif / PEiD

• Compiler / Packer etc.

• PE Headers / Imports / Exports etc.

• Virustotal results

• Tags

Page 19: DEEPSEC 2013: Malware Datamining And Attribution

Identifyingcompiler / packer

• PEiD

• Python

• peutils.SignatureDatabase().match_all()

Page 20: DEEPSEC 2013: Malware Datamining And Attribution

PE Header information

Page 21: DEEPSEC 2013: Malware Datamining And Attribution

VirusTotal Results

Page 22: DEEPSEC 2013: Malware Datamining And Attribution

Tags

• User-supplied tags to identify sample source and behavior

• analyst / analyst-system supplied

Page 23: DEEPSEC 2013: Malware Datamining And Attribution

Step 2: Store metadata and binary in MongoDB

Page 24: DEEPSEC 2013: Malware Datamining And Attribution

Components• Modified VXCage server

• Collects a lot more metadata then the original

• Stores malware & metadata in MongoDB instead of FS / ORDBMS

Page 25: DEEPSEC 2013: Malware Datamining And Attribution

VXCage REST API• /malware/add

• Add sample

• /malware/get/<filehash>

• Download sample. If no local sample, search other repos

• /malware/find

• Search for sample by md5, sha256, ssdeep, tag, date

• /tags/list

• List tags

Page 26: DEEPSEC 2013: Malware Datamining And Attribution

Step 3: Analyze collected data

Page 27: DEEPSEC 2013: Malware Datamining And Attribution

Identifying development environments

• Compiler / Linker / Libraries

• Strings

• Paths

• PE Translation header

• Compile times

• Number of times a software been built

Page 28: DEEPSEC 2013: Malware Datamining And Attribution

Cataloging behaviors

• Packers

• Encryption

• Anti-debugging

• Anti-VM

• Anti-forensics

Page 29: DEEPSEC 2013: Malware Datamining And Attribution

Result

Page 30: DEEPSEC 2013: Malware Datamining And Attribution

Have I seen you before?

• Detects similar malware (based on SSDEEP fuzzy hashing)

Page 31: DEEPSEC 2013: Malware Datamining And Attribution

Different MD5,100% SSDeep match

Page 32: DEEPSEC 2013: Malware Datamining And Attribution

SSDEEP Analysis (3007)

Page 33: DEEPSEC 2013: Malware Datamining And Attribution

SSDEEP Analysis (3007)

Page 34: DEEPSEC 2013: Malware Datamining And Attribution

SSDEEP Analysis (851)

Page 35: DEEPSEC 2013: Malware Datamining And Attribution

Challanges

• Party handshake problem:

• 707k samples analyzed and counting (resulting in over 250 billion compares!)

• Need a better target (pre-)selection

Page 36: DEEPSEC 2013: Malware Datamining And Attribution

What compilers / packers are common?

1. "Borland Delphi 3.0 (???)", 54298

2. "Microsoft Visual C++ v6.0", 33364

3. "Microsoft Visual C++ 8", 28005

4. "Microsoft Visual Basic v5.0 - v6.0", 26573

5. "UPX v0.80 - v0.84", 22353

Page 37: DEEPSEC 2013: Malware Datamining And Attribution

Are there any unidentified packers?

• How to identify a packer

• PE Section is empty in binary, is writable and executable

Page 38: DEEPSEC 2013: Malware Datamining And Attribution

How common are anti-debugging techniques?

• 31622 out of 531182 PE binaries uses IsDebuggerPresent (6 %)

• Packed executable uncounted

Page 39: DEEPSEC 2013: Malware Datamining And Attribution

Analysis Coverage

Core “backbone” sourcecode

Tweaks & Mods

3rd party sourcecode

3rd party libraries

Compiler

Runtime libraries

Time

Paths

MAC Address

Malware

Packing

Machine Binary

Source

Page 40: DEEPSEC 2013: Malware Datamining And Attribution

Future

Page 41: DEEPSEC 2013: Malware Datamining And Attribution

What am I trying to do in the future

Binary Human

BlacklistsNet ReconCommand

and Control

Developer Fingerprints

TacticsTechniquesProcedures

Social Cyberspace

DIGINT

Physical Surveillance HUMINT

Expand scope of analysis+network +memory +os changes +behavior

Page 42: DEEPSEC 2013: Malware Datamining And Attribution

What am I trying to do in the future

• More automation

• More modular design

• Solve the “Big Data” issue I am getting myself into (Hadoop?)

• More pretty graphs

Page 43: DEEPSEC 2013: Malware Datamining And Attribution

Thank you

• Michael Boman

[email protected]

• @mboman

• http://blog.michaelboman.org

• Code available at https://github.com/mboman/vxcage