Mining Specifications of Malicious Behavior Mihai Christodorescu (work done at University of...

Post on 29-Jan-2016

223 views 0 download

Tags:

Transcript of Mining Specifications of Malicious Behavior Mihai Christodorescu (work done at University of...

Mining Specifications of Mining Specifications of Malicious BehaviorMalicious Behavior

Mihai Christodorescu (work done atUniversity of Wisconsin)

Somesh Jha University of Wisconsin

Christopher Kruegel Technical University Vienna

IBM Research

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 2

Wide Spectrum of DetectorsWide Spectrum of Detectors

• Static detectors:

Semantics-aware malware detection [Christodorescu et al. 2005]

Model checking-based malware detection[Kinder et al. 2005]

Behavior-based spyware detection[Kirda et al. 2006]

Shadow Honeypots [Anagnostakis et al. 2005]

•Dynamic/hybrid detectors, host IDS:

……

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 3

Misuse DetectionMisuse Detection

Distinct techniques, fundamentally similar…

They all require high-quality specifications of malicious behavior.

……

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 4

Current Specification Current Specification GenerationGeneration• Specifications are manually developed

by experts.

Two issues:– Time consuming– Error prone

? Spec

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 5

Problem 1: Problem 1: Specification Specification DelayDelayTime from appearance of new malware

to availability of specification:• Manual analysis of the binary• Testing of the specification

Anti-virus industry: 4-18 hours to generate a new specification.

window of vulnerability

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 6

Problem 2: Problem 2: Spec. ImprecisionSpec. Imprecision

Too general = false positives Angry users

Infected machinesToo specific = false negatives

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 7

Our SolutionOur Solution

MINIMAL: a technique for mining malicious specifications

• Automatic• Flexible specification language• Fast• Performs well (compared to a human

expert)

Specification LanguageSpecification Language

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 9

What’s In a Specification?What’s In a Specification?

Requirements for obfuscationresilience:

1. Describe only relevant operations

2. Capture dependencies where present

3. Preserve independence of operations

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 10

Specifying Malicious Specifying Malicious OperationsOperations• We chose system calls

– Compatible with specifications for behavior-based detectors

– Define interface between trusted OS and untrusted programs

• Mining algorithm is not restricted to the system-call interface.

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 11

Specifying Malicious Specifying Malicious ConstraintsConstraints• Program operations are insufficient to

distinguish malicious from benign.

• We need to capture relations between operations:

F=open(“file”) ; read(F,buf) ; send(S,buf)

Constraints = logical formulas over system-call arguments

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 12

A Sample Specification A Sample Specification (Malspec)(Malspec)• Mass-mailing malware:

Y))T,Base64(l(StringEqua

send(X4,“DATA”)

X:=socket()

connect(X2)

send(X3,“EHLO”)

Y:=read(Z2)

send(X5,T)

Z:=open(S2)

S:=process_name()

2XX

32 XX

43 XX

54 XX

2SS

2ZZ

Mining AlgorithmMining Algorithm

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 14

The Specification Mining The Specification Mining ProblemProblem

Y))T,Base64(l(StringEqua

send(X4,“DATA”)

X:=socket()

connect(X2)

send(X3,“EHLO”)

Y:=read(Z2)

send(X5,T)

Z:=open(S2)

S:=process_name()

2XX

32 XX

43 XX

54 XX

2SS

2ZZ

Known malware

Known benign programsMINIMAL

Specification of malicious behavior

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 15

The Basic Mining OperationThe Basic Mining Operation

Known malware

Known benign program

C Q A

A O O O Q C

Q M C P

O F

O

F M C

A

Malware dependence graph

O O A A B O

R R R M M A A

R RC C R

A W

O W

C C

Benign dependence graph

O O O Q

M C P

Minimalmalspec

Compute dependence graphsStep 1 Compute graph differenceStep 2

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 16

Multi-Program MiningMulti-Program Mining

Maximal union of malspecs:

O O O Q

M C Pvs.

O Q

P

O O

M C

O O O Q

M C P

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 17

System-Call Dependence System-Call Dependence GraphGraph• We use a dynamic analysis to construct

the dependence graph– Static analysis too imprecise on binary

code

• Steps:1. Collect system-call trace2. Infer dependencies between system calls3. Construct (an underapproximation of the)

dependence graph

Step 1

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 18

Discovering DependencesDiscovering Dependences

NtOpenKey( 372, 0x20019, {24, 356, "ActiveComputerName", 0x40, 0, 0} )

NtQueryValueKey( 372, "ComputerName", Full, {TitleIdx=0, Type=1,Name="ComputerName", Data="Z...“}, 108, 76 )

NtClose( 372 )

Def-UseDependenc

es

SubstringDependenc

es

Step 1

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 19

Discovering Local ConstraintsDiscovering Local Constraints

• Access to well-defined resources:– Windows registry– Access to self– System files/directories

Step 1

NtCreateFile ( …, { …, "I-Worm.Mydoom.l.exe", … }, … )

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 20

Graph DifferencingGraph Differencing

Problem:Find the smallest subgraph of malicious operations that does not appear in any benign graph.

Solution:Minimal Contrast Subgraph

[Ting, Bailey SDM 2006]

Step 2

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 21

Minimal Contrast SubgraphsMinimal Contrast Subgraphs

• Idea:Minimal contrast subgraphs andmaximal common edge setsare duals.

• Vertex and edge labels (i.e., system calls and constraint formulas) help the search.

Step 2

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 22

Mining Contrast SubgraphsMining Contrast Subgraphs

• Size of graphs:100K-1.5M nodes, similar for edges

• Worst-case complexity: O(N!)

Step 2

C Q A

A O O O Q C

Q M C P

O F

O

F M C

AMalware dependence graph

O O A A B O

R R R M M A A

R RC C R

A W

O W

C C

Benign dependence graph

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 23

Heuristics Reduce Problem Heuristics Reduce Problem SizeSize• Normalize dependence graph

– Replace system-call sequences with shorter equivalents

• Eliminate disconnected subgraphs

• Eliminate trivial subgraphs

[see paper for details]

EvaluationEvaluation

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 25

Evaluating Evaluating MMINIINIMMALAL

• Goals:– Compare MINIMAL malspecs with those

from human expert– Use mined malspecs with behavior-based

detector

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 26

Experimental SetupExperimental Setup

• Trace collection in Windows 2000:– Malware samples run with no user input

(cf. expected execution model)– Benign samples run with normal user input– Execution for 1 or 2 minutes

• 16 malware samples:– Netsky, MyDoom, Beagle

• 6 benign programs:– Firefox, Thunderbird, installers

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 27

MMINIINIMMALAL vs. Human Expert vs. Human Expert

Average success rate: 77.26%Average mining time: 8 minutes

MMINIINIMMALAL malspecs

Netsky.A 5 7

Netsky.H 5 6

Bagle.C 7 12

MyDoom.E

7 10

Behavioral features as

given by Symantec website.

Behavioral features as

given by Symantec website.

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 28

Mined Malspecs for Netsky.AMined Malspecs for Netsky.A

MMINIINIMMALAL malspecs

Create mutex

Self-installation

Modify boot sequence

Terminate antivirus

Email self as ZIP file

Copy self to network drive

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 29

Limitations of Limitations of MMINIINIMMALAL

• Sensitive to test environment– Malicious behavior might not be observed

during tracing.

• Underapproximation of dependence graph– Complex constraints are not discovered.

• Sensitive to test-set selection– Not all differences are malicious behaviors.

Future WorkFuture Work

Mining Specifications of Mining Specifications of Malicious BehaviorMalicious Behavior

Mihai Christodorescumihai@cs.wisc.edu

Here Be Dragons!Here Be Dragons!

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 32

Mining Malspecs from Mining Malspecs from MalwareMalwareDynamic differential analysis

Malware system-call trace

Benign program system-call trace

Malspec:

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 33

Specifying Malicious BehaviorSpecifying Malicious Behavior

• Example• Now manual• NEED: automatic

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 34

A C Q A O O O Q C F O Q O M C P F O M C

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 35

1.1. Only Relevant OperationsOnly Relevant Operations

• System calls

Approach: Collect system-call traces.

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 36

2.2. DependencesDependences

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 37

3.3. Independence of Independence of OperationsOperations

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 38

Good SpecificationsGood Specifications

• One can write specifications satisfying the requirements.

• The algorithm to generate specifications must write specifications satisfying the requirements.

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 39

Misuse-Detection Misuse-Detection FundamentalsFundamentals

• Database of specifications (“signatures”) defines what is malicious.

Protected computerProtected computerProtected computer

Malware detector

Protected computerUnknown executable Protected computer

SigDB

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 40

• Byte signatures are not rich enough to capture malicious behavior.

Hackers evade detection through obfuscation.

Failures of Current DetectorsFailures of Current Detectors

All mass-mailing worms

Underapproximating byte signature

Overapproximating byte signature

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 41

Behavior-Based DetectorsBehavior-Based Detectors

• New detection techniques use higher level specificationsSemantics-Aware Malware Detection (2005)Model Checking-based Malware Detection (2005)Behavior-based Spyware Detection (2006)

These still depend on the quality of the specifications!

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 42

MMINIINIMMALAL: Malware-Mining : Malware-Mining AlgorithmAlgorithm

Known malware

NtOpenKey( 372, 0x20019, {24, 356, "ActiveComputerName",

0x40, 0, 0} )

Monitor system calls during Monitor system calls during executionexecution

System-call traceA C Q A O O O Q C F O Q O M C P F O M C

Step 1

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 43

MMINIINIMMALAL: Malware-Mining : Malware-Mining AlgorithmAlgorithm

Discover dependences between Discover dependences between syscallssyscalls

System-call traceA C Q A O O O Q C F O Q O M C P F O M C

Data dependence

graph

C Q A

A O O O Q C

Q M C P

O F

O

F M C

A

Step 2

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 44

MMINIINIMMALAL: Malware-Mining : Malware-Mining AlgorithmAlgorithm

Find malicious–benign graph Find malicious–benign graph differencedifference

C Q A

A O O O Q C

Q M C P

O F

O

F M C

A

Malspec

Malware dependence graph

O O O Q

M C P

O O A A B O

R R R M M A A

R RC C R

A W

O W

C C

Benign dependence graph

Step 3

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 45

Dependence Graph Dependence Graph NormalizationNormalization• Many sequences of system calls are

equivalent:

Aggregation replaces such sequences with a canonical sequence. (see paper for details)

Step 2

read( …, 1 )read( …, 1 )read( …, 1 )read( …, 1 )read( …, 1 )

read( …, 5 )≡

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 46

MMINIINIMMALAL Specs in Detection Specs in Detection

• Missed malspecs:– Due to unrecovered dependences

(e.g., ZIP compression of data)– Due to incompleteness of trace data

(e.g., certain behaviors did not execute)

• Using mined malspecs in semantics-aware malware detection:Netsky.A malspec Netsky.D, E, F, …

ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 47

ConclusionsConclusions

• The mining malicious of behavior can be automated

• Mined malspecs compare well with those from human experts

• Mining time significantly reduced over manual specification