Post on 29-Jan-2016
Mining Specifications of Mining Specifications of Malicious BehaviorMalicious Behavior
Mihai Christodorescu (work done atUniversity of Wisconsin)
Somesh Jha University of Wisconsin
Christopher Kruegel Technical University Vienna
IBM Research
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 2
Wide Spectrum of DetectorsWide Spectrum of Detectors
• Static detectors:
Semantics-aware malware detection [Christodorescu et al. 2005]
Model checking-based malware detection[Kinder et al. 2005]
Behavior-based spyware detection[Kirda et al. 2006]
Shadow Honeypots [Anagnostakis et al. 2005]
•Dynamic/hybrid detectors, host IDS:
……
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 3
Misuse DetectionMisuse Detection
Distinct techniques, fundamentally similar…
They all require high-quality specifications of malicious behavior.
……
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 4
Current Specification Current Specification GenerationGeneration• Specifications are manually developed
by experts.
Two issues:– Time consuming– Error prone
? Spec
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 5
Problem 1: Problem 1: Specification Specification DelayDelayTime from appearance of new malware
to availability of specification:• Manual analysis of the binary• Testing of the specification
Anti-virus industry: 4-18 hours to generate a new specification.
window of vulnerability
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 6
Problem 2: Problem 2: Spec. ImprecisionSpec. Imprecision
Too general = false positives Angry users
Infected machinesToo specific = false negatives
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 7
Our SolutionOur Solution
MINIMAL: a technique for mining malicious specifications
• Automatic• Flexible specification language• Fast• Performs well (compared to a human
expert)
Specification LanguageSpecification Language
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 9
What’s In a Specification?What’s In a Specification?
Requirements for obfuscationresilience:
1. Describe only relevant operations
2. Capture dependencies where present
3. Preserve independence of operations
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 10
Specifying Malicious Specifying Malicious OperationsOperations• We chose system calls
– Compatible with specifications for behavior-based detectors
– Define interface between trusted OS and untrusted programs
• Mining algorithm is not restricted to the system-call interface.
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 11
Specifying Malicious Specifying Malicious ConstraintsConstraints• Program operations are insufficient to
distinguish malicious from benign.
• We need to capture relations between operations:
F=open(“file”) ; read(F,buf) ; send(S,buf)
Constraints = logical formulas over system-call arguments
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 12
A Sample Specification A Sample Specification (Malspec)(Malspec)• Mass-mailing malware:
Y))T,Base64(l(StringEqua
send(X4,“DATA”)
X:=socket()
connect(X2)
send(X3,“EHLO”)
Y:=read(Z2)
send(X5,T)
Z:=open(S2)
S:=process_name()
2XX
32 XX
43 XX
54 XX
2SS
2ZZ
Mining AlgorithmMining Algorithm
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 14
The Specification Mining The Specification Mining ProblemProblem
Y))T,Base64(l(StringEqua
send(X4,“DATA”)
X:=socket()
connect(X2)
send(X3,“EHLO”)
Y:=read(Z2)
send(X5,T)
Z:=open(S2)
S:=process_name()
2XX
32 XX
43 XX
54 XX
2SS
2ZZ
Known malware
Known benign programsMINIMAL
Specification of malicious behavior
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 15
The Basic Mining OperationThe Basic Mining Operation
Known malware
Known benign program
C Q A
A O O O Q C
Q M C P
O F
O
F M C
A
Malware dependence graph
O O A A B O
R R R M M A A
R RC C R
A W
O W
C C
Benign dependence graph
O O O Q
M C P
Minimalmalspec
Compute dependence graphsStep 1 Compute graph differenceStep 2
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 16
Multi-Program MiningMulti-Program Mining
Maximal union of malspecs:
O O O Q
M C Pvs.
O Q
P
O O
M C
O O O Q
M C P
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 17
System-Call Dependence System-Call Dependence GraphGraph• We use a dynamic analysis to construct
the dependence graph– Static analysis too imprecise on binary
code
• Steps:1. Collect system-call trace2. Infer dependencies between system calls3. Construct (an underapproximation of the)
dependence graph
Step 1
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 18
Discovering DependencesDiscovering Dependences
NtOpenKey( 372, 0x20019, {24, 356, "ActiveComputerName", 0x40, 0, 0} )
NtQueryValueKey( 372, "ComputerName", Full, {TitleIdx=0, Type=1,Name="ComputerName", Data="Z...“}, 108, 76 )
NtClose( 372 )
Def-UseDependenc
es
SubstringDependenc
es
Step 1
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 19
Discovering Local ConstraintsDiscovering Local Constraints
• Access to well-defined resources:– Windows registry– Access to self– System files/directories
Step 1
NtCreateFile ( …, { …, "I-Worm.Mydoom.l.exe", … }, … )
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 20
Graph DifferencingGraph Differencing
Problem:Find the smallest subgraph of malicious operations that does not appear in any benign graph.
Solution:Minimal Contrast Subgraph
[Ting, Bailey SDM 2006]
Step 2
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 21
Minimal Contrast SubgraphsMinimal Contrast Subgraphs
• Idea:Minimal contrast subgraphs andmaximal common edge setsare duals.
• Vertex and edge labels (i.e., system calls and constraint formulas) help the search.
Step 2
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 22
Mining Contrast SubgraphsMining Contrast Subgraphs
• Size of graphs:100K-1.5M nodes, similar for edges
• Worst-case complexity: O(N!)
Step 2
C Q A
A O O O Q C
Q M C P
O F
O
F M C
AMalware dependence graph
O O A A B O
R R R M M A A
R RC C R
A W
O W
C C
Benign dependence graph
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 23
Heuristics Reduce Problem Heuristics Reduce Problem SizeSize• Normalize dependence graph
– Replace system-call sequences with shorter equivalents
• Eliminate disconnected subgraphs
• Eliminate trivial subgraphs
[see paper for details]
EvaluationEvaluation
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 25
Evaluating Evaluating MMINIINIMMALAL
• Goals:– Compare MINIMAL malspecs with those
from human expert– Use mined malspecs with behavior-based
detector
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 26
Experimental SetupExperimental Setup
• Trace collection in Windows 2000:– Malware samples run with no user input
(cf. expected execution model)– Benign samples run with normal user input– Execution for 1 or 2 minutes
• 16 malware samples:– Netsky, MyDoom, Beagle
• 6 benign programs:– Firefox, Thunderbird, installers
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 27
MMINIINIMMALAL vs. Human Expert vs. Human Expert
Average success rate: 77.26%Average mining time: 8 minutes
MMINIINIMMALAL malspecs
Netsky.A 5 7
Netsky.H 5 6
Bagle.C 7 12
MyDoom.E
7 10
Behavioral features as
given by Symantec website.
Behavioral features as
given by Symantec website.
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 28
Mined Malspecs for Netsky.AMined Malspecs for Netsky.A
MMINIINIMMALAL malspecs
Create mutex
Self-installation
Modify boot sequence
Terminate antivirus
Email self as ZIP file
Copy self to network drive
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 29
Limitations of Limitations of MMINIINIMMALAL
• Sensitive to test environment– Malicious behavior might not be observed
during tracing.
• Underapproximation of dependence graph– Complex constraints are not discovered.
• Sensitive to test-set selection– Not all differences are malicious behaviors.
Future WorkFuture Work
Mining Specifications of Mining Specifications of Malicious BehaviorMalicious Behavior
Mihai Christodorescumihai@cs.wisc.edu
Here Be Dragons!Here Be Dragons!
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 32
Mining Malspecs from Mining Malspecs from MalwareMalwareDynamic differential analysis
Malware system-call trace
Benign program system-call trace
Malspec:
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 33
Specifying Malicious BehaviorSpecifying Malicious Behavior
• Example• Now manual• NEED: automatic
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 34
A C Q A O O O Q C F O Q O M C P F O M C
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 35
1.1. Only Relevant OperationsOnly Relevant Operations
• System calls
Approach: Collect system-call traces.
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 36
2.2. DependencesDependences
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 37
3.3. Independence of Independence of OperationsOperations
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 38
Good SpecificationsGood Specifications
• One can write specifications satisfying the requirements.
• The algorithm to generate specifications must write specifications satisfying the requirements.
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 39
Misuse-Detection Misuse-Detection FundamentalsFundamentals
• Database of specifications (“signatures”) defines what is malicious.
Protected computerProtected computerProtected computer
Malware detector
Protected computerUnknown executable Protected computer
SigDB
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 40
• Byte signatures are not rich enough to capture malicious behavior.
Hackers evade detection through obfuscation.
Failures of Current DetectorsFailures of Current Detectors
All mass-mailing worms
Underapproximating byte signature
Overapproximating byte signature
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 41
Behavior-Based DetectorsBehavior-Based Detectors
• New detection techniques use higher level specificationsSemantics-Aware Malware Detection (2005)Model Checking-based Malware Detection (2005)Behavior-based Spyware Detection (2006)
These still depend on the quality of the specifications!
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 42
MMINIINIMMALAL: Malware-Mining : Malware-Mining AlgorithmAlgorithm
Known malware
NtOpenKey( 372, 0x20019, {24, 356, "ActiveComputerName",
0x40, 0, 0} )
Monitor system calls during Monitor system calls during executionexecution
System-call traceA C Q A O O O Q C F O Q O M C P F O M C
Step 1
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 43
MMINIINIMMALAL: Malware-Mining : Malware-Mining AlgorithmAlgorithm
Discover dependences between Discover dependences between syscallssyscalls
System-call traceA C Q A O O O Q C F O Q O M C P F O M C
Data dependence
graph
C Q A
A O O O Q C
Q M C P
O F
O
F M C
A
Step 2
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 44
MMINIINIMMALAL: Malware-Mining : Malware-Mining AlgorithmAlgorithm
Find malicious–benign graph Find malicious–benign graph differencedifference
C Q A
A O O O Q C
Q M C P
O F
O
F M C
A
Malspec
Malware dependence graph
O O O Q
M C P
O O A A B O
R R R M M A A
R RC C R
A W
O W
C C
Benign dependence graph
Step 3
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 45
Dependence Graph Dependence Graph NormalizationNormalization• Many sequences of system calls are
equivalent:
Aggregation replaces such sequences with a canonical sequence. (see paper for details)
Step 2
read( …, 1 )read( …, 1 )read( …, 1 )read( …, 1 )read( …, 1 )
read( …, 5 )≡
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 46
MMINIINIMMALAL Specs in Detection Specs in Detection
• Missed malspecs:– Due to unrecovered dependences
(e.g., ZIP compression of data)– Due to incompleteness of trace data
(e.g., certain behaviors did not execute)
• Using mined malspecs in semantics-aware malware detection:Netsky.A malspec Netsky.D, E, F, …
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 47
ConclusionsConclusions
• The mining malicious of behavior can be automated
• Mined malspecs compare well with those from human experts
• Mining time significantly reduced over manual specification