MutantX-S: Scalable Malware Clustering Based on Static Features Xin Hu, IBM T.J. Watson Research...
-
Upload
stuart-harrison -
Category
Documents
-
view
222 -
download
1
Transcript of MutantX-S: Scalable Malware Clustering Based on Static Features Xin Hu, IBM T.J. Watson Research...
MutantX-S: Scalable Malware Clustering Based on Static Features
Xin Hu, IBM T.J. Watson Research Center; Sandeep Bhatkar and Kent Griffin, Symantec Research Labs; Kang G. Shin, University of Michigan
2015. 04. 21
박 종 화[email protected]
컴퓨터 보안 및 운영체제 연구실
Computer Security & OS Lab.
IndexIndex
2
MotivationArchitectureGeneric Unpacking AlgorithmFeature ExtractionPrototype-based clusteringEvaluation
Computer Security & OS Lab.
MotivationMotivation
3
Why clustering malware? The current lack of automatic and labeling
of a large number of malware sample
Computer Security & OS Lab.
MotivationMotivation
4
How to efficiently process this huge influx of new samples and accurately labels them?
Family 1Family 2
Family 3Family 4
One possible solution is to automatically cluster malware sample Prioritize limited resources Avoid analyzing samples that have already been analyzed Label new incoming samples by association Generalized previous detection and mitigation strategies
to new variants
Computer Security & OS Lab.
ArchitecureArchitecure
5
MutantX-S is framework developed to automatically detect malware. Does by analyzing a program’s static features(assembly code)
Process1. Preprocess2. Feature Extraction3. Clustering
Computer Security & OS Lab.
Generic Unpacking AlgorithmGeneric Unpacking Algorithm
6
Exploits an inherent property of unpacking process A packed binary has to write the unpacked code into some memory
space and transfer control to the modified memory locations to continue execution.
Tracks memory access via non-execution(NX) support in modern x86 CPU and OS
Packed malware
Computer Security & OS Lab.
Generic Unpacking AlgorithmGeneric Unpacking Algorithm
7
Exploits an inherent property of unpacking process A packed binary has to write the unpacked code into some memory
space and transfer control to the modified memory locations to continue execution.
Tracks memory access via non-execution(NX) support in modern x86 CPU and OS
Packed malware
W = 0X = 1
W = 0X = 1
W = 0X = 1
Packed data
Unpacker code
Memory pages
Process Memory
Executable but non-writable
loads
Computer Security & OS Lab.
Generic Unpacking AlgorithmGeneric Unpacking Algorithm
8
Exploits an inherent property of unpacking process A packed binary has to write the unpacked code into some memory
space and transfer control to the modified memory locations to continue execution.
Tracks memory access via non-execution(NX) support in modern x86 CPU and OS
Packed malware
W = 0X = 1
W = 0X = 1
W = 0X = 1
Packed data
Unpacker code
Memory pages
Process Memory
Executable but non-writable
Memory write
W Exception
Computer Security & OS Lab.
Generic Unpacking AlgorithmGeneric Unpacking Algorithm
9
Exploits an inherent property of unpacking process A packed binary has to write the unpacked code into some memory
space and transfer control to the modified memory locations to continue execution.
Tracks memory access via non-execution(NX) support in modern x86 CPU and OS
Packed malware
W = 1X = 0
W = 0X = 1
W = 0X = 1
Packed data
Unpacker code
Memory pages
Process Memory
Memory write
Dirty page marking
Computer Security & OS Lab.
Generic Unpacking AlgorithmGeneric Unpacking Algorithm
10
Exploits an inherent property of unpacking process A packed binary has to write the unpacked code into some memory
space and transfer control to the modified memory locations to continue execution.
Tracks memory access via non-execution(NX) support in modern x86 CPU and OS
Packed malware
W = 1X = 0
W = 1X = 0
W = 1X = 0
Packed data
Unpacker code
Memory pages
Process Memory
Finish unpacking
X Exception
Dump the process memory image
unpackedmalware
disassembler
Computer Security & OS Lab.
Feature ExtractionFeature Extraction
11
MutantX-S uses the IDA Pro to disassemble a malware program into a sequence of machine instructions that are then used for feature extraction.
Similarity comparison between malware samples based on the disassembled instruction sequences.
MutantX-S uses the opcode Opcodes generalize well to represent variants of a malware family. Opcode sequence offers a better representation of instruction semantics.
Computer Security & OS Lab.
Feature ExtractionFeature Extraction
12
N-gram analysis to embedded features into feature vectors- The number of dimensions D determines the complexity- D increases exponentially with N in N-gram( where |O| is
the number of different opcodes) Hashing kernel
Reduce dimensionality of the feature vector Save both storage and computation overhead Incur only small penalty on the feature vector distance
Computer Security & OS Lab.
Prototype-Based ClusteringPrototype-Based Clustering
13
The process repeats until the distance from all the data points to their nearest prototype is smaller than a predefined threshold Pmax .
Computer Security & OS Lab.
EvaluationEvaluation
14
Data set Reference data set : 4821 samples
Large data set : 132,234 samples System configuration
Core i7 3.0G Hz CPU 12 G memory
Computer Security & OS Lab.
Clustering Accuracy and Running TimeClustering Accuracy and Running Time
15
Comparing with existing cluster methods: MutantX : less than 30s, Hierarchical: 51.3(precision 0.82),
k-mean: 32.3s(precision 0.75)
Computer Security & OS Lab.
Impact of Hash SizeImpact of Hash Size
16
In practice, a 12-bit hash function is found to be a good compromise, reducingthe time and memory requirements by over 80% while still keeping good accuracy.
Computer Security & OS Lab.
ReferencesReferences
17
N-gram-based Detection of New Malicious Code. Tony Abou-Assaleh, Nick Cercone, Vlado Keˇselj, Ray Sweidan Privacy and Security Laboratory, Faculty of Computer Science, Dalhousie University
http://www.av-test.org/en/ https://
public.gdatasoftware.com/Presse/Publikationen/Malware_Reports/GData_PCMWR_H1_2014_EN_v2.pdf
http://endic.naver.com/ www.Wikipedia.org Etc.
Computer Security & OS Lab.18
Thank You !