Fighting advanced malware using machine learning (English)
-
Upload
ffri-inc -
Category
Technology
-
view
787 -
download
1
description
Transcript of Fighting advanced malware using machine learning (English)
![Page 1: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/1.jpg)
FFRI, Inc.
Fourteenforty Research Institute, Inc.
FFRI, Inc. http://www.ffri.jp
@PacSec 2013
Fighting advanced malware
using machine learning
Junichi Murakami Director of Advanced Development
![Page 2: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/2.jpg)
FFRI, Inc.
• Security Researcher at FFRI, Inc
– Malware and vulnerabilities analysis with RCE
– Both Windows and Linux kernel development
• Speaker at
– BlackHat USA/JP, RSA, PacSec, AVAR, etc.
• *NOT* a MS/PhD degree in CS/Math
– Just a user of machine learning
Who am I ?
2
![Page 3: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/3.jpg)
FFRI, Inc.
The Data Science Venn Diagram (in security)
3
http://www.niemanlab.org/images/drew-conway-data-science-venn-diagram.jpg
![Page 4: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/4.jpg)
FFRI, Inc.
I am, and this talk is
4
http://www.niemanlab.org/images/drew-conway-data-science-venn-diagram.jpg
![Page 5: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/5.jpg)
FFRI, Inc.
• Background
• Our approach
• Future works
– Computers vs. Man
– Applying to real time protection
• Conclusion
Agenda
5
![Page 6: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/6.jpg)
FFRI, Inc.
Background – malware and its detection
6
Increasing
malware
Targeted Attack
(Unknown malware)
Generators
Obfuscators
Limitation of
pattern matching
other methods
Reputation
Heuristic
Machine learning
Bigdata Bigdata Behavioral
![Page 7: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/7.jpg)
FFRI, Inc.
• Evaluated 11 AV-product’s TRP using Metascan
• Used fresh malware (not wildlist malware)
• Prepared 2 test sets from different sources and period
– test-1: 1,000 samples
– test-2: 900 samples
Limitation of signature matching
7
Pattern
Reputation
Huer.
ML
Bhvr.
![Page 8: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/8.jpg)
FFRI, Inc.
Limitation of signature matching
8
0%
20%
40%
60%
80%
100%
test-
1te
st-
2
test-
1te
st-
2
test-
1te
st-
2
test-
1te
st-
2
test-
1te
st-
2
test-
1te
st-
2
test-
1te
st-
2
test-
1te
st-
2
test-
1te
st-
2
test-
1te
st-
2
test-
1te
st-
2
A B C D E F G H I J K
TPR FNR
Avg.
![Page 9: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/9.jpg)
FFRI, Inc.
• Concept is the same with signature-matching(Blacklisting)
• Endpoints don’t have to keep HEAVY patterns anymore
• Easy to reflect a new pattern to the others
Advantage of (cloud) reputation
9
1. query 2. check
3. response Pattern
Reputation
Huer.
ML
Bhvr.
![Page 10: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/10.jpg)
FFRI, Inc.
Disdvantage of (cloud) reputation
10
1. query 2. check
3. response
• “Detectable” means that someone is already attacked
• What if you are the first victim?
• How much effective against “Targeted Attack” it is?
Pattern
Reputation
Huer.
ML
Bhvr.
![Page 11: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/11.jpg)
FFRI, Inc.
• Based on pre-determined characteristics or behaviors
– OpenProcess -> WriteProcessMemory -> CreateRemoteThread
– Registering itself to auto start extensibility points
– Disabling Windows Firewall, etc.
• Providing generic logics to detect malware
• Signature-free
(eliminate regular scanning and update)
Advantage of heuristic/behavioral detection
11
Pattern
Reputation
Huer.
ML
Bhvr.
![Page 12: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/12.jpg)
FFRI, Inc.
• Difficult to avoid false positives completely
• Or ask user to determine if an action would be allow or not
(User dependent)
• Have to analyze malware and update logics continuously
Disadvantage of heuristic/behavioral detection
12
(Not human task, more suitable for computers) Pattern
Reputation
Huer.
ML
Bhvr.
![Page 13: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/13.jpg)
FFRI, Inc.
Heuristic Behavioral Test - July 2012
13
http://chart.av-comparatives.org/chart1.php
• AV-comparatives publishes H/B Test results since 2012
• The behavioral hardly contributes to detect (avg: 4.8%)
![Page 14: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/14.jpg)
FFRI, Inc.
• Behavioral-based detection powered by machine learning
• Not new but more practical for the industry
• Easy to try and automate using open source below
– Cuckoo Sandbox
– Jubatus
Our approach
14
Pattern
Reputation
Huer.
ML
Bhvr.
![Page 15: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/15.jpg)
FFRI, Inc.
Machine learning-based detection
15
Features
Static information
Dynamic information
Hybrid
Algorithms
SVM
Naive bayes
Perceptron, etc.
Evaluation
TPR/FRP, etc.
ROC-curve, etc.
Accuracy, Precision
• Most research is doing in academic
• Basically, it is a classification problem (task)
• Mainly focusing on a combination of the factors below
• Some good results are reported
Pattern
Reputation
Huer.
ML
Bhvr.
![Page 16: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/16.jpg)
FFRI, Inc.
Overview
16
Dynamic analysis
train
Cuckoo
Sandbox
Jubatus
training set
feature
selection
convert to
Feature
Vector
test
Jubatus
test set
TPR: X%
FPR: Y%
original data set malware
benign
![Page 17: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/17.jpg)
FFRI, Inc.
• It is a “Confidence Interval” theory
• It depends on how margin of error we accept
• All of these below hit rates are 1%
– 1/100 (N=100)
– 10/1,000 (N=1,000)
– 100/10,000 (N=10,000)
• Each confidences for determining to “1%” are different
– Each of them has different error
How many samples should we use?
17
![Page 18: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/18.jpg)
FFRI, Inc.
• We can calculate estimation of margin of error based on N
How many samples should we use?
18
(N)
Hit R
ate
conta
inin
g e
rrors
(95%
confidence)
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
100 1,000 10,000 100,000 1,000,000
![Page 19: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/19.jpg)
FFRI, Inc.
• Randomly sampling from files collected by ourselves
– Malware: 15,000(5,000 = training, 10,000 = testing)
– Benign: 15,000 (5,000 = training, 10,000 = testing)
• *Random* is very important
– Different period (choose 15,000/N sample from every day)
– Different sources
– Never care about filetype or malware type
Malware and benign files
19
![Page 20: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/20.jpg)
FFRI, Inc.
• Open source automated malware analysis system
– Sending malware into a virtual machine from a host
– Executing the malware inside the virtual machine
– Monitoring and saving its behaviors at runtime
• API calls, network activity, VT results, etc.
Cuckoo Sandbox - http://www.cuckoosandbox.org/
20
![Page 21: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/21.jpg)
FFRI, Inc.
API calls
21
![Page 22: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/22.jpg)
FFRI, Inc.
• Most of the samples finished calling API within 1s
(or keep calling APIs) -> Used API only called within 5s
Trends of API calls
22
0
2000
4000
6000
8000
10000
1s 5s 10s 15s 30s 45s 60s 60s+
Elapsed time
malware goodware
The N
um
ber
of sa
mple
s w
hic
h f
inis
hed c
allin
g A
PI
![Page 23: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/23.jpg)
FFRI, Inc.
• Distributed Online Machine Learning Framework
– Distributed: Scalable
– Online: Not batch, continuous learning
• Open source, LGPL v2.1. Latest release is 0.4.5(22/07/2013)
• Developed by Preferred Infrastructure, Inc. and NTT Software
Innovation Center
• Support various machine learning
– Classification, Regression, Recommendation, Anomaly Detection
• Easy to use (many language bindings, feature converter, etc)
Jubatus - http://jubat.us/en/
23
![Page 24: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/24.jpg)
FFRI, Inc.
Feature selection and convert to FV
24
NtOpenFile
NtWriteFile
NtWriteFile
NtOpenFile
NtWriteFile
NtWriteFile
Call s
equence
NtCreateProcess
NtOpenFileNtWriteFileNtWriteFile
NtWriteFileNtWriteFileNtOpenFile
NtWriteFileNtOpenFileNtWriteFile
.
.
.
.
.
![Page 25: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/25.jpg)
FFRI, Inc.
malware
benign
Feature selection and convert to FV
25
malware
NtCreateProcess NtClose NtClose benign NtCreateProcess NtClose NtClose benign
NtCreateProcessNtCloseNtClose
label:benign
NtOpenFile NtWriteFile Nt
malware
NtOpenFileNtWriteFileNtWriteFile
label:malware train
Jubatus
![Page 26: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/26.jpg)
FFRI, Inc.
Calc and update each FV’s weight based on its freq. and label
(for the detail is dependent on the algorithm called AROW, don’t ask me :-)
(Image) Training internal
26
malware benign
0.75 0.25
0.25
0.25
0.75
0.75
0.5 0.5
0.0
NtClose NtClose NtClose
NtWriteFile LdrLoadDll NtDelayExecution
NtCreateProcess NtClose NtClose
NtAllocateVirtualMemory ... ...
![Page 27: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/27.jpg)
FFRI, Inc.
• N of N-gram
– 3~5-gram > 2-gram = 6-gram
• Best: 3-gram
– TRP: 72.33% [71.58 ~ 73.07 % (95% confidence)]
– FPR: 0.77% [0.60 ~ 0.99% (95% confidence)]
• The result above is an example
– A lot of combination of features are available
(we used only “API-name” and its sequence)
Testing results
27
![Page 28: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/28.jpg)
FFRI, Inc.
Demo
28
![Page 29: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/29.jpg)
FFRI, Inc.
Future Works
29
![Page 30: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/30.jpg)
FFRI, Inc.
• http://blog.jubat.us/2013/06/classifier.html (Japanese only)
Dumping training model
30
Investigating weight parameters of classifier
jubalocal_storage_dump.cpp
https://gist.github.com/t-abe/5746333
![Page 31: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/31.jpg)
FFRI, Inc.
Indicators of malware likeness in API 3-gram
31
[foo@nolife classifier]$ ./dump --input model --label “malware”
0.181128 api_call$VirtualProtectEx_VirtualProtectEx_VirtualProtectEx@space#log_tf/bin
0.142254 api_call$RegOpenKeyExA_NtOpenKey_NtOpenKey@space#log_tf/bin
0.137144 api_call$NtReadFile_NtReadFile_NtFreeVirtualMemory@space#log_tf/bin
0.134443 api_call$LdrLoadDll_LdrGetProcedureAddress_VirtualProtectEx@space#log_tf/bin
0.130287 api_call$LdrLoadDll_RegOpenKeyExA_NtOpenKey@space#log_tf/bin
0.130287 api_call$DeviceIoControl_LdrLoadDll_RegOpenKeyExA@space#log_tf/bin
0.122363 api_call$VirtualProtectEx_LdrLoadDll_LdrGetProcedureAddress@space#log_tf/bin
0.102545 api_call$NtFreeVirtualMemory_LdrGetDllHandle_NtCreateFile@space#log_tf/bin
0.102485 api_call$RegCloseKey_RegCloseKey_RegCloseKey@space#log_tf/bin
0.0983165 api_call$NtReadFile_NtFreeVirtualMemory_LdrLoadDll@space#log_tf/bin
0.0966545 api_call$NtSetInformationFile_NtReadFile_NtFreeVirtualMemory@space#log_tf/bin
0.094639 api_call$NtMapViewOfSection_NtFreeVirtualMemory_NtOpenKey@space#log_tf/bin
0.0933827 api_call$NtFreeVirtualMemory_LdrLoadDll_LdrGetProcedureAddress@space#log_tf/bin
0.0905402 api_call$DeviceIoControl_DeviceIoControl_NtWriteFile@space#log_tf/bin
0.0903766 api_call$DeviceIoControl_NtWriteFile_NtWriteFile@space#log_tf/bin
0.0884724 api_call$RegOpenKeyExW_RegOpenKeyExW_LdrGetDllHandle@space#log_tf/bin
0.0853282 api_call$LdrLoadDll_LdrLoadDll_LdrLoadDll@space#log_tf/bin
...
![Page 32: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/32.jpg)
FFRI, Inc.
Indicators of goodware likeness in API 3-gram
32
[foo@nolife classifier]$ ./dump --input model --label “goodware”
0.268353 api_call$LdrGetDllHandle_LdrGetDllHandle_ExitProcess@space#log_tf/bin
0.268353 api_call$LdrGetDllHandle_ExitProcess_NtTerminateProcess@space#log_tf/bin
0.259838 api_call$NtWriteFile_LdrGetDllHandle_LdrGetDllHandle@space#log_tf/bin
0.25887 api_call$NtWriteFile_NtWriteFile_LdrGetDllHandle@space#log_tf/bin
0.135514 api_call$NtOpenFile_NtOpenFile_NtCreateFile@space#log_tf/bin
0.122445 api_call$DeviceIoControl_LdrLoadDll_LdrGetProcedureAddress@space#log_tf/bin
0.12242 api_call$DeviceIoControl_DeviceIoControl_LdrGetDllHandle@space#log_tf/bin
0.119231 api_call$GetSystemMetrics_LdrLoadDll_NtCreateMutant@space#log_tf/bin
0.115319 api_call$DeviceIoControl_LdrGetDllHandle_LdrGetProcedureAddress@space#log_tf/bin
0.109306 api_call$LdrGetProcedureAddress_NtOpenKey_LdrLoadDll@space#log_tf/bin
0.105579 api_call$NtReadFile_NtReadFile_NtReadFile@space#log_tf/bin
0.104565 api_call$NtCreateFile_NtCreateFile_NtWriteFile@space#log_tf/bin
0.103304 api_call$RegOpenKeyExA_LdrGetDllHandle_LdrGetProcedureAddress@space#log_tf/bin
0.10306 api_call$VirtualProtectEx_RegOpenKeyExA_LdrGetDllHandle@space#log_tf/bin
0.100701 api_call$NtFreeVirtualMemory_NtFreeVirtualMemory_GetSystemMetrics@space#log_tf/bin
...
![Page 33: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/33.jpg)
FFRI, Inc.
• “VirtualProtectEx_VirtualProtectEx_VirtualProtectEx”
looks like to related to malware
• How about “RegOpenKeyExA_NtOpenKey_NtOpenKey”?
• Computers might recognize indicators which human can’t
(Extremely strong left-brain player)
• Why don’t we cooperate with machine?
Computer vs. Man
33
![Page 34: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/34.jpg)
FFRI, Inc.
• Generating models using computers
• Checking them and guessing new logics by human
(Using our right-brain)
• ML-based detection is sometimes difficult to control
– Cannot specify strict conditions to detect
“ It is detected because ML said so ! ”
• Hybrid of ML-based and Logic-based would be powerful
Using computers
34
![Page 35: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/35.jpg)
FFRI, Inc.
• Using static information as feature
– We can check a file before its execution
– The performance is dependent on features
• Using dynamic information as feature
– Malware is already executed
– Sometimes, detections would be too late
– The hybrid detection above might be also useful
in this perspective
Applying to real time protection
35
![Page 36: Fighting advanced malware using machine learning (English)](https://reader033.fdocuments.net/reader033/viewer/2022052315/555c4203d8b42a0b038b4edc/html5/thumbnails/36.jpg)
FFRI, Inc.
• Traditional pattern-matching reaches its limit
• Current behavioral detections hardly contributes to detect
• By applying ML to behavioral detections
– TPR is improved
– Computers recognize new features which human can’t
– We should make use of them
Conclusion
36