MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models
-
Upload
emiliano-de-cristofaro -
Category
Technology
-
view
140 -
download
0
Transcript of MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models
![Page 1: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/1.jpg)
MaMaDroid: Detecting Android Malware by Building Markov Chains of
Behavioral Models
![Page 2: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/2.jpg)
Android & Malware
Android market share is growing…In 2016, 85% of smartphone sales
…and so is the interest of cybercriminalsBypassing two-factor authenticationStealing sensitive information, etc.
2
![Page 3: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/3.jpg)
Current Defenses
Can’t use complex on-device operationsLimited battery and memory resources
Google’s centralized analysisNot perfect, after-the-factMany apps installed outside Play Store
Lots of research in the field! However…Permission-based models prone to false positiveRelying on API calls frequently used by malware needs constant, costly retraining
3
![Page 4: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/4.jpg)
Our Idea
Rely on the sequence of abstracted calls1. Sequence captures the behavioral model2. Abstraction provides resilience to API changes
Intuition: malware uses calls for different actions and in different order than benign apps
E.g. android.media.MediaRecorder used by any app with permission to record audioOnly using it after calls to getRunningTasks(), which allows to record conversations, may suggest maliciousness
4
![Page 5: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/5.jpg)
Overview
5
![Page 6: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/6.jpg)
Call Graph Extraction
Based on static analysisGiven an apk, extract call graphs
ToolsSoot (Java optimization and analysis framework)FlowDroid (ensures contexts & flows preserved)
6
![Page 7: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/7.jpg)
7
![Page 8: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/8.jpg)
Call Graph
8
![Page 9: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/9.jpg)
Overview
9
![Page 10: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/10.jpg)
Sequence Extraction
Soot gives the sequence of functions that are potentially called by the program, but…
Each execution could take a specific branch of the graph and only execute a subset of the calls
When running example multiple times…Execute() may be followed by different calls, e.g., getShell() only in try or getShell() + getMessage() in catch
10
![Page 11: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/11.jpg)
Sequence Extraction (cnt’d)
We proceed as follows…1. Identify set of entry nodes2. Enumerate reachable paths3. Output set of all paths as the sequences of API calls
11
![Page 12: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/12.jpg)
Abstraction
PackagesUsing the list of 243 packages (as of API level 24) + 95 from the Google APIPackages defined by developers à “self-defined”If we can’t tell what its class implements à “obfuscated”
Families9 families: android, google, java, javax, xml, apache, junit, json, domPlus self-defined and obfuscated
12
![Page 13: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/13.jpg)
Example
13
![Page 14: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/14.jpg)
Overview
14
![Page 15: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/15.jpg)
Markov Chain
Memoryless modelsProb. transitioning from a state to another only depends on the current state
Represented as a set of nodesEach corresponding to a different state, and a set of edges labeled with the probability of transition.
Sum of all probabilities associated to all edges from any node is exactly 1
15
![Page 16: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/16.jpg)
Markov-chain based modeling
Building the Markov ChainsFrom the sequence of abstracted API calls, each package/family is a state, transition is the probability of moving from one to another
16
![Page 17: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/17.jpg)
Feature Extraction
For each app:Feature vector = probabilities of transitioning from one state to another in the Markov chainWith families, 11 possible states à 121 possible transitions in each chainWith packages, 340 states à 115,600 transitions
Principal Component Analysis (PCA)Standard way to reduce/refine features
17
![Page 18: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/18.jpg)
Overview
18
![Page 19: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/19.jpg)
Classification
Build a classifier using the extracted featuresEach app labeled as benign or malware
Can use a few standard algorithms for this task…Random Forests1-NN, 3-NNSVMMaybe deep learning?
19
![Page 20: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/20.jpg)
Datasets
20
![Page 21: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/21.jpg)
How many API calls?
21
![Page 22: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/22.jpg)
Android/Google family calls?
22
![Page 23: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/23.jpg)
Evaluation
(1) Accuracy of classification on benign and malicious samples developed around the same time
(2) Robustness to the evolution of malware as well as of the Android framework (using older datasets for training and newer ones for testing and vice-versa)
23
![Page 24: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/24.jpg)
Same Year
24
family
package
![Page 25: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/25.jpg)
Training on older samples
25
family
package
![Page 26: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/26.jpg)
Training on newer samples
26
family
package
![Page 27: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/27.jpg)
MaMaDroid vs DroidAPIMiner
27
![Page 28: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/28.jpg)
Case Studies (2016/newbenign)
False Positives (164 samples)Most of them “dangerous permissions”E.g., SMS permissions not clear why requested
False Negatives (114 samples)Actually not classified as malware by VirusTotal, might actually be legitimateMost of them adware
28
![Page 29: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/29.jpg)
Evasion
Repackaging benign appsDifficult to embed malicious code while keeping similar Markov chain, viceversa is also hard
Imitating Markov chainsLikely ineffective
Obfuscation/ManglingStill captured by the [obfuscated] abstraction
More in the paper…29
![Page 30: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/30.jpg)
Limitations
Classification is memory hungry
Soot is buggy, we lose ~4% of the samples
Limits of static analysis only methods
30
![Page 31: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/31.jpg)
Future Work
Further investigate resilience to evasionFocus on repackaged malicious appsInjection of API calls to mess with Markov chains
EnhancementsFine-grained abstractions (e.g., class)Seed with dynamic analysis
31
![Page 32: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/32.jpg)
Thank you!
32
Paper to appear at NDSS 2017:E. Mariconti, L. Onwuzurike, P. Andriotis,E. De Cristofaro, G. Ross, G. Stringhini.MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Model
![Page 33: MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models](https://reader030.fdocuments.net/reader030/viewer/2022021502/589a3b211a28ab8c588b5327/html5/thumbnails/33.jpg)
Thank you!
33