"Быстрое обнаружение вредоносного ПО для Android с...

Click here to load reader

  • date post

  • Category


  • view

  • download


Embed Size (px)


В докладе речь пойдёт о применении алгоритмов машинного обучения для обнаружения вредоносных приложений для Android. Я расскажу, как на базе Матрикснета в Яндексе был спроектирован высокопроизводительный инструмент для решения этой задачи. А также продемонстрирую, в каких случаях аналитические методы выявления вредоносного ПО помогают блокировать множество простых образцов вирусного кода. Затем мы поговорим о том, как можно усовершенствовать такие методы для обнаружения более хитроумных вредных программ.

Transcript of "Быстрое обнаружение вредоносного ПО для Android с...

  • 1. 1

2. 2 Fast detection of Android malware Yury Leonychev 3. 3 Introduction 4. 4 Android application APK Manifest (AndroidManifest.xml) Code (Classes.dex and native) Meta information (META-INF) Resources (les and Resources.arsc) 5. 5 Brief list of tools for APK analysis ! Androguard (ultimate tool by @adesnos and others) used by VirusTotal, APKInspector, etc. ! SCanDroid (Adam P. Fuchs, Avik Chaudhuri, and Jeffrey S. Foster) ! TaintDroid (guys from Intel, Penn State University, Duke University) ! DroidBox (dynamic analysis by Lantz Patric) used by ApkScan 6. 6 Is this all? Really? ! http://www.apk-analyzer.net ! http://anubis.iseclab.org ! http://apkscan.nviso.be 7. 7 Our task is more complex Malware detector 8. 8 Methods of malware detection Static analysis ! Advantages APK has predictable content. Application behavior can be learned by simply reading the le Checks are safe ! Limitations Can be ineffective for sophisticated malware and obfuscation techniques We cannot really tell as we don't execute app 9. 9 Methods of malware detection Dynamic analysis ! Advantages Clear results and interpretation Open source solutions available ! Limitations Not fast (enough) Can be detected and bypassed Big ecosystem requires big infrastructure 10. 10 Methods of malware detection Signature analysis ! Advantages Effective for known malware Commercial solutions available ! Limitations Signature databases requires regular (and frequent) updates Not effective for new malware Do you have a team of virus analytics? 11. 11 Methods of malware detection Seems like the most efcient way is hybrid solution 12. 12 MatrixNet What is The Matrix? 13. 13 Why can we use machine learning? Abstract task description: ! We have a set of objects (APK-les). We should divide this set into two subsets (malware and normal) ! For every element in main set we can count predictable amount of features ! Subsets only result of simple classication task, so we can try to choose effective features 14. 14 What is the MatrixNet? MatrixNet is an implementation of gradient boosted decision trees algorithm MatrixNet is a bit different from standard: ! Using Oblivious Trees ! Accounting for sample count in each leaf 15. 15 Why MatrixNet is powerful? ! This is machine learning algorithm for classication task ! A key feature of this method is its resistance to overtting 16. 16 MatrixNet post learning optimization 17. 17 MatrixNet post learning optimization Copyright 2013 by Sidney Harris. 18. 18 How it works? Ofine learning process: ! Choosing features ! Choosing samples ! Manual classication (malware or not) ! Learning on combined set of apps ! Calculating mistakes 19. 19 Features What kind of features to use: ! Permissions ! URI in strings and other resources ! Adware library usage ! Obfuscation methods ! 20. 20 Samples and classication Malware applications: ! VirusTotal feed ! Samples from malicious sites Normal applications: ! Manual testing ! Trusted developers ! Yandex applications 21. 21 Formula Features weight Features cost Learning Normal Malware MatrixNetFeatures 22. 22 Measuring of mistakes Formula 1 Features cost 1 Formula N Features cost N Normal Malware Formula with cool confusion matrix and low cost 23. 23 Analyzer architecture Fine! I'll go build my own casino, with blackjack and big data 24. 24 Main parts Parsers Analyzers Oracle Report 25. 25 Parsers In depth APK ManifestParser ResourceParser MetaInfoParser ClassesParser Analyzers PermissionAnalyzer PackageAnalyzer URLAnalyzer ReectionAnalyzer Reports XHTMLReporter JSONReporter Oracle MatrixNet 26. 26 ManifestParser Avoid some obfuscation methods: ! HEUR:Backdoor.AndroidOS.Obad.a 27. 27 ManifestParser 28. 28 ClassesParser ! Parser for DEX les ! Internal DEX disassembler ! Callgraph builder ! Embeds real functions/variables names into disassembly listing ! Builds a list of used procedures and functions 29. 29 ClassesParser Disassembler https://github.com/tracer0tong/de Example: ./de.py test1.dex.dat [[0, 'sget-object v0, {type} [{class}].{eld} // eld@2225'], [2, 'invoke-virtual v0 @13970 // {class}->{method}'], [5, 'move-result-object v0'], [6, 'check-cast v0, [{type_name}] // type@0958'], [8, 'return-object v0']] 30. 30 ReectionAnalyzer java.lang.reect.* ! Classes: Field, Method, etc. ! Functions: getClass(), getDeclaredField(), etc. 31. 31 ReectionAnalyzer Output: ! Report: There is some reections usage: 1@android.app.Activity->getContentResolver calls: 598@java.lang.Class->forName 2@android.app.Activity->onActivityResult calls: 598@java.lang.Class->forName ! Amount of reection calls is a feature. 32. 32 Service architecture Nginx Gunicorn Flask Celery MongoDB Nginx Gunicorn Flask Celery MongoDB 33. 33 Case study 34. 34 Let's try it on... Yandex.Store application feed: ! More than 50K Android applications ! More than 200 new/updated apps per week ! Open for developers (no strict manual verication) 35. 35 Perfomance. Check timing ~2 ms ~0,25 s ~4,5 min 36. 36 Performance. Amount of checks ! More than 16.000 applications checked in 1 hour on 1 cluster node 37. 37 Confusion matrix Meaning Malware (Score > 0) Normal (Score < 0) Fact Malware 485 (97%) 15 (3%) Normal 25 (5%) 475 (95%) 38. 38 (Un)predictable results ! Applications with malicious adware library AirPush classied as malware ! But we have no special features for adware in rst version 39. 39 Conclusion Its alive alive! 40. 40 It works! ! Analytic methods work ne for detection Android mobile malware ! Machine learning is not a rocket science but cool and effective instrument ! Open API coming soon. 41. 41 Thanks for attention 42. 42 Yury Leonychev Application Security Engineer yleonychev@yandex-team.ru ! tracer0tong Yandex LLC 2013