Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in...

19
Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b , Minhui Xue c,d , Lingling Fan a,b , Shuang Hao e , Lihua Xu a, *, Haojin Zhu d , Bo Li f a East China Normal University, Shanghai, China b Nanyang Technological University, Singapore c New York University Shanghai, Shanghai, China d Shanghai Jiao Tong University, Shanghai, China e University of Texas at Dallas, USA f University of California, Berkeley, USA ARTICLE INFO Article history: Received 17 April 2017 Received in revised form 31 October 2017 Accepted 12 November 2017 Available online ABSTRACT The evolution of mobile malware poses a serious threat to smartphone security.Today, so- phisticated attackers can adapt by maximally sabotaging machine-learning classifiers via polluting training data, rendering most recent machine learning-based malware detection tools (such as DREBIN,DROIDAPIMINER, and MAMADROID) ineffective. In this paper, we explore the feasibility of constructing crafted malware samples; examine how machine-learning clas- sifiers can be misled under three different threat models; then conclude that injecting carefully crafted data into training data can significantly reduce detection accuracy.To tackle the problem, we propose KUAFUDET, a two-phase learning enhancing approach that learns mobile malware by adversarial detection. KUAFUDET includes an offline training phase that selects and extracts features from the training set, and an online detection phase that utilizes the classifier trained by the first phase.To further address the adversarial environment, these two phases are intertwined through a self-adaptive learning scheme, wherein an auto- mated camouflage detector is introduced to filter the suspicious false negatives and feed them back into the training phase. We finally show that KUAFUDET can significantly reduce false negatives and boost the detection accuracy by at least 15%. Experiments on more than 250,000 mobile applications demonstrate that KUAFUDET is scalable and can be highly effec- tive as a standalone system. © 2017 Elsevier Ltd. All rights reserved. Keywords: Malware detection Adversarial machine learning Poisoning attacks Manipulation KUAFUDET We would like to thank Pwnzen Infotech Inc. for providing us with a copy of mobile malware to conduct the study, especially the Pwnzen Infotech Inc. co-founder Zhushou Tang for exchanging helpful industry experience.This work was supported in part by the Na- tional Natural Science Foundation of China, under Grants 61502170, 61272444, 61411146001, U1401253, and U1405251, in part by the Science and Technology Commission of Shanghai Municipality under Grant 13ZR1413000. * Corresponding author. E-mail address: [email protected] (L. Xu). https://doi.org/10.1016/j.cose.2017.11.007 0167-4048/© 2017 Elsevier Ltd. All rights reserved. computers & security 73 (2018) 326–344 Available online at www.sciencedirect.com journal homepage: www.elsevier.com/locate/cose ScienceDirect

Transcript of Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in...

Page 1: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

Automated poisoning attacks and defenses inmalware detection systems: An adversarialmachine learning approach

Sen Chen a,b, Minhui Xue c,d, Lingling Fan a,b, Shuang Hao e, Lihua Xu a,*,Haojin Zhu d, Bo Li f

a East China Normal University, Shanghai, Chinab Nanyang Technological University, Singaporec New York University Shanghai, Shanghai, Chinad Shanghai Jiao Tong University, Shanghai, Chinae University of Texas at Dallas, USAf University of California, Berkeley, USA

A R T I C L E I N F O

Article history:

Received 17 April 2017

Received in revised form 31 October

2017

Accepted 12 November 2017

Available online

A B S T R A C T

The evolution of mobile malware poses a serious threat to smartphone security. Today, so-

phisticated attackers can adapt by maximally sabotaging machine-learning classifiers via

polluting training data, rendering most recent machine learning-based malware detection

tools (such as DREBIN, DROIDAPIMINER, and MAMADROID) ineffective. In this paper, we explore

the feasibility of constructing crafted malware samples; examine how machine-learning clas-

sifiers can be misled under three different threat models; then conclude that injecting carefully

crafted data into training data can significantly reduce detection accuracy. To tackle the

problem, we propose KUAFUDET, a two-phase learning enhancing approach that learns mobile

malware by adversarial detection. KUAFUDET includes an offline training phase that selects

and extracts features from the training set, and an online detection phase that utilizes the

classifier trained by the first phase. To further address the adversarial environment, these

two phases are intertwined through a self-adaptive learning scheme, wherein an auto-

mated camouflage detector is introduced to filter the suspicious false negatives and feed

them back into the training phase. We finally show that KUAFUDET can significantly reduce

false negatives and boost the detection accuracy by at least 15%. Experiments on more than

250,000 mobile applications demonstrate that KUAFUDET is scalable and can be highly effec-

tive as a standalone system.

© 2017 Elsevier Ltd. All rights reserved.

Keywords:

Malware detection

Adversarial machine learning

Poisoning attacks

Manipulation

KUAFUDET

We would like to thank Pwnzen Infotech Inc. for providing us with a copy of mobile malware to conduct the study, especially thePwnzen Infotech Inc. co-founder Zhushou Tang for exchanging helpful industry experience. This work was supported in part by the Na-tional Natural Science Foundation of China, under Grants 61502170, 61272444, 61411146001, U1401253, and U1405251, in part by the Scienceand Technology Commission of Shanghai Municipality under Grant 13ZR1413000.

* Corresponding author.E-mail address: [email protected] (L. Xu).

https://doi.org/10.1016/j.cose.2017.11.0070167-4048/© 2017 Elsevier Ltd. All rights reserved.

c om pu t e r s & s e cu r i t y 7 3 ( 2 0 1 8 ) 3 2 6 – 3 4 4

Available online at www.sciencedirect.com

journal homepage: www.elsevier.com/ locate /cose

ScienceDirect

Page 2: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

1. Introduction

Since last decade, the software development has been wit-nessed to have a massive shift toward mobile applications. Withthe growth of mobile applications and their users, the secu-rity and privacy concerns are increasingly becoming the focusof great concern to various stakeholders. For instance, moreand more users store personal data in their mobile devices,even carrying out financial transactions such as online bankingand shopping from their smartphones. Some of these data canbe very sensitive. Consequently, hackers can have substan-tial financial gain from such sensitive data and thus find mobiledevices to be lucrative targets.

It is not surprising that the demand for tools of automati-cally analyzing and detecting malicious applications has alsogrown. Most of the researchers’ and practitioners’ efforts in thisarea target the Android platform, the largest share of the mobilemarket. There has been a plethora of research in malware de-tection for Android. Static and dynamic analyses are two generictechniques primarily implemented by two approaches:signature-based (Schlegel et al., 2011; Zhou et al., 2013; Zhouand Jiang, 2012; Zhou et al., 2012) and behavior-based (Grazianoet al., 2015; Rasthofer et al., 2016; Tam et al., 2015; Wu et al.,2014; Yan and Yin, 2012). Information flow analysis-based ap-proach (Arzt et al., 2014; Enck et al., 2014; Gordon et al., 2015;Li et al., 2015; Wong and Lie, 2016) is also proposed to detectAndroid malware. We note that machine learning is one of themost promising techniques in detecting mobile malware (Aaferet al., 2013; Arp et al., 2014; Avdiienko et al., 2015; Chen et al.,2016a; Dash et al., 2016; Fan et al., 2016; Feizollah et al., 2017;Idrees et al., 2017; Meng et al., 2016; Rasthofer et al., 2014; Yanget al., 2014, 2015; Zhang et al., 2014). However, machine learn-ing approaches also have a weakness: they are susceptible toadversarial countermeasures by attackers aware of their use.First, through reverse-engineering, attackers may become awareof classifiers and their parameters used to evade detection.Second, more sophisticated attackers can actively tamper withthe classifiers by injecting well-crafted data into training data.Therefore, with Android’s policy of open-source kernel, malwarewriters can gain an in-depth understanding of the mobile plat-form, hence intentionally alter the training set to reduce oreliminate its detection efficacy.

To our knowledge, most up-to-date works using machinelearning mainly focused on detection accuracy and assumedthat feature extraction is considered in an ideal environment(Chen et al., 2016c; Mariconti et al., 2017). No evasion tech-niques were conducted in the feature space when usingmachine learning-based detection approach. In this paper, weconsider a threat model within a specific class of attacks, namedpoisoning attack, in which the attacker is assumed to control asubset of samples or inject additional seeds at will in order tomislead the learning algorithm. For example, in malware de-tection, a sophisticated attacker may have a good commandof the whole training set and deliberately inject poisoning pat-terns to compromise the performance of the classifiers, whichbecome more prone to misclassify malicious applications asbenign ones. This threat model conceptually underlies adversarialmachine learning: it involves gradients of the function f representedby the learned model (e.g., SVM, logistic regression, K-Nearest Neigh-

bor) in order to evaluate it on an input x. Attackers can then fullyautomatically either identify individual input features that are per-turbed to achieve misclassification (Papernot et al., 2016).

To test the ramifications of these causative attacks, wedevelop an adversarial model with three types of attackers ac-cording to different aggressiveness of attacks to simulate thereal-world attacks.To do this, we adopt a customized adversarialcrafting algorithm, characterized by the aggressiveness ofattacks, to generate the crafted camouflage samples. The ab-straction of crafting steps is somehow restricted in three ways.(i) To preserve the functionality of the modified application, weonly add or remove features; (ii) we add a restricted numberof features. For simplicity, we therefore decide to add entriesto the AndroidManifest.xml and Smali files; (iii) since obscuringsemantic features is much more challenging than confusingsyntax features, we only use syntax features to craft samples.In spite of these restrictions in crafting, we achieve a signifi-cantly high misclassification rate on malicious applicationswhen using 564 original non-robust features.To further performa longitudinal comparison, we also apply our poisoning attackto DREBIN (Arp et al., 2014), DROIDAPIMINER (Aafer et al., 2013), andMAMADROID (Mariconti et al., 2017), the three most recentmachine-learning detection systems in academia. We thus vali-date that the threat model and the poisoning attack are indeedviable in malware detection. We conjecture that almost all thestate-of-the-art machine-learning malware detection systemsare suffering from the poisoning attack we exhibited in thepaper.

To handle these adversarial attacks, we propose KUAFUDET,a learning enhancing defense system with adversarial detec-tion that includes an offline training phase that selects andextracts contributing features from the training set for pre-processing, and an online detection phase that utilizes theclassifier trained by the first phase. Comparing to existing work,these two phases act together, through a self-adaptive learn-ing scheme, as an iterative adversarial detection process.Additionally, we introduce the camouflage detection for verify-ing false alarms to protect against poisoning attacks. By usingsimilarity analysis, the camouflage detection is applied to it-eratively detect against malicious data distortion. In concrete,we train 16,000 Android application samples that are equallydistributed, which are downloaded from Contagio MobileWebsite,1 Pwnzen Infotech Inc. and DREBIN (Arp et al., 2014). All195 robust features are extracted using static analysis on thegiven application, pruned by information gain.We further evalu-ate the results on 4000 applications. Our best detection classifierachieves up to 96% accuracy without adversarial environ-ment, and by at least 15% accuracy when coping with the mostpowerful attackers, along with both low false negatives. Fur-thermore, we conduct an empirical evaluation on our test setand select 1000 malware as samples out of the set of 10,400malicious samples and scan them using KUAFUDET and otherindustrial malware detecting tools, such as Kaspersky andMcAfee encapsulated in VirusTotal.2 The coverage of KUAFUDET

significantly outperforms these top-of-the-line antivirussystems. Finally, we perform the entire process of KUAFUDET,

1 http://contagiominidump.blogspot.hk/.2 https://www.virustotal.com/.

327c om pu t e r s & s e cu r i t y 7 3 ( 2 0 1 8 ) 3 2 6 – 3 4 4

Page 3: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

using real-time streaming, on a server with 16 GB memory,quad-core i7-4800MQ at 3.6 GHz, and 1 TB hard drives and showthat KUAFUDET is scalable and efficient.

In this paper, we make the following key contributions thatare fourfold.

1. We propose that poisoning attacks can be exhibited by threetypes of attackers in the real world, ranging from weak,strong, and sophisticated degrees. We hold evidence thatthe real-world mobile malware dataset is able to truly reflectthree types of attackers we defined.

2. We adopt a customized adversarial crafting algorithm, se-mantically characterized by the aggressiveness of attacks,to generate the crafted camouflage samples using syntaxfeatures to largely simulate the real-world attacks.

3. We show that our poisoning attack is able to misleadDROIDAPIMINER (Aafer et al., 2013), DREBIN (Arp et al., 2014), andMAMADROID (Mariconti et al., 2017), the three most recentmachine-learning detection systems in academia.

4. We propose a two-phase iterative adversarial-based detec-tion, termed KUAFUDET, wherein similarity-based filtering isused to identify the false negatives that are the camou-flaged malicious applications, further reinforcing theresilience of the malware detection system.

Our experiments show that attackers can also poison fea-tures while preserving maliciousness, and our experimentsverify that the resulting fake variants with poisoned featuresimpaired discriminative classifiers and succeed in lowering thedetection score in a test environment. Other main findings areas follows:

• We observe that different feature categories have differ-ent impacts on crafted camouflage samples. The effect rateof API call leads to greater perturbation than permission.

• We emulate the feature extraction for all types of featuresthat DREBIN used and find that DREBIN-used feature extrac-tion is substantially more computationally complex and doesnot necessarily boost the accuracy.

• We find that in the data-imbalanced (benign-malicious ratio)environment, the accuracy of KUAFUDET gradually degradesas we put in more benign applications, but the accuracy stillremains relatively high.

• We find that similarity-based filtering analysis and rela-beling have an excellent performance to handle againstadversarial attacks.

To the best of our knowledge, this is the first paper to ac-commodate a newly designed two-phase adversarial machinelearning mechanism into mobile malware detection to limitthe possibility of mimicry and poisoning attacks, and furtherpropose a learning enhancing system through adversarial de-tection of Android malware.

The rest of the paper is organized as follows. Section 2defines the research problem. Section 3 presents the motiva-tions and challenges. Section 4 provides the system overviewfollowed by the implementation shown in Section 5. Section6 summarizes experimental evaluation. Section 7 discusses limi-tations. Section 8 surveys related work. Finally, Section 9concludes the paper.

1.1. A note on ethics

In this paper, we are very aware of the potential impact on ma-licious apps disclosure or exploited by other malicious thirdparties. In particular, in order to illustrate this methodology,the collection of mobile malware used and crafted was strictlyfollowed by the Privacy Policy of the Pwnzen Infotech Inc., andconformed to the non-disclosure agreement (NDA) of thePwnzen Infotech Inc. Furthermore, to facilitate research onmobile malware detection, we make the malicious Android ap-plications (except those from Pwnzen Infotech Inc.) used in thepaper publicly available to other researchers underhttp://nsec.sjtu.edu.cn/kuafuDet/kuafuDet.html, but no attemptwas made to provide data from Pwnzen Infotech Inc. for peopleoutside of our research group because of intellectual prop-erty. Only Pwnzen Infotech Inc. authorized employees, usinginternal computer systems from Pwnzen Infotech Inc., can haveaccess to the dataset. Finally, we informed the team of thePwnzen Infotech Inc. of the potential newly discovered mali-cious apps in order to help Pwnzen Infotech Inc. improve thequality of its products and services. We believe this study per-forms an important public service, as it shows that even state-of-the-art antivirus systems are somehow futile. Our ultimategoal is to inform developers and users of such potential poi-soning attack, so that more comprehensive countermeasurescan be taken in the future.

2. Problem definition: Adversarialmachine learning

We denote a sample set by x yi i in, ,( ) ∈( ){ } =X Y 1

, where xi is theith malware sample vector of which each component exhib-its the selected feature; if xi has the jth component, then xij = 1;otherwise xij = 0. yi ∈ {0, 1}, n is the total number of malwaresamples, and X ⊆ { }0 1, m is an m-dimensional feature space.In this paper, we consider binary classifiers with only two outputclasses where the attacker crafts malware dataset to evade de-tection and hence achieves his goals.The attacker tries to moveaway malware dataset yi = 1 in any direction by adding a non-zero displacement feature vector δi to xi yi =1. For example,attackers may add good attributes to mobile malware to evadebinary classifiers. We note that attackers will not be able tomodify legitimate benign applications since an honest authorhas no interest in having his benign application classified asmalware. Hence, crafting an adversarial sample x*, misclassifiedby the function f (where f : x → y = f(x)), from a benign samplex can be formalized as the following problem (Szegedy et al.,2014):

x x f x f xx x* . . ,= + +( ) ≠ ( )δ δs t (1)

where δx = x is the minimal perturbation yieldingmisclassification and f can be the corresponding softmaxfunction.

The goal of adversarial sample crafting in malware detec-tion is to mislead the detection system, causing theclassification for a particular application to change accordingto the attackers wishes. In this paper, we only focus on the

328 c om pu t e r s & s e cu r i t y 7 3 ( 2 0 1 8 ) 3 2 6 – 3 4 4

Page 4: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

poisoning attack that results in malicious behavior beingmisclassified as benign (false negatives), because Inter-Component Communication (ICC) analysis is used to reducefalse positives (Li et al., 2015; Octeau et al., 2016). We alsoassume the attacker has full access to the classifier used, andcan inject as many variants’ features as possible at will to thegiven classifier. For this reason, following Eq. (1), we furtherdenote xij

max =( )1 and xijmin =( )0 as the maximum and the

minimum values that the jth feature of the ith sample can take.Then a poisoning attack can be characterized in the follow-ing:

C x x C x x j m

C xf ij ij ij f ij ij

f ij ij

min max , , ,−( ) ≤ ≤ −( ) ∀ ∈[ ]= −( ) ≤ ≤

δδ

1

0 CC x j mf ij1 1−( ) ∀ ∈[ ], , ,(2)

where Cf ∈ [0, 1] controls the aggressiveness of attacks. Cf = 0indicates no attacks, while Cf = 1 indicates the most aggres-sive attacks. To test the ramifications of causative attacks andclearly elaborate the challenges, we develop an adversarialmodel with three types of attackers with the correspondingCf values.

Weak attacker (Cf = 0.33). Our weak attacker is not awareof the statistical properties of the training features or labelsat all. This attacker simply fakes additional labels with randombinary features to poison the training dataset.

Strong attacker (Cf = 0.67). Our strong attacker is aware ofthe features we use for training and can have access to ourground-truth dataset (which comes from public sources). Thisattacker can manipulate partial features in the training data.However, this attacker is resource constrained and cannot ma-nipulate any mobile application statistics which would requiremore time. The strong attacker crafts features by randomly se-lecting publicly available Android malware and then fakingadditional labels, so that the partial training labels can becomenearly identical.

Sophisticated attacker (Cf = 1). Our strongest attacker, namedsophisticated attacker, has full knowledge of our training featureset. Additionally, this attacker has sufficient time and eco-nomic resources to create arbitrary mobile application statistics.Therefore, the sophisticated attacker can fully manipulatealmost all training features, which creates scenarios where rela-tively benign mobile applications and real-world maliciousmobile applications appear to have nearly identical attri-butes at the training phase.

To do this, we adopt the adversarial crafting algorithm(Papernot et al., 2016) based on the Jacobian matrix

Jf f x

xf

i

j i j m

= ∂ ( )∂

=∂ ( )

∂⎡

⎣⎢

⎦⎥

∈{ } ∈[ ]

X

X0 1 1, , ,

where f0(x) outputs x is benign and f1(x) outputs x is mali-cious, with f0(x) + f1(x) = 1. To craft an adversarial sample, weuse 73 benign features and 102 malicious features detailed inSection 5.1. We then take mainly two steps. In the first step,we compute the gradient of f with respect to x to estimate thedirection in which a perturbation in x would change f’s output.In the second step, we choose a perturbation δ of x withmaximal positive gradient into our target class Yy yi i= =0 1 denotedy′, and we customize the adversarial crafting algorithm(Papernot et al., 2016) according to our adversarial model withthree types of attackers (Cf), to indicate the probability (Eq. (2))of adding a specific feature. After computing the gradient, weiteratively choose a target feature of which the gradient is thelargest for our target class and then update its value in x toobtain our new input vector. We then re-update the gradientand repeat this process until either (i) we reach the boundedallowed changes (loop bound) or (ii) we successfully achievea misclassification.

We didn’t attempt to formalize the malware detection asan optimization problem, saying to maximize accuracy at thelowest resource cost with minimal adversarial perturbations,because simply using optimization may not be aligned withhow we semantically understand the human malicious be-haviors and motivations. For future research, we wish togenerate an exact mapping rule between machine-craftedmobile malware and read-to-use malicious apps in the wild,and ultimately wish to provide a foundation for developingsustainably-secure anti-malware systems in the face of dynamiccyber-maneuvers.

3. Motivations and challenges

In this section, to highlight our contributions, we motivate ourmalware detection model by incorporating adversarial envi-ronment witnessed in the real world and then review severalchallenges of our work.

3.1. Evolutionary chain

Fig. 1 describes an evolutionary chain of mobile malware ad-vancements that has been observed along the timeline: rangingfrom the seed explosion to the recent adversarial attack. Earliermenaces of malware are mainly pertinent to compromised SMSrelated functions. With every new technology comes abuse, theAndroid market is no exception. Since 2014, malware samplescan be easily exploited by Android vulnerabilities, which heavilydrives an arms race for the adversarial detection of mobilemalware. As the nature of the attack is shifting from small-

Fig. 1 – A timeline of Android malware.

329c om pu t e r s & s e cu r i t y 7 3 ( 2 0 1 8 ) 3 2 6 – 3 4 4

Page 5: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

scale and low-tech toward large-scale and skilled ones, theadditional efforts will have to be directed at taking targetedstrategies to detect stubborn malware.

3.2. Adversarial samples

In previous work (Chen et al., 2016a), a number of malwaresamples in the dataset are misclassified into benign ones. Wezoom in these misclassified samples and witnessed several real-world cases that reflect adversarial attacks from this dataset.Each observation corresponds to the three types of attackersdefined above.

Weak attacker. Embedding a good portion of benign codeinto a malicious app (e.g., manifest attributes and non-logicalcode in java code). Felt et al. (2011) show that more than halfof Android applications are overprivileged, such as misusingAndroidManifest.xml configuration. We carefully analyze the falsenegative samples using mapping relations between permis-sions and API calls, and find that some declared permissionsremain unused at the development stage.We can see from Fig. 2that part of the code in the AndroidManifest file of Polaris Officeis misclassified as benign. Although #1 to #7 permissions aredeclared, they have not been used at all since the app does notdemonstrate corresponding behaviors. These permissions ex-tracted as features for training classifiers weakly mislead theclassification outcomes. As shown in Fig. 3, we delete redun-dant code (#1–#7) that is irrespective of the logical behavior of

the sample. After we repeat the classification process, surpris-ingly the sample is classified as malicious.

Strong attacker. Hiding a good portion of malicious code intoother formats in the application package. Some alternative ma-licious applications use techniques such as dynamic loadingtechniques to hide a good portion of malicious code into otherformats. Usually these malicious code blocks include sensi-tive API calls. API calls used by most of machine learningclassifiers will dramatically lead to the misclassification. Forexample, SecondLock behaves as same as “egdata” does. Somemalicious code blocks or executable files are hidden in the filesof other formats, such as jar, so, jpg, and data. These filescontain sensitive API calls, such as DownloadManager.enqueueand DownloadManager.query, which are hidden by malware.The“assets”folder of SecondLock contains a png format file,whichis not a standard png file and can be dynamically loaded. Itcan prevent the application from updating or automaticallydownloading other malicious applications. As shown in Fig. 3,we show two steps to correct the classification results. In stepone, we remove five unused permissions, the result of classi-fication moves toward the hyperplane, though the final resultremains benign. In step two, we add some sensitive API callsthat are hidden in the png file (e.g.,DownloadManager.enqueueand DownloadManager.query).This particular sample is finallyclassified as malicious.

Sophisticated attacker. Embedding benign logic code injava code and dynamic code loading with reflection. A fewmalicious applications add benign logic code in source code.Benign logic code can be executed without any effects onmalicious behaviors, which is used to obscure the featureextraction process to clone benign applications. This is similarto “testing code.” By embedding benign logic code, the sophis-ticated attacker can add any code blocks or any combinationof various techniques to mislead the machine-learning clas-sifiers, making the classifiers less robust. Specifically, afterthe construction of Activity transition relations for WhatsApp,a repackaged malware from the third party rather than theofficial version, we find there exist embedded activities,standing alone with some methods, such as neither beingaffected by any other activities nor shown up in the systemlogic. Fig. 4 shows a code segment of WhatsApp with anembedded activity, getMemoryLimited() method initiates asystem call to getMemoryClass(), getLargeMemoryClass(),etc. However, these system calls extracted as features fortraining cannot reflect system logic of the sample, whichseriously misleads the classification outcomes. We thereforeremove such embedded benign logic code and retrain theclassifiers. As shown in Fig. 3, once we delete such code stepby step, the malware sample is exposed. Fig. 3 also exempli-fies that in the adversarial environment, the attack process ischangeable and dynamic.

Fig. 2 – A case of permissions overprivileged.

Fig. 3 – The process from benign to malicious: the blackdots refer to malicious apps, the white dots refer to benignapps, and the gray dots are originally malware butmisclassified as benign due to the camouflage codeinjected in the adversarial region. The gray dots turndarker with the deletion of camouflage code and finallyturn all black, exposing its malicious nature. The arrowsdirect toward the transformation process.

330 c om pu t e r s & s e cu r i t y 7 3 ( 2 0 1 8 ) 3 2 6 – 3 4 4

Page 6: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

The alternative approach that one sophisticated attackermay take is through dynamic code loading. Dynamic codeloading usually utilizes reflection mechanism to modify theruntime behavior of applications. It provides ability for so-phisticated attackers to add malicious behaviors (maliciousfeatures) without having to change the original application,hence mislead the machine learning classifiers. For example,as depicted in Fig. 5, sophisticated attackers can incorporatetwo malicious methods in the class MaliciousMethodsInDex(lines #1–#6) – sendDeviceInfo and sendCredential, and in-stantiate them via DEXCLASSLOADER at runtime. Since the maliciouscodes are loaded at runtime, and not part of the applicationsource, it is a challenge to classify them as malware throughmachine-learning classifiers.

3.3. Challenges

Class imbalance. We aim to train a classifier that producesbinary predictions: each mobile application is classified aseither benign or malicious. If there are significantly moremalicious applications in one class than in the other class,this biases the output of supervised machine-learning algo-rithms. Prior research treats it simply by using 49 differentmalware families (Zhou and Jiang, 2012). In consequence, ourdataset exhibits a modest class imbalance. We first define217 Android malware families and then classify them intoeight categories, such as Expense, Fraud, Payment, Privacy,Remote, Rogue, Spread, and System (see Table 1), to present asystematic characterization of existing Android malware.

The eight malware categories represent almost all coverageof existing Android malicious behaviors. We observe that it iscommon that many malware families belong to multiplecategories in parallel. This phenomenon indicates that mali-cious family is not limited to a single malicious behavior. Asshown in Table 1, Privacy and Fraud occupy the highestproportions among all. Therefore, we are able to find thereasonable distribution of our real-world malware during ourdata acquisition. As the ratio of malware to benign apps inthe real world is highly imbalanced, this class imbalanceusually represents a significant challenge for reducing falsenegatives.

Quality of ground-truth dataset. Prior work on malwaredataset focused on validating their approaches by using a smallout-dated dataset.These predictors can be used as ground truthfor training high-performance classifiers. In contrast, there isno comprehensive dataset of malware that is available in thereal world. We employ as ground truth the set of malware fromfive different platforms. In particular, the set obtained fromPwnzen Infotech Inc. is the most recent malware in the realworld. However, we acknowledge that this dataset does notcover all platforms uniformly.

4. System overview

In this section, we provide a high-level overview for our systemdesign.

Fig. 4 – A case of Activity embedded with benign API calls.

Fig. 5 – A loading example of using reflection (demonstrated as dynamic pseudo code).

331c om pu t e r s & s e cu r i t y 7 3 ( 2 0 1 8 ) 3 2 6 – 3 4 4

Page 7: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

4.1. Key ideas

In putting our approach in practice for massive-scale detec-tion, we aim to achieve two important goals: one is to identifyand filter the suspicious false negatives (i.e., the malicious ap-plications that are camouflaged), the other is to achieve accuracyand scalability at the same time. To achieve these two goals,we propose two key techniques: similarity-based filtering andtwo-phase iterative adversarial detection, as shown below.

(i) An automated similarity-based approach to filter sus-picious false negatives. In general, attackers have twocharacteristics: first, attackers may acquire dual intrin-sic attributes of applications, thus we assume thesuspicious ones are the ones with both strong reflec-tion on malicious features and benign features; second,in our threat model, attackers may have certain level ofknowledge of training set, thus we decouple the simi-larity metrics from machine-learning classifiers. Basedon these two characteristics, if the trained classifier couldincorporate the similarities across the applications in thetraining set to lead to a further fine-grained detection,the learning system would be periodically enhanced withthese newly discovered malware and suspicious falsenegatives.

(ii) A two-phase iterative adversarial detection approach toachieve accuracy and scalability. False negatives can bereduced based on our understanding of an attacker’s

threat level. Thus self-adaptive learning (SAL) scheme,where the suspicious false negatives, as well as the iden-tified malicious applications, are fed back to the trainingprocess for a desired trade-off between accuracy andscalability.

4.2. Overall architecture of KUAFUDET

The overall architecture of KUAFUDET is shown in Fig. 6, whichis comprised of two intertwined phases. In the Training Modelphase, KUAFUDET extracts features from labeled applicationsbased on our combined set of contributing features, trains clas-sification model offline, and updates classifiers at a certaininterval of time; In the Online Detection phase, KUAFUDET clas-sifies large sets of online Android applications (from multipleonline Android markets) into different categories, benign andmalicious; Meanwhile, KUAFUDET, through a Self-adaptive Learn-ing scheme, discovers new information from both the identifiedmalware and the filtered suspicious false negatives from Cam-ouflage Detector, and incorporates them into Training tostabilize the detection accuracy (Fig. 7).

Table 1 – Malware category.

Malware category Percentage

Privacy 47.1%Fraud 33.9%Rogue 20.1%Spread 20.1%System 11.1%Remote 10.6%Expense 10.1%Payment 6.3%

Note: We take a look at malware category with miscellaneousdatasets.We find that many Android malware families belong to mul-tiple malware categories, thereby the sum of percentages is greaterthan 100%.

Fig. 6 – The KUAFUDET framework through adversarial detection.

Fig. 7 – Choosing very benign and malicious outcomes.

332 c om pu t e r s & s e cu r i t y 7 3 ( 2 0 1 8 ) 3 2 6 – 3 4 4

Page 8: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

5. System design

In retrospect, the quality of discriminative classifier is the keyto the accuracy of malware detection. On one hand, when thetrained classifier is trained once and used for all time, it is notable to correspond to the new malware. On the other hand,aggressive attackers may obfuscate their representations interms of contributing features to impair discriminative clas-sifiers. Thus it might lead to high false negatives that themalicious applications evade detection. In order to perform ac-curate and scalable adversarial detection, our proposedadversarial detection approach contains two phases, trainingand detection, intertwined by the self-adaptive learning (SAL)scheme. In particular, we conduct our similarity-based analy-sis in Camouflage Detector to filter the suspicious falsenegatives.

The implementation of KUAFUDET involves the following steps:

1. In the feature selection stage, we decompile APKs to gen-erate Smali code via Apktool,3 we extract 195 out of 564features using manual pruning along with information gainvalidated.

2. In the training stage, we use different machine-learning clas-sifiers, such as Support Vector Machine (SVM), Random Forest(RF), and K-Nearest Neighbor (KNN), based on 195 dimen-sional features we selected.

3. In the camouflage detection stage, we perform similarity-based filtering to identify the false negatives that are thecamouflaged malicious applications.

5.1. Feature selector

The features considered in this study are classified into twocategories: syntax features ( S PERM INT H API, , ,{ } ) and semantic fea-tures ( ′{ }S Sequence ).

5.1.1. Syntax featuresThrough closely examining more than 250,000 applications fromvarious sources (breakdowns shown in Section 6.1), we noticethat the malicious applications tend to have drastically dif-ferent permissions, intents, hardwares and API calls, whichsupport the assumption that malicious applications are dis-tinguishable from benign ones. To facilitate reading, we showa coarse-grained description of syntax features used in thispaper.

• Permission ( S PERM{ }): Each APK has an AndroidManifest filein its root directory, which is an essential profile includinginformation about the application. Android OS must processthis profile before it runs any installation. The profile filedeclares which permissions the application must have inorder to access protected parts of the API and interact withother applications. It also declares the permissions thatothers are required to have in order to interact with the ap-plication’s components.

• Intent ( S INT{ }): Communication between different compo-nents is mainly through intent, which can be regarded asthe “medium” where information about massive asynchro-nous data exchange and calls to different components isshared between different components and applications.

• Hardware ( S H{ }): Features about requesting access to spe-cific hardware of the smartphone should be declared in themanifest file, such as NFC and GPS, since the combinationof such hardware modules may have harmful impact on thephone.

• API Call ( S API{ } ): Android API calls monitoring, based on thereverse engineering, can monitor those API calls, such assending SMS, accessing user location, and device ID. TheAndroid platform provides a framework API that applica-tions can be used to interact with the underlying AndroidOS. The framework API consists of a core set of packagesand classes. Most applications use a fairly large number ofAPI calls.

Here, we use statistical metrics-driven manual pruning (Chenet al., 2016a) with information gain to cross-check the featureselection. Although information gain facilitates the auto-matic feature selection, it ignores the class information anddistribution of the features. When these features are usedto detect malware, the performance would drop downdramatically. For example, READ_INPUT_STATE (resp.ACTION.SET_WALLPAPER) corresponds to Permission (resp.Intent) exhibits a high information gain than many others butit relates to only a small subset of malware. Such highly spe-cific features are undesirable for classification. In summary, wegenerate 175 types of syntax features.

# # # #S S S SPERM INT H{ } { } { }+ + +∩ ∩ ∩� ���� ���� � ���� ���� � ��� ���61 12 5

AAPI

PERM INT H API

{ }

{ }

∩� ���� ����

� ������������������ ���

97

S , , ,���������������

∑ .

5.1.2. Semantic featuresThe semantic feature ( ′{ }S Sequence ) represents malicious behav-iors that occur sequentially, which are extracted via staticanalysis. For instance, the sequence (DownloadManager, Uri,3 http://ibotpeaches.github.io/Apktool/.

Table 2 – Features.

Syntax features Semantic features

Permission Intent and hardware API Call Sequence

READ_PHONE_STATE INTENT.ACTION.DELETE URL.openConnection (chmod 777, Runtime, getRuntime, exec)WRITE_SMS INTENT.ACTION.GET_CONTENT TelephonyManager.getDeviceId (getDeviceId, URL, openConnection)INSTALL_PACKAGES HARDWARE.TOUCHSCREEN PackageManager.checkPermission (DownloadManager, Uri, Request, enqueue)… … … … … … … …

333c om pu t e r s & s e cu r i t y 7 3 ( 2 0 1 8 ) 3 2 6 – 3 4 4

Page 9: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

Request, enqueue) in Table 2 indicates that a download requestrequires to follow a certain order: construct a request objectand transfer the URL of the file to enqueue method to finish thedownload process. These sequences can characterize some in-teresting malicious behaviors that cannot be captured by thesyntax features and can reflect the malicious behaviors moreexplicitly for a large number of apps, with the purpose of train-ing classifiers. We de facto take several sensitive behaviors intoconsideration, such as “Send SMS,” “Request for chmod,”“Uninstall application,” “Get location,” “Get wifi info,” and “Starthttp connection.” We then extract the sequence of key stringsthat reflect interesting malicious behaviors. For example, re-questing for chmod is described as the sequence of “chmod777,” Runtime, getRuntime, and exec, we define 20 types of se-mantic features for detecting malware. By generating Androidmalware, these semantic features can be extended.

# .′{ }∑ S Sequence∩� ����� �����20

To characterize each of the applications using static analy-sis, we generate a final set of 195 out of 564 types of features,as partially shown in Table 2. In summary, we use 195-dimensional feature vectors for the study (breakdowns shownin Table A.11).

5.1.3. Justifying feature selectionIn DREBIN, the feature set contains thousands of arbitrary stringsthat appear in the manifest file or in the disassembled codeof the app chosen by developers. In particular, DREBIN extractseight types of features: hardware components, requested per-missions, app components, filtered intents, restricted API calls,used permissions, suspicious API calls, network addresses.Thesemassive features chosen from this dataset act as noises, mis-leading the classifier. We readily emulate the feature extractionfor all types of features that DREBIN used since DREBIN authorsdo provide the feature vectors of their own dataset for evalu-ation by other researchers. As shown in Table 3, we use DREBIN

dataset (Arp et al., 2014), massive DREBIN-used features, andsimulate his algorithm. We conclude that extracting DREBIN-used features was a substantially more computationallycomplex process than our feature selection due to the sheernumber of features extracted. In fact, these features do not nec-essarily boost the accuracy. Our approach also validates that195 types of features used in KUAFUDET are proper and will nottrigger the classical curse of dimensionality.

5.2. Machine learning classifiers

With these 195-dimensional features that result from ourfeature selector, we utilize a number of popular algorithms

widely used in security contexts, including Support VectorMachine (SVM), Random Forest (RF), and K-Nearest Neighbor(KNN). We leverage existing implementations of these algo-rithms in WEKA (Hall et al., 2009). In particular, SVMs seek todetermine the maximum margin hyperplane to separate theclasses of malicious and benign applications. When a hyper-plane cannot perfectly separate the binary class samples basedon the features we fed in, we then tune the parameters suchas regularization penalty and non-negative slack variables. Wealso perform multiple rounds of stratified random samplingdue to the data imbalance as stated in Section 3.3. RandomForest (RF) and K-Nearest Neighbor (KNN) are also tuned in ananalogous manner.

5.3. Camouflage detector

To further discover camouflage in malware, we manually picka fair number of applications from the farthest very benign out-comes and very malicious outcomes from the classificationhyperplane, respectively. In particular, these three machine-learning algorithms use the corresponding distance toclassification hyperplane. And those hand-picked applica-tions are the most benign and most malicious predictions andwould be updated along with training set updating. We assumethese applications have not been poisoned by any maliciousthird parties. We then measure the similarity between the train-ing set and the selected most benign applications, and vice versathe selected most malicious applications after classification.By further tuning the similarity threshold, we relabel the cam-ouflage malware of the training set as malicious samples tomake the classifier robust. Moreover, based on similarity analy-sis, we are able to identify the camouflage malware from falsenegatives. Our similarity-based approach is based on ex-tracted robust features and those non-bypassed samples thatare farthest from the hyperplane.

5.3.1. Measuring similarityWe use Jaccard index, Jaccard-weight similarity, and Cosine simi-larity to measure the similarity of applications. The similarityindices are characterized by two vectors A and B, where A rep-resents the feature vectors of the applications in the trainingset and B represents the feature vectors of hand-picked (themost benign or the most malicious) applications. The similar-ity indices used in this paper are the following:

Jaccard index: The Jaccard index, denoted by J(·), is definedas the size of the intersection divided by the size of the unionof the sample sets A and B:

J A BA BA B

A BA B A B

, .( ) = ∩∪

= ∩+ − ∩

Jaccard index is not accurate enough because it does notreflect the actual differences of frequencies. For example, anAPI is used 10 times and 100 times respectively in twoapplications, but the Jaccard distance simply treats themequally.

Jaccard-weight similarity: The Jaccard-weight similarity isdefined as follows, which is computed by two steps.

Table 3 – Comparison of consequential features (usingDREBIN dataset).

Detection tool within features used Accuracy

195-dimensional features used in KUAFUDET 96.55%564-dimensional features used in KUAFUDET 95.80%5000 features used in DREBIN 94.05%500,000 features used in DREBIN 93.90%

334 c om pu t e r s & s e cu r i t y 7 3 ( 2 0 1 8 ) 3 2 6 – 3 4 4

Page 10: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

Step 1. The weight of each component of the feature vectorJf is defined as the percentage that the number of apps whichexhibit that feature over the total number of apps.Step 2. For any two app a ∈ A and b ∈ B, we consider if boththe kth component of feature vectors are non-zero, that isak = bk = 1, where ak and bk denote the kth component offeature vectors of apps a and b, respectively. We collapsethe Jaccard-weight similarity W(a, b) as follows:

W a bJ a b

J

fk k k

k

n

fk

k

n,( ) == =( )=

=

∑∑

11

1

where n = 175 is the dimension of features and ⋅( ) is the in-dicator function.

Cosine similarity: The cosine similarity, denoted by cos(θ),is defined using a dot product of the two vectors A and B dividedby the product of their magnitudes as:

cos .θ( ) = ⋅A BA B

We use Jaccard index (J), Jaccard-weight similarity (W), andCosine similarity (C) to measure the similarity of applica-tions. If the similarity between two applications exceeds acertain threshold, the application will be selected as a malwarecandidate and fed back to the training process for further fine-grained detection. We want to select as many malware

candidates as possible for periodically retraining the classifi-ers. To be specific, a low threshold likely leads to high falsenegatives, while a high threshold leads to low falsenegatives. As shown in Fig. 8, during our experiments, we em-pirically choose parameters J J JT T1 2< < , W W WT T1 2< < , andC C CT T1 2< < as corresponding thresholds and then take theunion of three outcomes for picking the camouflage malware.From an attacker’s perspective, in order to evade the detec-tion, the fraction of two sets A and B must be below a giventhreshold 0 < p < 1 for Jaccard index: A B p A∩ ≤ × andA B p B∩ ≤ × . An optimal attack strategy is to schedule a

group of accounts according to the set of such action sets Aor B that has the maximum cardinality so as to minimize theprobability that two accounts are caught. But finding A withthe maximum cardinality remains an open problem in inter-section set theory (Brunk, 2009), which poses a limitation tothe attacker.

6. Experimental evaluation

We evaluate KUAFUDET with applications downloaded from dif-ferent popular third-party Android markets, as well as in realindustrial environments such as Pwnzen Infotech Inc.The goalsare to evaluate our system in aspects of: (i) the robustness ofour detection under three attacks; (ii) the capabilities of ac-curately detecting malicious applications; (iii) the efficiency andscalability of real-time analysis, and adaptability to new Androidmalware; and (iv) the capabilities of detecting coverage.

6.1. Experimental dataset

As mentioned earlier, most studies lack a large number of datasamples. We fulfill the need by presenting the first large col-lection of 252,900 Android application samples, including 10,400malicious samples, which covers the majority of existing torecent ones, as shown in Table 4. Specifically, these 252,900 APKfiles we collected consist of 242,500 benign applications thatare downloaded from Google Play Store, and the other 10,400malicious APK files where 1260 have been validated in Zhouand Jiang (2012) and the remaining are downloaded fromContagio Mobile Website (340 APKs), Pwnzen Infotech Inc. (4500APKs) and Arp et al. (2014) (4300 APKs). Our malicious appli-cations include all varieties of the threats for Android, suchas phishing, trojans, spyware, and root exploits. In the follow-ing, we randomly select various portions of benign apps andmalicious apps (various ratios of # benign vs # malicious) fordifferent experimental goals. Specifically, we select 1000malware as samples out of the set of 10,400 malicious samples

Fig. 8 – Similarity analysis.

Table 4 – Datasets for adversarial detection of Android malware.

Source Universal Data-driven Analysis Training Test Comparison

Benign 242,500 10,000 8000 2400 0Malicious Pwnzen Infotech Inc. 4500 4500 3500 1000 600

Zhou and Jiang (2012) 1260 1000 1000 260 150Arp et al. (2014) 4300 4200 3200 700 150Contagio 340 300 300 40 100

Apps 252,900 20,000 16,000 4000 1000

335c om pu t e r s & s e cu r i t y 7 3 ( 2 0 1 8 ) 3 2 6 – 3 4 4

Page 11: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

and scan them using KUAFUDET and other industrial malwaredetecting tools. For comparison, the 1000 samples contain bothbenchmarks before 2014 and the most recent datasets, morethan half of which are the most recent malware.

Finally, we measure the efficiency and scalability of KUAFUDET

performance, and perform the entire process of KUAFUDET, usingreal-time streaming, on a server with 16 GB memory, quad-core i7-4800MQ at 3.6 GHz, and 1 TB hard drives.

6.2. Experimental results

For a meaningful comparison, we list the results that are usedto train on different classifiers with respect to the aspects offalse negative (denoted as FN) and accuracy. FN rate refers toall malicious instances that are classified as benign applica-tions. Accuracy simply measures that the classifier makes thecorrect prediction. Because we use our classifier as a tool forprioritizing the response to Android malware disclosures, wefocus on improving the accuracy and reducing the falsenegative.

6.2.1. Evaluation on attacks against the detectionOur collected dataset (16,000 samples as a training set and 4000samples as a test set, shown in Table 4) serves as a bench-mark for evaluating robustness of Android malware detectionsystems. As mentioned, DROIDAPIMINER, DREBIN, and MAMADROID

are the three most recent Android malware detection systems.Since Support Vector Machine (SVM) is the only jointly-usedalgorithm by three detection systems, to conduct a fair com-parison, we adopt SVM to investigate the misclassification rateof the three detection systems together with ours (when theadversarial detection mechanism is not included) under poi-soning attacks. We first show that by poisoning their trainingset, it is possible to mislead their classifiers, along with ours;we then analyze the robustness of our discriminative classi-fiers against the three distinct attack strategies.

(i) Misclassification of Machine Learning Detection Systems.

To perform a longitudinal study, we first apply our poison-ing attack to DROIDAPIMINER, DREBIN, MAMADROID, and KUAFUDET

without adversarial detection (Without AD). Specifically, WithoutAD means that the camouflage detector component is dis-abled. We mimic sophisticated attacks to investigate howineffective these systems perform under our poisoning attack.We assume to have control over a subset of samples and au-tomatically generate the crafted camouflage samples as follows:(i) We can only add or remove features. We must preserve theutility of the modified application, which we achieve by onlyadding features from benign set, and only those that do notinterfere with the functionality of the application. (ii) We canadd a restricted number of features. We thus validate thatadversarial attack is indeed viable in security critical domains.

More specifically, we customize the adversarial crafting al-gorithm (Papernot et al., 2016) according to our adversarialmodel, to indicate the probability of adding or removing a spe-cific feature. Therefore, for machine-learning mechanisms thatare based on syntax features, such as DROIDAPIMINER and DREBIN,we can directly apply our customized adversarial crafting al-gorithm; for machine-learning mechanisms that mainlyconsider semantic and behavioral features, such as MAMADROID,which relies on application behavior using Markov chain mod-eling, we instead target on crafting its feature in terms of callsequences. Specifically, we first extract a set of call sequencesthat are only frequently used by benign samples, and then addthem to the malicious samples, to mimic our sophisticatedattack.

As shown in Table 5, we obtain 80.05%, 75.20%, and 68.95%misclassification rates (i.e., FN) on DROIDAPIMINER, DREBIN, andMAMADROID, respectively. We show that our poisoning attack onSVM is successfully validated through three machine-learningtools.

We here take a detailed discussion on the misclassificationrate of the systems:

• Our sophisticated attackers are able to mislead the machinelearning detection systems.

• MAMADROID relies on transitional call sequences, rather thansingle API calls, to train its classifier, merely inserting syntaxfeatures as we did for poisoning other systems is not con-sidered as a successful attack by our crafting algorithm. Wethereby manipulate its feature space through call sequences.

• MAMADROID achieves lower misclassification rates than DREBIN

and DROIDAPIMINER in the sophisticated attacks, sacrificingmore computational time cost over each application dueto call graph construction and feature extraction. Further-more, MAMADROID requires a sizable amount of memory whenperforming classification because of its large feature setsand the extraction of call graph.

• KUAFUDET (Without AD) also can be attacked through so-phisticated attacks and the misclassification rate (62.60%)indicates that it still suffers from adversarial samples.

• DROIDAPIMINER, DREBIN, and MAMADROID can be thwarted if weembed native code (as a strong attacker defined in our threatmodel) and dynamic code loading with reflection (as a so-phisticated attack), because malicious code is loaded ordetermined at runtime. The attackers can pollute trainingdata using a large-scale crafted samples through thesetechniques.

In conclusion, the state-of-the-art machine learning-based malware detection systems are possible to be misled bythe poisoning attacks we exhibit in the paper. By their nature,classifiers based on syntax features are more vulnerable thanthe ones based on semantic features. On the other hand, se-mantic features extraction does require more computationalcosts than syntax features. Hence, statistical robust features,

Table 5 – Misclassification rate comparison of adversarial detection.

Detection Tool DroidAPIMiner Drebin MaMaDroid KuafuDet (Without AD)

Misclassification rate (FN) 80.05% 75.20% 68.95% 62.60%

336 c om pu t e r s & s e cu r i t y 7 3 ( 2 0 1 8 ) 3 2 6 – 3 4 4

Page 12: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

pruned by information gain, are adopted in KUAFUDET to ac-commodate scalable, generic, and large-scale malware detection.In addition, KUAFUDET provides specific mechanisms to coun-termeasure the dynamic loading and native code embeddingpoisonings. KUAFUDET parses the native code and dynamic codefrom package, and then extracts the corresponding featuresto keep the robustness of classifiers.

In summary, we conjecture that almost all the state-of-the-art machine learning-based malware detection systems aresuffering from the poisoning attack we exhibited in the paper.

(ii) Robustness of KUAFUDET.

Here, we analyze the robustness of our discriminative clas-sifiers when encountering three distinct attack strategies. Thefirst attack strategy is to launch a causative attack without anyknowledge of the training data or ground truth. This weak at-tacker in principle amounts to injecting noise into the system.The second attack strategy corresponds to the strong attacker,who only manipulates partial features in the training set. Thethird, the most aggressive attacker we consider is the sophis-ticated attacker. This attacker can fully manipulate almost alltraining features to launch a sophisticated attack, which createsscenarios where relatively benign mobile applications and real-world malicious mobile applications appear to have nearlyidentical attributes at the training stage.

As shown in Table 6, the weak attacker is not able to forcethe accuracy of our malware detection below 90%. This sug-gests that discriminative classifiers can be relatively robust tothis type of random noise-based attack. When dealing with thestrong attacker, performance degrades to approximately 90%accuracy. The sophisticated attacker can cause the accuracyto drop to approximately 65% by incorporating thousands oftraining set (green bar in Fig. 9). The sophisticated attacker rep-resents a practical upper bound for the accuracy loss that arealistic attacker can inflict on our detection system. We seethat injecting carefully crafted data into training data can sig-nificantly reduce detection accuracy. However, with the helpof adversarial detection, holistic performance upgrades by atleast 15% accuracy with respect to each listed classifier. Hence,performance of our adversarial detection remains above base-line levels listed in Meng et al. (2016) even for our strongestattackers due to the use of similarity-based filtering to in-crease classifier robustness. Analysis on false negatives has ananalogous interpretation (see Fig. 10). As shown in Table 6, the

algorithm KNN outperforms other algorithms under adversarialenvironment because of its higher resistance to random clas-sification noise, which is aligned with the conclusion drawnfrom the recent research (Wang et al., 2017).

Fig. 11 displays scatter-plots of system accuracy ((TP + TN)/(TP + TN + FP + FN)) as a function of the size of randomly crafteddatasets, where TP and TN denote the number of samples cor-rectly classified as malicious and benign, respectively. FP andFN indicate the number of samples mistakenly identified asmalicious and benign, respectively. To ease presentation, theplots are fitted by Loess curves with 95% confidence intervalbands that depict the upper and lower confidence bounds forall points within the range of data, making it especially usefulfor comparing groups for which no theoretical models exist.As can be seen from Fig. 11, as the dataset size grows, the ac-curacy drops slightly. This is somewhat expected as morepotential evasion takes place. Furthermore, as discussed earlier,KuafuDet offers significantly better robustness of detection ac-curacy, regardless of either the size of the dataset or the typeof classifier applied. Analysis on the robustness of recall hasan analogous interpretation.

Table 6 – The performance of adversarial detection.

Conventional Detection SVM RF KNN

FN 4.90% 2.50% 3.40%Accuracy 94.95% 96.35% 95.80%

Attacker Weak Strong Sophisticated Weak Strong Sophisticated Weak Strong Sophisticated

Without AD SVM RF KNN

FN 8.60% 49.80% 62.60% 5.60% 41.80% 55.90% 5.90% 26.20% 45.40%Accuracy 93.10% 72.50% 64.30% 94.80% 76.40% 67.85% 94.55% 84.40% 72.00%

Within AD SVM RF KNN

FN 5.80% 10.00% 12.60% 4.70% 11.20% 14.50% 4.10% 9.20% 11.60%Accuracy 94.50% 92.40% 89.30% 95.65% 92.00% 88.85% 95.45% 92.90 90.45%

Fig. 9 – Detection accuracy.

337c om pu t e r s & s e cu r i t y 7 3 ( 2 0 1 8 ) 3 2 6 – 3 4 4

Page 13: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

6.2.2. Evaluation on accuracyWe hereby compare our work with the previous work withrespect to accuracy. We opt to apply a large portion of ourdataset used in KUAFUDET, which intergrates the dataset ownedby DREBIN (Arp et al., 2014) and STORMDROID (Chen et al., 2016a),in order to evaluate accuracy performance. Because the datasetused in DROIDAPIMINER and MAMADROID are not publicly avail-able, to be fair, we do not apply the dataset that is not evenused in DROIDAPIMINER and MAMADROID per se to them to havean asymmetric advantage over it. Hence, DREBIN and STORMDROID

are only considered. As shown in Table 7, our accuracy rate(96.35%) completely outperforms the accuracy rate in STORMDROID

(93.80%) (Chen et al., 2016a) and Drebin (93.90%) (Arp et al., 2014),let alone use our combined dataset.

As evidenced by Table 7, we achieve the highest accuracybecause of the feature selection and similarity-based filtering.

6.2.3. Robustness of imbalanced dataOur experiments aim to evaluate the robustness of KUAFUDET

when the ratio goes imbalanced in the adversarial environment.

To evaluate the robustness of KUAFUDET in the data-imbalancedenvironment, we first apply 4000 malicious and 4000 benignapps (i.e., 1:1 ratio) for training our classifiers, and graduallyadd benign ones to achieve different ratios up to 1:50. To bespecific, we conduct experiments with ratios including 1:1, 1:5,1:10, 1:20, and 1:50. We use 10-fold cross-validation in our ex-periments. As shown in Table 8, the accuracy degrades as weapproach real-world ratio of malware and benign apps, but theaccuracy still remains above 90%. In the self-adaptive scheme,we, nevertheless, show the capability to tackle the problem ofthe imbalanced data ratio.

6.2.4. Evaluation on time cost, scalability, and adaptabilityTo support a high-performance malware detection, KUAFUDET

is designed to run on top of an open-source distributed stream-processing engine Storm.4 KUAFUDET supports a large-scaledetection of a data stream by a set of worker units that connectto each other, forming a topology: A submitted application isfirst disassembled to extract its features; then, the metrics-driven pruning and information gain analysis are run, and two-phase iterative adversarial detection is finally activated. Eachoperation here is delegated to a worker unit on the topologyand all the data associated with the application are in a singlestream. Running on top of the Storm stream processor, KUAFUDET

is tested on the platform of Pwnzen Infotech Inc. We show thatthe size of the test group is unaffected with the efficiency ofour tool. As shown in Table 9, average detection time per ap-plication is less than 3 seconds, which is indeed capable ofscaling up to the massive datasets.

6.2.5. Evaluation on coverageTo circumvent the over-fitting issue and to better understandthe coverage of KUAFUDET, we randomly sample 1000 mali-

4 http://storm.apache.org/.

Fig. 10 – Detection false negatives.

Fig. 11 – The robustness of KUAFUDET (accuracy).

Table 7 – Accuracy comparison.

Detection Tool Accuracy # Malware

KUAFUDET 96.35% 10,400DREBIN 93.90% 5,560STORMDROID 93.80% 3,620

Table 8 – Results for different malware to benign appsratios.

Ratio 1:1 1:5 1:10 1:20 1:50

Accuracy 96.40% 96.15% 95.80% 94.60% 93.75%

Table 9 – Time cost of KUAFUDET in units of seconds.

# APKs Total time AVG time/APK

200 518 2.59400 1066 2.67600 1578 2.63800 2097 2.621000 2778 2.78

338 c om pu t e r s & s e cu r i t y 7 3 ( 2 0 1 8 ) 3 2 6 – 3 4 4

Page 14: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

cious applications from 217 Android malware families from ourdataset to cover almost all the existing Android malicious be-haviors. We scan them using KUAFUDET and other well-knownindustrial malware detection tools, such as Kaspersky andMcAfee encapsulated in VirusTotal. Although the coverage ofKUAFUDET, with the combined top features, is 96.20%, better thanwhat can be achieved by any individual scanner, including suchtop-of-the-line antivirus systems as ESET-NOD32 (79.50%),McAfee (75.50%), Ikarus (72.50%), Kaspersky (72.10%), and Avira(69.30%), industrial tools deal with millions of applications, manyof which are zero-day. We argue that comparing with indus-trial tools is to understand the different emphases in academiaand industry. The breakdowns of the coverage study is pre-sented in Table 10.

Since KUAFUDET decouples the similarity-based filtering frommachine-learning classifiers, it enables us to periodicallyenhance the learning system. Moreover, KUAFUDET also consid-ers an attacker threat dimension, which makes the wholesystem design completely adaptable to new malware.

7. Discussion

Our study is limited in five ways as discussed in the following.

(i) The granularity of classifiers. The hyperplane betweenbenign and malicious can be blurred and subjective,which depends on specific security requirements anduses cases to determine whether a pattern is reallybenign or malicious. For example, if individual users roottheir own devices and use the game hacking applica-tions, game developers are very likely to treat them asmalicious because they bypass the in-app purchase. Inpractice, this kind of apps is defined as “grayware” thathas no clear distinctive difference between the benignand the malicious. “Grayware” is now becoming a greatthreat to mobile devices since attackers achieve moreprofit in this way. KUAFUDET is a generic and coarse-grained architecture for classifying applications with highaccuracy. As for such grayware, KUAFUDET can be tunedfor a specific detection.

(ii) The limitation of decompilation technologies. We extractfeatures from the manifest and Smali files that are suc-cessfully decompiled. However, in our experiments,we find that a few APK files cannot be decompiled

successfully. As for this situation, we change thedecompiling tool in our experiments. For future study,we will explore the possibility to use reinforcement tech-niques to prevent the APKs from reverse-engineering.Thiswould increase the difficulty of unpacking the originalAPK for attackers.

(iii) The scarcity of empirical samples. Although we findsome case studies reflecting the different attacker levelsfor machine learning classifiers, we still lack a hugenumber of samples to scrutinize. We note attackersmight be able to deliberately force the benign andmalicious access patterns to co-occur in one log, suchas triggering the benign pattern first and then launch-ing the attack, though we have not observed in thewild. This perhaps requires to dilute the poisoned logsand possibly requires human analysts to contributeexternal knowledge. We hope in future study thatusing Game Theory is beneficial to our interpretationof the attackers psychology so as to take targetedstrategy to detect stubborn malware.

(iv) The limitation of selected features. Although the 195 fea-tures are representative and extracted by using statisticalmetrics-driven manual pruning with information gainin our experiments, new malicious behaviors mightdisturb the feature space, making the system lesseffective.

(v) The limitation of similarity-based approach. (1) Attack-ers can somehow approximate our similarity-basedapproach by inferring the similarity thresholds used.However, it is actually difficult to infer our selectedsamples that are used to calculate the thresholds. Fur-thermore, the thresholds will change as selected samplesupdate over time, for which attackers will take greatefforts to exploit. (2) KUAFUDET, through a self-adaptivelearning scheme, discovers new information from boththe identified malware and the filtered suspicious falsenegatives from camouflage detector.We acknowledge thatthis process would cause false positives. For example,SMS-related applications are benign applications, but theyalso have sensitive behaviors that are typical in Androidmalware.

8. Related work

Contemporary machine learning-based techniques typicallymodel the detection problem as a binary classification problem.Together with system analysis techniques, the malicious be-haviors can be studied and employed to increase their detectionperformance, especially for mobile applications in the wild.

8.1. Machine learning-based detection

Arp et al. (2014) built the DREBIN system, which works with amassive feature set extracted from the manifest file and theapp’s source code and trains an SVM classifier for malware de-tection. Although DREBIN has accommodated thousands offeatures with an impressive performance results, it suffers twochallenges: first, the malware is out-dated and well recorded

Table 10 – Coverage comparison.

Detection Tool Percentage

KUAFUDET 96.20%ESET-NOD32 79.50%McAfee 75.50%Ikarus 72.50%Kaspersky 72.10%Avira 69.30%VIPER 67.50%Qihoo-360 62.30%Symantec 40.40%

339c om pu t e r s & s e cu r i t y 7 3 ( 2 0 1 8 ) 3 2 6 – 3 4 4

Page 15: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

in malware detection tools; second, a comprehensive cover-age of different attacking and evasion techniques is missing.

DROIDAPIMINER (Aafer et al., 2013) mainly extracts the top 169API calls, which are used more frequently in the malware thanin the benign set, package level information, as well as somedangerous parameter information as features to analyze a largecorpus of Android malware. Because of the evolution of bothmalware and the Android API, it requires constant retrainingon most common calls.

Most recently, Chen et al. (2016c) suggest the use of se-mantic features of mobile apps to retain classifier value overtime, building on the intuition that certain semantic attri-butes of mobile malware are invariant. Experiments verify thatthe incorporation of semantic features can significantly improvethe performance of Android malware classification. Deo et al.(2016) propose to assess the quality of binary classification byusing probabilistic predictors. Although they both consider re-training, adversarial environment is missing.

MUDFLOW (Avdiienko et al., 2015) argues that the patternof sensitive information flows in malware is statisticallydifferent from those in benign apps. From an application,MUDFLOW uses static analysis to extract the flow paths, andthese flow paths are then mapped to a feature vector used ina specific classifier. DroidSIFT (Zhang et al., 2014) is unique indesigning features in terms of the generation of API depen-dency graphs G for each app, and the construction of thefeature vector of the app. The features represent the similar-ity of the graphs G corresponding to the database of graphsof known benign apps and malware apps. Finally, the featurevectors are used in anomaly detection. However, the datasetfor detection is not large enough, yielding low effectivenessand confidence in classification. DroidMat (Wu et al., 2012)uses a static feature-based mechanism, which considersstatic information (e.g., permission, intent messages, APIcalls) for detecting Android malware. It uses K-means algo-rithm to enhance the malware modeling capability and KNNalgorithm to classify apps as benign or malicious. However, itdoes not extract semantic features for training and does nottake adversarial environment into consideration. Shabtai et al.(2012) presented a host-based malware detection systemthat continuously monitors mobile devices to detect mali-cious data using a supervised machine-learning anomalydetection technique. It focuses on host-based malware,while our approach focuses on mobile malware detectionin adversarial environment. Most recently, MAMADROID(Mariconti et al., 2017) is built to maintain resilience to APIchanges, but it requires a large amount of memory whenperforming classification and a substantial amount of timeper app.

8.2. Evasion techniques

Currently, the issues of understanding machine-learning se-curity in adversarial settings mainly focus on spam emaildetection (Biggio et al., 2014; Brückner et al., 2012; Debarr et al.,2013; Wang et al., 2014; Zhang et al., 2016). Recently, many sta-tistical adversarial models are proposed to construct effectiveadversarial samples, such as using deep neutral networks(Grosse et al., 2016; Li and Li, 2016; McDaniel et al., 2016;Papernot et al., 2016; Shen et al., 2016). As seen from a generic

perspective, Wang et al. (2016) utilized the notation of topo-logical spaces and oracles to explain why an adversarial samplecan bypass a classifier, and they generated a sufficient and nec-essary condition to determine the robustness of classifiers underadversarial environment. Goodfellow et al. (2014) explained andgenerated adversarial samples for adversarial training to reducetest error. However, all of these studies did not focus on spe-cific causation leading to evasion in the mobile malware context.They also did not show the feasibility how these adversarialsamples work in the wild.

For conventional malware evasion, one straightforwardevasion technique is to repackage a benign app with small snip-pets of malicious code added to several classes. Moreover,attackers could also use reflection, dynamic code loading, ornative code (Poeplau et al., 2014). Such attempts to escape de-tection are likely to be deemed suspicious. Among them,DroidChameleon (Rastogi et al., 2014) integrates three types oftransformation techniques and generates obfuscated mobilemalware. Mystique (Meng et al., 2016) develops a frameworkto automatically generate malware covering four attack fea-tures and two evasion features to obfuscate the generatedmalware. For the general defense, Cao and Yang (2015) pre-sented a proof-of-concept machine unlearning prototype thatcan rapidly forget data to regain privacy, security, and usabil-ity. The current paper is an extension of a poster (Chen et al.,2016b).

In summary, previous works either conduct evasion tech-niques without considering the feature space or only usemachine learning-based approaches. With our experiments andreal-world case studies, it is obvious that attacks can also poisonfeatures while preserving maliciousness, and our experi-ments verified that the resulting fake variants with poisonedfeatures impaired discriminative classifiers and succeeded inlowering the detection score in a test environment. To the bestof our knowledge, this is one of the first papers to accommo-date adversarial machine learning into mobile malwaredetection. We are also the first paper to show the possibilityto defend against adversarial attacks on mobile malware, tothe greatest extent.

9. Conclusion

We reviewed several challenges for the malware detectionproblem. We showed how the conventional machine learningclassifiers can fail against determined attackers. Based on theseinsights, we designed and evaluated three types of attackerstargeting the training phases to poison our detection. Throughsimulation, we presented practical bounds for the accuracy lossto each attacker. To address this threat, we therefore pro-posed our detection system, KUAFUDET, and showed itsignificantly reduces false negatives and boosts the detectionaccuracy by at least 15%.

We argue that it is essential to inform researchers consid-ering how attackers will adapt to the conventional detection,as well as to inform developers working on the next-generationmalware detection systems. We conjecture that the arms racewill be over only when the effectiveness of early detection willsufficiently increase the cost of infection.

340 c om pu t e r s & s e cu r i t y 7 3 ( 2 0 1 8 ) 3 2 6 – 3 4 4

Page 16: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

Appendix A. Syntax and Semantic Features

Table A.11 – 175 syntax features and 20 semantic features for training classifiers.

PERMISSION SET_WALLPAPER TelephonyManager.getSubscriberId LocationManager.addNmeaListener

ACCESS_COARSE_LOCATION SET_WALLPAPER_HINTS TelephonyManager.getVoiceMailNumber LocationManager.addProximityAlertACCESS_FINE_LOCATION STATUS_BAR TelephonyManager.hasIccCard LocationManager.addTestProviderACCESS_LOCATION_EXTRA_COMMANDS SYSTEM_ALERT_WINDOW TelephonyManager.isNetworkRoaming LocationManager.clearTestProviderLocationACCESS_NETWORK_STATE UPDATE_DEVICE_STATS SmsManager.divideMessage LocationManager.getBestProviderACCESS_WIFI_STATE USE_CREDENTIALS SmsManager.getDefault LocationManager.getGpsStatusAUTHENTICATE_ACCOUNTS VIBRATE SmsManager.sendDataMessage LocationManager.getLastKnownLocationBATTERY_STATS WAKE_LOCK SmsManager.sendMultipartTextMessage LocationManager.requestLocationUpdatesBLUETOOTH WRITE_APN_SETTINGS SmsManager.sendTextMessage LocationManager.sendExtraCommandBROADCAST_SMS WRITE_SETTINGS HttpURLConnection.disconnect WifiManager.addNetworkBROADCAST_STICKY WRITE_SMS HttpURLConnection.getContentEncoding WifiManager.calculateSignalLevelCALL_PHONE WRITE_EXTERNAL_STORAGE HttpURLConnection.getPermission WifiManager.createWifiLockCAMERA INTENT HttpURLConnection.getRequestMethod WifiManager.disconnectCHANGE_COMPONENT_ENABLED_STATE action.DELETE HttpURLConnection.getResponseCode WifiManager.enableNetworkCHANGE_CONFIGURATION action.GET_CONTENT HttpURLConnection.getResponseMessage WifiManager.getConfiguredNetworksCHANGE_NETWORK_STATE action.MAIN HttpURLConnection.usingProxy WifiManager.getConnectionInfoCHANGE_WIFI_MULTICAST_STATE action.PICK ContentResolver.bulkInsert WifiManager.getDhcpInfoCHANGE_WIFI_STATE action.SEND ContentResolver.getType WifiManager.getScanResultsCLEAR_APP_CACHE action.SET_WALLPAPER ContentResolver.openAssetFileDescriptor WifiManager.getWifiStateCONTROL_LOCATION_UPDATES action.VIEW ContentResolver.query WifiManager.isWifiEnabledDELETE_CACHE_FILES category.BROWSABLE ContentResolver.registerContentObserver WifiManager.removeNetworkDELETE_PACKAGES category.DEFAULT ContentResolver.update WifiManager.saveConfigurationDEVICE_POWER category.HOME ContentResolver.delete WifiManager.setWifiEnabledDISABLE_KEYGUARD category.INFO Runtime.getRuntime NotificationManager.cancelEXPAND_STATUS_BAR category.LAUNCHER Runtime.exec NotificationManager.notifyFLASHLIGHT HARDWARE Runtime.addShutdownHook PackageManager.checkPermissionGET_PACKAGE_SIZE camera Runtime.maxMemory PowerManager.isInteractiveGET_TASKS camera.autofocus URLConnection.addRequestProperty PowerManager.isScreenOnINSTALL_PACKAGES sensor.accelerometer URLConnection.connect PowerManager.newWakeLockINTERNET telephony URLConnection.getConnectTimeout SEMANTICKILL_BACKGROUND_PROCESSES touchscreen URLConnection.getContent *Install application*MODIFY_PHONE_STATE API CALL URLConnection.getContentType *Uninstall application*MOUNT_UNMOUNT_FILESYSTEMS URL.openConnection URLConnection.getDefaultUseCaches *Get installed packages*NFC URL.openStream URLConnection.getPermission *Monitor URI*PERSISTENT_ACTIVITY URL.getContent URLConnection.getURL *Download file*PROCESS_OUTGOING_CALLS TelephonyManager.getCallState URLConnection.setConnectTimeout *Get location*READ_CALL_LOG TelephonyManager.getCellLocation URLConnection.setReadTimeout *Read SD card*READ_CONTACTS TelephonyManager.getDeviceId ActivityManager.getLargeMemoryClass *Write SD card*READ_EXTERNAL_STORAGE TelephonyManager.getDeviceSoftwareVersion ActivityManager.getRunningAppProcesses *Request for chmod*READ_LOGS TelephonyManager.getNeighboringCellInfo ActivityManager.isLowRamDevice *Start http connection*READ_PHONE_STATE TelephonyManager.getNetworkCountryIso ActivityManager.killBackgroundProcesses *Send Sms*READ_PROFILE TelephonyManager.getNetworkOperator ActivityManager.restartPackage *Receive Sms*READ_SMS TelephonyManager.getNetworkOperatorName BluetoothAdapter.cancelDiscovery *Delete Sms*RECEIVE_BOOT_COMPLETED TelephonyManager.getNetworkType BluetoothAdapter.getAddress *Intercept Sms receiver*RECEIVE_MMS TelephonyManager.getPhoneType BluetoothAdapter.getBondedDevices *Get wifi info*RECEIVE_SMS TelephonyManager.getSimCountryIso BluetoothAdapter.getRemoteDevice *Get Logs*RECEIVE_WAP_PUSH TelephonyManager.getSimOperator BluetoothSocket.connect *Get Class loader*RECORD_AUDIO TelephonyManager.getSimOperatorName DownloadManager.enqueue *Get contacts*RESTART_PACKAGES TelephonyManager.getSimSerialNumber DownloadManager.query *Get account*SEND_SMS TelephonyManager.getSimState LocationManager.addGpsStatusListener *Get phone type/Sim serial number/device

id/subscriber id/IMSI*

341computers

&securit

y73

(2018)326–344

Page 17: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

R E F E R E N C E S

Aafer Y, Du W, Yin H. DroidAPIMiner: mining API-level featuresfor robust malware detection in Android. In: Security andprivacy in communication networks. Springer; 2013. p. 86–103.

Arp D, Spreitzenbarth M, Hubner M, Gascon H, Rieck K. DREBIN:effective and explainable detection of Android malware inyour pocket. In: Proceedings of the annual symposium onNetwork and Distributed System Security (NDSS). 2014.

Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, et al.FlowDroid: precise context, flow, field, object-sensitive andlifecycle-aware taint analysis for Android apps. In: ACMSIGPLAN notices, vol. 49. ACM; 2014. p. 259–69.

Avdiienko V, Kuznetsov K, Gorla A, Zeller A, Arzt S, Rasthofer S,et al. Mining apps for abnormal usage of sensitive data. In:2015 IEEE/ACM 37th IEEE international conference onsoftware engineering, vol. 1. IEEE; 2015. p. 426–36.

Biggio B, Fumera G, Roli F. Security evaluation of patternclassifiers under attack. IEEE Trans Knowl Data Eng2014;26(4):984–96.

Brunk F. 2009. Intersection problems in combinatorics [Ph.D.thesis]. University of St Andrews.

Brückner M, Kanzow C, Scheffer T. Static prediction games foradversarial learning problems. J Mach Learn Res2012;13(Sep):2617–54.

Cao Y, Yang J. Towards making systems forget with machineunlearning. In: Security and privacy (SP), 2015 IEEEsymposium on. IEEE; 2015. p. 463–80.

Chen S, Xue M, Tang Z, Xu L, Zhu H. StormDroid: astreaminglized machine learning-based system for detectingAndroid malware. In: Proceedings of the 11th ACM on Asiaconference on computer and communications security. ACM;2016a. p. 377–88.

Chen S, Xue M, Xu L. Poster: towards adversarial detection ofmobile malware. In: Proceedings of the 22nd annualinternational conference on mobile computing andnetworking. ACM; 2016b. p. 415–16.

Chen W, Aspinall D, Gordon AD, Sutton C, Muttik I. Moresemantics more robust: improving android malwareclassifiers. In: Proceedings of the 9th ACM conference onsecurity & privacy in wireless and mobile networks. ACM;2016c. p. 147–58.

Dash SK, Suarez-Tangil G, Khan S, Tam K, Ahmadi M, Kinder J,et al. DroidScribe: classifying Android malware based onruntime behavior. Mobile Security Technologies (MoST 2016)2016;7148:1–12.

Debarr D, Sun H, Wechsler H. Adversarial spam detection usingthe randomized hough transform-support vector machine. In:Machine Learning and Applications (ICMLA), 201312th international conference on, vol. 1. IEEE; 2013.p. 299–304.

Deo A, Dash SK, Suarez-Tangil G, Vovk V, Cavallaro L. Prescience:probabilistic guidance on the retraining conundrum formalware detection. In: Proceedings of the 2016 ACMworkshop on artificial intelligence and security. ACM; 2016. p.71–82.

Enck W, Gilbert P, Han S, Tendulkar V, Chun B-G, Cox LP, et al.TaintDroid: an information-flow tracking system for realtimeprivacy monitoring on smartphones. ACM Transactions onComputer Systems (TOCS) 2014;32(2):5.

Fan L, Xue M, Chen S, Xu L, Zhu H. Poster: accuracy vs. time cost:detecting android malware through pareto ensemble pruning.In: Proceedings of the 2016 ACM SIGSAC conference oncomputer and communications security. ACM; 2016. p. 1748–50.

Feizollah A, Anuar NB, Salleh R, Suarez-Tangil G, Furnell S.Androdialysis: analysis of android intent effectivenessin malware detection. Computers & Security 2017;65:121–34.

Felt AP, Chin E, Hanna S, Song D, Wagner D. Android permissionsdemystified. In: Proceedings of the 18th ACM conference oncomputer and communications security. ACM; 2011. p. 627–38.

Goodfellow IJ, Shlens J, Szegedy C. 2014. Explaining andharnessing adversarial examples. arXiv preprintarXiv:1412.6572.

Gordon MI, Kim D, Perkins JH, Gilham L, Nguyen N, Rinard MC.Information flow analysis of Android applications inDroidSafe. In: Proceedings of the annual symposium onNetwork and Distributed System Security (NDSS). 2015.

Graziano M, Canali D, Bilge L, Lanzi A, Balzarotti D. Needles in ahaystack: mining information from public dynamic analysissandboxes for malware intelligence. In: 24th USENIX SecuritySymposium (USENIX Security 15). 2015. p. 1057–72.

Grosse K, Papernot N, Manoharan P, Backes M, McDaniel P. 2016.Adversarial perturbations against deep neural networks formalware classification. arXiv preprint arXiv:1606.04435.

Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH.The WEKA data mining software: an update. ACM SIGKDDExplorations Newsletter 2009;11:10–18.

Idrees F, Rajarajan M, Conti M, Chen T, Rahulamathavan Y.PIndroid: a novel android malware detection system usingensemble learning methods. Computers & Security.2017;doi:10.1016/j.cose.2017.03.011.

Li L, Bartel A, Bissyandé TF, Klein J, Le Traon Y, Arzt S, et al.IccTA: detecting inter-component privacy leaks in Androidapps. In: Proceedings of the 37th international conference onsoftware engineering, vol. 1. IEEE Press; 2015. p. 280–91.

Li X, Li F. 2016. Adversarial examples detection in deep networkswith convolutional filter statistics. arXiv preprintarXiv:1612.07767.

Mariconti E, Onwuzurike L, Andriotis P, De Cristofaro E, Ross G,Stringhini G. MAMADROID: detecting Android malware bybuilding Markov chains of behavioral models. In: Proceedingsof the annual symposium on Network and DistributedSystem Security (NDSS). 2017.

McDaniel P, Papernot N, Celik ZB. Machine learning inadversarial settings. IEEE Security & Privacy 2016;14(3):68–72.

Meng G, Xue Y, Mahinthan C, Narayanan A, Liu Y, Zhang J, et al.Mystique: evolving Android malware for auditing anti-malware tools. In: Proceedings of the 11th ACM on Asiaconference on computer and communications security. ACM;2016. p. 365–76.

Octeau D, Jha S, Dering M, McDaniel P, Bartel A, Li L, et al.Combining static analysis with probabilistic models to enablemarket-scale android inter-component analysis. In: ACMSIGPLAN notices. vol. 51. ACM; 2016. p. 469–84.

Papernot N, McDaniel P, Jha S, Fredrikson M, Celik ZB, Swami A.The limitations of deep learning in adversarial settings. In:Security and Privacy (EuroS&P), 2016 IEEE Europeansymposium on. IEEE; 2016. p. 372–87.

Poeplau S, Fratantonio Y, Bianchi A, Kruegel C, Vigna G. Executethis! analyzing unsafe and malicious dynamic code loading inandroid applications. In: NDSS, vol. 14. 2014. p. 23–6.

Rasthofer S, Arzt S, Bodden E. A machine-learning approach forclassifying and categorizing Android sources and sinks. In:Proceedings of the annual symposium on Network andDistributed System Security (NDSS). 2014.

Rasthofer S, Arzt S, Miltenberger M, Bodden E. Harvestingruntime values in Android applications that feature anti-analysis techniques. In: Proceedings of the annualsymposium on Network and Distributed System Security(NDSS). 2016.

342 c om pu t e r s & s e cu r i t y 7 3 ( 2 0 1 8 ) 3 2 6 – 3 4 4

Page 18: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

Rastogi V, Chen Y, Jiang X. Catch me if you can: evaluatingAndroid anti-malware against transformation attacks. IEEETransactions on Information Forensics and Security2014;9(1):99–108.

Schlegel R, Zhang K, Zhou X, Intwala M, Kapadia A, Wang X.Soundcomber: a stealthy and context-aware sound Trojan forsmartphones. In: Proceedings of the annual symposium onNetwork and Distributed System Security (NDSS), vol. 11.2011. p. 17–33.

Shabtai A, Kanonov U, Elovici Y, Glezer C, Weiss Y. Andromaly: abehavioral malware detection framework for Android devices.Journal of Intelligent Information Systems 2012;38(1):161–90.

Shen S, Tople S, Saxena P. AUROR: defending against poisoningattacks in collaborative deep learning systems. In:Proceedings of the 32nd annual conference on computersecurity applications. ACM; 2016. p. 508–19.

Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, GoodfellowI, et al. Intriguing properties of neural networks. In:Proceedings of the 2014 international conference on learningrepresentations. Computational and biological learningsociety. 2014.

Tam K, Khan SJ, Fattori A, Cavallaro L. CopperDroid: automaticreconstruction of Android malware behaviors. In: Proceedingsof the annual symposium on Network and DistributedSystem Security (NDSS). 2015.

Wang B, Gao J, Qi Y. 2016. A theoretical framework for robustnessof (deep) classifiers under adversarial noise. arXiv preprintarXiv:1612.00334.

Wang F, Liu W, Chawla S. On sparse feature attacks inadversarial learning. In: Data Mining (ICDM), 2014 IEEEinternational conference on. IEEE; 2014. p. 1013–18.

Wang Y, Jha S, Chaudhuri K. 2017. Analyzing the robustness ofnearest neighbors to adversarial examples. arXiv preprintarXiv:1706.03922.

Wong MY, Lie D. IntelliDroid: a targeted input generator for thedynamic analysis of Android malware. In: Proceedings of theannual symposium on Network and Distributed SystemSecurity (NDSS). 2016.

Wu C, Zhou Y, Patel K, Liang Z, Jiang X. AirBag: boostingsmartphone resistance to malware infection. In: Proceedingsof the annual symposium on Network and DistributedSystem Security (NDSS). 2014.

Wu D-J, Mao C-H, Wei T-E, Lee H-M, Wu K-P. DroidMat: Androidmalware detection through manifest and API calls tracing. In:Information security (Asia JCIS), 2012 seventh Asia jointconference on. IEEE; 2012. p. 62–9.

Yan LK, Yin H. DroidScope: seamlessly reconstructing the OS andDalvik semantic views for dynamic Android malwareanalysis. In: Presented as part of the 21st USENIX securitysymposium (USENIX Security 12). 2012. p. 569–84.

Yang C, Xu Z, Gu G, Yegneswaran V, Porras P. DroidMiner:automated mining and characterization of fine-grainedmalicious behaviors in Android applications. In: Europeansymposium on research in computer security. Springer; 2014.p. 163–82.

Yang W, Xiao X, Andow B, Li S, Xie T, Enck W. AppContext:differentiating malicious and benign mobile app behaviorsusing context. In: Software Engineering (ICSE), 2015 IEEE/ACM37th IEEE international conference on, vol. 1. IEEE; 2015. p.303–13.

Zhang F, Chan PP, Biggio B, Yeung DS, Roli F. Adversarial featureselection against evasion attacks. IEEE Transactions onCybernetics 2016;46(3):766–77.

Zhang M, Duan Y, Yin H, Zhao Z. Semantics-aware Androidmalware classification using weighted contextual APIdependency graphs. In: Proceedings of the 2014 ACM SIGSACconference on computer and communications security. ACM;2014. p. 1105–16.

Zhou W, Zhou Y, Grace M, Jiang X, Zou S. Fast, scalable detectionof piggybacked mobile applications. In: Proceedings of thethird ACM conference on data and application security andprivacy. ACM; 2013. p. 185–96.

Zhou Y, Jiang X. Dissecting Android malware: characterizationand evolution. In: Security and privacy (SP), 2012 IEEEsymposium on. IEEE; 2012. p. 95–109.

Zhou Y, Wang Z, Zhou W, Jiang X. Hey, you, get off of my market:detecting malicious apps in official and alternative Androidmarkets. In: NDSS, vol. 25. 2012. p. 50–2.

Sen Chen is pursuing his Ph.D. degree at the School of ComputerScience and Software Engineering of East China Normal Univer-sity, focusing primarily on areas of smartphone security, Androidmalware, vulnerability and program analysis. He has received theMobiCom 2016 Travel Grant Award. He is currently serving as a vis-iting scholar in Cyber Security Lab at Nanyang TechnologicalUniversity. He is currently advised by Professor Lihua Xu (ECNU)and Yang Liu (NTU).

Minhui Xue is pursuing his Ph.D. degree at the School of Com-puter Science and Software Engineering of East China NormalUniversity. He is also serving as a visiting scholar at the CourantInstitute of Mathematical Sciences and Tandon School of Engi-neering at New York University, as well as a research assistant atNew York University Shanghai, advised by Professor Keith W.Ross (NYU). Previously, he received a Bachelor of Science degreein the field of fundamental mathematics from East China NormalUniversity in July 2013. His current research interests are in data-driven analysis of online social networks and privacy.

Lingling Fan is pursuing her Ph.D. degree at the School of Com-puter Science and Software Engineering of East China NormalUniversity, focusing on software testing, model checking, andAndroid application analysis. She is interested in software testingand analysis, aiming at bug revelation and bug localization, andmalware detection. She received a Bachelor of Science degree inthe field of computer science and technology from ECNU, as an ex-cellent graduate student of Shanghai. She received an ExcellentStudent award from ECNU. She is currently advised by ProfessorLihua Xu (ECNU) and Yang Liu (NTU).

Shuang Hao is an assistant professor at the Department of Com-puter Science at the University of Texas at Dallas. He is broadlyinterested in all aspects of network and system security. His workfollows a measurement- and data-driven approach to character-ize and detect critical security issues in large-scale systems. Hiscurrent research focuses on anomaly detection, underground eco-nomics, DNS analysis, web and mobile security. He obtained hisPh.D. in Computer Science at the Georgia Institute of Technology,and he did a Postdoctorate at the University of California, SantaBarbara.

Lihua Xu is an associate professor at School of ComputerScience and Software Engineering, East China Normal University.She received her Ph.D. and M.Sc. degree from University of Cali-fornia at Irvine. Her current research focuses on softwareengineering, automated software analysis and testing, and mobilesecurity.

Haojin Zhu is currently a professor at Department of ComputerScience and Engineering, Shanghai Jiao Tong University, China. Hereceived his B.Sc. degree (2002) from Wuhan University (China), hisM.Sc.(2005) degree from Shanghai Jiao Tong University (China), bothin computer science and the Ph.D. in Electrical and Computer En-gineering from the University of Waterloo (Canada), in 2009. Hiscurrent research interests include network security and data privacy.He serves as the associate/guest editor of IEEE Internet of Things

343c om pu t e r s & s e cu r i t y 7 3 ( 2 0 1 8 ) 3 2 6 – 3 4 4

Page 19: Automated poisoning attacks and defenses in …...Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach Sen Chen a,b, Minhui

Journal, IEEE Wireless Communications, IEEE Network, and Peer-to-Peer Networking and Applications.

Bo Li is currently a Postdoctoral Fellow at the EECS department atUC Berkeley University, working with Prof. Dawn Song. She will join

the EECS department at University of Illinois at Urbana – Cham-paign as an assistant professor in June 2018. Her research focuseson both theoretical and practical aspects of machine learning, se-curity, privacy, game theory, social networks, and adversarial deeplearning.

344 c om pu t e r s & s e cu r i t y 7 3 ( 2 0 1 8 ) 3 2 6 – 3 4 4