4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious...

28
CS 621 - Paper Presentation Ziqi Chen Rajarshi Das Zixin Kong Checking App Behavior Against App Descriptions 1

Transcript of 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious...

Page 1: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

CS 621 - Paper Presentation

Ziqi ChenRajarshi DasZixin Kong

Checking App Behavior Against App Descriptions

1

Page 2: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Main ContributionHow do we know a program does what it claims to do?

Research Impact:- Computer Systems- Interesting and novel use of Natural Language Processing.- Usable Privacy

https://www.usableprivacy.org/

2

Page 3: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Success story

3

Page 4: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Topic Modeling in 2 mins

4

Page 5: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Next few slides from Prof. Blei’s talk slideshttp://www.cs.columbia.edu/~blei/talks/Blei_Topic_Modeling_Workshop_2013.pdfhttp://www.cs.columbia.edu/~blei/talks/Blei_Science_2008.pdf

5

Page 6: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

6

Pois oning by lee-cream. No chemist ce rtni nly would $Uppo$e that the ""me

poison exists in all s,imp le of ice-cream . which have produced untoward sy mptoms in man. Minern l poi­son.sJ copper, rend, nn,enic, and mercury, have aJl been found in ice cream. In some instio.nces: these bave be n used with c1;mi11al in te nt . In other cases their presence ha s been acGidental. Likewise , th.at va.uil1a. is sometimes tho bearer, nt least, of the poi~ son, is well known to a ll chemisus. D1·. Bartl ey's idea that tho poisonous properties of th e cream w hieh he e:,:amioe d were due to putrid gelntino is cert<Uoly n 1-n.tional theory. The poisonous pri.ncip le might in tbia caae n.riso from the do,compos:ition of the gt,la.tine; or with the gelatine there may be introduc ed into tho milk o. fe1·mont , by the growth of which a. poison ia. pr oduced.

But in the cream wWch I exo.mi.ned, none of th e above sour ces of tbe poisoning exi•ted. Th ere were uo mineral poisons present. No ge latin e of any kind had been us• d in making the crea m. The vanilla used wo.s shown to be not poisouo us. This sbow iug ,vas mo.de, not by~ chcm ico.1 anal .vsisl \'1hich might not bave been conc lusive, but Mr. N vie and I drank of the 110.nillo. c.xt,ract which was US@d1 an d no ill re­sul ts followed. St ill , from t his cream we isolated tbe so.mo poison which I had bef ore found in poiBon~ ous ch ease (Zeitsclwlft fi1t· p hy1tw/()(Jisc1.. cliem w, x,

RNA Editing and the Evolution of Parasites

Larry Simpson and Dmitri A. Maslov

!~t!::;!'up?,;~~~ JeM the it8tUM txtan l liM3t;e el t"ubr,,ut­i,c crg;inimw wnlrc mnodiorulria ( r •­Wlrllln tbc: k.inrttoplft!itidJ, dierto ii~ t .. ,;i INjtx- groups. the poor{r ltudJed bod.-.tu..,Js... C1'11'tobUd1, which cotlMlit cl ~ m:,-liv­in11 •nd pllS:..tic al t., and the bclm kno,wn tr,}'tMO!iOfllflll~ ... tudi are obligate parmita(2) .

Perhip ~ of me 111D1111tT ~ tbc lr, pa11~.)rntni d linl!11!,"', tMM! cells pcat!tl

9n'fflll nlcpe emetic fea­lurct { :tcoompflnya'I; Per­specDYc by Nilscn)--anc a( ~kh i. RNA «dltlrtl: of mi­ttx:hmidnal mrucripu. ll u, RNA ,eJ111n., iutwtloo ()-7) ctHtU cptll tt adin9 li: 11t16 In ~cryp~" 1J¥ lruelloo fot O(:etiriontil i.ldt.-tiuo) ol uridine !U} rea fact ai a t"ew i:oc,;:lflc lilll« Midun u~ wJ. mg repan r:J an mRNA (5'· cdil il'lf:) (If M tiluhipM' Ip,,!•

clhc 1it11S thn:ugtiou t die

ll'IRNA ~ 1Htli ti~ _)· Tot!

m,I, but there • d.dAKfeiea'll!:M ,:in die tl.A­l"uft of Lht. ~1oiary pM1uitic hoit . The Min· ~ndinft Gm" model (10, I I I AIUIIS dun rhe 1tu1iJ1ol Jlfl1"3Mtbm ..... in t..lllfi 1,:ul of r.tt­Pirnb rian inftndi~ s. Cioe\-olution o{ P'A""he Aro:f h,oe;t W-Ou1d ba"'fl ltJ to a ..Je daribtJt iOII of uyJ:OnoeomaxiJ Ill ilVIKtl

and lccctict. LI\ th li u~. di.l}MCli C IIN C)ffN (111.r.ernating invt'm!brnte and \'fftc• hn~ hnlu) e"(ll,'Cd Irita ~ * t($1 k o( the: l((flliktk>t1 by KJIM' lit-m.ipler.uu mJ d tfflllll of the ab1lnf ro fttd on die blood

·­ "'""

Chaotic Beetles Chatlos Godfray and Mlc:llael Hassell

IOR111,-1Hn~bU)'lo'l1.iorthc~ttt1[\I: •ttl: of May 111 d'tll! mnJ.l lf70.. IJ) tN.I: dw rorulomoo dvnamu:, cl annMb aoJ rl .. 3 an M t'ic«alnq:lv n'""lt'"C Thu C\.,cqrk-.x •t~· ;,nw• frt•m 1wo ,1111..:t""' Th!! wr~ ",:h (If 111ll~111 il'h 1h11t V,~MiilUH• •W n;Jt,rr..l <&:ommu"ll) (WO','ilk fl m~1,iLl ,,f .liffttcl\l p.u.:lw1,p fot" ~ t&l lhtttlKI. b1irh J... ~1'1) rinil in.hr« 1k .. o\nJ t'\'tn ln IAAmcd parulnii,;:in,. t~ nonhn61r f~cl.: pr., cc~, rrnml m JII mnurnl rcp.Jl,t1crn rnn n:,ult 11'1 e<rmr*'I J.,-r..11n1C l_.d,_.. . ..,.._ NJl'l.irJI

l"'l'Ul:1tll•f1• c:1n ~" 1W l"l'TWlU•I l-..Cl(l+tory d-,1'1.J<t"Jll""*''J 1.lt.•.._1l,tl ,,und ,.,r.,'-'t"r,,,..J tYpo::,c1,~tw-s.t:mlrlv,1v1,1ln,1ii1l.:<onJ,, .. ,ru.lf SuLh du.-ltl~ J\'l'IItffilo. Wtte c.......-.m,'ll, ill n~ rn1C'~ch.cn1hisw..1u1Jlu\>ttm("YLU'5rlftllf,... c1J1...,mforthe~mm1...Jc.ofUffV1 111011clrururn1'9:IUl'Ca.0n~l!t5ld th• 1,w.c.CoKimunoftlil. (l)f'W\·ulethem,.,,i

Thir~•••,rlllr°""*',.*"d~ l'f04hll ~rir~f'.I,~ .f4Ka8gha W"/Pl u.; £.. !Olli ".IIU(dhlC"'

convaic1n.i;: C\'lii-na 10 dnte ol ccai11ln ,,frnnmic1 :,r,J ch:ic.. m I b1ol11;1c.,.a r,~lauvn--o( the iloor bt:ijtll', T,iklii1m C«it"11('1un (~t'C fli:urd,

lthJll<fW"'"!lfXl~ 1114'1yJ1f­llu1h 11.:..i:-tUo.d\HrntC' o.1mrlu dJNIIIIO.ll'lf"'f'WthNllnth.e llclJ.B<,usverJm,wre.illdu ... anc1ill7 11.ui::tllMU\ll pq,uL.a.t100 will ,ur,cr1lcmlly rurmMr n '!tnbl,torqd1Cf'.~1LM1\.111buf­kt.,,Jl-ythrnr'fTl111lrJ1"'1:omJ'T-1u1lt.otu~.., ,·1,J"1,tT'l("'I tr, 1111 Sfc<.ci. Gk«t ll l~llt MrM,ifl hlDC' ;.:,rlc:ti, .i.i1?11.-wib: 1.-1ox, ln:m nunl,ne.u milllhem.11.u c.>1nbcuitd101Jeru1f-i,the1ell ra~.agrnrurcidclutoL lnrhasc ~,dl,10Uc1rJ1ts:«innci::mc to br on ''>..trJJ'U'! 'M.trJcti:n: (Ul'IOl.ai\-'Ufflt'UVi:(,l.J("l.l, .. . ,d, lr.11:tJl ,1rouun1 ;mJ ll!:11<:t' 111~111111~1' .Jnrn;ll"l'l'l, 11.t ,hty

m,-,.,,;:1'r\l!1Th,;.··~•'t.10,-.i(rht'111n;,a,:r,1,.wt:1,l Jdji,,:tt1 1 1r .. Jt>.:.t ,11lts. .,r.e-l't,llltdarc,n. ch.en ~1t1Lh.eJ And L1IJcJ, jjo,) thillll o, btrome,; ,m­p......,,;i,k:1urttd..:.1 tw11r.:.t('q'Uhl:loodtruwe1 Into the fi..111e. Tu 1,1rmyth (lf the mllcln11 1h11r ,,.,es rtit' to tlw uuem,e ~ruiU\'il'I' ta ln1u;d o;:011il..lllom c.in be ~ii rn;uh ft!IDt1c.tllT elllltWll\il dw l111p.11W¥ l'XPI'"

Cl~m ;n~ -.II.ff , TI11IBurtQ11fG TJ'lbo. \IY'le~embt.11 "'~C "{IOU~!,on lty­r.am.cSoltlonl!'IOlnlOUnl rtc11ro,,bahrn1tallertld m•IT'Altleffli1o;:Alrnodel

ll('M,whicha,pP"IIJ\~fot",;:/ia ­

,ot11.° d\'T1:l!ll~ :U-J l'K'ttt""1·

tll'\'.l'(ffl'O'\OUf'Th.t-rolla•'\'."-'ffl 11Uf'I)· i<1t('l1'f'I• lo l""'-•111.,1~ M-

1r .... 1,.- \ 11111\'.11 ........ , .. ... , lil'rj'l­• 11U)', t'<l'IL...-.rnl._f1<IIUTi,,....M:­rlrC'S1lll.l,, lf\J l<.ADC"C.Jl't.Jl,lnt .:ha.:.1:i.:r,:f111lmlonh11,•ebeer\ IJcruliitJ (,om,e lllKas, ro drnts, ;inJ ffiOlt CCIC'IYIJ\C-

1~ . hum.in d\llJl.co.l du· ~ · "(",•.hltllw"41't11l 1n1lo.iifli­(,,.lltt-., i,nl1JJ1o1 #ltlY h",.J Je:CfW'.-.,11:l>ll ~'lO. (Jl.

AnflllfflW1l.,tflJlpn:,;r,:h l1 hl riililmelttl:!t' porul..r11on ll'IOOek w1th d»:J from rui:urM papulmmu...J ihm«alJ'.<lff mm rmkuon, with th, ofr­mumc, 111th.: hdJ.Tlmt~'ffl­nll:1JI( hn,b:1'T1J:.unn ic 1>Qru· ],n1yi.l'lr«Mll~..,..J ,'°'ll'(>.ll"f t,,r1.,1,l-f,1,;.; I ,.,t~lnc.K 11·1 ra· n1IIW(N.o:,,;i,111ml1>t,.C"',.-..,·~c,c-

5CIF.NCE • \'CC.. US • 17 IANUAR.'Y 1997

Page 7: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Topics as summary

7

human evolution disease computer genome evolutionary host models

dna species bacteria information genetic organisms diseases data genes life resistance computers

sequence origin bacterial system gene biology new network

molecular groups strains systems sequencing phylogenetic control model

map living infectious parallel information diversity malaria methods

genetics group parasite networks mapping new parasites software project two united new

sequences common tuberculosis simulations

Page 8: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Evolution of topics over time

8

"Theoretical Physics" "Neuroscience"

FORCE OXYGEN

NEAVE:: .. :: ..

1880 1900 1920 1940 1960 1980 2000 1880 1900 1920 1940 1960 1980 2000

Page 9: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Latent Dirichlet Allocation (LDA)

A document exhibits multiple topics

9

Page 10: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Latent Dirichlet Allocation (LDA)

Each document is a random mixture of corpus wide topics

Each word is drawn from one of those topics

10

Page 11: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

11

Table 1: Topics mined from Android Apps Id Assigned Name 0 "personalize "

2

3

4 5

6

7

8

9

10

11

12 13

14 15

16 17 18

19

20

21

22 23 24 25

26

27

28

29

'game and cheat sheets " "money' '

"tv"

"music" holidays" and

religion "navigation and trave l" "language "

"share "

'weather and stars '

'1iles and video "

"photo and social'

"cars ' "design and art "

"food and recipes" "personalize "

"hea lth" "travel' "kids and bodies"

ringtones anc:I sound ' "game "

"search and browse " "batt le games " "settings and utils " "sports " "wallpapers "

"connection "

"policies and ads"

"popu lar media "

"puzzle and card games "

slot , mach ine, money, poker, currenc , market , t rade , stock , casino co in, finance tv, chan nel, countr i, live, watch , germani, na­tion , bbc, newspap music, song , radio, play, player, listen chr istmas, halloween , santa , year , holiday, is­lam, god map, inform, track, gps, navig , travel

language , word , english, learn , german , translat emai l, ad , support , facebook , share , tw itter, rate , suggest weather , forecas t, locate , temperatur , map , city, light file , down load , v ideo , media, support , man ­age , share , view, search photo , friend, facebook, share , love, twitter, pictur , chat , messag , galleri , hot , send socia l car, race , speed , dr ive, vehicl , bike, track life, peopl , natur , form , feel , learn , art , design , uniqu , effect , modern recip , cake , chicken, cook , food theme , launcher , download , install, icon, menu weight, bodi , exerc ise, diet , workout , medic citi , guid , map, travel, flag , count ri , attract kid , anim, color , girl, babi , pictu r, fun , draw, des ign , learn sound , rington , alarm , notif , music

game, pla i, graphic , fun, jump, level, ball, 3d , score search, icon , delet , bookmark, link, homepag , shortcut, browse r story , game , monster , zomb i, war , batt le screen , set , widget , phone, batteri team , footba ll, leagu , player, spo rt, basketbal wa llpap, live, home , screen , background , menu device , connect , network , wif i, blootooth , in­ternet , remot , server live, ad , home , applovi n, notif , data , polic i, pri­vacy, share , airpush , advertis ser i, video , film, album , movi , mus ic, award , star , fan , show, gangnam , top, bieber game, plai , level, puzzl , playe r, score , chal­leng , card

Table 3: Clusters of applications. "Size" is the number of appli­cations in the respective cluster. "Most Important Topics" list the three most prevalent topics; most important(> 10%) shown in bold. Topics less than 1 % not listed.

Id Assigned Name "sharing"

2 "puzzle and card games"

3 "memory puzzles"

4 "music"

5 "music vide os"

6 "religious wallpapers"

7 "language"

8 "cheat sheets"

9 "utils"

10 "sports game"

11 "battle games"

12 "navigation and travel"

13 "money"

14 "kids"

15 "personalize'

16 "connection"

17 "health"

18 "weather"

19 "sports"

20 "files and videos"

21 "search and browse"

22 "advertisements" 23 "design and arl"

24 "car games"

25 "Iv live"

26 "adult photo"

27 "adult wallpapers"

Size 1,453

953

1,069

714

773

367

602

785

1,300

1,306

953

1,273

589

1,001

304

823

669

282

580

679

363

380 978

449

500

828

543

Most Important Topics share (53%) , settings and utils, navigation and travel puzzle and card games (78%) , share, game puzzle and card games (40%) , game (12%), share music (58%), share, settings and utils popular media (44%), holidays and religion (20%) , share holidays and religion (56%). de­sign and art, wallpapers language (67%), share, settings and utils game and cheat sheets (76%), share, popular media settings and utils (62%). share, connectio n game (63%), battle games, puzzle and card games battle games (60%). game (11 %), design and art navigation and travel (64%), share, travel money (57%) , puzzle and card games, settings and utils kids and bodies (62%). share, puzzle and card games personalize (71 %), wallpapers (15%), settings and utils connection (63%), settings and utils, share health (63%), design and art, share weather and stars (61%), set­tings and utlls (11 %), navigation and travel sports (62%) , share, popular me­dia files and videos (63%), share, settings and utils search and browse (64%) , game, puzzle and card games policies and ads (97%) design and art (48%), share, game cars (51%), game, puzzle and card games tv (57%) , share, navigation and travel photo and social (59%) , share, settings and utils wallpapers (51 %), share, kids

Page 12: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Implementation

•App collectionCHABADA collect 22,500+ Android applications from Google Play Store

• Identify TopicsUsing Latent Dirichlet Allocation (LDA) on the app descriptions todefine 30 topics (Eg: music, money…)

12

Page 13: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Implementation

•Cluster AppsCHABADA classifies apps into 32 clusters using K-Means algorithm withits related topics

•Used APIs In each cluster, CHABADA identifies sensitive APIs each app staticallyaccesses

13

Page 14: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Implementation

•OutliersCHABADA identifies outliers in the APIs clusters using unsupervisedone-class SVM anomaly classification

14

Page 15: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Evaluation

•Outlier DetectionRQ1: Can our technique effectively identify anomalies (i.e., mismatches between description and behavior) in Android applications?

•Malware DetectionRQ2: Can our technique be used to identify malicious Android applications?

15

Page 16: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Evaluation – Outlier Detection

• Experiment data- 32 clusters with an entire set of 22,521 applications

• Experiment - CHABADA- Partitioning and training: 9 subsets for training and 1 subset for

testing; run 10 times- Manual assessment: 3 categories (malicious, dubious, benign)

16

Page 17: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Evaluation – Outlier Detection

• Result:- Top 5 outliers are identified from each cluster; 160 outliers out of

22,521 applications- Top outliers, as produced by CHABADA, contain 26% malware;

additional 13% dubious apps- 39% of the top 5 outliers require additional scrutiny by app store

managers or end users

17

Page 18: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Evaluation – Malware Detection• Experiment data- Original set of “benign” apps (2,238) and reduced set of “malicious”

apps (172)• Experiment:- Run OC-SVM as a classifier that decides whether an element would

be part of the same distribution or not• Comparison:- Classification using topic clusters- Classification without clustering- Classification using given categories

19

Page 19: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Evaluation – Malware Detection

• Classification using topic clusters

Predicted as malicious Predicted as benign

Malicious apps 96.5 (56%) 75.5 (44%)

Benign apps 353.9 (16%) 1,884.4 (84%)

Table 1: Checking APIs and descriptions within topic clusters (CHABADA)

In our sample, even without knowing existing malware patterns, CHABADA detects the majority of malware as such.

20

Page 20: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Evaluation – Malware Detection

• Classification without clustering

Predicted as malicious Predicted as benignMalicious apps 41 (24%) 131 (76%)

Benign apps 334.9 (15%) 1,903.1 (85%)

Table 2: Checking APIs and descriptions in one single cluster

Classifying without clustering yields more false negatives.

21

Page 21: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Evaluation – Malware Detection

• Classification using given categories

Predicted as malicious Predicted as benignMalicious apps 81.6 (47%) 90.4 (53%)

Benign apps 356.9 (16%) 1,881.1 (84%)

Table 3: Checking APIs and descriptions within Google Play Store categories

Clustering by description topics is superior to clustering by given categories

22

Page 22: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Classification using topic clustersPredicted as malicious Predicted as benign

Malicious apps 96.5 (56%) 75.5 (44%)Benign apps 353.9 (16%) 1,884.4 (84%)

Classification without clusteringPredicted as malicious Predicted as benign

Malicious apps 41 (24%) 131 (76%)Benign apps 334.9 (15%) 1,903.1 (85%)

Classification using given categoriesPredicted as malicious Predicted as benign

Malicious apps 81.6 (47%) 90.4 (53%)Benign apps 356.9 (16%) 1,881.1 (84%) 23

Page 23: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Discussion

1. LDA is definitely not the best way to do this.

24

LDA only assigns probability mass to very few words in a topic and there is a high chance that a word representing malicious behavior will be left out.

Page 24: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Discussion

2. Problem with source of data

25

Page 25: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Discussion

3. Anomalous behavior definition not robust

26

Sending text messages to alert users of bad weather might be ok but not be present in the description

Page 26: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Discussion

4. Sensitivity to hyper-params

27

Page 27: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

Discussion

5. What next? How do you plan to let the consumer’s know?

28

Page 28: 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious wallpapers" 7 "language" 8 "cheat sheets" 9 "utils" 10 "sports game" 11 "battle games"

29

Thank you!