4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious...
Transcript of 4. Checking App Behavior Against App Descriptions€¦ · 4 "music" 5 "music videos" 6 "religious...
CS 621 - Paper Presentation
Ziqi ChenRajarshi DasZixin Kong
Checking App Behavior Against App Descriptions
1
Main ContributionHow do we know a program does what it claims to do?
Research Impact:- Computer Systems- Interesting and novel use of Natural Language Processing.- Usable Privacy
https://www.usableprivacy.org/
2
Success story
3
Topic Modeling in 2 mins
4
Next few slides from Prof. Blei’s talk slideshttp://www.cs.columbia.edu/~blei/talks/Blei_Topic_Modeling_Workshop_2013.pdfhttp://www.cs.columbia.edu/~blei/talks/Blei_Science_2008.pdf
5
6
Pois oning by lee-cream. No chemist ce rtni nly would $Uppo$e that the ""me
poison exists in all s,imp le of ice-cream . which have produced untoward sy mptoms in man. Minern l poison.sJ copper, rend, nn,enic, and mercury, have aJl been found in ice cream. In some instio.nces: these bave be n used with c1;mi11al in te nt . In other cases their presence ha s been acGidental. Likewise , th.at va.uil1a. is sometimes tho bearer, nt least, of the poi~ son, is well known to a ll chemisus. D1·. Bartl ey's idea that tho poisonous properties of th e cream w hieh he e:,:amioe d were due to putrid gelntino is cert<Uoly n 1-n.tional theory. The poisonous pri.ncip le might in tbia caae n.riso from the do,compos:ition of the gt,la.tine; or with the gelatine there may be introduc ed into tho milk o. fe1·mont , by the growth of which a. poison ia. pr oduced.
But in the cream wWch I exo.mi.ned, none of th e above sour ces of tbe poisoning exi•ted. Th ere were uo mineral poisons present. No ge latin e of any kind had been us• d in making the crea m. The vanilla used wo.s shown to be not poisouo us. This sbow iug ,vas mo.de, not by~ chcm ico.1 anal .vsisl \'1hich might not bave been conc lusive, but Mr. N vie and I drank of the 110.nillo. c.xt,ract which was US@d1 an d no ill resul ts followed. St ill , from t his cream we isolated tbe so.mo poison which I had bef ore found in poiBon~ ous ch ease (Zeitsclwlft fi1t· p hy1tw/()(Jisc1.. cliem w, x,
RNA Editing and the Evolution of Parasites
Larry Simpson and Dmitri A. Maslov
!~t!::;!'up?,;~~~ JeM the it8tUM txtan l liM3t;e el t"ubr,,uti,c crg;inimw wnlrc mnodiorulria ( r •Wlrllln tbc: k.inrttoplft!itidJ, dierto ii~ t .. ,;i INjtx- groups. the poor{r ltudJed bod.-.tu..,Js... C1'11'tobUd1, which cotlMlit cl ~ m:,-livin11 •nd pllS:..tic al t., and the bclm kno,wn tr,}'tMO!iOfllflll~ ... tudi are obligate parmita(2) .
Perhip ~ of me 111D1111tT ~ tbc lr, pa11~.)rntni d linl!11!,"', tMM! cells pcat!tl
9n'fflll nlcpe emetic fealurct { :tcoompflnya'I; PerspecDYc by Nilscn)--anc a( ~kh i. RNA «dltlrtl: of mittx:hmidnal mrucripu. ll u, RNA ,eJ111n., iutwtloo ()-7) ctHtU cptll tt adin9 li: 11t16 In ~cryp~" 1J¥ lruelloo fot O(:etiriontil i.ldt.-tiuo) ol uridine !U} rea fact ai a t"ew i:oc,;:lflc lilll« Midun u~ wJ. mg repan r:J an mRNA (5'· cdil il'lf:) (If M tiluhipM' Ip,,!•
clhc 1it11S thn:ugtiou t die
ll'IRNA ~ 1Htli ti~ _)· Tot!
m,I, but there • d.dAKfeiea'll!:M ,:in die tl.Al"uft of Lht. ~1oiary pM1uitic hoit . The Min· ~ndinft Gm" model (10, I I I AIUIIS dun rhe 1tu1iJ1ol Jlfl1"3Mtbm ..... in t..lllfi 1,:ul of r.ttPirnb rian inftndi~ s. Cioe\-olution o{ P'A""he Aro:f h,oe;t W-Ou1d ba"'fl ltJ to a ..Je daribtJt iOII of uyJ:OnoeomaxiJ Ill ilVIKtl
and lccctict. LI\ th li u~. di.l}MCli C IIN C)ffN (111.r.ernating invt'm!brnte and \'fftc• hn~ hnlu) e"(ll,'Cd Irita ~ * t($1 k o( the: l((flliktk>t1 by KJIM' lit-m.ipler.uu mJ d tfflllll of the ab1lnf ro fttd on die blood
· "'""
Chaotic Beetles Chatlos Godfray and Mlc:llael Hassell
IOR111,-1Hn~bU)'lo'l1.iorthc~ttt1[\I: •ttl: of May 111 d'tll! mnJ.l lf70.. IJ) tN.I: dw rorulomoo dvnamu:, cl annMb aoJ rl .. 3 an M t'ic«alnq:lv n'""lt'"C Thu C\.,cqrk-.x •t~· ;,nw• frt•m 1wo ,1111..:t""' Th!! wr~ ",:h (If 111ll~111 il'h 1h11t V,~MiilUH• •W n;Jt,rr..l <&:ommu"ll) (WO','ilk fl m~1,iLl ,,f .liffttcl\l p.u.:lw1,p fot" ~ t&l lhtttlKI. b1irh J... ~1'1) rinil in.hr« 1k .. o\nJ t'\'tn ln IAAmcd parulnii,;:in,. t~ nonhn61r f~cl.: pr., cc~, rrnml m JII mnurnl rcp.Jl,t1crn rnn n:,ult 11'1 e<rmr*'I J.,-r..11n1C l_.d,_.. . ..,.._ NJl'l.irJI
l"'l'Ul:1tll•f1• c:1n ~" 1W l"l'TWlU•I l-..Cl(l+tory d-,1'1.J<t"Jll""*''J 1.lt.•.._1l,tl ,,und ,.,r.,'-'t"r,,,..J tYpo::,c1,~tw-s.t:mlrlv,1v1,1ln,1ii1l.:<onJ,, .. ,ru.lf SuLh du.-ltl~ J\'l'IItffilo. Wtte c.......-.m,'ll, ill n~ rn1C'~ch.cn1hisw..1u1Jlu\>ttm("YLU'5rlftllf,... c1J1...,mforthe~mm1...Jc.ofUffV1 111011clrururn1'9:IUl'Ca.0n~l!t5ld th• 1,w.c.CoKimunoftlil. (l)f'W\·ulethem,.,,i
Thir~•••,rlllr°""*',.*"d~ l'f04hll ~rir~f'.I,~ .f4Ka8gha W"/Pl u.; £.. !Olli ".IIU(dhlC"'
convaic1n.i;: C\'lii-na 10 dnte ol ccai11ln ,,frnnmic1 :,r,J ch:ic.. m I b1ol11;1c.,.a r,~lauvn--o( the iloor bt:ijtll', T,iklii1m C«it"11('1un (~t'C fli:urd,
lthJll<fW"'"!lfXl~ 1114'1yJ1fllu1h 11.:..i:-tUo.d\HrntC' o.1mrlu dJNIIIIO.ll'lf"'f'WthNllnth.e llclJ.B<,usverJm,wre.illdu ... anc1ill7 11.ui::tllMU\ll pq,uL.a.t100 will ,ur,cr1lcmlly rurmMr n '!tnbl,torqd1Cf'.~1LM1\.111bufkt.,,Jl-ythrnr'fTl111lrJ1"'1:omJ'T-1u1lt.otu~.., ,·1,J"1,tT'l("'I tr, 1111 Sfc<.ci. Gk«t ll l~llt MrM,ifl hlDC' ;.:,rlc:ti, .i.i1?11.-wib: 1.-1ox, ln:m nunl,ne.u milllhem.11.u c.>1nbcuitd101Jeru1f-i,the1ell ra~.agrnrurcidclutoL lnrhasc ~,dl,10Uc1rJ1ts:«innci::mc to br on ''>..trJJ'U'! 'M.trJcti:n: (Ul'IOl.ai\-'Ufflt'UVi:(,l.J("l.l, .. . ,d, lr.11:tJl ,1rouun1 ;mJ ll!:11<:t' 111~111111~1' .Jnrn;ll"l'l'l, 11.t ,hty
m,-,.,,;:1'r\l!1Th,;.··~•'t.10,-.i(rht'111n;,a,:r,1,.wt:1,l Jdji,,:tt1 1 1r .. Jt>.:.t ,11lts. .,r.e-l't,llltdarc,n. ch.en ~1t1Lh.eJ And L1IJcJ, jjo,) thillll o, btrome,; ,mp......,,;i,k:1urttd..:.1 tw11r.:.t('q'Uhl:loodtruwe1 Into the fi..111e. Tu 1,1rmyth (lf the mllcln11 1h11r ,,.,es rtit' to tlw uuem,e ~ruiU\'il'I' ta ln1u;d o;:011il..lllom c.in be ~ii rn;uh ft!IDt1c.tllT elllltWll\il dw l111p.11W¥ l'XPI'"
Cl~m ;n~ -.II.ff , TI11IBurtQ11fG TJ'lbo. \IY'le~embt.11 "'~C "{IOU~!,on ltyr.am.cSoltlonl!'IOlnlOUnl rtc11ro,,bahrn1tallertld m•IT'Altleffli1o;:Alrnodel
ll('M,whicha,pP"IIJ\~fot",;:/ia
,ot11.° d\'T1:l!ll~ :U-J l'K'ttt""1·
tll'\'.l'(ffl'O'\OUf'Th.t-rolla•'\'."-'ffl 11Uf'I)· i<1t('l1'f'I• lo l""'-•111.,1~ M-
1r .... 1,.- \ 11111\'.11 ........ , .. ... , lil'rj'l• 11U)', t'<l'IL...-.rnl._f1<IIUTi,,....M:rlrC'S1lll.l,, lf\J l<.ADC"C.Jl't.Jl,lnt .:ha.:.1:i.:r,:f111lmlonh11,•ebeer\ IJcruliitJ (,om,e lllKas, ro drnts, ;inJ ffiOlt CCIC'IYIJ\C-
1~ . hum.in d\llJl.co.l du· ~ · "(",•.hltllw"41't11l 1n1lo.iifli(,,.lltt-., i,nl1JJ1o1 #ltlY h",.J Je:CfW'.-.,11:l>ll ~'lO. (Jl.
AnflllfflW1l.,tflJlpn:,;r,:h l1 hl riililmelttl:!t' porul..r11on ll'IOOek w1th d»:J from rui:urM papulmmu...J ihm«alJ'.<lff mm rmkuon, with th, ofrmumc, 111th.: hdJ.Tlmt~'fflnll:1JI( hn,b:1'T1J:.unn ic 1>Qru· ],n1yi.l'lr«Mll~..,..J ,'°'ll'(>.ll"f t,,r1.,1,l-f,1,;.; I ,.,t~lnc.K 11·1 ra· n1IIW(N.o:,,;i,111ml1>t,.C"',.-..,·~c,c-
5CIF.NCE • \'CC.. US • 17 IANUAR.'Y 1997
Topics as summary
7
human evolution disease computer genome evolutionary host models
dna species bacteria information genetic organisms diseases data genes life resistance computers
sequence origin bacterial system gene biology new network
molecular groups strains systems sequencing phylogenetic control model
map living infectious parallel information diversity malaria methods
genetics group parasite networks mapping new parasites software project two united new
sequences common tuberculosis simulations
Evolution of topics over time
8
"Theoretical Physics" "Neuroscience"
FORCE OXYGEN
NEAVE:: .. :: ..
1880 1900 1920 1940 1960 1980 2000 1880 1900 1920 1940 1960 1980 2000
Latent Dirichlet Allocation (LDA)
A document exhibits multiple topics
9
Latent Dirichlet Allocation (LDA)
Each document is a random mixture of corpus wide topics
Each word is drawn from one of those topics
10
11
Table 1: Topics mined from Android Apps Id Assigned Name 0 "personalize "
2
3
4 5
6
7
8
9
10
11
12 13
14 15
16 17 18
19
20
21
22 23 24 25
26
27
28
29
'game and cheat sheets " "money' '
"tv"
"music" holidays" and
religion "navigation and trave l" "language "
"share "
'weather and stars '
'1iles and video "
"photo and social'
"cars ' "design and art "
"food and recipes" "personalize "
"hea lth" "travel' "kids and bodies"
ringtones anc:I sound ' "game "
"search and browse " "batt le games " "settings and utils " "sports " "wallpapers "
"connection "
"policies and ads"
"popu lar media "
"puzzle and card games "
slot , mach ine, money, poker, currenc , market , t rade , stock , casino co in, finance tv, chan nel, countr i, live, watch , germani, nation , bbc, newspap music, song , radio, play, player, listen chr istmas, halloween , santa , year , holiday, islam, god map, inform, track, gps, navig , travel
language , word , english, learn , german , translat emai l, ad , support , facebook , share , tw itter, rate , suggest weather , forecas t, locate , temperatur , map , city, light file , down load , v ideo , media, support , man age , share , view, search photo , friend, facebook, share , love, twitter, pictur , chat , messag , galleri , hot , send socia l car, race , speed , dr ive, vehicl , bike, track life, peopl , natur , form , feel , learn , art , design , uniqu , effect , modern recip , cake , chicken, cook , food theme , launcher , download , install, icon, menu weight, bodi , exerc ise, diet , workout , medic citi , guid , map, travel, flag , count ri , attract kid , anim, color , girl, babi , pictu r, fun , draw, des ign , learn sound , rington , alarm , notif , music
game, pla i, graphic , fun, jump, level, ball, 3d , score search, icon , delet , bookmark, link, homepag , shortcut, browse r story , game , monster , zomb i, war , batt le screen , set , widget , phone, batteri team , footba ll, leagu , player, spo rt, basketbal wa llpap, live, home , screen , background , menu device , connect , network , wif i, blootooth , internet , remot , server live, ad , home , applovi n, notif , data , polic i, privacy, share , airpush , advertis ser i, video , film, album , movi , mus ic, award , star , fan , show, gangnam , top, bieber game, plai , level, puzzl , playe r, score , challeng , card
Table 3: Clusters of applications. "Size" is the number of applications in the respective cluster. "Most Important Topics" list the three most prevalent topics; most important(> 10%) shown in bold. Topics less than 1 % not listed.
Id Assigned Name "sharing"
2 "puzzle and card games"
3 "memory puzzles"
4 "music"
5 "music vide os"
6 "religious wallpapers"
7 "language"
8 "cheat sheets"
9 "utils"
10 "sports game"
11 "battle games"
12 "navigation and travel"
13 "money"
14 "kids"
15 "personalize'
16 "connection"
17 "health"
18 "weather"
19 "sports"
20 "files and videos"
21 "search and browse"
22 "advertisements" 23 "design and arl"
24 "car games"
25 "Iv live"
26 "adult photo"
27 "adult wallpapers"
Size 1,453
953
1,069
714
773
367
602
785
1,300
1,306
953
1,273
589
1,001
304
823
669
282
580
679
363
380 978
449
500
828
543
Most Important Topics share (53%) , settings and utils, navigation and travel puzzle and card games (78%) , share, game puzzle and card games (40%) , game (12%), share music (58%), share, settings and utils popular media (44%), holidays and religion (20%) , share holidays and religion (56%). design and art, wallpapers language (67%), share, settings and utils game and cheat sheets (76%), share, popular media settings and utils (62%). share, connectio n game (63%), battle games, puzzle and card games battle games (60%). game (11 %), design and art navigation and travel (64%), share, travel money (57%) , puzzle and card games, settings and utils kids and bodies (62%). share, puzzle and card games personalize (71 %), wallpapers (15%), settings and utils connection (63%), settings and utils, share health (63%), design and art, share weather and stars (61%), settings and utlls (11 %), navigation and travel sports (62%) , share, popular media files and videos (63%), share, settings and utils search and browse (64%) , game, puzzle and card games policies and ads (97%) design and art (48%), share, game cars (51%), game, puzzle and card games tv (57%) , share, navigation and travel photo and social (59%) , share, settings and utils wallpapers (51 %), share, kids
Implementation
•App collectionCHABADA collect 22,500+ Android applications from Google Play Store
• Identify TopicsUsing Latent Dirichlet Allocation (LDA) on the app descriptions todefine 30 topics (Eg: music, money…)
12
Implementation
•Cluster AppsCHABADA classifies apps into 32 clusters using K-Means algorithm withits related topics
•Used APIs In each cluster, CHABADA identifies sensitive APIs each app staticallyaccesses
13
Implementation
•OutliersCHABADA identifies outliers in the APIs clusters using unsupervisedone-class SVM anomaly classification
14
Evaluation
•Outlier DetectionRQ1: Can our technique effectively identify anomalies (i.e., mismatches between description and behavior) in Android applications?
•Malware DetectionRQ2: Can our technique be used to identify malicious Android applications?
15
Evaluation – Outlier Detection
• Experiment data- 32 clusters with an entire set of 22,521 applications
• Experiment - CHABADA- Partitioning and training: 9 subsets for training and 1 subset for
testing; run 10 times- Manual assessment: 3 categories (malicious, dubious, benign)
16
Evaluation – Outlier Detection
• Result:- Top 5 outliers are identified from each cluster; 160 outliers out of
22,521 applications- Top outliers, as produced by CHABADA, contain 26% malware;
additional 13% dubious apps- 39% of the top 5 outliers require additional scrutiny by app store
managers or end users
17
Evaluation – Malware Detection• Experiment data- Original set of “benign” apps (2,238) and reduced set of “malicious”
apps (172)• Experiment:- Run OC-SVM as a classifier that decides whether an element would
be part of the same distribution or not• Comparison:- Classification using topic clusters- Classification without clustering- Classification using given categories
19
Evaluation – Malware Detection
• Classification using topic clusters
Predicted as malicious Predicted as benign
Malicious apps 96.5 (56%) 75.5 (44%)
Benign apps 353.9 (16%) 1,884.4 (84%)
Table 1: Checking APIs and descriptions within topic clusters (CHABADA)
In our sample, even without knowing existing malware patterns, CHABADA detects the majority of malware as such.
20
Evaluation – Malware Detection
• Classification without clustering
Predicted as malicious Predicted as benignMalicious apps 41 (24%) 131 (76%)
Benign apps 334.9 (15%) 1,903.1 (85%)
Table 2: Checking APIs and descriptions in one single cluster
Classifying without clustering yields more false negatives.
21
Evaluation – Malware Detection
• Classification using given categories
Predicted as malicious Predicted as benignMalicious apps 81.6 (47%) 90.4 (53%)
Benign apps 356.9 (16%) 1,881.1 (84%)
Table 3: Checking APIs and descriptions within Google Play Store categories
Clustering by description topics is superior to clustering by given categories
22
Classification using topic clustersPredicted as malicious Predicted as benign
Malicious apps 96.5 (56%) 75.5 (44%)Benign apps 353.9 (16%) 1,884.4 (84%)
Classification without clusteringPredicted as malicious Predicted as benign
Malicious apps 41 (24%) 131 (76%)Benign apps 334.9 (15%) 1,903.1 (85%)
Classification using given categoriesPredicted as malicious Predicted as benign
Malicious apps 81.6 (47%) 90.4 (53%)Benign apps 356.9 (16%) 1,881.1 (84%) 23
Discussion
1. LDA is definitely not the best way to do this.
24
LDA only assigns probability mass to very few words in a topic and there is a high chance that a word representing malicious behavior will be left out.
Discussion
2. Problem with source of data
25
Discussion
3. Anomalous behavior definition not robust
26
Sending text messages to alert users of bad weather might be ok but not be present in the description
Discussion
4. Sensitivity to hyper-params
27
Discussion
5. What next? How do you plan to let the consumer’s know?
28
29
Thank you!