Twitter Content-based Spam Filtering - CISIS 2013
-
Upload
carlos-laorden -
Category
Technology
-
view
1.274 -
download
3
description
Transcript of Twitter Content-based Spam Filtering - CISIS 2013
Igor Santos Igor Miñambres-Marcos Carlos Laorden Patxi Galán-García Aitor Santamaría-Ibirika Pablo G. Bringas
Detecting spammer accounts
Content-based analysis
(TweetSpike) (Legitimate)
spam ham
t1
t2
t3
m1
m2
m10
m3
m9
m4
m7
m8
m5
m11
m6
legitimate
spam
legitimate
spam
testing
probability
Dynamic Markov Chain (DMC)
Prediction by Partial Match (PPM)
Classifier Acc. Sp Sr F-Measure AUC
Random Forest N=50 96.42 0.98 0.94 0.96 0.99
DMC without Adaptation 95.99 0.96 0.95 0.96 0.99
Random Forest N=10 95.96 0.97 0.94 0.95 0.99
PPM without Adaptation 94.80 0.97 0.91 0.94 0.99
Naive Bayes Multinomial Word Frequency 94.94 0.95 0.93 0.94 0.98
Bayes K2 94.12 0.99 0.88 0.93 0.98
DMC with Adaptation 93.11 0.94 0.90 0.92 0.98
C4.5 95.79 0.98 0.92 0.95 0.97
KNN K=3 93.71 0.97 0.89 0.93 0.97
SVM PVK 95.81 0.97 0.93 0.95 0.96
PPM with Adaptation 76.50 0.78 0.69 0.72 0.86
Naive Bayes 72.72 0.64 0.89 0.75 0.76
A new and public dataset of twitter spam to serve as evaluation
Adaptation of content-based spam filtering to Twitter
A new compression-based text filtering library for the ML tool WEKA
enhance this approach using social network features
semantic capabilities by studying the linguistic relationships
1. Follow me: http://files.twiyo-magazine.com/200000231-
1dfbb1ef57/follow-me-twitter.png
2. Twitter: http://www.redunonet.co/twitter.png
3. Twitter Infography: http://expandedramblings.com/index.php/march-
2013-by-the-numbers-a-few-amazing-twitter-stats
4. Twitter news: http://techtips.biz/wp-
content/uploads/sites/9/2013/07/twitter-news.jpg
5. Customer service: http://www.parature.com/wp-
content/uploads/2012/04/customerservice_twitter.jpg
6. MUSI Deusto: https://twitter.com/MUSIDeusto
7. Gossip: http://polskilive.pl/wp-content/uploads/2013/02/bigstock-
Gossiping-Women-Retro-Clip-A-17343494.jpg
8. Cyber-bullying:
http://jodielouiseuow.files.wordpress.com/2013/05/2010-10-21-cyber-
bullies.jpg
9. Sad teddy bear: http://thumbs.dreamstime.com/x/sad-lonely-teddy-
bear-15726476.jpg
10. Spam bird: http://all4boys.ru/_pu/0/52734883.png
11. Dollars: http://vegasgravy.com/News-detail/two-women-caught-for-
transporting-drug-money-from-vegas/dollars/
12. Day 97: Infected by dustywrath:
http://www.flickr.com/photos/10921499@N07/2187318683
13. my bank sucks by B Rosen:
http://www.flickr.com/photos/rosengrant/3537904106/
14. Spam wall by freezelight:
http://www.flickr.com/photos/63056612@N00/155554663/
15. Bird with boxing gloves: http://www.fightlikeagirlclub.com/wp-
content/uploads/2010/11/Bird-with-Boxing-Gloves.png
16. Twitter media: http://media.meltybuzz.fr/article-1440806-
ajust_930/media.jpg
17. Construction bird: http://i1-news.softpedia-
static.com/images/news2/Malicious-URL-Filtering-on-Twitter-2.jpg
18. Bird in egg: http://needsomeonetoblog.com/wp-
content/uploads/2013/07/bigstock-Blue-bird-in-egg-6079257.jpg
19. Document folder:
http://www.gsstr.nl/upload/9/4/1/gsstr/documentfolder.large.jpg?0.7202
662836172612
20. ZIP: http://www.kohl.bz/fileadmin/template/ZIP.png
21. Bird in pole: http://www.microcenterblog.com/wp-
content/uploads/2013/01/Fake-or-Real-150x150.jpg
22. Bird screaming: http://www.bluewaterbrand.com/wp-
content/uploads/2013/04/168_2671597.jpg
23. Bird with sign: http://blog.retirementincomenetwork.com/wp-
content/uploads/2013/05/twitter-bird.jpg
24. Bird in lineup: http://sparkboutik.com/wp-
content/uploads/2012/01/twitterfauxpas.jpg