Collective Classification for Spam Filtering - CISIS 2011

28
Carlos Laorden

description

Presentation at CISIS 2011 International conference of the paper: Collective Classification for Spam Filtering

Transcript of Collective Classification for Spam Filtering - CISIS 2011

Page 1: Collective Classification for Spam Filtering - CISIS 2011

Carlos Laorden

Page 2: Collective Classification for Spam Filtering - CISIS 2011

WHAT YOU GOT, THEN? SPAM, EGG,

SPAM, SPAM, BACON AND

SPAM.

SPAM, SPAM, SPAM, BAKED BEANS AND

SPAM.

ANYTHING WITHOUT

SPAM?

I DON’T LIKE SPAM!!

UGH!

Page 3: Collective Classification for Spam Filtering - CISIS 2011

Meet the real SPiced hAM

Page 4: Collective Classification for Spam Filtering - CISIS 2011

Monty Python’s Flying Circus

Page 5: Collective Classification for Spam Filtering - CISIS 2011

Something that repeats and repeats until being annoying

Page 6: Collective Classification for Spam Filtering - CISIS 2011

It is a

real problem for Information Security

Page 7: Collective Classification for Spam Filtering - CISIS 2011

Billions of daily losses in

productivity

Page 8: Collective Classification for Spam Filtering - CISIS 2011

Infected computers

Page 9: Collective Classification for Spam Filtering - CISIS 2011

Stolen credentials

Page 10: Collective Classification for Spam Filtering - CISIS 2011
Page 11: Collective Classification for Spam Filtering - CISIS 2011

We must

fight

Page 12: Collective Classification for Spam Filtering - CISIS 2011

Anti-spam methods

Pre-sending

New

protocols

Post-sending

Increase sending

costs Increase risks

for spammers

E-mail

sender

E-mail

content

E-mail

content

Page 13: Collective Classification for Spam Filtering - CISIS 2011

Usually

supervised approaches

Page 14: Collective Classification for Spam Filtering - CISIS 2011

A significant

labelling work is needed

Page 15: Collective Classification for Spam Filtering - CISIS 2011

Usually

supervised approaches

Link structure among documents

Page 16: Collective Classification for Spam Filtering - CISIS 2011

Collective Classification

Page 17: Collective Classification for Spam Filtering - CISIS 2011

no interest this SpamAssassin word has

this has Ling Spam no interest word

Empirical evaluation

SpamAssassin

Ling Spam t1

t2

t3 D1

D2

D10 D3

D9

D4

D7

D8

D5

D11

D6

Page 18: Collective Classification for Spam Filtering - CISIS 2011

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

10% 20% 30% 40% 50% 60% 70% 80% 90%

Collective KNN, k=10

Collective Forest

Collective Woods

Random Woods

Precision

Page 19: Collective Classification for Spam Filtering - CISIS 2011

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

10% 20% 30% 40% 50% 60% 70% 80% 90%

Collective KNN, k=10

Collective Forest

Collective Woods

Random Woods

Recall

Page 20: Collective Classification for Spam Filtering - CISIS 2011

0.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

10% 20% 30% 40% 50% 60% 70% 80% 90%

Collective KNN, k=10

Collective Forest

Collective Woods

Random Woods

AUC

Page 21: Collective Classification for Spam Filtering - CISIS 2011

Suitable to overcome the amount

of unclassified spam e-mails

Page 22: Collective Classification for Spam Filtering - CISIS 2011

Will we see the END of spam?

Page 23: Collective Classification for Spam Filtering - CISIS 2011

95%

Page 24: Collective Classification for Spam Filtering - CISIS 2011

“Solution to spam”

Cut their billing systems?

Page 25: Collective Classification for Spam Filtering - CISIS 2011
Page 26: Collective Classification for Spam Filtering - CISIS 2011
Page 27: Collective Classification for Spam Filtering - CISIS 2011
Page 28: Collective Classification for Spam Filtering - CISIS 2011

References 1. Monty Python – Spam:

http://www.youtube.com/watch?v=anwy2MPT5RE

2. Spam wall by freezelight: http://www.flickr.com/photos/63056612@N00/155554663/

3. monty python flying circus by the_d8_show: http://www.flickr.com/photos/8056839@N04/478599790/

4. Dollars: http://vegasgravy.com/News-detail/two-women-

caught-for-transporting-drug-money-from-vegas/dollars/

5. Day 97: Infected by dustywrath: http://www.flickr.com/photos/10921499@N07/2187318683

6. my bank sucks by B Rosen: http://www.flickr.com/photos/rosengrant/3537904106/

7. Interlinked documents: http://zhangruiyanz.blogspot.com.es/

8. Honeycomb: http://desktop-

wallpapers.net/3d/Honeycomb.html

9. Feet on table: http://bisystembuilders.com/wp-

content/uploads/2010/02/shutterstock_feet-on-table.jpg

10. Buried on bills: http://getupkids.net/wp-

content/uploads/2013/06/debt_piling.jpg

11. Kill spam: http://www.email-marketing-wizard.com/wp-

content/uploads/2010/03/spammer.jpg