Anomaly-based Spam Filtering - SECRYPT 2011

45
Carlos Laorden

Transcript of Anomaly-based Spam Filtering - SECRYPT 2011

Page 1: Anomaly-based Spam Filtering - SECRYPT 2011

Carlos Laorden

Page 2: Anomaly-based Spam Filtering - SECRYPT 2011

WHAT YOU GOT, THEN? SPAM, EGG,

SPAM, SPAM, BACON AND

SPAM.

SPAM, SPAM, SPAM, BAKED BEANS AND

SPAM.

ANYTHING WITHOUT

SPAM?

I DON’T LIKE SPAM!!

UGH!

Page 3: Anomaly-based Spam Filtering - SECRYPT 2011

Meet the real SPiced hAM

Page 4: Anomaly-based Spam Filtering - SECRYPT 2011

Monty Python’s Flying Circus

Page 5: Anomaly-based Spam Filtering - SECRYPT 2011

Something that repeats and repeats until being annoying

Page 6: Anomaly-based Spam Filtering - SECRYPT 2011

It is a

real problem for Information Security

Page 7: Anomaly-based Spam Filtering - SECRYPT 2011

Billions of daily losses in

productivity

Page 8: Anomaly-based Spam Filtering - SECRYPT 2011

Infected computers

Page 9: Anomaly-based Spam Filtering - SECRYPT 2011

Stolen credentials

Page 10: Anomaly-based Spam Filtering - SECRYPT 2011
Page 11: Anomaly-based Spam Filtering - SECRYPT 2011

We must

fight

Page 12: Anomaly-based Spam Filtering - SECRYPT 2011

Anti-spam methods

Pre-sending

New

protocols

Post-sending

Increase sending

costs Increase risks

for spammers

E-mail

sender

E-mail

content

E-mail

content

Page 13: Anomaly-based Spam Filtering - SECRYPT 2011

Usually

supervised approaches

Page 14: Anomaly-based Spam Filtering - SECRYPT 2011

A significant

labelling work is needed

Page 15: Anomaly-based Spam Filtering - SECRYPT 2011

A significant

labelling work is needed

Page 16: Anomaly-based Spam Filtering - SECRYPT 2011

But, is this

possible?

Page 17: Anomaly-based Spam Filtering - SECRYPT 2011

I mean, is this

possible...

Page 18: Anomaly-based Spam Filtering - SECRYPT 2011
Page 19: Anomaly-based Spam Filtering - SECRYPT 2011

YES

Page 20: Anomaly-based Spam Filtering - SECRYPT 2011

Anomaly Detection

Page 21: Anomaly-based Spam Filtering - SECRYPT 2011

no interest this SpamAssassin word has

this has Ling Spam no interest word

SpamAssassin

Ling Spam t1

t2

t3 D1

D2

D10 D3

D9

D4

D7

D8

D5

D11

D6

Page 22: Anomaly-based Spam Filtering - SECRYPT 2011

? ?

Anomaly detection

d

d > threshold?

> threshold?

Page 23: Anomaly-based Spam Filtering - SECRYPT 2011

Manhattan distance

Euclidean distance

Page 24: Anomaly-based Spam Filtering - SECRYPT 2011

Anomaly detection

?

d

d ?

Page 25: Anomaly-based Spam Filtering - SECRYPT 2011

Minimum distance

Maximum distance

Mean distance

Page 26: Anomaly-based Spam Filtering - SECRYPT 2011

Minimum

distance

Maximum

distance

Mean

distance

Manhattan

distance

Euclidean

distance

Page 27: Anomaly-based Spam Filtering - SECRYPT 2011

10 different

thresholds

Page 28: Anomaly-based Spam Filtering - SECRYPT 2011

Anomaly detection

d

d < threshold

> threshold

Page 29: Anomaly-based Spam Filtering - SECRYPT 2011
Page 30: Anomaly-based Spam Filtering - SECRYPT 2011

Minimum

distance

Maximum

distance

Mean

distance

Manhattan

distance

Euclidean

distance

10

thresholds

Page 31: Anomaly-based Spam Filtering - SECRYPT 2011

Results

Page 32: Anomaly-based Spam Filtering - SECRYPT 2011

SpamAssassin Manhattan Euclidean

Prec. Rec. F-Meas. Prec. Rec. F-Meas.

Mean 91.03% 92.85% 91.93% 76.14% 97.77% 85.61%

Maximum 69.61% 99.89% 82.05% 72.99% 97.66% 83.54%

Minimum 95.40% 93.86% 94.62% 92.10% 94.00% 93.04%

Page 33: Anomaly-based Spam Filtering - SECRYPT 2011

SpamAssassin Manhattan Euclidean

Prec. Rec. F-Meas. Prec. Rec. F-Meas.

Mean 91.03% 92.85% 91.93% 76.14% 97.77% 85.61%

Maximum 69.61% 99.89% 82.05% 72.99% 97.66% 83.54%

Minimum 95.40% 93.86% 94.62% 92.10% 94.00% 93.04%

Page 34: Anomaly-based Spam Filtering - SECRYPT 2011

Ling Spam Manhattan Euclidean

Prec. Rec. F-Meas. Prec. Rec. F-Meas.

Mean 79.18% 73.54% 76.26% 92.82% 91.58% 92.20%

Maximum 76.23% 74.29% 75.25% 85.95% 79.29% 82.49%

Minimum 65.82% 74.38% 69.84% 87.51% 93.13% 90.23%

Page 35: Anomaly-based Spam Filtering - SECRYPT 2011

Ling Spam Manhattan Euclidean

Prec. Rec. F-Meas. Prec. Rec. F-Meas.

Mean 79.18% 73.54% 76.26% 92.82% 91.58% 92.20%

Maximum 76.23% 74.29% 75.25% 85.95% 79.29% 82.49%

Minimum 65.82% 74.38% 69.84% 87.51% 93.13% 90.23%

Page 36: Anomaly-based Spam Filtering - SECRYPT 2011

Suitable to

overcome the amount

of unclassified spam e-mails

Page 37: Anomaly-based Spam Filtering - SECRYPT 2011
Page 38: Anomaly-based Spam Filtering - SECRYPT 2011
Page 39: Anomaly-based Spam Filtering - SECRYPT 2011

Will we see

the END of spam?

Page 40: Anomaly-based Spam Filtering - SECRYPT 2011

95%

Page 41: Anomaly-based Spam Filtering - SECRYPT 2011

“Solution to spam”

Cut their billing systems?

Page 42: Anomaly-based Spam Filtering - SECRYPT 2011
Page 43: Anomaly-based Spam Filtering - SECRYPT 2011
Page 44: Anomaly-based Spam Filtering - SECRYPT 2011
Page 45: Anomaly-based Spam Filtering - SECRYPT 2011

References

1. Monty Python – Spam: http://www.youtube.com/watch?v=anwy2MPT5RE

2. Spam wall by freezelight: http://www.flickr.com/photos/63056612@N00/155554663/

3. monty python flying circus by the_d8_show: http://www.flickr.com/photos/8056839@N04/478599790/

4. Dollars: http://vegasgravy.com/News-detail/two-women-

caught-for-transporting-drug-money-from-vegas/dollars/

5. Day 97: Infected by dustywrath: http://www.flickr.com/photos/10921499@N07/2187318683

6. my bank sucks by B Rosen: http://www.flickr.com/photos/rosengrant/3537904106/

7. Feet on table: http://bisystembuilders.com/wp-

content/uploads/2010/02/shutterstock_feet-on-table.jpg

8. Buried on bills: http://getupkids.net/wp-

content/uploads/2013/06/debt_piling.jpg

9. Kill spam: http://www.email-marketing-wizard.com/wp-

content/uploads/2010/03/spammer.jpg