Spamato

16
SP.a.M/\by Keno Albrecht Nicolas Burri Roger Wattenhofer Spamato An Extendable Spam Filter System

description

 

Transcript of Spamato

Page 1: Spamato

SP.a.M/\TØ

by

Keno AlbrechtNicolas Burri

Roger Wattenhofer

Spamato

An Extendable Spam Filter System

Page 2: Spamato

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

Motivation

• Countless number of different spam filters– Google: 1,740,000 hits (not spam filters)– Freshmeat/Sourceforge: 404/420 projects– Several "once-only" research projects

• Client-side filtering (vs. server-side)– Email Client Add-On: Outlook (Express), …– Proxy: Mediator between Client and Server– Stand-alone: Proprietary “email clients”

Page 3: Spamato

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

Project Goal

• Build an extendable spam filter system to…– ease the development of filters; provide filter

container – help implementing tools for common tasks– support as many email clients as possible

• Encourage filter developers to use our framework

Page 4: Spamato

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

Subject: Free Spam Filter SystemTo: [email protected]: [email protected]

Dear Spam Filter Developer,

This is your once-in-a-lifetime opportunity to use the free spam filter system Spamato. Spamato aims to bring a practical, easy-to-use, and effective spam filter technology to the user’s desktop. It has been designed to be used primarily as an add-on for several email clients. The combination of multiple filtering techniques leads to a high spam detection rate and a low false-positive rate. It offers a variety of features that simplifies your life as a spam filter developer.

Do not reinvent the wheel!Write your filter in an instance!

Use Spamato!Visit our homepage at http://www.spamato.net. To unsubscribe click here.

The Spamato-Team

Page 5: Spamato

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

Architecture

Java• platform independent

Depending on Add-on:• Visual Basic• Java Script• …

Page 6: Spamato

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

Filtering Process

Emails are processed in five phases:

(1) Initialization

(2) Pre-Check

(3) Check

(4) Decision

(5) Post-Check

Page 7: Spamato

• Email client receives email, forwards it to Spamato, and waits for check result.

Spamato Base

Filter 1

PreCheck(msg)

Checkpoint PreCheckveto(msg) = veto1(msg) || veto2(msg) || … || vetoN(msg)

Filter 2

PreCheck(msg)

Filter N

PreCheck(msg)

veto1(msg) veto2(msg) vetoN(msg)

. . . . .

DecisionisSpam(msg) = globalDecision(isSpam1(msg), isSpam2(msg), …, isSpamN(msg))

Post Check

Filter1

Filter2

...

FilterN

msg msg msg

isSpam1(msg) isSpam2(msg) isSpamN(msg)

isSpam(msg)

msg msg msg

Filter 1

Check(msg)

Filter 2

Check(msg)

Filter N

Check(msg)

isSpam(msg)

veto(msg) == trueignore this msg

. . . . .

msg isSpam(msg)

Filtering Process

(1) Initialization

Page 8: Spamato

• Veto against further processing

(Configuration, Sender-whitelist)• Gain information for other plugins (URL extractor)

Filtering Process

(2) Pre-Check

Spamato Base

Filter 1

PreCheck(msg)

Checkpoint PreCheckveto(msg) = veto1(msg) || veto2(msg) || … || vetoN(msg)

Filter 2

PreCheck(msg)

Filter N

PreCheck(msg)

veto1(msg) veto2(msg) vetoN(msg)

. . . . .

DecisionisSpam(msg) = globalDecision(isSpam1(msg), isSpam2(msg), …, isSpamN(msg))

Post Check

Filter1

Filter2

...

FilterN

msg msg msg

isSpam1(msg) isSpam2(msg) isSpamN(msg)

isSpam(msg)

msg msg msg

Filter 1

Check(msg)

Filter 2

Check(msg)

Filter N

Check(msg)

isSpam(msg)

veto(msg) == trueignore this msg

. . . . .

msg isSpam(msg)

Page 9: Spamato

• Each filter calculates the spam probability

Filtering Process

(3) CheckSpamato Base

Filter 1

PreCheck(msg)

Checkpoint PreCheckveto(msg) = veto1(msg) || veto2(msg) || … || vetoN(msg)

Filter 2

PreCheck(msg)

Filter N

PreCheck(msg)

veto1(msg) veto2(msg) vetoN(msg)

. . . . .

DecisionisSpam(msg) = globalDecision(isSpam1(msg), isSpam2(msg), …, isSpamN(msg))

Post Check

Filter1

Filter2

...

FilterN

msg msg msg

isSpam1(msg) isSpam2(msg) isSpamN(msg)

isSpam(msg)

msg msg msg

Filter 1

Check(msg)

Filter 2

Check(msg)

Filter N

Check(msg)

isSpam(msg)

veto(msg) == trueignore this msg

. . . . .

msg isSpam(msg)

Page 10: Spamato

• The overall spam probability is calculated and returned to the email client

Filtering Process

(4) Decision

Spamato Base

Filter 1

PreCheck(msg)

Checkpoint PreCheckveto(msg) = veto1(msg) || veto2(msg) || … || vetoN(msg)

Filter 2

PreCheck(msg)

Filter N

PreCheck(msg)

veto1(msg) veto2(msg) vetoN(msg)

. . . . .

DecisionisSpam(msg) = globalDecision(isSpam1(msg), isSpam2(msg), …, isSpamN(msg))

Post Check

Filter1

Filter2

...

FilterN

msg msg msg

isSpam1(msg) isSpam2(msg) isSpamN(msg)

isSpam(msg)

msg msg msg

Filter 1

Check(msg)

Filter 2

Check(msg)

Filter N

Check(msg)

isSpam(msg)

veto(msg) == trueignore this msg

. . . . .

msg isSpam(msg)

Page 11: Spamato

• Learn from global decision

• Collect statistics

• Play sound

Filtering Process

(5) Post-Check

Spamato Base

Filter 1

PreCheck(msg)

Checkpoint PreCheckveto(msg) = veto1(msg) || veto2(msg) || … || vetoN(msg)

Filter 2

PreCheck(msg)

Filter N

PreCheck(msg)

veto1(msg) veto2(msg) vetoN(msg)

. . . . .

DecisionisSpam(msg) = globalDecision(isSpam1(msg), isSpam2(msg), …, isSpamN(msg))

Post Check

Filter1

Filter2

...

FilterN

msg msg msg

isSpam1(msg) isSpam2(msg) isSpamN(msg)

isSpam(msg)

msg msg msg

Filter 1

Check(msg)

Filter 2

Check(msg)

Filter N

Check(msg)

isSpam(msg)

veto(msg) == trueignore this msg

. . . . .

msg isSpam(msg)

Page 12: Spamato

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

Filters

• Bayesianato: Naïve Bayesian-based filter

• Ruleminator: Rule-based filter

• Razor(Ephemeral): Hash-based filter» Vipul’s Razor: http://razor.sourceforge.net

• URL-based filters:– Domainator: Search engine (“Google”) filter– Earlgrey: Our collaborative multi-domain filter– Razor(Whiplash): Collaborative single-domain filter

Page 13: Spamato

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

URL/URI/Domain Filtering

• About 70,000 spam emails investigated– ~76% with at least one domains, thereof…

• ~20% with more than one distinct domain• ~2% with ten or more distinct domains

• Spammers obfuscate their messages for the (sole) purpose of misleading URL filters!

• How to handle “fake” (including ham) domains? How to find “spam” domains?

Page 14: Spamato

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

URL-Filters in Comparison

D E R/W NOT ONLY

D 26.5% 1.1% 27.3% 0.6%

E 11.7% 2.5% 42.1% 2.0%

R/W 25.2% 41.4% 3.1% 15.6%

26.5% (1.1%) of all spam messages were identified by the Domainator, but not by the Earlgrey (Razor/Whiplash) filter. 27.3% of all messages were not identified by the Domainator, and 0.6% of all spam messages were solely identified by it.

Page 15: Spamato

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

Conclusion & Future Work

• Spamato eases the implementation and deployment of spam filters and tools. It can be used with all email clients. It is open source.

• A multi-faceted (URL-) filtering approach is reasonable.

• TODO:– Integration of more filters and improved analysis tools– Decision module (dynamic weighting of filter results)– Trust system for collaborative filters

Page 16: Spamato

Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005

Thank you!

Questions?Comments?

(Un)[email protected]@spamato.nethttp://www.spamato.nethttp://sf.net/projects/spamato