Evade Hard Multiple Classifier Systems

25
Battista Biggio, Giorgio Fumera, Fabio Roli Pattern Recognition and Applications Group University of Cagliari, Italy Department of Electrical and Electronic Engineering R A P G ECAI / SUEMA 2008, Patras, Greece, July 21st - 25th Evade Hard Multiple Classifier Systems

description

Multiple classifier systems are widely used in security applications like biometric personal authentication, spam filtering, and intrusion detection in computer networks. Several works experimentally showed their effectiveness in these tasks. However, their use in such applications is motivated only by intuitive and qualitative arguments. In this work we give a first possible formal explanation of why multiple classifier systems are harder to evade, and therefore more secure, than a system based on a single classifier. To this end, we exploit a theoretical framework recently proposed to model adversarial classification problems. A case study in spam filtering illustrates our theoretical findings.

Transcript of Evade Hard Multiple Classifier Systems

Page 1: Evade Hard Multiple Classifier Systems

SUEMA 2008

Battista Biggio, Giorgio Fumera, Fabio Roli

Pattern Recognition and Applications GroupUniversity of Cagliari, ItalyDepartment of Electrical and Electronic Engineering

R AP G

ECAI / SUEMA 2008, Patras, Greece, July 21st - 25th

Evade HardMultiple Classifier Systems

Page 2: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 2SUEMA 2008

About me

• Pattern Recognition and Applications Grouphttp://prag.diee.unica.it– DIEE, University of Cagliari, Italy.

• Contact– Battista Biggio, Ph.D. student

[email protected]

Page 3: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 3SUEMA 2008

Pattern Recognition andApplications Group

• Research interests– Methodological issues

• Multiple classifier systems• Classification reliability

– Main applications• Intrusion detection in computer networks• Multimedia document categorization, Spam filtering• Biometric authentication (fingerprint, face)• Content-based image retrieval

R AP G

Page 4: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 4SUEMA 2008

Why are we working on this topic?

• MCSs are widely used in security applications,but…– Lack of theoretical motivations

• Only few theoretical works on machine learningfor adversarial classification

• Goal of this (ongoing) work– To give some theoretical background to the use of

MCSs in security applications

Page 5: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 5SUEMA 2008

Outline

• Introducing the problem– Adversarial Classification

• A study on MCSs for adversarial classification– MCS hardening strategy: adding classifiers trained on

different features– A case study in spam filtering: SpamAssassin

Page 6: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 6SUEMA 2008

• Adversarial classification– An intelligent adaptive adversary modifies patterns to

defeat the classifier.• e.g., spam filtering, intrusion detection systems (IDSs).

Adversarial Classification

• Goals– How to design adversary-

aware classifiers?– How to improve classifier

hardness of evasion?

Dalvi et al., Adversarial Classification, 10th ACM SIGKDD Int. Conf. 2004

Page 7: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 7SUEMA 2008

Definitions

X1

X2 x

+

-

Classifier

c ∈ C, concept class(e.g., linear classifier)

C : X !{+,"}

X1

X2

Adversarialcost function

(e.g., more legiblespam is better)

W : X ! X " !Each Xi is a featureInstances, x ∈ X(e.g., emails)

X1

X2 x

Instance space

X = {X

1, ... , X

N}

• Two class problem:– Positive/malicious patterns (+)– Negative/innocent patterns (-)

Dalvi et al., 2004

Page 8: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 8SUEMA 2008

Adversarial cost function• Cost is related to

– Adversary efforts• e.g., to use a different server for sending spam

– Attack effectiveness• more legible spam is better!

Example• Original spam message: BUY VIAGRA!

– Easy to be detected by classifier

• Slightly modified spam message: BU-Y V1@GR4!– It can evade classifier and be effective

• No more legible spam (uneffective message): B--Y V…!– It can evade several systems, but who will still buy viagra?

Page 9: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 9SUEMA 2008

A framework foradversarial classification

• Problem formulation– Two player game: Classifier vs Adversary

• Utility and cost functions for each player• Classifier chooses a decision function C(x) at each ply• Adversary chooses a modification function A(x) to evade classifier

• Assumptions in Dalvi et al., 2004– Perfect Information

• Adversary knows the classifier’s discriminant function C(x)• Classifier knows adversary strategy A(x) for modifying patterns

– Actions• Adversary can only modify malicious patterns at operation phase

(training process is untainted)

Dalvi et al., 2004

Page 10: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 10SUEMA 2008

In a nutshell

+

-

Classifier’s Task:Choose a new decisionfunction to minimise theexpected risk

+

-

Adversary’s Task:Choose minimum costmodifications toevade classifier

Lowd & Meek, Adversarial Learning, 11th ACM SIGKDD Int. Conf. 2005

Page 11: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 11SUEMA 2008

Mimimum costcamouflage(s)BUY VI@GRA!

x

1

Adversary’s strategy

+ x

x

2

+ x ''+

x '''

+ x '

C(x) = + C(x) = !

Too high costcamouflage(s)B--Y V…!

BUY VIAGRA!

Page 12: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 12SUEMA 2008

Classifier’s strategy

• The Classifier knows A(x) [perfect information]– Adversary-aware classifier

Dalvi et al. showed that adversary-aware classifier canperform significantly better

x

1

C(x) = + C(x) = !

x

2 + x

+ x '

x

x '

??

detected!still evades…

Page 13: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 13SUEMA 2008

Goals of this work

• Analysis of a widely used strategy for hardeningMCSs

– Using different sets of heterogeneus and redundantfeatures [Giacinto et al. (2003), Perdisci et al. (2006)]

• Only heuristic and qualitative motivations havebeen given

• Using the described framework, we give moreformal explainations about the effectiveness ofthis strategy

Page 14: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 14SUEMA 2008

An example of theconsidered strategy

• Biometric verification system

Decisionrule

genuineimpostor

Face

Voice

Fingerprint

Claimed Identity

Page 15: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 15SUEMA 2008

• Spam filtering

Black/White List

URL Filter

Signature Filter

Header Analysis

Content Analysis

Σ… Assigned class

legitimatespam

Another example of theconsidered strategy

http://spamassassin.apache.org

Page 16: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 16SUEMA 2008

Applying the frameworkto the spam filtering case

• Cost for Adversary

Σlegitimate

spam

s<th

false

Black/White List

Signature Filter

Text Classifier

Header Analysis

Keyword Filters…

s1 = 0.2

s2 = 0

s3 = 0

s4 = 2.5

sN = 3

s = 5.7

BUYVIAGRA!

s<5BUYVI@GR4!

sN = 0

s = 2.7

truetrue

false

Working assumption: changing “VIAGRA” to “VI@GR4” costs 3!

Page 17: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 17SUEMA 2008

s = 5.7s = 3.2

s1 = 3.2

s2 = 0

s3 = 0

sN = 2.5sN = 0

s = 6.2

AFM Continues to Climb. Big News OnHorizon | UP 50 % This Week

Aerofoam Metals Inc.Symbol : AFMLPrice : $ 0.10 UP AGAINStatus : Strong Buy

Applying the frameworkto the spam filtering case

Σlegitimate

spam

s<5

false

Image Analysis…

sN+1 = 3

truetrue

false

Text is embeddedinto an image!

Black/White List

Signature Filter

Text Classifier

Header Analysis

Now both text and image classifiers must be evaded to evade the filter!

Evasion costs 2.5

Evasion costs 3.0

Page 18: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 18SUEMA 2008

Forcing the adversary to surrender

• Hardening the system by adding modules canmake the evasion too costly for the adversary

– In the end, the optimal adversary strategy becomesnot fighting!

“The ultimate warrior is one who wins the war by forcing the enemy to surrender without fighting any battles”

The Art of War, Sun Tzu, 500 BC

Page 19: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 19SUEMA 2008

Experimental Setup

• SpamAssassin– 619 tests– includes a text classifier (naive bayes)

• Data set: TREC 2007 spam track– 75,419 e-mails (25,220 ham - 50,199 spam).– We used the first 10K e-mails (taken in chronological

order) for training the SpamAssassin naive Bayesclassifier.

Page 20: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 20SUEMA 2008

Experimental Setup

• Adversary– Cost simulated at score level

• Manhattan distance between test scores

– Maximum cost fixed• Rationale: higher cost modifications will make the spam

message no more effective/legible

• Classifier– We did not take into account the computational cost

for adding tests

• Performance measure– Expected utility

Page 21: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 21SUEMA 2008

Experimental Resultsmaximum cost = 1

Page 22: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 22SUEMA 2008

Experimental Resultsmaximum cost = 5

Page 23: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 23SUEMA 2008

Will spammers give up?

• Spammer economics– Goal: beat enough of the filters temporarily to get a bit

of mails through and generate a quick profit– As filters accuracy increases, spammers simply send

larger quantities of spam in order to get the same bitof mails still pass through• the cost of sending spam is negligible with respect to the

achievable profit!

• Is it feasible to push the accuracy of spam filtersup to the point where only ineffective spammessages can pass through the filters?– Otherwise spammers won’t give up!

Page 24: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 24SUEMA 2008

Future work

• Theory of Adversarial Classification– Extend the model to more realistic situations

• Investigating other defence strategies– We are expanding the framework to model

information hiding strategies [Barreno et al. (2006)]• Possible implementation: randomising the placement of

the decision boundary

“Keep the adversary guessing. If your strategy is a mystery, it cannot be counteracted. This gives you a significant advantage”

The Art of War, Sun Tzu, 500 BC

Page 25: Evade Hard Multiple Classifier Systems

21-07-2008 Evade Hard MCSs 25SUEMA 2008

Thank you!

• Contacts– [email protected]

[email protected]

[email protected]

R AP G