Spamming Botnets: Signatures and Characteristics

34
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, 2008. Presented by: Arnold Perez

description

Spamming Botnets: Signatures and Characteristics. Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, 2008 . Presented by: Arnold Perez. Outline. Introduction Goals AutoRE Challenges Design Results Botnet characteristics Contributions - PowerPoint PPT Presentation

Transcript of Spamming Botnets: Signatures and Characteristics

Page 1: Spamming Botnets: Signatures and Characteristics

Spamming Botnets: Signatures and CharacteristicsYinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, 2008.

Presented by: Arnold Perez

Page 2: Spamming Botnets: Signatures and Characteristics

Outline

Introduction Goals AutoRE

Challenges Design Results

Botnet characteristics Contributions Weaknesses

Page 3: Spamming Botnets: Signatures and Characteristics

Introduction

Botnets are commonly used for profitBotnets rented out to spammers

Botnets can send spam emails at a large scaleCan transmit thousands of emails in a short

duration Difficult to detect and blacklist individual bots

Page 4: Spamming Botnets: Signatures and Characteristics

Goals

Understand the behaviors of botnets from the perspective of large email servers that are popular targets Identify botnet characteristics and trendsTrack sending behavior and content patterns

Develop a framework (AutoRE) that identifies botnet hosts by generating botnet spam signatures from emails

Page 5: Spamming Botnets: Signatures and Characteristics

AutoRE

Motivated by recent success of signature based worm and virus detection systemsBotnet spam emails are often sent in an

aggregate fashion, resulting in content prevalence similar to worm propagation

Focus primarily on URLs embedded in the email

Page 6: Spamming Botnets: Signatures and Characteristics

AutoRE Challenges

Spammers often add random, legitimate URLs to content in order to increase the perceived legitimacy of emails

Page 7: Spamming Botnets: Signatures and Characteristics

AutoRE Challenges

Spammers use URL obfuscation techniques to evade detection

Page 8: Spamming Botnets: Signatures and Characteristics

AutoRE Design

Page 9: Spamming Botnets: Signatures and Characteristics

AutoRE Design

InputSet of unlabeled email messages

OutputSet of spam URL signatures

Complete URL string URL regular expression

List of botnet host IP addresses

Page 10: Spamming Botnets: Signatures and Characteristics

AutoRE Design

Comprised of three modules URL preprocessor

Extracts URLs and other relevant fields and groups them according to web domain

Group selector Selects URL groups with the highest degree of burstiness in

sending times

RegEx generator Extracts signatures by processing one group at a time

Page 11: Spamming Botnets: Signatures and Characteristics

URL Pre-Processing

Extracts URL string Source server IP address Email sending time

Partitions into groups based on web domains Emails from same spam campaign always advertise

the same product or service from the same domain

Page 12: Spamming Botnets: Signatures and Characteristics

URL Group Selection

Each email my belong to more than one groupUse the bursty property of botnet email traffic

Select group that exhibits the strongest temporal correlation across a large set of distributed senders

Page 13: Spamming Botnets: Signatures and Characteristics

Signature Generation and Botnet Identification Two types of signatures

Complete URL based signatureRegular expression signatures

Signature criteriaDistributedBurstySpecific

Page 14: Spamming Botnets: Signatures and Characteristics

Signature Generation and Botnet Identification Distributed

Total number of Autonomous Systems (AS) spanned by source IP addresses must be at least 20

Bursty The set of matching URLs must be sent within 5 days

Specific Complete URLs are specific by definition For regex, entropy reduction is used to test.

Probability of a random string matching signature is 1/(2^90)

Page 15: Spamming Botnets: Signatures and Characteristics

Automatic URL Regular Expression Generation

Page 16: Spamming Botnets: Signatures and Characteristics

Signature Tree Construction

Constructs a keyword-based signature tree where each node corresponds to a substring, with the root of the tree set to the domain nameKeywords are the most frequent substrings

that are both bursty and distributed

Page 17: Spamming Botnets: Signatures and Characteristics

Signature Tree Construction

Page 18: Spamming Botnets: Signatures and Characteristics

Regular Expression Generation

DetailingReturns a domain specific regular expression

using the keyword-based signature Generalization

Returns a more general domain-agnostic regular expression by merging very similar domain-specific expressions

Page 19: Spamming Botnets: Signatures and Characteristics

Regular Expression Generation

Page 20: Spamming Botnets: Signatures and Characteristics

Datasets and Results

Based on randomly sampled Hotmail email messages November 2006 June 2007 July 2007

Total of 5,382,460 sampled emails Pre-classified as either spam or non-spam by

human user (not used by filter, used for validation purposes only)

Page 21: Spamming Botnets: Signatures and Characteristics

AutoRE Results

Identified 7,721 botnet spam campaigns 580,466 spam messages 340,050 distinct botnet host IP addresses 5,916 AS

Page 22: Spamming Botnets: Signatures and Characteristics

AutoRE Results

Page 23: Spamming Botnets: Signatures and Characteristics

AutoRE Results

Majority of the campaigns belong to CU category 100% increase from July 2007 when compared

to Nov 2006 Spam volume increased 50% in same time

period Total number of botnet IPs does not increase

proportionally, suggesting that each botnet is being used more aggressively

Page 24: Spamming Botnets: Signatures and Characteristics

False Positive Rate

Rate = non spam matching signature / total number of non spam

Page 25: Spamming Botnets: Signatures and Characteristics

Ability to Detect Future Spam

Experiment Apply signatures derived in Nov 2006 and June 2007 to the

emails collected in July 2007

Nov 2006 signatures are not useful Indicates that spam URL patterns evolve over time

June 2007 signatures are highly effective RE signatures are more robust than CU signatures over time

Page 26: Spamming Botnets: Signatures and Characteristics

Regular Expressions vs Keyword Conjunctions Identical spam detection rates Difference is in false positive rate

Page 27: Spamming Botnets: Signatures and Characteristics

Domain-specific vs Domain-Agnostic Signatures Generalization effectively preserves the stable

structures of polymorphic URLs while removing the volatile domain substrings

Page 28: Spamming Botnets: Signatures and Characteristics

Botnet Characteristics

Distribution of IP addresses indicate botnet menace is a global phenomenon, with China, Korea, France, and USA having significant number of IP addresses

Page 29: Spamming Botnets: Signatures and Characteristics

Botnet Characteristics

When viewed individually, botnet hosts do not exhibit distinct sending patternsContent in email is quite different even though

the target web pages are the same 50% of botnet spam campaigns have a

standard deviation of less than 1.81 hours, while 90% have standard deviation of less than 24 hours.

Page 30: Spamming Botnets: Signatures and Characteristics

Botnet Characteristics

Similar number of recipients per email Share a constant connection rate

Most likely due to rate control seen in botnet software

Large number of campaigns share the same domain-agnostic regular expression signatures Same botnets participating in multiple spam

campaigns

Page 31: Spamming Botnets: Signatures and Characteristics

Contributions

AutoRE, a framework that automatically generates URL signatures for spamming botnet detection

Several important findings about botnet spam Botnet hosts spread across the internet No distinctive pattern when viewed individually Botnet host sending patterns

Page 32: Spamming Botnets: Signatures and Characteristics

Weaknesses

The AutoRE system analyzes batches of emails after they are all received Would be better if we could do this in real time to stop

email once a campaign has been identified and a signature created

The AutoRE system needs a lot of emails to work effectively. We can’t use it on individual inboxes, it must be put

between the ISP and the incoming email

Page 33: Spamming Botnets: Signatures and Characteristics

Weaknesses

I was hoping to take the characteristics found in the paper to use in my own projectPaper shows that individually you can not

identify spam from botnets. The AutoRE system works on group behavior.

Page 34: Spamming Botnets: Signatures and Characteristics

References

"Spamming Botnets: Signatures and Characteristics". Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, 2008.