Spamming Botnets: Signatures and Characteristics
description
Transcript of Spamming Botnets: Signatures and Characteristics
Spamming Botnets: Signatures and CharacteristicsYinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, 2008.
Presented by: Arnold Perez
Outline
Introduction Goals AutoRE
Challenges Design Results
Botnet characteristics Contributions Weaknesses
Introduction
Botnets are commonly used for profitBotnets rented out to spammers
Botnets can send spam emails at a large scaleCan transmit thousands of emails in a short
duration Difficult to detect and blacklist individual bots
Goals
Understand the behaviors of botnets from the perspective of large email servers that are popular targets Identify botnet characteristics and trendsTrack sending behavior and content patterns
Develop a framework (AutoRE) that identifies botnet hosts by generating botnet spam signatures from emails
AutoRE
Motivated by recent success of signature based worm and virus detection systemsBotnet spam emails are often sent in an
aggregate fashion, resulting in content prevalence similar to worm propagation
Focus primarily on URLs embedded in the email
AutoRE Challenges
Spammers often add random, legitimate URLs to content in order to increase the perceived legitimacy of emails
AutoRE Challenges
Spammers use URL obfuscation techniques to evade detection
AutoRE Design
AutoRE Design
InputSet of unlabeled email messages
OutputSet of spam URL signatures
Complete URL string URL regular expression
List of botnet host IP addresses
AutoRE Design
Comprised of three modules URL preprocessor
Extracts URLs and other relevant fields and groups them according to web domain
Group selector Selects URL groups with the highest degree of burstiness in
sending times
RegEx generator Extracts signatures by processing one group at a time
URL Pre-Processing
Extracts URL string Source server IP address Email sending time
Partitions into groups based on web domains Emails from same spam campaign always advertise
the same product or service from the same domain
URL Group Selection
Each email my belong to more than one groupUse the bursty property of botnet email traffic
Select group that exhibits the strongest temporal correlation across a large set of distributed senders
Signature Generation and Botnet Identification Two types of signatures
Complete URL based signatureRegular expression signatures
Signature criteriaDistributedBurstySpecific
Signature Generation and Botnet Identification Distributed
Total number of Autonomous Systems (AS) spanned by source IP addresses must be at least 20
Bursty The set of matching URLs must be sent within 5 days
Specific Complete URLs are specific by definition For regex, entropy reduction is used to test.
Probability of a random string matching signature is 1/(2^90)
Automatic URL Regular Expression Generation
Signature Tree Construction
Constructs a keyword-based signature tree where each node corresponds to a substring, with the root of the tree set to the domain nameKeywords are the most frequent substrings
that are both bursty and distributed
Signature Tree Construction
Regular Expression Generation
DetailingReturns a domain specific regular expression
using the keyword-based signature Generalization
Returns a more general domain-agnostic regular expression by merging very similar domain-specific expressions
Regular Expression Generation
Datasets and Results
Based on randomly sampled Hotmail email messages November 2006 June 2007 July 2007
Total of 5,382,460 sampled emails Pre-classified as either spam or non-spam by
human user (not used by filter, used for validation purposes only)
AutoRE Results
Identified 7,721 botnet spam campaigns 580,466 spam messages 340,050 distinct botnet host IP addresses 5,916 AS
AutoRE Results
AutoRE Results
Majority of the campaigns belong to CU category 100% increase from July 2007 when compared
to Nov 2006 Spam volume increased 50% in same time
period Total number of botnet IPs does not increase
proportionally, suggesting that each botnet is being used more aggressively
False Positive Rate
Rate = non spam matching signature / total number of non spam
Ability to Detect Future Spam
Experiment Apply signatures derived in Nov 2006 and June 2007 to the
emails collected in July 2007
Nov 2006 signatures are not useful Indicates that spam URL patterns evolve over time
June 2007 signatures are highly effective RE signatures are more robust than CU signatures over time
Regular Expressions vs Keyword Conjunctions Identical spam detection rates Difference is in false positive rate
Domain-specific vs Domain-Agnostic Signatures Generalization effectively preserves the stable
structures of polymorphic URLs while removing the volatile domain substrings
Botnet Characteristics
Distribution of IP addresses indicate botnet menace is a global phenomenon, with China, Korea, France, and USA having significant number of IP addresses
Botnet Characteristics
When viewed individually, botnet hosts do not exhibit distinct sending patternsContent in email is quite different even though
the target web pages are the same 50% of botnet spam campaigns have a
standard deviation of less than 1.81 hours, while 90% have standard deviation of less than 24 hours.
Botnet Characteristics
Similar number of recipients per email Share a constant connection rate
Most likely due to rate control seen in botnet software
Large number of campaigns share the same domain-agnostic regular expression signatures Same botnets participating in multiple spam
campaigns
Contributions
AutoRE, a framework that automatically generates URL signatures for spamming botnet detection
Several important findings about botnet spam Botnet hosts spread across the internet No distinctive pattern when viewed individually Botnet host sending patterns
Weaknesses
The AutoRE system analyzes batches of emails after they are all received Would be better if we could do this in real time to stop
email once a campaign has been identified and a signature created
The AutoRE system needs a lot of emails to work effectively. We can’t use it on individual inboxes, it must be put
between the ISP and the incoming email
Weaknesses
I was hoping to take the characteristics found in the paper to use in my own projectPaper shows that individually you can not
identify spam from botnets. The AutoRE system works on group behavior.
References
"Spamming Botnets: Signatures and Characteristics". Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, 2008.