Email Spam Filtering Computer Security Seminar

of 32 /32
06/16/22 06/16/22 Email Spam Filtering - Muthiyalu Jo Email Spam Filtering - Muthiyalu Jo thir thir 1 Email Spam Email Spam Filtering Filtering Computer Security Seminar Computer Security Seminar N.Muthiyalu Jothir – 271120 N.Muthiyalu Jothir – 271120 Media Informatics Media Informatics

Embed Size (px)

description

Email Spam Filtering Computer Security Seminar. N.Muthiyalu Jothir – 271120 Media Informatics. Agenda. What is Spam ? Statistics Who Benefits from it? Spam Filtering Techniques Combining Filters Conclusion. What is Spam?. Spam  Unsolicited email - PowerPoint PPT Presentation

Transcript of Email Spam Filtering Computer Security Seminar

  • Email Spam FilteringComputer Security SeminarN.Muthiyalu Jothir 271120Media Informatics

    Email Spam Filtering - Muthiyalu Jothir

  • AgendaWhat is Spam ?StatisticsWho Benefits from it?Spam Filtering TechniquesCombining FiltersConclusion

    Email Spam Filtering - Muthiyalu Jothir

  • What is Spam?Spam Unsolicited email Emails that involves sending identical or nearly identical messages to thousands (or millions) of recipients.

    Caution !SPAM - Spiced Ham is a popular American canned meat brand

    Email Spam Filtering - Muthiyalu Jothir

  • Problem With a tiny investment, a spammer can send over 100,000 bulk emails per hour.

    Junk mails waste storage and transmission bandwidth.

    ISPs investment Cost we absorb as ISPs customer

    Spam is a problem because the cost is forced onto us, the recipient.

    Email Spam Filtering - Muthiyalu Jothir

  • Statistics

    Email Spam Filtering - Muthiyalu Jothir

  • Who benefits from Spam?

    Financial Firms e.g. MortgageLead Generators(Gain 2% of Loan value per customer data)Spammers (Share the profit with Lead Generators)RecipientInformation about interested customersRecipient replies here

    Email Spam Filtering - Muthiyalu Jothir

  • Spam Control TechniquesFight Back techniquesFiltering Techniques Reporting Spam to ISP

    Fight back filters

    Slow Senders

    Law ???

    etc.

    Challenge-Response Filtering

    Blacklists and White lists

    Content based filters Rule based Bayesian filters

    Email Spam Filtering - Muthiyalu Jothir

  • Reporting Spam To ISPsOriginal spam solutionLegitimate ISPs respond to such complaintsSpammers kicked offDisadvantageDisguised Spammers.Nave users cannot interpret the email headers

    Email Spam Filtering - Muthiyalu Jothir

  • Filters that Fight Back (FFB)

    Majority of spam contain links to web pages.

    Spam filters could auto retrieve the URLs and crawl back to those pages, which would increase the load on the server.

    If all the spam receivers do this at the same time, the server might be crashed and so the cost of spamming increases.

    Caution !

    FFB usually works with blacklists (of malicious servers) in order to avoid the attack on innocent servers.

    Email Spam Filtering - Muthiyalu Jothir

  • Filtering Techniques

    Email Spam Filtering - Muthiyalu Jothir

  • Spam Vs HamCare to be taken in any Spam filtering technique

    All the Spam could be allowed to pass thro; but, not even a single legitimate mail should be filtered.

    False Positive Legitimate mail classified as spam.

    Least false positive rate desired

    Caution : Check your junk folder before deleting

    Dont believe your Spam filter

    Email Spam Filtering - Muthiyalu Jothir

  • Challenge-Response Filtering

    Emails from unknown senders will receive an auto-reply message asking them to verify themselves

    Senders Challenged" to type in a word that is hidden within a graphic or a sound file

    Mail is forwarded to receivers inbox, only after successful response

    This technique almost filters all spam . No spammer would be interested to take the extra effort to prove him / her self.Commercial product spamarrest

    DisadvantageThis technique is rude

    Sometimes senders dont or forget to reply to the challenge

    Email Spam Filtering - Muthiyalu Jothir

  • Blacklists and White lists

    Blacklists of misbehaving servers or known spammers that are collected by several sites.

    Sender id in the email is compared with the blacklist

    White lists are complementary to black lists, and contain addresses of trusted contacts

    Use blacklists and white lists for the first level filtering (before applying content checks) and not used as the only tool for making decision.

    DisadvantageProne to wrong configurations with legitimate servers unable to exit from a list where they had been incorrectly inserted.

    Email Spam Filtering - Muthiyalu Jothir

  • Content based filters

    Not a good idea to filter mails just based on blacklists

    Wiser decision Consider the actual content of the email

    Almost all the successful spam filters use this technique

    Major types : Rule-based and Bayesian

    Email Spam Filtering - Muthiyalu Jothir

  • Rule Based FiltersRule based filters work based on some static rules to decide whether a mail is a spam or not.

    Rules could bewords and phraseslots of uppercase charactersexclamation pointsspecial charactersWeb linksHTML messagesbackground colorscrazy Subject lines etc.

    Email Spam Filtering - Muthiyalu Jothir

  • Rule based filtersRules are given scores, based on importance

    Incoming mails are parsed and checked for known malicious patterns

    Total score calculated for the triggered rules

    If Final Score > Threshold, classify as spam. Otherwise, classify as legitimate mail.

    Threshold decided by the user.

    Email Spam Filtering - Muthiyalu Jothir

  • Rule Based FiltersSpamassasin, a popular spam filtering product uses rule based filtering.

    Perl Regex (Regular expressions) used for pattern checking

    Example rulesheader __LOCAL_FROM_NEWS From /[email protected]\.com/i

    body __LOCAL_SALES_FIGURES /\bMonthly Sales Figures\b/

    score LOCAL_NEWS_SALES_FIGURES 0.8

    Email Spam Filtering - Muthiyalu Jothir

  • Rule Based FiltersAdvantageEasy to implement No training required

    DisadvantageStatic rules too generalSpammers find new ways to deceive the rules

    Email Spam Filtering - Muthiyalu Jothir

  • Bayesian FiltersBayesian filters are the latest in spam filtering technology and the most successful.

    Bayes classifiers were used extensively in the field of pattern recognition.

    Given an unlabeled example, the classifier will calculate the most likely classification with some degree of probability.

    Email Spam Filtering - Muthiyalu Jothir

  • Bayesian FiltersSteps in Bayes FilteringTrainingValidationImplementation

    Training starts with two collections of mails : one of spam and one of legitimate mail.

    For every word in these emails, it calculates a spam probability based on the proportion of spam occurrences.

    Bayesian filters are quite accurate, and adapt automatically as spam evolves.

    False positives are minimized by Bayesian filtering because they consider evidence of innocence as well as evidence of spam.

    Email Spam Filtering - Muthiyalu Jothir

  • Bayesian FilteringBayes Probability,

    Pr (spam | words) = Pr (spam) * Pr (words | Spam)

    Pr (words)

    Probability closer to 1 would be classified as spam and closer to 0 is classified as ham.

    0.5 is set as the threshold.

    Email Spam Filtering - Muthiyalu Jothir

  • Neural Network for TrainingNeural Network Structurei

    Email Spam Filtering - Muthiyalu Jothir

  • Neural Networks for TrainingNeural networks are used to train the spam filter (Rule-based or Bayesian) and itself is not a filter

    Input words or rules etc.

    Trained over multiple samples of the users mails (both spam and ham)

    Weights of the links are altered till the desired output is obtained.

    Email Spam Filtering - Muthiyalu Jothir

  • Supervised LearningSupervised learning Training with a teacher signal

    Train the system till we get optimized unaltered weights for the edges.

    Caution!Take care not to over train the network.

    Email Spam Filtering - Muthiyalu Jothir

  • Combining Spam Filters

    Goal Combined filter aims to improve individual filters performance.

    Combined Filter = Original Filter (OF) + Received Filter (RF)

    Max gain Received filter contains some feature sets not found in the original filter.

    E.g.Original Filter = {Share Market, Higher Studies}Received filter = {Share Market, Job Alerts}

    Email Spam Filtering - Muthiyalu Jothir

  • ChallengesDecisions (Spam / Ham) made by both filters individually

    Decisions agree No Problem

    Disagreement Due to difference of feature sets

    ChallengesHow do we select the correct decision or filter?Who selects it?

    Email Spam Filtering - Muthiyalu Jothir

  • Filter Selector (FS)Training Phase FS predicts the unique features (e.g. words) of RF

    Parse the emails of training set and extract the features

    Bag of (predicted) features for RF

    Text similarity comparison between the current e-mail's features and the feature sets of the filters.

    Email Spam Filtering - Muthiyalu Jothir

  • Algorithm FlowchartTraining PhaseFinal Verdict

    Email Spam Filtering - Muthiyalu Jothir

  • TF IDF Similarity Measure

    Commonly used in Information Retrieval applications.

    More frequent words would be key to accurate classification of emails

    FS predicted feature set is unique

    Query Document retrieval procedure.2 documents Feature setsQuery Current email

    Email Spam Filtering - Muthiyalu Jothir

  • Experiments & Results

    Email Spam Filtering - Muthiyalu Jothir

  • ConclusionWe discussed the techniques to kill spam

    Comparison between various techniques

    So far, Bayesian seems to be reliable

    Discussed a new approach to combine filters

    Future work : Learning techniques for Filter SelectorBetter Similarity measures

    Email Spam Filtering - Muthiyalu Jothir

  • Thank You

    Email Spam Filtering - Muthiyalu Jothir