Detecting Phishing in Emails

26
Detecting Phishing in Emails Srikanth Palla Ram Dantu University of North Texas, Denton

description

Detecting Phishing in Emails. Srikanth Palla Ram Dantu University of North Texas, Denton. What is Phishing?. Phishing is a form of online identity theft Employs both social engineering and technical subterfuge - PowerPoint PPT Presentation

Transcript of Detecting Phishing in Emails

Page 1: Detecting Phishing in Emails

Detecting Phishing in Emails

Srikanth Palla

Ram Dantu

University of North Texas, Denton

Page 2: Detecting Phishing in Emails

What is Phishing?

Phishing is a form of online identity theft Employs both social engineering and technical subterfuge Targets consumers' personal identity data and financial account

credentials such as credit card numbers, account usernames, passwords and social security numbers.

Social-engineering schemes use 'spoofed' e-mails to lead consumers to counterfeit websites.

-Anti Phishing Working Group

(APWG)

Page 3: Detecting Phishing in Emails

Phishing Tactics

Hijacking reputable brand names

Creating a plausible premise Redirecting URL’s Collecting confidential

information through emails

Page 4: Detecting Phishing in Emails

Do we need to restrict Phishing attacks?

Page 5: Detecting Phishing in Emails

The Statistics…

Sources: Anti Phishing Working Group

Page 6: Detecting Phishing in Emails

Problems with Current Spam Filtering Techniques

Current spam filters focus on analyzing the content

Majority of the Phishers obfuscate their email content to bypass the email filters

Labels an email as BULK and expect the recipients’ to make a decision on the authenticity of the email source

Current spam filters have high degree of false positives

Page 7: Detecting Phishing in Emails

Methodology

Our method examines: The header of the email (not content) The social network of the recipient Credibility of the source Classifies Phishers as:

Prospective Phishers Recent Phishers Suspects Serial Phishers

Page 8: Detecting Phishing in Emails

Traffic Profile

The following Figure describes the incoming email traffic profiles based on number of recipients and how often they receive the message.

LEGITIMATE

Number of Recipients in an enterprise

ANNOYANCE/COUNTERFIET/ NUISANCE

PERSONALCLUB INVITATIONS

NEWS GROUPS

BUSINESS DISCUSSIONS

STRANGERS

OPTIONAL

PRODUCTIVITY GAIN PRODUCTIVITY LOSS

DISCUSSIONTHREADS

INDIVIDUALDISCUSSIONS

PROFESSIONAL/BUSINESS

ANNOUNCEMENTS

GOOD NEWS

PROFESSIONALDISCUSSIONS

TELEMARKETING

PHISHING

Fre

quency o

f em

ails

arr

ivin

g

Page 9: Detecting Phishing in Emails

Email Corpus Traffic Profile

Our analysis requires sent email folder of the recipient

Emails provided in the TREC evaluation tool kit are spam and non spam emails

We require a mix of legitimate and phising emails to evaluate our filter

We have analyzed a live corpus of

13,843 emails, collected over 2.5 years. This corpus has a mix of legitimate, spam and phishing emails. Different categories of emails are shown in the figure

Page 10: Detecting Phishing in Emails

Experimental Setup

We deployed our classifier on a recipient’s local machine running an IMAP proxy and thunderbird (MUA).

All the recipient’s emails were fed directly into our classifier by the proxy.

Our classifier periodically scans the user’s mailbox files for any new incoming emails.

DNS-based header analysis, social network analysis, wantedness analysis were performed on each of the emails.

The end result is tagging of emails as either Phishing, Opt-outs, Socially distinct and Socially close.

Page 11: Detecting Phishing in Emails

Architecture

The architecture model of our

classifier consists of three analyses

Step 1: DNS-based header analysis

Step 2: Social network analysis Step 3: Wantedness analysis Step 4: Classification

DNS-based Header Analysis

Social networkAnalysis

ClassificationBased on

WantednessAnd

Credibility

Mail Box Phishing

Opt-outs

socially close

socially distinct

User Feed back

User Feed back

wantednessAnalysis

Page 12: Detecting Phishing in Emails

Step 1: DNS-based Header Analysis

Stage 1: In this step, we validate the information provided in the email header: the hostname position of the sender, the mail server and the relays in the rest of the path. We divide the entire corpus into two buckets. The emails which are valid for DNS lookups (Bucket 1). The emails which are not valid for DNS lookups (Bucket 2).

Stage 2: This step involves doing DNS lookup on the hostname provided in the Received: lines of the header and matching the IP address returned, with the IP address which is stored next to the hostname, by the relays during the SMTP authorization process. Bucket 1 is further divided into: Trusted bucket. Untrusted bucket.

We pass the Bucket2 and both trusted and untrusted buckets to the Social Network Analysis phase for further analysis.

Page 13: Detecting Phishing in Emails

Step 2: Social Network Analysis

Each of the three buckets: bucket2, untrusted bucket and trusted bucket received from the DNS-based header analysis are treated with the rules formulated by analyzing the “sent” folder emails of the receiver.

For instance,All emails from trusted domains will be removedFamiliarity to sender’s community Familiarity to the path traversed

The rules can be built as per the recipients’ email filtering preferences.

Page 14: Detecting Phishing in Emails

Classification of Trusted and Untrusted Senders

Email corpusSize: 13843

DNS lookupValid

Size:13087

DNS lookupInvalid

Size:1875

SociallyUntrusted

SociallyTrusted

SociallyUntrusted

SociallyTrusted

UntrustedEmails

Size: 563

TrustedEmails

Size: 13280

Phishers

Opt-outs

SociallyWanted

SociallyUnwanted

Page 15: Detecting Phishing in Emails

Step 3: Wantedness Analysis

Measuring the senders credibility (ρ):

We believe the credibility of a sender depends on the nature of his recent emails

If the recent emails sent by the sender are legitimate, his credibility increases

If the recent emails from the sender are fraudulent, his fraudulency increases

Page 16: Detecting Phishing in Emails

Credibility Drops As Time Progresses for

Untrusted Senders

ΔTeji,ρni,ρ

Page 17: Detecting Phishing in Emails

Computing Credibility

.(4)............................................................ρ1 ρ̂ yFraudulenc

.....(3)....................emailslegitimateΔTemailsfraudulentΔT

τ̂

τρ yCredibilit

2..................................emailsfraudulentΔT

1τ̂Disbelief

1..........................................emailslegitimateΔT

1τBelief

(ΔT legitimate emails) is the average time period of all legitimate email w.r.t the most recent email

(ΔT fraudulent emails) is the average time period of all fraudulent emails w.r.t the most recent email

Page 18: Detecting Phishing in Emails

Credibility of Untrusted Senders

0 20 40 60 80 100 1200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Cre

dib

ility

Valu

e

O1O2

O3

Threshold

Phishers

Optouts

Low Credible Domains eg: www.ebay.com, www.paypal.cometc

High CredibleDomains

Page 19: Detecting Phishing in Emails

Measuring Recipient’s Wantedness

Tolerance (α+) for a sender is more if the recipient reads and stores his emails for longer period

Intolerance (β-) for a sender is more if the recipient deletes his emails with out reading them

Page 20: Detecting Phishing in Emails

Measuring Wantedness

R

χ1 R γssUnwantedne

emailslegitimateΔt

emailsfraudulentΔt

urdTrdT

β

α

βeIntoleranc

αToleranceRχ Wantedness

emailsfraudulentΔt

1β eIntoleranc sRecipient'

urdT β eIntoleranc sRecipient'

emailslegitimateΔt

1α Tolerance sRecipient'

rdT α Tolerance sRecipient'

(ΔT legitimate emails) is the average time period of all legitimate email w.r.t the most recent email

(ΔT fraudulent emails) is the average time period of all fraudulent emails w.r.t the most recent emailTrd is the average storage time period of all the read emailsTurd is the average storage time period of all unread emails

Page 21: Detecting Phishing in Emails

Wantedness of Trusted Senders

Page 22: Detecting Phishing in Emails

Classification

Classification of Phishers: Credibility Vs Phishing Frequency

Classification of Trusted Senders: Credibility Vs Wantedness

Page 23: Detecting Phishing in Emails

Classification of Phishers

0 15 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Phishing Frequency

Fra

udule

ncy

Prospective PhishersHigh RiskSuspectsPhishers Under Review

RecentPhishers

Suspects

High Risk

Prospective Phishers

Page 24: Detecting Phishing in Emails

Classification of Trusted Senders

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Wantedness

Cre

dibi

lity

Spammers, Phishers, Telemarketers Socially DistinctOpt-InsFamily, Friends etc

High Risk Strangers

Socially CloseOpt-Ins

Page 25: Detecting Phishing in Emails

Summary of Results

# of emails False Positives False Negatives Precision

Corpus-I

DNS Analysis 11968 260 0 85%

{[DNS Analysis] + [Social Network Analysis]}

2548 03 05 95.6%

{[DNS Analysis] + [Social Network Analysis]+ [Wantedness Analysis]}

563 (Domains) 03 01 98.4%

Corpus-II

DNS Analysis 756 5 0 90.4%

{[DNS Analysis] + [Social Network Analysis]}

59 0 0 93.75%

{[DNS Analysis] + [Social Network Analysis]+ [Wantedness Analysis]}

148 1 0 99.2%

Precision is the percentage of messages that were classified as phishing that actually are phishing

Page 26: Detecting Phishing in Emails

Conclusions

Phishers use special software's to conceal the path taken by their emails to reach the recipient. Most of the times the path length is single hop.

Our classifier can be used in conjunction with any existing spam filtering techniques for restricting spam and phishing emails

Rather than labeling an email as BULK, based on the sender’s credibility and his wantedness, we further classify them as: Prospective phishers

Suspects

Recent phishers

Serial phishers

We classified two different email corpuses with a precision of 98.4% and 99.2% respectively