PhishNet : Predictive Blacklisting to Detect Phishing Attacks

30
PhishNet: Predictive Blacklisting to Detect Phishing Attacks Pawan Prakash Manish Kumar Ramana Rao Kompella Minaxi Gupta Purdue University, Indiana University INFOCOM (March, 2010) 2010/8/03 1

description

PhishNet : Predictive Blacklisting to Detect Phishing Attacks. Pawan Prakash Manish Kumar Ramana Rao Kompella Minaxi Gupta Purdue University, Indiana University INFOCOM (March, 2010) . Outline. Introduction Component1 Component2 Evaluation Related Work Conclusion. Phishing Attacks. - PowerPoint PPT Presentation

Transcript of PhishNet : Predictive Blacklisting to Detect Phishing Attacks

Page 1: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

1

PhishNet: Predictive Blacklisting to

Detect Phishing AttacksPawan PrakashManish Kumar

Ramana Rao Kompella Minaxi Gupta

Purdue University, Indiana UniversityINFOCOM (March, 2010)

2010/8/03

Page 2: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

2

Outline

• Introduction• Component1• Component2• Evaluation• Related Work• Conclusion

2010/8/03

Page 3: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

3

Phishing Attacks

• Simplicity and Ubiquity of the Web• Attract several miscreants– lure innocent to revealing sensitive information

• Above all. Today, such an miscreants attack of common and increasing by day is Phishing– http://www.antifishing.org/events/events.html– APWG’s anti-phishing contents of work

2010/8/03

Page 4: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

4

Popular Solution

• Add additional feature s within an Internet browser

• Often provided by a mechanism is known as Blacklisting– Simple to design and easy to implement

• Major problem: Incompleteness– cyber-criminals are extremely savvy so that they are

easy to evade blacklists

2010/8/03

Page 5: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

5

Observation

• Malicious URLs do often tend to occur in groups that are close to each other either syntactically or semantically– www1.rogue.com, www2.rogue.com– two URLs with hostnames resolves to the same IP

address

2010/8/03

Page 6: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

6

Implication

• First, discover new sources of maliciousness in and around the original blacklist entries and add them into blacklist

• Second, exact match implementation of a blacklist to an approximate match that is aware of several of the legal mutations that often exist within these URLs

2010/8/03

Page 7: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

7

PhishNet

• Comprise two major components:A. a URL prediction component

B. an approximate URL matching component

2010/8/03

Page 8: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

8

Outline

• Introduction• Component1• Component2• Evaluation• Related Work• Conclusion

2010/8/03

Page 9: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

9

Predicting Malicious URLs

• Predicting new URLs from existing blacklist entries– e.g., PhishTank, http://www.phishtank.com/index.php

• Use five heuristics for generating new URLs• Basic idea– combine pieces of known phishing URLs(parent) from

a blacklist to generate new URLs(child)

• Then, test the existence of these child URLs using a verification process

2010/8/03

Page 10: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

10

Heuristics• H1, Replacing TLD (Top Level Domain)– find such variants of original blacklist entries obtained

by changing the TLDs– use 3210 effective TLDs (eq, co.in)

• H2, IP address equivalence– URLs have same IP address are grouped together into

clusters– create new URLs by considering all combinations of

hostnames and pathnames

2010/8/03

Page 11: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

11

Heuristics (Conti.)

• H3, Directory structure similarity– URLs with similar directory structure are grouped

together– build new URLs by exchanging the filenames among

URLs belonging to the same group

• H4, Query string substitution– build new URLs by exchanging the query strings

among URLs

2010/8/03

Page 12: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

12

Heuristics (Conti.)

• H5, Brand name equivalence– phishers often target multiple brand name using the

same URL structure– build new URLs by substituting brand names

occurring in phishing URLs with other brand names

2010/8/03

Page 13: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

13

Verificaiton

• Eliminate URLs that are either non-existent or are non-phishing sites– conduct DNS lookup– establish a connection to the corresponding server– initiate a HTTP GET request to obtain content from the

server– if request is successful, use publicly available detection

tool

2010/8/03

Page 14: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

14

Outline

• Introduction• Component1• Component2• Evaluation• Related Work• Conclusion

2010/8/03

Page 15: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

15

Approximate Matching

• Determine whether a given URL is a phishing site or not

• Perform approximate match of a given URL to the entries in the blacklist by first breaking the input URL into four different entities– IP address– hostname– directory structure– brand name

2010/8/03

Page 16: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

16

Approximate Matching (Conti.)

2010/8/03

Page 17: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

17

Approximate Matching (Conti.)

• M1: Matching IP address– drect match– assign a normalized score based on the number of

blacklist entries that map to a given IP address– IP address IPi is common to ni URLs– scores computing as following:

2010/8/03

Page 18: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

18

Approximate Matching (Conti.)

• M2: Matching hostname– classify between WHS (Web Host Service)and non-

WHS

2010/8/03

Page 19: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

19

Approximate Matching (Conti.)

• M2: Matching hostname– A. Matching WHSes:• if match succeeds, confidence score is computed using

(1), on the number of URLs that have the same primary domain

– B. Matching non-WHSes:• based on syntactic similarity across labels• If match succeeds, confidence score is computed

using(1), ni referring to the number of URLs that match a given regular expression

2010/8/03

Page 20: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

20

Approximate Matching (Conti.)

• M3: Matching directory structure– If match succeeds, confidence score is computed

using(1), ni representing the number of URLs corresponding to a directory structure in the hash map

• M4: Matching brand names– If match succeeds, confidence score is computed

using(1), ni being the number of occurrences of the brand name

2010/8/03

Page 21: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

21

Outline

• Introduction• Component1• Component2• Evaluation• Related Work• Conclusion

2010/8/03

Page 22: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

22

Predicting Malicious URLs

• Collected URLs over a period of 24 days starting from 2nd July 2009 to 25th July 2009

• Generated almost 1.55 million child URLs from the approximately 6,000 parent URLs

• About 34489 out of 1.55 million could be fetched (%2), be compared with parent URL using page similarity tool– http://www.webconfs.com– greater than 90% similarity are reported as our new

malicious URLs2010/8/03

Page 23: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

23

Approximate Matching

• Effectiveness and the URL processing time• Use data from four sources• The experimental setup consists of two phases

—training and testing• For evaluation, use the following weight to

different normalized scores :– W(M1) = 1.0– W(M2) = 1.0– W(M3) = 1.5– W(M4) = 1.5

2010/8/03

Page 24: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

24

Approximate Matching

2010/8/03

Page 25: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

25

Approximate Matching

2010/8/03

14% of URLs Gen

80% of 14% of URLs Gen(11.2% of URLs Gen)

2.2% of 14% of URLs Gen(2% of URLs Gen)

Page 26: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

26

Outline

• Introduction• Component1• Component2• Evaluation• Related Work• Conclusion

2010/8/03

Page 27: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

27

Related Work

• APWG regularly publishes facts and figures about phishing such as list of TLDs and brand names targeted, trends in phishing URL structure

• Highly Predictive blacklisting– Rely on tens of thousands of features based on extra

information from outside sources such as WhoIS, registrar information

2010/8/03

Page 28: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

28

Outline

• Introduction• Component1• Component2• Evaluation• Related Work• Conclusion

2010/8/03

Page 29: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

29

Conclusion

• Blacklisting is the most common technique to defend against phishing attacks

• PhishNet suffers from low false positives and is remarkably effective at flagging new URLs that were not part of the original blacklist

2010/8/03

Page 30: PhishNet : Predictive Blacklisting to Detect Phishing Attacks

30

THANK YOU

2010/8/03