Email Address Harvesting

58
Produced in cooperation with: HP Technology Forum & Expo 2009 © 2009 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Email Address Harvesting Michael Lamont Senior Software Engineer June 17, 2009

description

HP Tech Forum 2009 presentation covering some of the ways spammers harvest email addresses on the Internet (and how you can prevent it), including an in-depth look at three commonly used software packages.

Transcript of Email Address Harvesting

Page 1: Email Address Harvesting

Produced in cooperation with: HP Technology Forum & Expo 2009

© 2009 Hewlett-Packard Development Company, L.P.

The information contained herein is subject to change without notice

Email Address Harvesting Michael Lamont

Senior Software Engineer

June 17, 2009

Page 2: Email Address Harvesting

Overview

• What is email address harvesting?

• How do spammers do it?

• What can you do about it?

• Examples of harvesting software

Page 3: Email Address Harvesting

Mandatory Definition Slide

• Email address harvesting is the process used by spammers to extract email addresses from public sources.

• Common sources:

− Web sites

− Newsgroups

− Mailing lists

− Chat rooms

Page 4: Email Address Harvesting

Mandatory “How Bad Is It?” Slide

• FTC: 86% of all email addresses posted on web pages receive spam.

• FTC: 93% of all email addresses used in newsgroups receive spam.

• PSC honeypot record: Address received spam 4 minutes after being included in a newsgroup post.

Page 5: Email Address Harvesting

Address Lists

• Spammers use address harvesting to build giant lists of addresses to send spam to.

• Most lists have 1-20 million addresses.

• Spammers sell/share their lists, so being on even just one list will get you a lot of spam.

Page 6: Email Address Harvesting

Evolution Of The Address List

• Somebody (probably not even a spammer) harvests addresses from various sources.

• A “good” harvester scrubs the list.

• The harvester sells the list to lots of spammers.

• Once your address is on a list, it’s going to be on one or more lists forever.

Page 7: Email Address Harvesting

Harvesting From Web Sites

• Spammers usually use a spider program to scrape addresses off of web pages.

Page 8: Email Address Harvesting

Harvesting From Web Sites

Page 9: Email Address Harvesting

Harvesting From Web Sites

• Web directories make it easy to get lots of addresses

Page 10: Email Address Harvesting

Harvesting From Web Sites

10 22 July 2014

Page 11: Email Address Harvesting

UseNet Newsgroups

• Spider programs exist to extract these addresses as well.

• Email addresses are splattered all over:

− Message headers

− Signatures

− Attributions

Page 12: Email Address Harvesting

Mailing Lists

• Lots of list manager software provides a list of every email address on a list.

• Spammers are happy to join a mailing list temporarily to get access to a list of subscribers.

• Some clever spammers send an innocuous newbie question from the list archives with a read-receipt request.

Page 13: Email Address Harvesting

3rd Party Mailing Lists

• People you’ve provided your address to provide it to 3rd parties (usually for profit).

• Example: Auto insurance quote

• Initial sale of list might be aboveboard, but lists have a way of trickling down to less desirable senders.

Page 14: Email Address Harvesting

Web Browser Holes

• Newer browsers have eliminated most of these, but they’re still common in older browsers.

• Extraction of email address from HTTP_FROM header that browser sends to web server.

• JavaScript to extract email address from browser’s configuration.

Page 15: Email Address Harvesting

Web Browser Holes

• Force browser to fetch an image on a page by anonymous FTP.

− Most browsers use the configured email address as the password.

• JavaScript action that sends an email message in the background on page load.

Page 16: Email Address Harvesting

Chat Rooms

• Web bots monitor chat rooms and extract user names.

• Lots of providers (AOL, Yahoo) use the same profile names for both chat rooms and email.

• IRC used to be fertile harvesting ground, but it’s fallen into disuse by less savvy users.

Page 17: Email Address Harvesting

Domain Contacts

• Every registered domain name has one or more contact addresses.

• Addresses are publicly accessible (WHOIS)

• Addresses are almost always valid and read by a real person on a regular basis.

Page 18: Email Address Harvesting

Guessing

• Spammers “guess together” a list of email addresses.

• The addresses are tested against one or more email servers.

• Valid addresses are added to a list of addresses to be spammed.

• Usually referred to as directory harvesting.

Page 19: Email Address Harvesting

CAN-SPAM

• Federal CAN-SPAM act explicitly makes email address harvesting illegal.

• Some providers of the harvesting software have ceased and desisted, but harvesting has actually increased.

• Like most legal solutions, CAN-SPAM is severely constrained by jurisdictional boundaries.

Page 20: Email Address Harvesting

Harvesting Prevention

• The harder it is for spammers to get your address, the harder it is for them to spam you.

• “I don’t care – my spam filter is awesome. Bring it on!”

• No filter is 100% accurate

• Filtering still places load on filtering system and/or email server.

Page 21: Email Address Harvesting

Prevention Methods

• Reformatting addresses

• Web forms

• JavaScript-generated mailto links

• Graphical addresses

• Throwaway addresses

Page 22: Email Address Harvesting

Reformatting Addresses

• Prevents harvesting from web pages and newsgroups.

• Simple examples include inserting bogus strings into the address to make it invalid:

[email protected]

[email protected]

Page 23: Email Address Harvesting

Reformatting Addresses

• Writing the address out longhand can prevent harvesters from recognizing it as an email address:

jdoe at hp dot com

• Inserting extra whitespace can also help:

jdoe @ hp.com

jdoe @ hp.com

Page 24: Email Address Harvesting

Reformatting Addresses

• ASCII-encoded characters in the address are decoded by most web clients, but not by most spamware:

jdoe@p&#

114;ocess&#

046;com

Page 25: Email Address Harvesting

Web Forms

• Provide an HTML form for web site visitors to enter a message.

• When the form is submitted, the CGI script mails the message to the appropriate recipient.

• Avoids displaying the actual address anywhere on the site.

• Can still be abused, but it’s relatively difficult to do.

Page 26: Email Address Harvesting

Web Forms

Page 27: Email Address Harvesting

JavaScript Generated mailtos

• Use JavaScript to dynamically generate mailto: link when the link is clicked.

<A HREF=„javascript:window.location=

“mail”+”to:”+”jdoe”+”@”+”hp”+”.”+”com”; return

true‟>Click here to mail John Doe</A>

Page 28: Email Address Harvesting

Graphical Addresses

• Displaying all or part of an email address as a graphical image will throw off most harvesting software.

• No known harvesting software is OCR-capable.

− Anecdotal reports of at least one large spam organization trying to develop accurate OCR harvesters

Page 29: Email Address Harvesting

Graphical Address Complexity

• Graphical @ sign:

− Probably sufficient to throw off most harvesters.

− Username and hostname are still in close proximity.

− Works easily for multiple users/multiple domains.

jdoe hp.com

Page 30: Email Address Harvesting

Graphical Address Complexity

• Graphical @hostname:

− Should prevent any harvester from working.

− Requires a different image for each email domain.

jdoe

Page 31: Email Address Harvesting

Graphical Address Complexity

• Graphical everything:

− For the truly paranoid.

− Completely unreadable by harvesters unless they’re OCR-enabled.

− Requires either a lot of images or a script that can dynamically generate them.

Page 32: Email Address Harvesting

Throwaway Addresses

• Many people create an email account that they use only for web pages and newsgroups.

• Some software products go further and let you create an alias for every occasion.

• You still need a static address for business cards, resumes, etc.

Page 33: Email Address Harvesting

Harvesting Software

• Tons of specialized software (spamware) used by spammers to harvest addresses.

• Most spamware developed in Eastern Europe and Asia.

• We’re going to look at several of the most popular packages.

Page 34: Email Address Harvesting

List Harvester

• Harvests addresses from web sites.

• “Targeted” harvesting - in theory, the harvested email addresses have something in common.

• Appears to be based in China.

• http://www.listharvester.com

• Price: $699 US

Page 35: Email Address Harvesting

List Harvester - Method

• Performs a search for one or more keywords on the user’s choice of search engine.

• Parses every site returned by the search engine in order, looking for addresses and links.

• Follows links to other pages and parses them for addresses as well.

Page 36: Email Address Harvesting

List Harvester

• Start screen:

Page 37: Email Address Harvesting

List Harvester

• Search terms entry:

Page 38: Email Address Harvesting

List Harvester

• Search parameters:

Page 39: Email Address Harvesting

List Harvester

• Search filters:

Page 40: Email Address Harvesting

List Harvester

• Parsing engine options:

Page 41: Email Address Harvesting

List Harvester

• Saving list of extracted addresses:

Page 42: Email Address Harvesting

List Harvester

• Harvesting in progress:

Page 43: Email Address Harvesting

Atomic Email Hunter

• Harvests addresses from web sites.

• Either scans an entire web site for addresses or performs a “targeted search” like List Harvester.

• Based in Russia, most likely Moscow.

• http://www.massmailsoftware.com/

• Price: $79.85 US

Page 44: Email Address Harvesting

Atomic Email Hunter

• Start screen:

Page 45: Email Address Harvesting

Atomic Email Hunter

• Web download settings:

Page 46: Email Address Harvesting

Atomic Email Hunter

• Address filtering settings:

Page 47: Email Address Harvesting

Atomic Email Hunter

Run:

Page 48: Email Address Harvesting

Atomic Email Hunter

• Results:

Page 49: Email Address Harvesting

Fast Newsgroups Extractor

• Harvests addresses from newsgroups.

• Has a companion web site extractor that’s very similar to Atomic Email Hunter.

• Based in Russia, most likely Moscow.

• http://www.lencom.com

• Price: $79.00 US

Page 50: Email Address Harvesting

Fast Newsgroups Extractor - Method

• Lets user select one or more newsgroups to extract content from.

• Downloads multiple messages simultaneously from the NNTP server.

• Extracts addresses from the downloaded messages.

• Has the ability to limit downloaded messages to those that contain certain text in the subject.

Page 51: Email Address Harvesting

Fast Newsgroups Extractor

• Start screen:

Page 52: Email Address Harvesting

Fast Newsgroups Extractor

• News server setup:

Page 53: Email Address Harvesting

Fast Newsgroups Extractor

• Newsgroup list download:

Page 54: Email Address Harvesting

Fast Newsgroups Extractor

• News group selection:

Page 55: Email Address Harvesting

Fast Newsgroups Extractor

• Harvesting job setup

Page 56: Email Address Harvesting

Fast Newsgroups Extractor

• Run:

Page 57: Email Address Harvesting

Quick Review

• We talked about:

− What email address harvesting is

− What data sources are harvested

− How you can protect your addresses

− 3 software packages used by spammers to harvest addresses

Page 58: Email Address Harvesting

58 22 July 2014