1 CCIED Looking Forward. Context After four years… u 50+ papers, multiple awards, significant...

1

CCIED Looking ForwardCCIED Looking Forward

ContextContext• After four years…

50+ papers, multiple awards, significant advances on state of the art, two new workshops, lots of tech transfer, many students trained, etc

• But…We didn’t stop Internet worms,

let alone malware, let alone cybercrime…

nor did anyone else. At best, moved it around a bit.

By any meaningful metric things are worse than when we started…

• Mistake: looking at this primarily as a technical problem

Key threat transformations Key threat transformations of the 21of the 21stst century century

• Efficient large-scale compromises Internet communications model Software homogeneity User naïveity/fatigue

• Centralized control Cheap scalability for criminal applications

(e.g. spam, info theft, DDoS, etc)

• Profit-driven applications Commodity resources

(IP, bandwidth, storage, CPU) Unique resources

(PII/credentials, CD-Keys, address book, etc)

3

Emergence of Emergence of Economic Drivers Economic Drivers • In last five years, emergence of profit-making malware

Anti-spam efforts force spammers to launder e-mail through compromised machines (starts with MyDoom.A, SoBig)

“Virtuous” economic cycle transforms nature of threat

• Commoditization of compromised hosts Fluid third-party exchange market (millions of hosts)

» Raw bots (range from pennies to dollars)» Value added tier: SPAM proxying (more expensive)

• Innovation in both host substrate and its uses Sophisticated infection and command/control networks: platform SPAM, piracy, phishing, identity theft, DDoS are all applications

DDoS for saleDDoS for sale• Emergence of economic engine for Internet crime

SPAM, phishing, spyware, etc

• Fluid third party markets for illicit digital goods/services Bots ~$0.5/host, special orders, value added tiers Cards, malware, exploits, DDoS, cashout, etc.

6

• 3.6 cents per bot week

• 6 cents per bot week

• 2.5 cents per bot week

September 2004 postings to SpecialHam.com, Spamforum.biz

>20-30k always online SOCKs4, url is de-duped and updated> every 10 minutes. 900/weekly, Samples will be sent on

> request. Monthly payments arranged at discount prices.

>$350.00/weekly - $1,000/monthly (USD) >Type of service: Exclusive (One slot only)

>Always Online: 5,000 - 6,000>Updated every: 10 minutes

>$220.00/weekly - $800.00/monthly (USD)>Type of service: Shared (4 slots)

>Always Online: 9,000 - 10,000>Updated every: 5 minutes

Botnet Spammer Rental RatesBotnet Spammer Rental Rates

Bot PayloadsBot Payloads

Structural asymmetriesStructural asymmetries

• Defenders reactive, attackers proactive Defenses public, attacker develops/tests in private Arms race where best case for defender is to “catch up”

• New defenses expensive, new attacks cheap Defenses sunk costs/business model,

attacker agile and not tied to particular technology

• Minimal deterrent effect Functional anonymity on the Internet; very hard to fix

• Defenses hard to measure, attacks easy to measure Few security metrics (no “evidence-based” security),

attackers measure monetization which drives attack quality

10

Example: brief history Example: brief history of the spam arms raceof the spam arms race

Anti-spam action

1. Real-time IP blacklisting

2. Clean up open relays/proxies

3. Content-based learning

4. Site takedown

5. CAPTCHAs

11

Spammer response

1. Send via open relays/proxies

2. Delivery via compromised botnets

3. Content chaff, polymorphic spam generators, img spam

4. Fast-flux redirect and transparent proxies

5. CAPTCHA outsourcing, OCR-based breaking

The problem The problem • We think about this in terms of technical means for

securing computer systems• Most of 50-100B IT budget on cyber security is spent on

securing the end host AV, firewalls, IDS, encryption, etc… Single most expensive front to secure Single hardest front to secure

• But individual end hosts are not that valuable to the bad guys?

Maybe $1.50? Even less in bulk…

• We need to focus on their economic bottlenecks• Which means we need to understand their economics

13

Internet Criminal EconomicsInternet Criminal Economics

• Our experience so far Underground market analysis [CCS 07] Spam [USEC ‘07, LEET ‘08/’09, CCS ’08]

• Where we’re going In-depth analysis of Market enablers Large-scale analysis of vertical markets Technical defenses based on market enablers Empirical defense assessment (“evidence-based security”)

14

Elements of the Internet Elements of the Internet “underground economy”“underground economy”• Acquisition of illicit digital goods

Tier-1 goods (e.g. credit card data, paypal, etc) » Directly valued in “real world”; single step liquidity

Tier-2 goods (e.g. bots, malware, $ services)» Valued only in UE, rented for service, or used to produce value in scam

• Trade/Sale in such goods On-line markets and market enablers (IRC/Web Forums)

• Scams (capital investment to extract new value) Combine digital goods with value creation strategy SPAM, phishing, DDoS extortion, pump/dump, etc

• Liquidation of goods (cash out) Indirect: SPAM/Adware (potentially legal), Click fraud, pump/dump,

gambling Direct: cash out (WU, eGold, WebMoney), wire transfer, card “tracking”,

mules/drops

ExampleExample

• Scammer runs phishing campaign Buys phishing kit from software specialist Buys mailing list Buys bots for mail relay or rents remailing net Buys host(s) for phishing server Gets credit cards plus PII & CVV2 info (“fulls”)

• Trade fulls on-line for money or other digital goods

• Can use to buy physical goods Drop/remailer: launders physical goods

• Cashier will cash-out fulls for percentage of take E.g. WU: drop receives cash, confirmer “fakes” true owner

Market is public channel active on independent IRC networks (#ccpower)

Common channel activity and admin. creates unified market

IRC log dataset (2.4GB) 13 million public messages From Jan. ’06 to Aug. ’06 Think QVC, not NASDAQ

Market

SS

S

IRC Network

CC

M

Msgs

…S

S

S

IRC Network

CC

M

Msgs

Dataset

Market data collectionMarket data collection

• 1. Posting advertisements Sales and want ads for

goods and services

• 2. Posting sensitive personal information

Full personal information freely pasted to channel

Establishes credibility

• Unstructured quasi-english Need automatic techniques

to identify ads and sensitive data

”have hacked hosts, mail lists, php mailer

send to all inbox”

Market

SSS

Buy, Sell, & Trade

Market ActivityMarket Activity

“i have verified paypal accounts with good balance…and i can cashout paypals”

Name: Phil PhishedAddress: 100 Scammed Ln

Phone: 555-687-5309Card Num: 4123 4567 8901 2345

Exp: 10/09 CVV:123SSN: 123-45-6789

What’s on the market?What’s on the market?Financial instrumentsFinancial instruments

i sell CVV2s at $0.90, hacked hosts at $8, paypals at 8, fullz at $10, and wells fargo logins. IM me at XXXX DO NOT ASK FOR TESTS OR FREE CARDS. Thank you :)

What’s on the market?What’s on the market?Financial servicesFinancial services

i am boa cashout and wellsfargo including chase

westernunion confirmercan confirm males and females have drops in usa I AM VERIFIED MSG ME

looking good and legit drop from USA for stuff (laptop, mobile phones, TV plasma etc)

Ad Type (Goods)

Per

cen

tag

e o

f L

abe

led

Dat

a Hacked Host Sale (3%)

Mailer Sale (3%)

Scam Page Sale (1.5%)

Email List Sale (2%)

PhishingShopping List

GoodsGoods

courtesy Jason Franklin

Some high bitsSome high bits

• Value of “goodwill data” 87k unique credit cards (w/valid Luhn and BIN #)

» Estimate $427.50 exposure = $37M Declared value of bank accounts = $54M But these are only the public numbers, not trades

• Reputation Few miscreants will deal with unknown buyers/sellers New entrants establish reputation by providing free samples

or services» Post raw credit card, bank account, etc

Poor behavior is systematically reported» #rippers channel

Leads to many questions…Leads to many questions…

• Vertical integration vs open markets? How much is each? How much transparency?

• Who dominates market volume? A small number of bigger players? A large number of small players?

• What dominates value creation in each segment?• Can we use market data to directly value threat risk?• Where are the bottlenecks?

Cashout? Market friction (reputation issues) Which bottlenecks amenable to technical means vs economic

means/state power.

• All unknown… and fairly critical

Vertical market segment:Vertical market segment:Spam-based marketingSpam-based marketing

• 100B+ spam e-mails sent per day [Ironport] Most focused on product/service advertising Some as vector for malware, etc. >$1B in direct costs [IDC], larger indirect costs 10-100x

• Range of enablers Botnet-based mail delivery, spaming software, address list,

redirection infrastructure, hosting infrastructure, payment processing, fulfillment

• Direct marketing business model Cost of delivery < marginal revenue * conversion rate Only works because someone is buying?

• Very little empirical data on any of this…

24

Courtesy Stuart Brownmodernlifisrubbish.co.uk

Anatomy of a modern pharma Anatomy of a modern pharma spam campaignspam campaign

Andreson, Fleizach, Savage and Voelker, Spamscatter: Characterizing Internet Scam hosting Infrastrcuture, USENIX Security 2007.

SpamscatterSpamscatter

• Goal: Measure and analyze Internet scam hosting infrastructure

• Mine spam for URLs to scam sites hosting ad Probe machines hosting the scams over time Follow all redirections (separate redirection infrastructure from

hosting infrastructure) Render pages and cluster sites based on image similarity

(image shingling)

Spam Campaign LifetimeSpam Campaign LifetimeHow long do spam campaigns last for a scam?How long do spam campaigns last for a scam?

Spam campaigns relatively short

88% last < 20 hours 8% > 2 days

On average... 12 hours of spam Scam site up 1 week

April 21, 2023

< 20 hours

< 2 days

Scam Lifetime & StabilityScam Lifetime & StabilityHow long are scams active, and how reliable are the hosts?How long are scams active, and how reliable are the hosts?

• Scam sites long-lived 50+% lifetime as long as

probe time (1 week)

• Multiple hosts extend scam lifetime

• Web servers and hosts have same lifetime

Hosts likely blocked

• Overall availability high 97% downloads

successful

Shared InfrastructureShared InfrastructureTo what extent do multiple scams share infrastructure?To what extent do multiple scams share infrastructure?

• Substantial sharing

38% of scams share IP with another scam

10 IPs hosted 10 or more scams

• Reasons? Same scammer,

multiple scams Or, sites rented to

multiple scammers...

Looking inside Looking inside spam campaignsspam campaigns

• Virtually all analysis of spam is from standpoint of recipient How many received, from whom, content of msg, etc?

• We really care much more about standpoint of spammer How many sent, how many delivered, to whom, for how long, sent

how, what kind of countermeasures, how many site visits in response, how many conversions, how much cost, how much revenue?

But generally not visible, except to spammer

• Approach: botnet infiltration Spam sent via botnets, botnets have trust problem wrt

compromised hosts Instrumented botnet host offers window into spam operations

30

StormStorm

• Storm is a well-known peer-to-peer botnet• Storm has a hierarchical architecture

Workers perform tasks (send spam, launch DDoS attacks, etc.) Proxies organize workers, connect to HTTP proxies Master servers controlled directly by botmaster

• Workers and proxies are compromised hosts (bots) Use a Distributed Hash Table protocol (Overnet) for rendezvous Roughly 20,000 actives bots at any time in April [Kanich08]

• Master servers run in “bullet-proof” hosting centers Communicate with proxies and workers via command and

control (C&C) protocol over TCP

Spamalytics 31Kanich, Levchenko, Enright, Voelker and Savage, The Heisenbot Uncertainty Problem: Challenges in Separating Bots from Chaff, LEET 2008.

Storm architectureStorm architecture

32

Dr. Evil

Masterservers

Proxybots

Workerbots

Storm spam campaignsStorm spam campaigns

Workers request “updates” to send spam [Kreibich08] Dictionaries: names, domains, URLs, etc. Email templates for producing polymorphic spam

» Macros instantiate fields: %^Fdomains^% from domains dict Lists of target email addresses (batches of 500-1000 at a time)

Workers immediately act on these updates Create a unique message for each email address Send the message to the target Report the results (success, failure) back to proxies Send harvested e-mail addresses

Many campaign types Self-propagation malware, pharmaceutical, stocks, phishing, …

33Kreibich, Kanich, Levchenko, Enright, Voelker, Paxson and Savage, On the Spam Campaign Trail, LEET 2008.

Storm templatesStorm templates

Example Storm spam template and instantiation

34

Macro expansion to insert target email address

Received: from %^C0%^P%^R2-6^%:qwertyuiopasdfghjklzxcvbnm^%.%^P%^R2-6^%:qwertyuiopasdfghjklzxcvbnm^%^% ([%^C6%Î^%.%Î^%.%Î^%.%Î^%^%]) by %Â^% with Microsoft SMTPSVC(%^Fsvcver^%); %^D^%From: <%^Fnames^%@%^Fdomains^%>To: <%^0^%>Subject: Say hello to bluepill!<%^Fpharma_links^%>

Received: from %^C0%^P%^R2-6^%:qwertyuiopasdfghjklzxcvbnm^%.%^P%^R2-6^%:qwertyuiopasdfghjklzxcvbnm^%^% ([%^C6%Î^%.%Î^%.%Î^%.%Î^%^%]) by %Â^% with Microsoft SMTPSVC(%^Fsvcver^%); %^D^%From: <%^Fnames^%@%^Fdomains^%>To: <%^0^%>Subject: Say hello to bluepill!<%^Fpharma_links^%>

Received: from auz.xwzww ([132.233.197.74]) by dsl-189-188-79-63.prod-infinitum.com.mx with Microsoft SMTPSVC(5.0.2195.6713); Wed, 6 Feb 2008 16:33:44 -0800From: <[email protected]>To: <[email protected]> Subject: Say hello to bluepill!spammerdomain2.com

Received: from auz.xwzww ([132.233.197.74]) by dsl-189-188-79-63.prod-infinitum.com.mx with Microsoft SMTPSVC(5.0.2195.6713); Wed, 6 Feb 2008 16:33:44 -0800From: <[email protected]>To: <[email protected]>Subject: Say hello to bluepill!spammerdomain1.com

Received: from auz.xwzww ([132.233.197.74]) by dsl-189-188-79-63.prod-infinitum.com.mx with Microsoft SMTPSVC(5.0.2195.6713); Wed, 6 Feb 2008 16:33:44 -0800From: <[email protected]>To: [email protected]: Say hello to bluepill!spammerdomain2.com

Storm in actionStorm in action

1224704030~!pharma_links~!spammerdomain1.comspammerdomain2.comspammerdomain3.com…

1224720409~!names~!eduardorafaelkatierachrisjohnny…

[email protected]@[email protected]@icir.org...

35

Received: from dkjs.sgdsz ([132.233.197.74]) by dsl-189-188-79-63.prod-infinitum.com.mx with Microsoft SMTPSVC(5.0.2195.6713); Wed, 6 Feb 2008 16:33:44 -0800From: <[email protected]>To: <[email protected]>Subject: Say hello to bluepill!spammerdomain3.com

mailto:[email protected]

Data Collection: Data Collection: C&C CrawlerC&C Crawler

Data Collection: Proxy Data Collection: Proxy OperationOperation

@@

@@@@

@@

Data Collection: SummaryData Collection: Summary

• Crawler-based dataset Nov 20 2007 – Nov 11 2008 492,491 C&C requests (to 2,794 proxies) 536,607 templates (23% unique)

• Proxy dataset March 9 2008 – April 02 2008 94,335 workers 813,655 templates (52% unique) 1,212,971 harvested addresses (49% unique)

• Harvest injection dataset April 26 2008 – May 6 2008 1,820,360 harvested addresses (50% unique) 87,846 marker addresses injected 1,957 markers targeted (2.2%) 1,017 spams delivered to markers

Kreibich, Kanich, Levchenko, Enright, Voelker, Paxson and Savage, Spamcraft: An Inside LookAt Spam Campaign Orchestration, LEET 2009.

Who gets spammed?Who gets spammed?

39

Campaigns: The Big PictureCampaigns: The Big Picture

Long campaignsuse few types

Stock scams took a break

Others don't last,but have many types(types ~ instances)

Domain Use & UsabilityDomain Use & Usability• JwSpamSpy

• 557 pharma 2LDs, 94% on blacklist

• Average use 5.6 days

• Shortest use is single dictionary

• Longest is 86 days

• 12.9 domains per hour

• Registration -> use: 21 days

• Use -> block: 18 minutes

Registrations in batchesused at the same time

Domains are abandonedafter being blocked

No more .cn, shorter timeto use, longer use

Address SourcingAddress Sourcing

• 10,000 addresses sampled from harvests and target lists• Web-searches on Google• Only available on infected machines:

76% of harvested addresses 87% of targeted addresses

• Web crawling for addresses unlikely

Affiliate linkageAffiliate linkage

• Evidence of pharma affiliate scheme Web server error message leaked into dictionaries 21 days Nov 20 2007 – Feb 11 2008

<div style="padding-left:165px;padding-top:40px;"><img src="img/logo.gif" border="0" alt="Spamit.com"></div>

<div style="padding-bottom:3px;padding-top:26px; font-size: 14px;"><br /><strong>The system is temporary busy, try to access it later. No data can be lost.</strong></div>

<div>Copyright © SpamIt.com 2007, All rights reserved.</div>

Estimating spam profitsEstimating spam profits

• Key basic inequality:

(Delivery Cost) < (Conversion Rate) x (Marginal Revenue)

• We have some handle on two of these Delivery cost to send spam

» Outsourced cost: retail purchase price < $70/M addrs» In-house cost: development/management labor

Marginal revenue

» Average pharma sale of $100, affiliate commissions ≈ 50%

• Conversion rate is hard to measure directly• We provided first empirical measurement of conversion• By rewriting requests sent through proxies under our control

44

spammerdomain.com

spammerdomain2.com

spammerdomain3.com

Modifying template linksModifying template links

newdomain1.com

newdomain2.com

newdomain3.com

Received: from dkjs.sgdsz ([132.233.197.74]) by dsl-189-188-79-63.prod-infinitum.com.mx with Microsoft SMTPSVC(5.0.2195.6713); Wed, 6 Feb 2008 16:33:44 -0800From: <[email protected]>To: <[email protected]>Subject: Say hello to bluepill!spammerdomain3.com

Received: from dkjs.sgdsz ([132.233.197.74]) by dsl-189-188-79-63.prod-infinitum.com.mx with Microsoft SMTPSVC(5.0.2195.6713); Wed, 6 Feb 2008 16:33:44 -0800From: <[email protected]>To: <[email protected]>Subject: Say hello to bluepill!newdomain2.com

• Create two sites that mirror actual sites in spam E-card (self-propagation) and pharmaceutical Replace dictionaries with URLs to our sites

• E-card (self-prop) site Link to benign executable that POSTs to our server Log all POSTs to track downloads and executions

• Pharma site Log all accesses up through clicks on “purchase” Track the contents of shopping carts

• Strive for verisimilitude to remove bias (spam filtering) Site content is similar, URLs have same format as originals, …

Measuring click-throughMeasuring click-through

46

Measuring DeliveryMeasuring Delivery

• Create various test email accounts At Web mail providers: Hotmail, Yahoo!, Gmail Behind a commercial spam filtering appliance As SMTP sinks: accept every message delivered

• Put email addresses in Storm target delivery lists

• Log all emails delivered to these addresses Both labeled as spam (“Junk E-mail”) and in inbox

47

http://en.wikipedia.org/wiki/Image:Wlhlogo.png

Ethical contextEthical context

• Consequentialism• First, do no harm (users no worse off than before)

We do not send any spam» Proxies are relays, worker bots send spam

We do not enable additional spam to be sent» Workers would have connected to some other proxy

We do not enable spam to be sent to additional users» Users are already on target lists, only add control addresses

• Second, reduce harm where possible Our pharma sites don’t take credit card info Our e-card sites don’t export malicious code

48

Legal contextLegal context

• Warning: IANAL• CAN*SPAM

• Subject to strong definition of “initiator”; we don’t fit it

• ECPA• Our proxy is directly addressed by worker bots

(“party to” communication carve out)

• CFAA• We do not contact worker bots, they contact us

(“unauthorized access”?)• We do not cause any information to be extracted or any

fundamentally new activity to take place • Hard to find a good theory of damages

(functionally indistinguishable -- consequentialism)49

But…But…

• In this kind of work there is little precedent• No agency to get permission; no way to get indemnity• Lawyers tend to say “I believe this activity has low risk

of…”• We worked with two different lawyers to make sure

• Thus, we communicate our activities to a lot of people• Security researchers in industry, academia• Affected network operators/registrars• Law enforcement• FTC

50

Spam pipelineSpam pipeline

51

83.6 M

347.5M

21.1M (25%)

82.7M (24%)

3,827 (0.005%)

10,522 (0.003%)

316 (0.00037%)

28 (0.000008%)

---

Pharma: 12 M spam emails for one “purchase”Pharma: 12 M spam emails for one “purchase”

Sent MTA Visits ConversionsInbox

40.1 M 10.1M (25%) 2,721 (0.005%) 225 (0.00056%)

E-card: 1 in 10 visitors execute the binaryE-card: 1 in 10 visitors execute the binary

Spam filtering software• The fraction of spam delivered into user inboxes

depends on the spam filtering software used Combination of site filtering (e.g., blacklists) and

content filtering (e.g., spamassassin)

• Difficult to generalize, but we can use our test accounts for specific services

Fraction of spam sent that was delivered to inboxes

Effects of Blacklisting (CBL Feed)

Unused

Effective

Other filtering

Response rates by country

Two orders of magnitude

No large aberrations based on email topic

The spammer’s bottom lineThe spammer’s bottom line

• Recall that we tracked the contents of shopping carts• Using the prices on the actual site, we can estimate the

value of the purchases 28 purchases for $2,731 over 25 days, or $100/day ($140 active)

• We only interposed on a fraction of the workers Connected to approx 1.5% of workers Back-of-the-envelope (be very careful)

$7-10k/day for all, or ~$3M/year With a 50% affiliate commission, $1.5M/year revenue Not enough to be profitable unless spammer = botnet owner

• For self-propagation Roughly 3-9k new bots/day

52

We’re on the cusp…We’re on the cusp…

• This is a wide open area with huge impact potential• We have tremendous momentum and experience here• Over several years we’ve brokered the commercial

partnerships necessary to do this work (plus fed advice)

• Key agreements in UC: active purchasing experiments

53

Active Research Partnerships Active Data Providers

Going forward…Going forward…

• Epidemiology Characterizing value chain for different scams

» Spammers, botnets, fast flux, affiliates, processing, fulfillment, Mining social network of underground providers Analyzing market enablers (cost structure and characteristics)

» E.g., mules, domain registration, traffic selling, de-CAPTCHA Mapping monetization via financial credential honeytokens Characterization of phishing defense effectiveness Nation-state vs e-crime infrastructure

• Defenses Botnet-driven spam filtering Proactive URL blocking via on-line learning Proactive phishing defense via machine vision

54

Click Trajectory projectClick Trajectory project

• 10,000 foot idea: We’ve gone deep into one spam campaign Like to understand the relationship between all the elements of the

value chain involved across the spam industry

• Value-chain characterization Front end (visible via network)

» Spamming groups» Botnets (& hosters)» Fast flux networks (& hosters/registrars)» Affiliate programs (& hosters)

Back end » Payment processing» Fulfillment

55

Unraveling Unraveling front end value chainfront end value chain

• Expanding honeyfarm to host all major botnets (safely) Log C&C and spam traffic; additional reversing too All URLs tagged and stored in database

• 1st and 3rd spam feeds and bad url feeds (many) URLs into same database (with source tag)

• Crawl all pages, referrers and metadata (DNS, whois)• Database allows direct association of

Distinct scams (Web page matching and text matching) Distinct botnets (via source tag) Distinct fast flux networks (mapped during crawl) Distinct affiliate programs (via both cookies and templates – also

partner infiltrating affiliate programs to validate) Have IP, DNS and registrar data for everything…

56

Unraveling back end Unraveling back end of value chainof value chain

• Purchasing wide range of spam-advertized products (note: actual purchasing not using any NSF money)

Watches Herbal, Pharma (via partner)

• Cluster purchases based on Merchant and processor Packaging (postmark, forensic analysis of paper) Artifacts of manufacturing process (e.g., FT-NIR on drugs,

analysis of movement similarity for watches)

5757

Crawling underground social networks

Underground criminals have implicit social network Who offers which services, who partners with whom, etc... Use multiple pseudo-identities, but significant structure still can

be reconstructed manually Goal: build social network via crawling/datamining

Identifiers (ICQ, phone, etc) Web page content, linkage on forum sites (who referenced

whom, etc)

CAPTCHA solving analysis Webmail based spam

Web bots hard to filter; launder reputation of Web mail provider But bots must solve CAPTCHA to create account; key enabler

De-catpcha services ($2/1k solved, 33% margin) Study: purchase solving from range of such services Key questions

Human vs vision-based solving (via error variation) For Humans,

» Native language (language primes)» Size of operation (via queuing)

For computers» Accuracy variation, differential pricing» Capacity

Mule recruitment “Mules” are used to launder money or goods (remailers)

Recruited via spam Building classifier that identifies mule spam

automatically; cluster based on e-mail content and site Engage in automated conversation with e-mail sender Goal:

Infer size of mule operation, turn-over, level of sophistication,changes in demand, etc.

Traffic selling On-line underground market for click traffic (parallel to

Google/Yahoo) For direction to particular scams (e.g. pharma, counterfeits, etc) For use in click fraud/PTC scams

Active purchasing of traffic streams Characterize traffic streams themselves

» Real people, country of origin, time on site, click through, etc» Survey of subset of people (why are you here)

Differential pricing for different click streams

Financial honeytokens Range of scams that steal financial credentials Question: do they share monetization infrastructure?

Money mules, wire cashout, layering via purchase, carding, trading, etc

Methodology: Purposely “lose” financial credentials

» Infostealing malware, phishing site, on open market See how accounts are monetized

» Fingerprinting test transactions» Merchant for large transfers

Exploring solo version and via Partnership with financial servicescompany

Scam domain registration Web-based crime is built on cheap and easy domain

registration, but little understood We now have full feed for .com, .net and .org (others)

Look at pattern of use for scam domains (ala w/Storm) Time to use, length of use, registrar agility, etc Different between FF domains and hosting domains

Mining registrant records Either identify template or tie into social network

Phishing defense valuePhishing defense value

• We have three kinds of phishing defenses Spam filtering: stops subset from getting known e-mails lures Toolbars: stop subset from clicking on a known phishing site Takedown: stop everyone from reaching known phishing site

• But… how much do they each matter (i.e., to the phisher) and which is worth additional investment?

• Dataset Categorize phish e-mail and send through current filters Track current toolbar blacklists Track site lifetime (i.e. takedown) Estimating click through (Taylor webalizer trick, DNS caching)

64

Assessing Attacks Assessing Attacks By Nation-StatesBy Nation-States

• AirJaldi is the ISP for the Nation of Tibet 20,000 users in wireless deployment in Dharamsala (nation-in-exile) Maintains Tibetan nation’s web presence (San Jose)

• At both locations we’ve deployed Bro monitors• Goal: can we observe attacks originating for nation-state

purposes rather than cybercrime? San Jose location has “control”: AirJaldi has non-Tibetan customers

too, can partition address space Control for Dharamsala deployment harder, but working on it … Initial data captured prior to GhostNet story indeed exhibits

GhostNet infections = direct subversion from China

• Meta-question: how much similarity between e-crime/nation-state methods/infrastructure?

Proactive phishing defenseProactive phishing defense

• Virtually all anti-phishing defenses are reactive• Proactive defense via browser-based logo identification

Phishing campaigns all use logos or variations as trust cues

• SIFT feature matching invariant (rotation, shearing, scale)

66

Proactive phishing defenseProactive phishing defense

• Query brand provider (ala SPF for domains) on recognized logo – is IP address authorized to display

• Delay notification until user attempts to enter data

67

Warning: you are attempting to enter data into a site that is not

authorized to use the Bank of America trademark.

It is likely that this isa scam

68

Proactive detection Of Proactive detection Of malicious web sitesmalicious web sites

Predict what is safe without

committing to risky actions

• Safe URL?

• Web exploit?

• Spam-advertised site?

• Phishing site?

URL = Uniform Resource Locator

http://www.cs.mcgill.ca/~icml2009/abstracts.html

http://www.bfuduuioo1fp.mobi/ws/ebayisapi.dll

http://fblight.com

http://mail.ru

Joint work w/Lawrence Saul

http://www.cs.mcgill.ca/~icml2009/abstracts.html

http://www.bfuduuioo1fp.mobi/ws/ebayisapi.dll

http://fblight.com/

69

Problem in a NutshellProblem in a Nutshell URL features to identify malicious Web sites Different classes of URLs

Benign, spam, phishing, exploits, scams... For now, distinguish benign vs. malicious

facebook.com fblight.com

70

Live URL Classification SystemLive URL Classification System

Label Example Hypothesis

71

Feature vector constructionFeature vector constructionhttp://www.bfuduuioo1fp.mobi/ws/ebayisapi.dll

WHOIS registration: 3/25/2009Hosted from 208.78.240.0/22IP hosted in San MateoConnection speed: T1Has DNS PTR record? YesRegistrant “Chad”...

[ _ _ … 0 0 0 1 1 1 … 1 0 1 1 …]Real-valued Host-based Lexical

60+ features 1.8 million 1.1 millionGROWING

72

Which online algorithms?Which online algorithms?

99% accuracy w/on-line classifier

Confidence-Weighted

LR w/ SGD

Perceptron

Meta-points on URL Meta-points on URL classificationclassification

• Two big practical issues for using machine learning Much work doesn’t scale to large-scale problems Batch SVM-type strategies adapt slowly and don’t work well in

practice (adversary just changes from day-to-day)

• We’ve been working closely with a large Web-mail provider on this project

Scales to their problem size Online update adapts quickly Performs better than their current strategy

(they have reimplemented our scheme and tested w/live data)

73

• Observations– Modest number of bots send most spam– Virtually all bots use templates with simple rules to

describe polymorphism

– Templates+dictionaries ≈ regex describing spam to be

generated

– If we can extract or infer these from the botnets, we have a perfect filter for all the spam generated by the botnet

– Very specific filters, extremely low FP risk

Bot-based spam filter generationBot-based spam filter generation

http://www.marshal.com/trace/spam_statistics.asp

random letters and numbers

phrases from a dictionary

http://www.marshal.com/trace/spam_statistics.asp

Full automated algorithmAlmost perfect in testing

(~0 false positives, very few false negatives)Exploring live testing

SummarySummary

• We think that the economic structures underlying e-crime are far weaker than their technical vulnerabilities

• Quantitative empirical data is key both for driving technical innovations and policy

• We think we’re uniquely positioned to do this work

76

Questions?Questions?

Yahoo! 77

Collaborative Center for Internet Epidemiology and Defenses

http://ccied.org

1 CCIED Looking Forward. Context After four years… u 50+ papers, multiple awards, significant...

Documents

Transcript of 1 CCIED Looking Forward. Context After four years… u 50+ papers, multiple awards, significant...