BOTNET JUDO Fighting Spam with Itself By: Pitsillidis, Levchenko, Kreibich, Kanich, Voelker, Paxson,...

BOTNET JUDOFighting Spam with Itself

By:

Pitsillidis, Levchenko, Kreibich, Kanich, Voelker, Paxson, Weaver, and Savage

Presentation by:

Heath Carroll

The Origins of Spam

QuickTime™ and a decompressor

are needed to see this picture.

Presentation Overview• Abstract - What was the intent of the

paper?• Introduction - current problems faced

and methods used to combat them• Background - Def: Botnet, Regular

Expression, Template-based Spam• Approach - How the authors dealt with

this problem

Abstract• Botnet Judo: Fighting Spam with Itself

or ‘Botnet Host Quarantine: What’d we learn?’

• Examination of a controlled, isolated, Botnet host.

• Quick generation of precise and accurate spam filters with ~ 0 false positives

Introduction : Botnets• Definition: Botnet - a collection of software

agents, or robots, that run autonomously and automatically. The term is most commonly associated with malicious software, but it can also refer to a network of computers using distributed computing software. (en.wikipedia.org/wiki/Botnet)

• Example: DDoS attack against Blue Security, May 2, 2006

Botnets (cont’d)• Common uses of botnets:

– Denial-of-service attacks– Adware– Spyware– Email spam (template, image, etc)– Click fraud– Internet Access number replacement– Fast flux (DNS Url/IP address switching)

SPAM!!– Template Based Spam

• Botnet uses a RE to produce massive amounts of highly varied spam

• Harder to [content] filter initially due to varied message makeup

– Requires defenders to collect ‘suspect’ spam in order to lobby an effective content-based filter

• Harder to [sender] filter due to massive host lists

– Requires defenders to rely on alternative methods to combat the botnet

SPAM!!• Preventative measures:

– Anti-virus software– Passive OS fingerprinting– Network based approaches (nullrouting)– Spam filtering– Directed study

• The last two are covered by this paper

Anti-spam!!• Basically 2 different approaches:

– Content-based :• Filtering based on established heuristics and learning

algorithms focused against specific message features• Can be highly effective (esp against targeted botnets)• Labor intensive to maintain since the basic technique

can be countered by chaff and poisoning attacks • Hard to maintain low false positives from the filter• Blacklisting URLs can also be effective, but needs large

up-to-date white-lists to avoid poisoning– Doesn’t do anything if spam doesn’t utilize URLs

Anti-Spam!! (cont’d)– Sender-based

• Focuses on spam delivery system• Assumes sender of spam is likely to repeat

sending spam, and not likely to send legitimate messages

• Basically works by Blacklisting offending senders after the fact

• Doesn’t work against newest spam• Botnets are an effective work-around since the

controller distributes his spam over a large number of hosts

Anti-Spam!! (cont’d)• Template-based spam filtering:

– Suspected Botnet generated spam is examined and deconstructed into a Regular Expression (RE)

– Works very well against static botnets, but requires a lot of instances of suspected spam to deconstruct

– Useless if controller changes the RE used by the bots

Regular Expressions

Regular Expressions (cont’d)

• Review:

JUDO!!

• Generates regular expression signatures to thwart spam

• Operates by examining the output from quarantined botnet

• Uses template inference algorithm to generate a set of signatures matching all previous messages

JUDO!! (cont’d)

1. Header Filtering2. Anchor identification3. Macro classification• Dictionary• Micro-anchor• Noise

4. Special Tokens5. Signature Update

Second Chance Pre-clustering

Judo - Second Chance Mechanism

• Used to mitigate the effects of a small training buffer

• If a message signature fails to match an existing signature– It is re-checked using only anchors– If matched, signature is updated

Judo - Pre-clustering

• Used to mitigate the effects of overly large training buffers (potentially mixed RE’s)– Skeleton signatures used to sort incoming

messages prior to running Judo on them– Similar to second chance mechanism, but

with a larger allowable anchor size

Experimental Results

• Requirements of a good spam filter:– Safe: does not classify legitimate mail as

spam• Low false positive rate

– Effective: correctly identifies the targeted class of spam

• Low false negative rate

Experimental Results (cont’d)

• Testing: 4 tiers– Signature safety

• Signatures from 3 other tiers run against legitimate mail ‘corpora’ to access false positive rate

• to prevent age bias, they tested the signatures only on the subject and body of the corpora


– Controlled single template inference• Generated 5000 instances of spam from a ‘Storm’ bot from

templates gained through reverse engineering– 1000 for signature generation– 4000 for testing false negative rate– Done for each of 10,676 templates (53,380,000 messages)

• Results:

• Also, at k = 1000 false positive rate = 0% for all sigs


– Controlled multi-template inference• Spam used for testing generated during the

Botlab project at the University of Washington• 4 bots used: 1 each from Mega-D, Pushido,

Rustock, and Srizbi botnets• First million messages from each split into

training and testing sets, then Judo run chronologically on each test message

– True matches determined if a match generated from signature generated from previous test messages

– Otherwise counted as false negative


• Results:

• Only false positives from Rustock bot tests


– Real world deployment:• 2xXarvester + 2xMega-D + 4xRustock +

6xGheg = 14 bots• Messages generated:

• Ran the test as in multi-template runs


• Results:

– Worst Case: Rustock again only source of false positives: 1 in 12,500 messages. All others 0 total false positives in corpora


• Efficiency: Since the goal of the project was an accurate RE generator, efficiency wasn’t a priority – Initial RE generation using buffer size 50

with 6000 character length messages takes about 2 sec using an average desktop circa 2009

– Signature updates at ~ 50-100 ms

Response Time• Based on the message out rate of the bot(s)

generating the spam• May be complicated by the existance of

multiple bots or templates• Bots used in this experiment generated > 100

spam messages per minute.– Since acceptable results from k >= 500, should

only take a few minutes to generate a working signature

Overview• ‘Judo’ is basically a learning spam filter

– Content based– Requires training to produce effective signatures– Safe and Effective (both greater than 99.75%)

• Controlled tests show exceptional results• Simulated real world tests show promise, but

could be worked around by bots that can randomly generate new templates

Any Questions?

BOTNET JUDO Fighting Spam with Itself By: Pitsillidis, Levchenko, Kreibich, Kanich, Voelker, Paxson,...

Documents

Transcript of BOTNET JUDO Fighting Spam with Itself By: Pitsillidis, Levchenko, Kreibich, Kanich, Voelker, Paxson,...