Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network...

35
U Kang Introduction to Data Mining Anomaly Detection U Kang Seoul National Univeristy

Transcript of Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network...

Page 1: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

Introduction to Data Mining

Anomaly Detection

U KangSeoul National Univeristy

Page 2: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

In This Lecture

Motivation of anomaly detection

Graph structure based method

Random walk based method

Page 3: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

Outline

Overview

Graph Structure Based Method

Random Walk Based Method

Page 4: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

Data Mining

Data mining: find patterns and anomalies

To spot anomalies, we have to discover patterns

Page 5: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

Data Mining

Data mining: find patterns and anomalies

To spot anomalies, we have to discover patterns

Large datasets reveal patterns/anomalies that may be invisible otherwise…

Page 6: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

Anomaly Detection

Anomaly detection

Find suspicious data points which deviate significantly from normal data

Anomaly detection in graph

Find “strange” node in graph

Page 7: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

Anomaly Detection

Applications

Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.)

Call network : find heavy telemarketer

Social network : spot people adding friends indiscriminately in “popularity contest”

Credit card fraud

(the list continues..)

Page 8: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

Anomaly Detection

More Applications

Campaign donation irregularity

Extremely cross-disciplinary authors in an author-paper graph

Electronic auction fraud

Page 9: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

Plan

We will look at two methods for anomaly detection in graphs

Graph Structure Based Method

Random Walk Based Method

Page 10: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

Outline

Overview

Graph Structure Based Method

Random Walk Based Method

L. Akoglu, M. McGlohon, C. Faloutsos. OddBall: Spotting Anomalies in Weighted

Graphs. PAKDD, 2012

Page 11: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

Problem Definition

Given: a weighted and unlabeled graph,

Q1: how can we spot strange, abnormal, extreme nodes?

Q2 : how can we explain why the spotted nodes are anomalous?

Page 12: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

OddBall: approach

For each node

Extract “ego-net” (=1 step neighborhood)

Extract features (#edges, total weight, etc.)

Features that could yield “laws”

Features fast to compute and interpret

Detect patterns

Regularities

Detect anomalies

Deviate significantly

from patterns

Page 13: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

What is Odd?

Page 14: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

Main Idea

For each egonet, extract features

Find “rules” in features

Anomalies deviate significantly from the rules

Page 15: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

Which Features?

Ni : # of neighbors (degree) of ego i

Ei : # of edges in egonet i

Wi : total weight of egonet i

λw,i : principal eigenvalue of the weighted adjacency matrix of egonet i

Page 16: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

Why Principal Eigenvalue?

Page 17: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

OddBall: pattern #1

Page 18: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

OddBall: pattern #2

Page 19: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

OddBall: pattern #3

Page 20: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

OddBall: anomaly detection

(e.g. LOF)

Page 21: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

OddBall: datasets

Page 22: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

OddBall at work (Posts)

Page 23: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

OddBall at work (FEC)

Page 24: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

OddBall at work (DBLP)

Page 25: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

Outline

Overview

Graph Structure Based Method

Random Walk Based Method

J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos. Neighborhood formation and anomaly

detection in bipartite graphs. ICDM, 2005

Page 26: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

Anomalies in Bipartite Graphs

Page 27: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

Examples of Bipartite Graphs

Publication network

Author-paper

P2P network

User-file

Recommendation

User-product

Stock market

Stock-trader

Page 28: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

1) Neighborhood Formulation

Main idea

Compute the Random Walk with Restart score from query node q

Steady state probability = relevance

Page 29: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

1) Neighborhood Formulation

Exact Neighborhood Formulation (NF)

Exact RWR score

Approximate NF

Partition the original graph into pieces by METIS

Compute similarities only on the partition containing the query node

Page 30: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

2) Anomaly Detection

Main idea: to compute anomaly score of t

Compute pairwise “relevance” scores for the neighbors of t

Compute mean of the relevance scores

Page 31: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

Experiment

Dataset:

DBLP Conf-Auth

DBLP Author-Paper

IMDB movie-actor

Questions:

Q1) What are the discoveries?

Q2) Anomaly detection quality?

Page 32: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

1) NF discovery

Page 33: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

2) Anomaly Detection Quality

Setting: injected 100 random nodes connecting high degree nodes

Page 34: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

What You Need to Know

Anomaly detection

Find suspicious data points which deviate significantly from normal data

Anomaly detection in graphs

Graph Structure Based Method

Random Walk Based Method

Neighborhood Formulation (NF)

Anomaly detection using NF

Page 35: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call

U Kang

Questions?