Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection?...

126
Introduction to Anomaly Detection Chao Lan Presented at the summer camp of RAMPE II: Cybersecurity and Internet of Things, University of Wyoming, 2018.

Transcript of Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection?...

Page 1: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Introduction to Anomaly Detection

Chao Lan

Presented at the summer camp of RAMPE II: Cybersecurity and Internet of Things, University of Wyoming, 2018.

Page 2: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

OutlineBackground

Learning-based Detection Approaches

Evaluation Metrics

Challenges

Page 3: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

OutlineBackground

- what is anomaly detection and what are their applications?- why do we need computer to help anomaly detection?- why do we want machine learning to help design detection rule?

Learning-based Detection Approaches

Evaluation Metrics

Challenges

Page 4: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

What is anomaly detection?

“Anomaly detection refers to the problem of finding patterns

in data that do not conform to expected behavior.”

Chandola et al. Anomaly detection: A survey. ACM Computing Surveys, 2009.

Page 5: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

http://www.svcl.ucsd.edu/projects/anomaly/

Page 6: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Network Anomaly Detection – Do We Know What to Detect? 2013

Page 7: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Fraud Prevention with Neo4j: A 5-Minute Overview, 2017

Page 8: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do
Page 9: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do
Page 10: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do
Page 11: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do
Page 12: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do
Page 13: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do
Page 14: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do
Page 15: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Fujitsu Develops Traffic-Video-Analysis Technology Based on Image Recognition and Machine Learning, 2016.

Page 16: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Early detection of at-risk students using machine learning based on LMS log data. 2017.

Page 17: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do
Page 18: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Exercise: how to teach computer to detect spams?

Page 19: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

An Example Spam Email on Google Lottery

Page 20: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Let me design & program some “rules” in computer!

Page 21: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Rule 1: Email with “lottery” is a spam.

Page 22: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

What about this warning email?

Page 23: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Rule 2: Email containing “million” is a spam.

Page 24: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

What about this UW email?

Page 25: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do
Page 26: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do
Page 27: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

ML Solution: learn detection rules from example emails.

spam

normal

Page 28: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

QuizQ1: what are the applications of anomaly detection?

Q2: why do we need computers to help detect anomalies?

Q3: what’s wrong with handcrafted detection rules?

Page 29: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

QuizQ1: what are the applications of anomaly detection?

A1: surveillance, cyber-security, fraud transaction, health-care, education, etc

Q2: why do we need computers to help detect anomalies?

A2: massive amount of data makes manual detection inefficient (or, impossible)

Q3: what’s wrong with handcrafted detection rules?

A3: hard to design (need domain knowledge) and generalize

Page 30: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

OutlineBackground

Learning-based Detection Approaches - preliminary: data representation and visualization- six common anomaly detection approaches

Evaluation Metrics

Challenges

Page 31: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Preliminary 1: Data Representation An example email is often represented by a vector (feature vector).

x =

google lotterycatemailtransportpandamillion ..

=

1101001..

Above example vector is called “bag-of-words” feature representation of a document.

Page 32: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Concepts: Feature, Label, InstanceEach element in the vector is a feature/attribute.

x =

google lotterycatemailtransportpandamillion ..

=

1101001..

Page 33: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

The target variable we want to detect is label. (different tasks have different labels)

spam

normal

Concepts: Feature, Label, Instance

Page 34: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

In summary, an example email (or, an instance) is a pair of feature vector & label.

x1 =

1101001..

, spam x2 =

1011010..

, ham

This is a most common representation of an example. There are, of course, more complicated representations.

Concepts: Feature, Label, Instance

Page 35: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Other Examples of Feature Vector Representation Image data represented as a vector.

.

.

.

Page 36: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Other Examples of Feature Vector Representation Student data represented as a vector.

# Steal

# Lie/Cheat

# Behavior Pro

# Peer Rej

.

.

.

=

0

1

2

1

.

.

.

Page 37: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

We will repeatedly see example & label notations.

Page 38: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Preliminary 2: Data Visualization An example is a vector in a high dimensional space (feature space).

For easier interpretation, we often visualize examples in a 2D space.

x =

google lotterycatemailtransportpandamillion ..

=

1101001..

feature 1

feature 2

Page 39: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Two Common Strategies to Get 2D Space 1. Select two features from the pool (feature selection)

2. Project all features onto two new features (feature transformation)

x = =

feature 1

feature 2

google lotterycatemailtransportpandamillion ..

1101001..

Page 40: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

1101

Feature Projection We can project all features on to a new feature using a projective vector w.

Projection on to the new feature is obtained by inner product between w and x.

wT * x = 0.3, -1.2, 0.8, 0.23 * = 0.3 - 1.2 + 0 + 0.23 = -0.67 new feature

Page 41: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Feature Projection Two get two new features, we need to projective vectors w1 and w2.

feature 1 = w1T * x

feature 2 = w2T * x

Page 42: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Get Projective Vectors using PCA Principal Component Analysis (PCA) is commonly used to get projective vectors.

https://qiita.com/bmj0114/items/db9145a707cb6ed13201

w2 w1

Page 43: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

We will repeatedly see data distribution in 2D feature space (by PCA).

Page 44: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

QuizRecap: to design a spam email detection model, we can design label as

- y = 1 for spam, y = 0 for ham

Q1: to design a fraud transaction detection model, how to design label?

Q2: to design an at-risk student detection model, how to design label?

Page 45: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

QuizRecap: to design a spam email detection model, we can design label as

- y = 1 for spam, y = 0 for ham

Q1: to design a fraud transaction detection model, how to design label?

A1: y = 1 for fraud, y = 0 for normal transaction

Q2: to design an at-risk student detection model, how to design label?

A2: y = 1 for at-risk student, y = 0 for normal student

Page 46: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

OutlineBackground

Learning-based Detection Approaches - preliminary: data representation and visualization- six common anomaly detection approaches

Evaluation Metrics

Open Challenges

Page 47: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Learning-based Anomaly Detection Approaches Classification-based

Clustering-based

Support Vector Data Descriptor (SVDD)

Statistics-based

Neighborhood-based

Spectral-based

Page 48: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

1. Classification-based Approach Learn a detection model to classify emails into spam and ham (i.e. normal email).

model f

spam

ham

email

Page 49: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

model f

How to learn model f ?

spam

ham

email

Step 1. construct a model f with some unknown parameters.

Step 2. estimate the parameters from data

Page 50: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Example: learn a linear regression model Step 1. Construct a linear regression model

- x·1 and x·2 are two features of example x (e.g. words “google” and “cat”)

- w0, w1, w2 are unknown parameters (w0 is called bias)

Page 51: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Example: learn a linear regression model Step 2. Estimate w0, w1, w2 from examples x1, x2, x3, …, xn by solving

- xi is the ith example (e.g. the ith email)

- yi is the label of xi, and yi= 0 (ham) or 1 (spam)

Page 52: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Example: learn a linear regression model The solution is where

Page 53: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

A new email x = [x.1,x.2] is first input to the model

The result is then thresholded (by a proper value such as 0.5)

Example: apply model to classify email

Many models can directly output 0 and 1, so we do not need to threshold their outputs.

Page 54: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Recall: detection rule is

y=1 for spam, y=0 for ham.

QuizIf model has

- w0=0.5, w1=−0.1, w2=0.1

Are the following emails spam or ham?

- x1 = [x·1,x·2]T = [1, 0]T

- x2 = [x·1,x·2]T = [0, 1]T

- x3 = [x·1,x·2]T = [1, 1]T

- x4 = [x·1,x·2]T = [0, 0]T

Page 55: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

QuizIf model has

- w0=0.5, w1=−0.1, w2=0.1

Are the following emails spam or ham?

- x1 = [x·1,x·2]T = [1, 0]T is ham, because f(x) = 0.5 - 0.1*1 + 0.1*0 = 0.4 < 0.5

- x2 = [x·1,x·2]T = [0, 1]T

- x3 = [x·1,x·2]T = [1, 1]T

- x4 = [x·1,x·2]T = [0, 0]T

Recall: detection rule is

y=1 for spam, y=0 for ham.

Page 56: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

QuizIf model has

- w0=0.5, w1=−0.1, w2=0.1

Are the following emails spam or ham?

- x1 = [x·1,x·2]T = [1, 0]T is ham, because f(x) = 0.5 - 0.1*1 + 0.1*0 = 0.4 < 0.5

- x2 = [x·1,x·2]T = [0, 1]T is spam, because f(x) = 0.5 - 0.1*0 + 0.1*1 = 0.6 > 0.5

- x3 = [x·1,x·2]T = [1, 1]T

- x4 = [x·1,x·2]T = [0, 0]T

Recall: detection rule is

y=1 for spam, y=0 for ham.

Page 57: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

QuizIf model has

- w0=0.5, w1=−0.1, w2=0.1

Are the following emails spam or ham?

- x1 = [x·1,x·2]T = [1, 0]T is ham, because f(x) = 0.5 - 0.1*1 + 0.1*0 = 0.4 < 0.5

- x2 = [x·1,x·2]T = [0, 1]T is spam, because f(x) = 0.5 - 0.1*0 + 0.1*1 = 0.6 > 0.5

- x3 = [x·1,x·2]T = [1, 1]T is ham, because f(x) = 0.5 - 0.1*0 + 0.1*0 = 0.5 ≤ 0.5

- x4 = [x·1,x·2]T = [0, 0]T

Recall: detection rule is

y=1 for spam, y=0 for ham.

Page 58: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

QuizIf model has

- w0=0.5, w1=−0.1, w2=0.1

Are the following emails spam or ham?

- x1 = [x·1,x·2]T = [1, 0]T is ham, because f(x) = 0.5 - 0.1*1 + 0.1*0 = 0.4 < 0.5

- x2 = [x·1,x·2]T = [0, 1]T is spam, because f(x) = 0.5 - 0.1*0 + 0.1*1 = 0.6 > 0.5

- x3 = [x·1,x·2]T = [1, 1]T is ham, because f(x) = 0.5 - 0.1*0 + 0.1*0 = 0.5 ≤ 0.5

- x4 = [x·1,x·2]T = [0, 0]T is ham, because f(x) = 0.5 - 0.1*1 + 0.1*1 = 0.5 ≤ 0.5

Recall: detection rule is

y=1 for spam, y=0 for ham.

Page 59: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Learning-based Anomaly Detection Approaches Classification-based

Clustering-based

Support Vector Data Descriptor (SVDD)

Statistics-based

Neighborhood-based

Spectral-based

Page 60: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

2. Clustering-based Approach Group examples into clusters. Assume those far from their cluster centers are

more likely to be anomalie.

Page 61: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Detection based on Anomalous Score Algorithm output anomalous score of an example, which indicates how likely the example is an anomaly. We can then threshold the scores to get final detection.

a.s. = 0.8

a.s. = 0.4

a.s. = 0.1

Page 62: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

How to cluster examples? K-means is a most common clustering algorithm among others.

- choose a number of clusters k (e.g. k=3)

- initialize k cluster centers (randomly)

- repeat until convergence

- assign every example to its nearest cluster (nearest to cluster center)

- update cluster center to means of its member examples

Page 63: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

A Demo of K-means Clustering Algorithm

Page 64: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Quiz (apply K-means Clustering with k=2) Which example has the highest anomalous score? Which has the lowest?

x1

x2

x3

Page 65: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Quiz (apply K-means Clustering with k=2) A: first, the k-means clustering result is roughly as follows

x1

x2

x3

Page 66: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Quiz (apply K-means Clustering with k=2) A: based on clustering result, x1 is the farthest from its center so it has the highest anomalous score. And x3 is the closest to its center so it has the lowest score.

x1

x2

x3

Page 67: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Learning-based Anomaly Detection Approaches Classification-based

Clustering-based

Support Vector Data Descriptor (SVDD)

Statistics-based

Neighborhood-based

Spectral-based

Page 68: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

3. Support Vector Data Descriptor (SVDD)Learn a (smallest) normal region that encompasses all normal examples. Assume whatever falls outside the region is anomaly.

Page 69: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

minimize

s.t.

Mathematical Model of One-Class SVMFirst, assume a sphere with radius R encompasses all normal examples.

- distance from normal example to normal region center is less than R

Page 70: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

min

Mathematical Model of One-Class SVMThen, find the center and smallest radius of such a sphere.

- find sphere center and minimize sphere radius

s.t.

Page 71: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

QuizIf is normal example. Which examples will be detected as anomalies by SVDD?

AB

C

Page 72: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

QuizA: a normal region roughly looks like below. B & C are outside so are anomalies.

AB

C

Page 73: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Learning-based Anomaly Detection Approaches Classification-based

Clustering-based

Support Vector Data Descriptor (SVDD)

Statistics-based

Neighborhood-based

Spectral-based

Page 74: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

4. Statistics-based Approach Estimate a distribution over examples. Assume those drawn from the distribution with lower probability are more likely to be anomalies.

Page 75: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Exercise

Student # Attendance

John 3

Nancy 2

Sam 2

Richard 1

Lily 3

p(x=3) =

p(x=2) =

p(x=1) =

What are the probabilities a student attend class for 1, 2, 3 times?

Page 76: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Exercise

Student # Attendance

John 3

Nancy 2

Sam 2

Richard 1

Lily 3

p(x=3) = 2 / 5 = 0.4

p(x=2) =

p(x=1) =

We can estimate probabilities by counting frequencies.

Page 77: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Exercise

Student # Attendance

John 3

Nancy 2

Sam 2

Richard 1

Lily 3

p(x=3) = 2 / 5 = 0.4

p(x=2) = 2 / 5 = 0.4

p(x=1) =

We can estimate probabilities by counting frequencies.

Page 78: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Exercise

Student # Attendance

John 3

Nancy 2

Sam 2

Richard 1

Lily 3

p(x=3) = 2 / 5 = 0.4

p(x=2) = 2 / 5 = 0.4

p(x=1) = 1 / 5 = 0.2

We can estimate probabilities by counting frequencies.

Page 79: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Exercise

Student # Attendance

John 3

Nancy 2

Sam 2

Richard 1

Lily 3

p(x=3) = 2 / 5 = 0.4

p(x=2) = 2 / 5 = 0.4

p(x=1) = 1 / 5 = 0.2

Richard is more likely to be an abnormal (at-risk) student because he attends class 1 time, and p(x=1)=0.1 is way smaller than the other probabilities.

Page 80: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Quiz Which student is most likely at-risk according to statistics-based approach?

- let x be # peer rejection

Student John Lily Sam Nancy Green Susan Peter Rose Jack Lucy

x 0 1 0 2 1 0 3 1 2 0

Page 81: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Quiz Which student is most likely at-risk according to statistics-based approach?

- let x be # peer rejection

Student John Lily Sam Nancy Green Susan Peter Rose Jack Lucy

x 0 1 0 2 1 0 3 1 2 0

p(x=0) = 4/10 = 0.4

p(x=1) = 3/10 = 0.3

p(x=2) = 2/10 = 0.2

p(x=3) = 1/10 = 0.1, lowest probability, Peter has x=1 so he is most likely at-risk

Page 82: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Learning-based Anomaly Detection Approaches Classification-based

Clustering-based

Support Vector Data Descriptor (SVDD)

Statistics-based

Neighborhood-based

Spectral-based

Page 83: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

5. Neighborhood-based Approach Assume examples far from their neighbors are more likely to be anomalies.

Page 84: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Example: 2-nearest neighbor based approachOnly consider two nearest neighbors of examples.

A

B C

D1

1

1

2

2

Page 85: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Example: 2-nearest neighbor based approachTotal distance from A to its two nearest neighbors (B, C) are 1 + 1 = 2

A

B C

D1

1

1

2

2

Example Distance

A 2

B

C

D

Page 86: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Example: 2-nearest neighbor based approachTotal distance from B to its two nearest neighbors (A, C) are 1 + 1 = 2

A

B C

D1

1

1

2

2

Example Distance

A 2

B 2

C

D

Page 87: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Example: 2-nearest neighbor based approachTotal distance from C to its two nearest neighbors (A, B) are 1 + 1 = 2

A

B C

D1

1

1

2

2

Example Distance

A 2

B 2

C 2

D

Page 88: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Example: 2-nearest neighbor based approachTotal distance from D to its two nearest neighbors (A, C) are 2 + 2 = 4

A

B C

D1

1

1

2

2

Example Distance

A 2

B 2

C 2

D 4

Page 89: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Example: 2-nearest neighbor based approachD is more likely to be an anomaly because it has the largest distance to neighbors.

A

B C

D1

1

1

2

2

Example Distance

A 2

B 2

C 2

D 4

Page 90: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Quiz Which example is most likely an anomaly based on 2-nearest neighbor approach?

A

C D

B

1

0.5

1

1

1.5

Example Distance

A

B

C

D

Page 91: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Quiz A: B is most likely an anomaly.

A

C D

B

1

0.5

1

1

1.5

Example Distance

A 2

B 2.5

C 1.5

D 1.5

Page 92: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Learning-based Anomaly Detection Approaches Classification-based

Clustering-based

Support Vector Data Descriptor (SVDD)

Statistics-based

Neighborhood-based

Spectral-based

Page 93: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

6. Spectral-based Approach Assume normal examples lie in a low dimensional feature space so can be well-reconstructed from that space. Anomalies are not.

Page 94: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

3.20.2

Example Project original feature vector into 2D space and reconstruct it.

0.91.10.10.9

Projection can be done by taking inner product between the feature vector with a projective vector.

1101

Page 95: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

0.1-0.1-0.10.1

Example Reconstruction error can be used as an anomalous score.

1101

0.91.10.10.9

- = error = 0.12 + (-0.1)2 + (-0.1)2 + 0.12 = 0.04

Page 96: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Find Low-Dimensional Space using PCA Principal Component Analysis (PCA) is commonly used to get projective vectors.

https://qiita.com/bmj0114/items/db9145a707cb6ed13201

w2 w1

Page 97: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Example Result of PCA-based Approach Abnormal network traffic flows have higher reconstruction errors.

Page 98: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

OutlineBackground

Learning-based Detection Approaches

Evaluation Metrics - detection error - f1-score and AUC score

Challenges

Page 99: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Detection Error Detection error of a model is the fraction of its mis-detected examples

- e.g. mis-detect a normal example as anomaly

- e.g. mis-detect an anomaly as normal

Example: if there are 100 testing examples, and 10 of them are mis-detected, the detection error is 10/100 = 0.1.

Page 100: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do
Page 101: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

10 spam emails

990 ham emails

Page 102: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

10 spam emails

990 ham emails

What is the detection error of this model?

normalspam

Page 103: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do
Page 104: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Confusion Matrix

True Positive (TP) False Positive (FP)

False Negative (FN) True Negative (TN)

actual positive (spam) actual negative (ham)

predicted negative

predicted positive

Page 105: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Precision: how many predicted positive are truly positive

Recall: how many actual positive data are predicted positive

F1-Score: harmonic mean of precision and recall

Precision, Recall, F1-Score

Page 106: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

TP = ? FP = ?

FN = ? TN = ?

actual pos (spam)

actual neg (ham)

predicted neg (ham)

predicted pos (spam)

10 spam emails

990 ham emails

Exercise What is the confusion matrix of the detection model?

normal spam

Page 107: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

TP = 0 FP = 0

FN = 10 TN = 990

actual pos (spam)

actual neg (ham)

predicted neg (ham)

predicted pos (spam)

10 spam emails

990 ham emails

Exercise What is the confusion matrix of the detection model?

normal spam

Page 108: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

TP = 0 FP = 0

FN = 10 TN = 990

actual pos (spam)

actual neg (ham)

predicted neg (ham)

predicted pos (spam)

Exercise What are the precision, recall and f1-score?

Precision =

Recall =

F1-Score =

Page 109: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

TP = 0 FP = 0

FN = 10 TN = 990

actual pos (spam)

actual neg (ham)

predicted neg (ham)

predicted pos (spam)

Exercise What are the precision, recall and f1-score?

Precision = = 0 / 0

Recall = = 0/ (0+10)

F1-Score = = ?

Page 110: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Detection by Thresholding Anomalous Score Many anomaly detection models output anomalous scores, and detection results are obtained by thresholding these scores.

Example A. Score Threshold 0.5Detection Result

1 = anomaly 0 = normal

A 0.8 0.8 > 1 1

B 0.3 0.3 < 0.5 0

C 0.6 0.6 > 0.5 1

D 0.2 0.2 < 0.5 0

TP FP

FN TNF1 Score

Page 111: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Exercise What are detection results based on the following thresholds?

Example A. Score Detection Result (Threshold 0.5)

Detection Result (Threshold 0.7)

Detection Result (Threshold 0.25)

A 0.8 1

B 0.3 0

C 0.6 1

D 0.2 0

Page 112: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

ExerciseDifferent thresholds can give different detection results, thus different TP & FP.

Example A. Score Detection Result (Threshold 0.5)

Detection Result (Threshold 0.7)

Detection Result (Threshold 0.25)

A 0.8 1 1 1

B 0.3 0 0 1

C 0.6 1 0 1

D 0.2 0 0 0

Page 113: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

ROC CurveROC curve of a model is its performance under different thresholds.

Each point is result of one threshold.

Page 114: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Area Under Curve (AUC) ScoreAUC score is the area under ROC curve. Good model has higher AUC score.

Page 115: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

SummaryThere are many metrics to evaluate detection performance of a model.

Detection error is most common but has many flaws.

Confusion matrix gives four numbers but hard to compare.

F1-score is a more robust measure but based on a single threshold.

AUC score is a most robust measure that integrates results over many thresholds.

Page 116: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

OutlineBackground

Learning-based Detection Approaches

Evaluation Metrics

Challenges

Page 117: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Challenges in Anomaly Detection Contextual Anomaly Detection

Collective Anomaly Detection

Other Technical Challenges

Page 118: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Contextual Anomaly

Page 119: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Collective Anomaly

Page 120: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Collective Anomaly

Page 121: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Exercise: Any Anomaly?A customer is shopping on Amazon

- object 1: steel ball bearings

Page 122: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Exercise: Any Anomaly? A customer is shopping on Amazon

- object 1: steel ball bearings

- object 2: black powder/charcoal

Page 123: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Exercise: Any Anomaly? A customer is shopping on Amazon

- object 1: steel ball bearings

- object 2: black powder/charcoal

- object 3: battery connectors

Page 124: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Exercise: Any Anomaly? A customer is shopping on Amazon

- object 1: steel ball bearings

- object 2: black powder/charcoal

- object 3: battery connectors

- …

A customer who bought above items together could be a bomb-maker!

Page 125: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Other Technical Challenges

Hard to find a normal region.

Attackers may disguise anomalies.

Normal behavior may evolve over time.

Notion of anomaly is problem-dependent.

Not enough labeled data (especially, anomalous data).

Page 126: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdfWhat is anomaly detection? “Anomaly detection refers to the problem of finding patterns in data that do

Q & A?