Outlier Selection and One Class Classification by Jeroen Janssens
-
Upload
hakka-labs -
Category
Technology
-
view
140 -
download
1
description
Transcript of Outlier Selection and One Class Classification by Jeroen Janssens
...
Outlier Selection andOne-Class Classification
.
Jeroen Janssens @jeroenhjanssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Overview
• Anomalies and outliers
• Stochastic Outlier Selection
• Experiments and results
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Anomalies and outliers
Outlier Selection and One-Class Classification Jeroen Janssens
..
..
.
.
.
.
.
..
.
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Definition (Anomaly)
An anomaly is an observation or event that deviates qualitatively from what isconsidered to be normal, according to a domain expert.
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Detecting anomalies is important
• Expensive
• Dangerous
• Mess up your model
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Detecting anomalies is important
• Expensive
• Dangerous
• Mess up your model
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Human anomaly detection may suffer from
• Fatigue
• Information overload
• Emotional bias
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Computers work with numbers
.
....6
.8
.10
. 4.
6
.
8
.
10
.
width
.
height
.
. ..apple
. ..orange.
Visualisation
.
width height label8.4 7.3 apple6.7 7.1 orange8.0 6.8 apple7.4 7.2 apple9.6 9.2 orange .
….
….
….
Data points
..
Observations
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Computers work with numbers
.
....6
.8
.10
. 4.
6
.
8
.
10
.
width
.
height
.
. ..apple
. ..orange.
Visualisation
.
width height label8.4 7.3 apple6.7 7.1 orange8.0 6.8 apple7.4 7.2 apple9.6 9.2 orange .
….
….
….
Data points
..
Observations
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Computers work with numbers
.....6
.8
.10
. 4.
6
.
8
.
10
.
width
.
height
.
. ..apple
. ..orange.
Visualisation
.
width height label8.4 7.3 apple6.7 7.1 orange8.0 6.8 apple7.4 7.2 apple9.6 9.2 orange .
….
….
….
Data points
..
Observations
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
From anomaly to outlier
.....
speed over ground
.
rateofturn
.
. ..anomalous vessel
. ..normal vessel
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Definition (Outlier)
An outlier is a data point that deviates quantitatively from the majority of thedata points, according to an outlier-selection algorithm.
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Three standard deviations
.....−3σ
.−2σ
.−1σ
.μ
.1σ
.2σ
.3σ
...
x
.
. ..outlier
. ..inlier
..
.. 3σ..
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
..
Euler diagram
.
Set legend
.
Set notation
.
real-world observations
.
X.X.(a)
.
A
.
X
.
(b)
.
labeled by expert as anomalous
.
labeled by expert as normal
.
A
.
X ∩A
.
(c)
.
data points
.
unrecorded
.
D
.
X ∩D
.
A
.
D
.
X
.(d) .
anomalies represented as data points
. normalities represented as data points.
CA = D ∩A. CN = D ∩A.A . D.
X.
(e)
.
classified by algorithm as an outlier
.
classified by algorithm as an inlier
.
CO
.
CI = D ∩ CO
.
A
.
D
.
CO
.
X
.
(f )
.
hits
.
misses
.
false alarms
.
correct rejects
.
H= CA ∩ CO
.
FA = CN ∩ CO
.
M= CA ∩ CI
.
CR = CN ∩ CI
.
A
.
D
.
CO
.
X
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
..
Euler diagram
.
Set legend
.
Set notation
.
real-world observations
.
X.X.(a) .
A
.
X
.
(b)
.
labeled by expert as anomalous
.
labeled by expert as normal
.
A
.
X ∩A
.
(c)
.
data points
.
unrecorded
.
D
.
X ∩D
.
A
.
D
.
X
.(d) .
anomalies represented as data points
. normalities represented as data points.
CA = D ∩A. CN = D ∩A.A . D.
X.
(e)
.
classified by algorithm as an outlier
.
classified by algorithm as an inlier
.
CO
.
CI = D ∩ CO
.
A
.
D
.
CO
.
X
.
(f )
.
hits
.
misses
.
false alarms
.
correct rejects
.
H= CA ∩ CO
.
FA = CN ∩ CO
.
M= CA ∩ CI
.
CR = CN ∩ CI
.
A
.
D
.
CO
.
X
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
..
Euler diagram
.
Set legend
.
Set notation
.
real-world observations
.
X.X.(a) .
A
.
X
.
(b)
.
labeled by expert as anomalous
.
labeled by expert as normal
.
A
.
X ∩A
.
(c)
.
data points
.
unrecorded
.
D
.
X ∩D
.
A
.
D
.
X
.(d) .
anomalies represented as data points
. normalities represented as data points.
CA = D ∩A. CN = D ∩A.A . D.
X.
(e)
.
classified by algorithm as an outlier
.
classified by algorithm as an inlier
.
CO
.
CI = D ∩ CO
.
A
.
D
.
CO
.
X
.
(f )
.
hits
.
misses
.
false alarms
.
correct rejects
.
H= CA ∩ CO
.
FA = CN ∩ CO
.
M= CA ∩ CI
.
CR = CN ∩ CI
.
A
.
D
.
CO
.
X
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
..
Euler diagram
.
Set legend
.
Set notation
.
real-world observations
.
X.X.(a) .
A
.
X
.
(b)
.
labeled by expert as anomalous
.
labeled by expert as normal
.
A
.
X ∩A
.
(c)
.
data points
.
unrecorded
.
D
.
X ∩D
.
A
.
D
.
X
.(d) .
anomalies represented as data points
. normalities represented as data points.
CA = D ∩A. CN = D ∩A.A . D.
X
.
(e)
.
classified by algorithm as an outlier
.
classified by algorithm as an inlier
.
CO
.
CI = D ∩ CO
.
A
.
D
.
CO
.
X
.
(f )
.
hits
.
misses
.
false alarms
.
correct rejects
.
H= CA ∩ CO
.
FA = CN ∩ CO
.
M= CA ∩ CI
.
CR = CN ∩ CI
.
A
.
D
.
CO
.
X
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
..
Euler diagram
.
Set legend
.
Set notation
.
real-world observations
.
X.X.(a) .
A
.
X
.
(b)
.
labeled by expert as anomalous
.
labeled by expert as normal
.
A
.
X ∩A
.
(c)
.
data points
.
unrecorded
.
D
.
X ∩D
.
A
.
D
.
X
.(d) .
anomalies represented as data points
. normalities represented as data points.
CA = D ∩A. CN = D ∩A.A . D.
X.
(e)
.
classified by algorithm as an outlier
.
classified by algorithm as an inlier
.
CO
.
CI = D ∩ CO
.
A
.
D
.
CO
.
X
.
(f )
.
hits
.
misses
.
false alarms
.
correct rejects
.
H= CA ∩ CO
.
FA = CN ∩ CO
.
M= CA ∩ CI
.
CR = CN ∩ CI
.
A
.
D
.
CO
.
X
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
..
Euler diagram
.
Set legend
.
Set notation
.
real-world observations
.
X.X.(a) .
A
.
X
.
(b)
.
labeled by expert as anomalous
.
labeled by expert as normal
.
A
.
X ∩A
.
(c)
.
data points
.
unrecorded
.
D
.
X ∩D
.
A
.
D
.
X
.(d) .
anomalies represented as data points
. normalities represented as data points.
CA = D ∩A. CN = D ∩A.A . D.
X.
(e)
.
classified by algorithm as an outlier
.
classified by algorithm as an inlier
.
CO
.
CI = D ∩ CO
.
A
.
D
.
CO
.
X
.
(f )
.
hits
.
misses
.
false alarms
.
correct rejects
.
H= CA ∩ CO
.
FA = CN ∩ CO
.
M= CA ∩ CI
.
CR = CN ∩ CI
.
A
.
D
.
CO
.
X
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Confusion matrix
..
..hitHi
..false alarmFA
..missMi
..correct rejectCR
.
Anomaly (CA)
.
Normality (CN)
.Outlier(
CO)
.Inlier(
CI)
.
Expert labels the observation as a(n)
.Algo
rithm
classi
esthedatapo
intasa
n
Outlier Selection and One-Class Classification Jeroen Janssens
...
....
Data set
.
. ..data point
..
....
Labels by the expert
.
. ..anomaly. ..normality
.. .....
....
Classi cations by the algorithm
.
. ..inlier. ..outlier
..
....
Outcome
.
. ..hit. ..false alarm. ..miss. ..correct reject
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Stochastic Outlier Selection
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Stochastic Outlier Selection
• Unsupervised outlier selection algorithm
• Employs concept of affinity
• Computes outlier probabilities
• One parameter: perplexity
• Inspired by t-SNE
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
A data point is selected as an outlier when all the other data pointshave insufficient affinity with it.
Outlier Selection and One-Class Classification Jeroen Janssens
..
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
From input to output
...
..1. 2.1.
2
.
3
.
4
.
5
.
6
...X
.. .. 0.2
.
4
.
6
.
8
....
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...D
.. .. 0.2
.
4
.
6
.
8
....
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...A
.. .. 0.0.2
.
0.4
.
0.6
.
0.8
.
1
....
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
.. .. 0.
0.1
.
0.2
.
0.3
....
..1.1.
2
.
3
.
4
.
5
.
6
...Φ
.. .. 0.0.2
.
0.4
.
0.6
.
0.8
.
1
...
4.2.1
.
4.2.2
.
4.2.3
.
4.2.4 – 6
.
n
.m
.
Subsections:
.n
.n
.n
.1
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Demo: SOS on thecommand-line
(see http://sos.jeroenjanssens.com)
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
From feature matrix to dissimilarity matrix
...........
d2,6 ≡ d6,2 =5.056
.
x1
.
x2
.
x3
.
x4
.
x5
.
x6
.0.
1.
2.
3.
4.
5.
6.
7.
8.
9.0 .
1
.
2
.
3
.
4
.
rst feature x1
.
second
feature
x 2
.
. ..data point
..
..1. 2.1.
2
.
3
.
4
.
5
.
6
...X
.. .. 0.2
.
4
.
6
.
8
....
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...D
.. .. 0.2
.
4
.
6
.
8
...
Equation 2.1
dij =
¿ÁÁÀ
m
∑k=1(xjk − xik)2
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
From feature matrix to dissimilarity matrix
...........
d2,6 ≡ d6,2 =5.056
.
x1
.
x2
.
x3
.
x4
.
x5
.
x6
.0.
1.
2.
3.
4.
5.
6.
7.
8.
9.0 .
1
.
2
.
3
.
4
.
rst feature x1
.
second
feature
x 2
.
. ..data point
..
..1. 2.1.
2
.
3
.
4
.
5
.
6
...X
.. .. 0.2
.
4
.
6
.
8
....
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...D
.. .. 0.2
.
4
.
6
.
8
...
Equation 2.1
dij =
¿ÁÁÀ
m
∑k=1(xjk − xik)2
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Affinity between data points
.....0.
2.
4.
6.
8.
10.0 .
0.2
.
0.4
.
0.6
.
0.8
.
1
.
dissimilarity dij
.
affinity
a ij
.
. ..σ 2i = 0.1
. ..σ 2i = 1
. ..σ 2i = 10
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...D
.. .. 0.2
.
4
.
6
.
8
....
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...A
.. .. 0.0.2
.
0.4
.
0.6
.
0.8
.
1
...
Equation 4.2
aij =⎧⎪⎪⎨⎪⎪⎩
exp (−d 2ij / 2σ 2
i ) if i ≠ j0 if i = j
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Affinity between data points
.....0.
2.
4.
6.
8.
10.0 .
0.2
.
0.4
.
0.6
.
0.8
.
1
.
dissimilarity dij
.
affinity
a ij
.
. ..σ 2i = 0.1
. ..σ 2i = 1
. ..σ 2i = 10
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...D
.. .. 0.2
.
4
.
6
.
8
....
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...A
.. .. 0.0.2
.
0.4
.
0.6
.
0.8
.
1
...
Equation 4.2
aij =⎧⎪⎪⎨⎪⎪⎩
exp (−d 2ij / 2σ 2
i ) if i ≠ j0 if i = j
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Smooth neighborhoods
...........
σ 26
.
x1
.
x2
.
x3
.
x4
.
x5
.
x6
.−1.
0.
1.
2.
3.
4.
5.
6.
7.
8.
9.−1 .
0
.
1
.
2
.
3
.
4
.
5
.
6
.
rst feature x1
.
second
feature
x 2
.
. ..σ 21. ..σ 2
2. ..σ 2
3. ..σ 2
4. ..σ 2
5. ..σ 2
6
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
From affinity to binding probability
...
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...A
.. .. 0.0.2
.
0.4
.
0.6
.
0.8
.
1
....
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
.. .. 0.
0.1
.
0.2
.
0.3
...
Equation 4.5 / 4.6
bij = p (i→ j ∈ EG)∝ aij
bij =aij
∑nk=1 aik
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
From affinity to binding probability
...
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...A
.. .. 0.0.2
.
0.4
.
0.6
.
0.8
.
1
....
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
.. .. 0.
0.1
.
0.2
.
0.3
...
Equation 4.5 / 4.6
bij = p (i→ j ∈ EG)∝ aij
bij =aij
∑nk=1 aik
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Binding probabilities
..
..
(a)
.v1.v2.
v3
.
v4
.
v5
.
v6
. .04.
.21
..24
.
.25
.
.25
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
.. .. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
...
(b)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.10
.
.21
.
.29
.
.30
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.
0.1
.
0.2
.
0.3
..
.
(c)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.18
.
.22
.
.23
.
.24
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
..
.
(d)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.04
.
.04
.
.04
.
.05
.
.05
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.01
.
0.02
.
0.03
.
0.04
.
0.05
..
.
..
(a)
.v1.v2.
v3
.
v4
.
v5
.
v6
. .04.
.21
..24
.
.25
.
.25
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
.. .. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
...
(b)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.10
.
.21
.
.29
.
.30
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.
0.1
.
0.2
.
0.3
..
.
(c)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.18
.
.22
.
.23
.
.24
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
..
.
(d)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.04
.
.04
.
.04
.
.05
.
.05
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.01
.
0.02
.
0.03
.
0.04
.
0.05
..
.
..
(a)
.v1.v2.
v3
.
v4
.
v5
.
v6
. .04.
.21
..24
.
.25
.
.25
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
.. .. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
...
(b)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.10
.
.21
.
.29
.
.30
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.
0.1
.
0.2
.
0.3
..
.
(c)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.18
.
.22
.
.23
.
.24
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
..
.
(d)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.04
.
.04
.
.04
.
.05
.
.05
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.01
.
0.02
.
0.03
.
0.04
.
0.05
..
.
..
(a)
.v1.v2.
v3
.
v4
.
v5
.
v6
. .04.
.21
..24
.
.25
.
.25
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
.. .. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
...
(b)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.10
.
.21
.
.29
.
.30
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.
0.1
.
0.2
.
0.3
..
.
(c)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.18
.
.22
.
.23
.
.24
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
..
.
(d)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.04
.
.04
.
.04
.
.05
.
.05
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.01
.
0.02
.
0.03
.
0.04
.
0.05
..
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Binding probabilities
.
.
..
(a)
.v1.v2.
v3
.
v4
.
v5
.
v6
. .04.
.21
..24
.
.25
.
.25
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
.. .. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
...
(b)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.10
.
.21
.
.29
.
.30
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.
0.1
.
0.2
.
0.3
..
.
(c)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.18
.
.22
.
.23
.
.24
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
..
.
(d)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.04
.
.04
.
.04
.
.05
.
.05
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.01
.
0.02
.
0.03
.
0.04
.
0.05
..
.
..
(a)
.v1.v2.
v3
.
v4
.
v5
.
v6
. .04.
.21
..24
.
.25
.
.25
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
.. .. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
...
(b)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.10
.
.21
.
.29
.
.30
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.
0.1
.
0.2
.
0.3
..
.
(c)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.18
.
.22
.
.23
.
.24
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
..
.
(d)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.04
.
.04
.
.04
.
.05
.
.05
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.01
.
0.02
.
0.03
.
0.04
.
0.05
..
.
..
(a)
.v1.v2.
v3
.
v4
.
v5
.
v6
. .04.
.21
..24
.
.25
.
.25
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
.. .. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
...
(b)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.10
.
.21
.
.29
.
.30
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.
0.1
.
0.2
.
0.3
..
.
(c)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.18
.
.22
.
.23
.
.24
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
..
.
(d)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.04
.
.04
.
.04
.
.05
.
.05
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.01
.
0.02
.
0.03
.
0.04
.
0.05
..
.
..
(a)
.v1.v2.
v3
.
v4
.
v5
.
v6
. .04.
.21
..24
.
.25
.
.25
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
.. .. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
...
(b)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.10
.
.21
.
.29
.
.30
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.
0.1
.
0.2
.
0.3
..
.
(c)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.18
.
.22
.
.23
.
.24
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
..
.
(d)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.04
.
.04
.
.04
.
.05
.
.05
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.01
.
0.02
.
0.03
.
0.04
.
0.05
..
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Binding probabilities
.
.
..
(a)
.v1.v2.
v3
.
v4
.
v5
.
v6
. .04.
.21
..24
.
.25
.
.25
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
.. .. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
...
(b)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.10
.
.21
.
.29
.
.30
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.
0.1
.
0.2
.
0.3
..
.
(c)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.18
.
.22
.
.23
.
.24
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
..
.
(d)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.04
.
.04
.
.04
.
.05
.
.05
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.01
.
0.02
.
0.03
.
0.04
.
0.05
..
.
..
(a)
.v1.v2.
v3
.
v4
.
v5
.
v6
. .04.
.21
..24
.
.25
.
.25
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
.. .. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
...
(b)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.10
.
.21
.
.29
.
.30
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.
0.1
.
0.2
.
0.3
..
.
(c)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.18
.
.22
.
.23
.
.24
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
..
.
(d)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.04
.
.04
.
.04
.
.05
.
.05
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.01
.
0.02
.
0.03
.
0.04
.
0.05
..
.
..
(a)
.v1.v2.
v3
.
v4
.
v5
.
v6
. .04.
.21
..24
.
.25
.
.25
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
.. .. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
...
(b)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.10
.
.21
.
.29
.
.30
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.
0.1
.
0.2
.
0.3
..
.
(c)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.18
.
.22
.
.23
.
.24
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
..
.
(d)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.04
.
.04
.
.04
.
.05
.
.05
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.01
.
0.02
.
0.03
.
0.04
.
0.05
..
.
..
(a)
.v1.v2.
v3
.
v4
.
v5
.
v6
. .04.
.21
..24
.
.25
.
.25
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
.. .. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
...
(b)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.10
.
.21
.
.29
.
.30
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.
0.1
.
0.2
.
0.3
..
.
(c)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.18
.
.22
.
.23
.
.24
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
..
.
(d)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.04
.
.04
.
.04
.
.05
.
.05
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.01
.
0.02
.
0.03
.
0.04
.
0.05
..
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Binding probabilities
.
.
..
(a)
.v1.v2.
v3
.
v4
.
v5
.
v6
. .04.
.21
..24
.
.25
.
.25
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
.. .. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
...
(b)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.10
.
.21
.
.29
.
.30
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.
0.1
.
0.2
.
0.3
..
.
(c)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.18
.
.22
.
.23
.
.24
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
..
.
(d)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.04
.
.04
.
.04
.
.05
.
.05
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.01
.
0.02
.
0.03
.
0.04
.
0.05
..
.
..
(a)
.v1.v2.
v3
.
v4
.
v5
.
v6
. .04.
.21
..24
.
.25
.
.25
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
.. .. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
...
(b)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.10
.
.21
.
.29
.
.30
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.
0.1
.
0.2
.
0.3
..
.
(c)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.18
.
.22
.
.23
.
.24
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
..
.
(d)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.04
.
.04
.
.04
.
.05
.
.05
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.01
.
0.02
.
0.03
.
0.04
.
0.05
..
.
..
(a)
.v1.v2.
v3
.
v4
.
v5
.
v6
. .04.
.21
..24
.
.25
.
.25
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
.. .. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
...
(b)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.10
.
.21
.
.29
.
.30
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.
0.1
.
0.2
.
0.3
..
.
(c)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.18
.
.22
.
.23
.
.24
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
..
.
(d)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.04
.
.04
.
.04
.
.05
.
.05
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.01
.
0.02
.
0.03
.
0.04
.
0.05
..
.
..
(a)
.v1.v2.
v3
.
v4
.
v5
.
v6
. .04.
.21
..24
.
.25
.
.25
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
.. .. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
...
(b)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.10
.
.21
.
.29
.
.30
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.
0.1
.
0.2
.
0.3
..
.
(c)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.10
.
.18
.
.22
.
.23
.
.24
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.05
.
0.1
.
0.15
.
0.2
.
0.25
..
.
(d)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
.04
.
.04
.
.04
.
.05
.
.05
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
..
...
..
.. 0.0.01
.
0.02
.
0.03
.
0.04
.
0.05
..
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Stochastic Neighbor Graph
G = (V,EG)
p(G) =∏i→j ∈EG
bij .
CO ∣G = {xi ∈ X ∣ deg−G(vi) = 0}= {xi ∈ X ∣ ∄vj ∈ V ∶ j→ i ∈ EG}= {xi ∈ X ∣ ∀vj ∈ V ∶ j→ i ∉ EG}
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Stochastic Neighbor Graph
G = (V,EG)
p(G) =∏i→j ∈EG
bij .
CO ∣G = {xi ∈ X ∣ deg−G(vi) = 0}= {xi ∈ X ∣ ∄vj ∈ V ∶ j→ i ∈ EG}= {xi ∈ X ∣ ∀vj ∈ V ∶ j→ i ∉ EG}
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Stochastic Neighbor Graph
G = (V,EG)
p(G) =∏i→j ∈EG
bij .
CO ∣G = {xi ∈ X ∣ deg−G(vi) = 0}= {xi ∈ X ∣ ∄vj ∈ V ∶ j→ i ∈ EG}= {xi ∈ X ∣ ∀vj ∈ V ∶ j→ i ∉ EG}
Outlier Selection and One-Class Classification Jeroen Janssens
..
(Ga)
.v1.v2.
v3
.
v4
.
v5
.
v6
.
p(Ga) = 3.931 ⋅ 10−4
.CO∣Ga = {x1, x4, x6}
.
(Gb)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
p(Gb) = 4.562 ⋅ 10−5
.
CO∣Gb = {x5, x6}
.
(Gc)
.
v1
.
v2
.
v3
.
v4
.
v5
.
v6
.
p(Gc) = 5.950 ⋅ 10−7
.
CO∣Gc = {x1, x3}
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Set of all SNGs
.....0 .
2
.
4
.
6
.
8
.
10
.
⋅10−4
..
Ga
..
Gb
..
Gc
.G
.
prob
abilitymass
.. ..0 .
0.2
.
0.4
.
0.6
.
0.8
.
1
.G
.cumulativeprob
abilitymass
.
. ..h = 4.0
. ..h = 4.5
. ..h = 5.0
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Approximating outlier probabilities by sampling SNGs
... ..100.
101.
102.
103.
104.
105.
106.0 .
0.2
.
0.4
.
0.6
.
0.8
.
1
.
sampling iteration
.
outlierprob
ability
.
. ..x1
. ..x2
. ..x3
. ..x4
. ..x5
. ..x6
p(xi ∈ CO) = limS→∞
1
S
S
∑s=1
I{xi ∈ CO ∣G(s)} , G(s) ∼ P(G)
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Demo: Sampling SNGs inCoffeeScript and D3(see http://sos.jeroenjanssens.com)
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Computing outlier probabilities through marginalisation
p(xi ∈ CO) =∑G∈G
I{xi ∈ CO ∣G} ⋅ p(G)
=∑G∈G
I{xi ∈ CO ∣G} ⋅∏q→r ∈EG
bqr .
∣G∣ = (n − 1)n
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Computing outlier probabilities in closed form
...........
x1
.
x2
.
x3
.
x4
.
x5
.
x6
.0.
1.
2.
3.
4.
5.
6.
7.
8.
9.0 .
1
.
2
.
3
.
4
.
rst feature x1
.
second
feature
x 2
.
. ..data point
..
..1. 2. 3. 4. 5. 6.1.
2
.
3
.
4
.
5
.
6
...B
.... 0.
0.1
.
0.2
.
0.3
.. ..
..1.1.
2
.
3
.
4
.
5
.
6
...Φ
.... 0.0.2
.
0.4
.
0.6
.
0.8
.
1
.. .
Equation 4.14
p(xi ∈ CO) =∏j≠i(1 − bji)
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
xi ∈ CO ∣G ⇐⇒ deg−G(vi) = 0 (1)
p(xi ∈ CO) = EG [I{deg−G(vi) = 0}] (2)
p(xi ∈ CO) = EG [∏j≠i
I{ j→ i ∉ EG}] (3)
p(xi ∈ CO) = EG [∏j≠i(1 − I{ j→ i ∈ EG})] (4)
p(xi ∈ CO) =∏j≠i(1 −EG [I{ j→ i ∈ EG}]) (5)
p(xi ∈ CO) =∏j≠i(1 − p( j→ i ∈ EG)) (6)
p(xi ∈ CO) =∏j≠i(1 − bji) (7)
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Selecting outliers
...........
x1
.
x2
.
x3
.
x4
.
x5
.
x6
.−1.
0.
1.
2.
3.
4.
5.
6.
7.
8.
9.−1 .
0
.
1
.
2
.
3
.
4
.
5
.
rst feature x1
.
second
feature
x 2
.
. ..outlier
. ..inlier
f(x) =⎧⎪⎪⎨⎪⎪⎩
outlier if p(x ∈ CO) > θ,inlier if p(x ∈ CO) ≤ θ.
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Adaptive variances via the perplexity parameter
... ..10−1
.100.
101.
102.
1.
2
.
3
.
4
.
5
.
variance σ 2i
.
perplexity
h(b i)
.. ..5
.10
.15
......
variance σ 2i
..
. ..h = 4.5
. ..x1
. ..x2
. ..x3
. ..x4
. ..x5
. ..x6
h(bi) = 2H(bi) , H(bi) = −n
∑j=1j≠i
bij log2 (bij)
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Continuous binary search
...
.............
5
.
10
.
15
..
variance
σ2 i
.
. ..x1
. ..x2
. ..x3
. ..x4
. ..x5
. ..x6..
..
0
.
1
.
2
.
3
.
4
.
5
.
6
.
7
.
8
.
9
.
10
.
4.2
.
4.4
.
4.6
.
4.8
.
binary search iteration
.
perplexity
h(b i)
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Perplexity influences outlier probabilities
.....0.
1.
2.
3.
4.
5.
6.0 .
0.2
.
0.4
.
0.6
.
0.8
.
1
.
perplexity h
.
outlierprob
ability
.
. ..x1
. ..x2
. ..x3
. ..x4
. ..x5
. ..x6
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Experiments and results
Outlier Selection and One-Class Classification Jeroen Janssens
..
..
......
...
......
...
......
...
......
...
......
...
.
Banana
.
Densities
.
Ring
.
KNNDD(k=
10)
.LO
F(k=10)
.
LOCI
.
SOS(h=10)
.
LSOD
.
A
.
B
.
C
.
D
.
E
.
H
.G .
F
.
I
.
J
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
One-class datasets
..
...
..4.
5.
6.
7.
8.0 .
1
.
2
.
3
.
Sepal length
.
Petalw
idth
.
DM = Iris ower data set
.
. ..C1 = Setosa
. ..C2 = Versicolor
. ..C3 = Virginica
......
D1
.
. ..CA = Versicolor ∪ Virginica
. ..CN = Setosa
......
D2
.
. ..CA = Setosa ∪ Virginica
. ..CN = Versicolor
.. ....
D3
.
. ..CA = Setosa ∪ Versicolor
. ..CN = Virginica
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Weighted AUC
..
.....0 .
0.5
.
1
..
outlierscore
.
. ..hit. ..false alarm. ..miss. ..correct reject
.CA
.CN
.
CO
.
CI
.
θ′.
.....0
.0.2
.0.4
.0.6
.0.8
.1
.0 .
0.2
.
0.4
.
0.6
.
0.8
.
1
.
false alarm rate
.
hitrate
.. ... 0.2.
0.4
.
0.6
.
0.8
.
1
...
θ′= 0.57
.
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Real-world datasets
...
..
Iris
.
Ecoli
.
Breast W. Original.
Delft Pump.
Waveform Generator.
Wine Recognition.
Colon Gene.
BioMed.
Vehicle Silhouettes.
Glass Identi cation.
Boston Housing.
Heart Disease.
Arrhythmia.
Haberman’s Survival.
Breast W. New.
Liver Disorders.
SPECTF Heart.
Hepatitis.
0
.
0.5
.
0.6
.
0.7
.
0.8
.
0.9
.
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. ..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
..
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
weigh
tedAU
C
.
. ..SOS . ..KNNDD . ..LOF . ..LOCI . ..LSOD
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Synthetic datasets
Parameter λ
Data set Determines λstart step size λend
(a) Radius of ring 5 0.1 2(b) Cardinality of cluster and ring 100 5 5(c) Distance between clusters 4 0.1 0(d) Cardinality of one cluster and ring 0 5 45(e) Density of one cluster and ring 1 0.05 0(f ) Radius of square ring 2 0.05 0.8(g) Radius of square ring 2 0.05 0.8
Outlier Selection and One-Class Classification Jeroen Janssens
..
..
..
(a)
.......
.......
......
..
(b)
.......
.......
......
..
(c)
.......
.......
......
..
(d)
.......
.......
......
..
(e)
.......
.......
......
..
(f)
.......
.......
......
..
(g)
.......
.......
......
.
λstart
. λinter
.
λend
.
..
...anom
aly.
..normality
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Results on synthetic datasets
.. ...
..
2
.
2.5
.
3
.
3.5
.
4
.
0.6
.
0.7
.
0.8
.
0.9
.
1
....
λ
.
AUC
.
(a)
....10 .20 .30 .40 .........
λ..
(b)
..
..0.1
.2 .3 .4
.
0.6
.
0.7
.
0.8
.
0.9
.
1
....λ
.
AUC
.
(c)
..
..
10
.
20
.
30
.
40
.........
λ
..
(d)
..
..
0
.
0.1
.
0.2
.
0.3
.
0.4
.
0.5
.0.6 .0.7
.
0.8
.
0.9
.
1
....
λ
.
AUC
.
(e)
..
..
0.8
.
1
.
1.2
.
1.4
.........λ
..
(f )
..
..0.8 .1 .1.2 .1.4 .0.6
.
0.7
.
0.8
.
0.9
.
1
....
λ
.
AUC
.
(g)
.
. ..SOS
. ..KNNDD
. ..LOF
. ..LOCI
. ..LSOD
.
...
..
2
.
2.5
.
3
.
3.5
.
4
.
0.6
.
0.7
.
0.8
.
0.9
.
1
....
λ
.
AUC
.
(a)
....10 .20 .30 .40 .........
λ..
(b)
..
..0.1
.2 .3 .4
.
0.6
.
0.7
.
0.8
.
0.9
.
1
....λ
.
AUC
.
(c)
..
..
10
.
20
.
30
.
40
.........
λ
..
(d)
..
..
0
.
0.1
.
0.2
.
0.3
.
0.4
.
0.5
.0.6 .0.7
.
0.8
.
0.9
.
1
....
λ
.
AUC
.
(e)
..
..
0.8
.
1
.
1.2
.
1.4
.........λ
..
(f )
..
..0.8 .1 .1.2 .1.4 .0.6
.
0.7
.
0.8
.
0.9
.
1
....
λ
.
AUC
.
(g)
.
. ..SOS
. ..KNNDD
. ..LOF
. ..LOCI
. ..LSOD
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Results on synthetic datasets
.
. ...
..
2
.
2.5
.
3
.
3.5
.
4
.
0.6
.
0.7
.
0.8
.
0.9
.
1
....
λ
.
AUC
.
(a)
....10 .20 .30 .40 .........
λ..
(b)
..
..0.1
.2 .3 .4
.
0.6
.
0.7
.
0.8
.
0.9
.
1
....λ
.
AUC
.
(c)
..
..
10
.
20
.
30
.
40
.........
λ
..
(d)
..
..
0
.
0.1
.
0.2
.
0.3
.
0.4
.
0.5
.0.6 .0.7
.
0.8
.
0.9
.
1
....
λ
.
AUC
.
(e)
..
..
0.8
.
1
.
1.2
.
1.4
.........λ
..
(f )
..
..0.8 .1 .1.2 .1.4 .0.6
.
0.7
.
0.8
.
0.9
.
1
....
λ
.
AUC
.
(g)
.
. ..SOS
. ..KNNDD
. ..LOF
. ..LOCI
. ..LSOD
.
...
..
2
.
2.5
.
3
.
3.5
.
4
.
0.6
.
0.7
.
0.8
.
0.9
.
1
....
λ
.
AUC
.
(a)
....10 .20 .30 .40 .........
λ..
(b)
..
..0.1
.2 .3 .4
.
0.6
.
0.7
.
0.8
.
0.9
.
1
....λ
.
AUC
.
(c)
..
..
10
.
20
.
30
.
40
.........
λ
..
(d)
..
..
0
.
0.1
.
0.2
.
0.3
.
0.4
.
0.5
.0.6 .0.7
.
0.8
.
0.9
.
1
....
λ
.
AUC
.
(e)
..
..
0.8
.
1
.
1.2
.
1.4
.........λ
..
(f )
..
..0.8 .1 .1.2 .1.4 .0.6
.
0.7
.
0.8
.
0.9
.
1
....
λ
.
AUC
.
(g)
.
. ..SOS
. ..KNNDD
. ..LOF
. ..LOCI
. ..LSOD
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
SOS performs significantly better
..1
.2
.3
.4
.5
.
cd (p < .05)
.
SOS
.
LOF
.
LOCI
.
KNNDD
.
LSOD
.
1
.
2
.
3
.
4
.
5
.
cd (p < .01)
.
SOS
.
LOCI
.
LSOD
.
LOF
.
KNNDD
Outlier Selection and One-Class Classification Jeroen Janssens
. . . . . . . . . . . .Anomalies and outliers
. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection
. . . . . . . . . . .Experiments and results
Conclusion
• Outlier selection can support the detection of anomalies
• SOS is an intuitive and probabilistic algorithm to select outliers
• SOS has a very good performance
• No free lunch
Outlier Selection and One-Class Classification Jeroen Janssens
...
Outlier Selection andOne-Class Classification
.
Jeroen Janssens @jeroenhjanssens