Outlier Detection with Dirichlet Process Mixtures · Outlier Detection with Dirichlet Process...
Transcript of Outlier Detection with Dirichlet Process Mixtures · Outlier Detection with Dirichlet Process...
Outlier Detection with Dirichlet Process Mixtures
Matthew S. Shotwell1 Elizabeth H. Slate2
Vanderbilt University1
Medical University of South Carolina2
August 3, 2011
Dirichlet Process Mixture (DPM)
yi|θi ∼ L(θi; yi) i = 1 .. nθi ∼ GG ∼ DP (α,G0)
I DP is a distribution over distributions
I G is discrete ⇒ P (θj = θk) > 0
I if θj = θk, then yj and yk are clustered
Product Partition Model (PPM)
yi|zi = k, φk ∼ L(φk; yi) i = 1 .. nφk ∼ G0(φk) k = 1 .. r
P (z) ∝r∏
k=1
αΓ(nk)
I z is the data partition parameter
I estimating z ‘partitions’, or ‘clusters’ the data
I if zj = zk, then yj and yk are clustered
I [Hartigan, 1990]
Outlier Detection Using PartitioningSteps:
1. set a “small” cluster threshold (e.g. 1% of n)2. estimate the data partition (i.e. cluster the data)3. “small” clusters are considered outlying
I an outlier partition contains one or more small outlier clusters
Histogram
Den
sity
0 5 10
0.0
0.1
0.2
0.3
Quantifying Evidence to Detect Outliers: Questions
If the data partition (z) is estimated, and outlier clustersare discovered, how much evidence suggests that theseclusters are truely different from the others?
Can the partition estimate be restricted such that aminimum level of evidence is required to identify outlierclusters? Yes!
A Criterion for Outlier Detection: Setup
Consider an outlier partition zo (n = 10):
zo = [1, 1, 1, 1, 1, 1, 1, 1, 2, 3]
zm1 = [1, 1, 1, 1, 1, 1, 1, 1, 1, 3]
zm2 = [1, 1, 1, 1, 1, 1, 1, 1, 2, 1]
zm3 = [1, 1, 1, 1, 1, 1, 1, 1, 2, 2]
zm4 = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I zm· are formed by merging the outliers in zo.
I outlier detection is a decision between zo and zm·.
I denote the collection zm· as Mo
A Criterion for Outlier Detection: The Trick
zo is favored if, for all zm ∈Mo
P (zo|y) > P (zm|y)
P (y|zo)P (y|zm)
>P (zm)
P (zo)
BFom >P (zm)
P (zo)
BFom >1
ανβom
I ν is the number of clusters merged to arrive at zm
I βom (a ratio involving Γ(·)) is always ≥ 1 for zm ∈Mo
I to favor zo, BFom must exceed 1αν
I BFom must increase 1/α fold for each outlier
A Criterion for Outlier Detection: How to Fix α
I set the criteria by fixing α
I use Jeffrey’s scale of evidence for Bayes factors
I [Efron and Gous, 2001]
Evidence for zo1/α < 1 negative
1 ≤ 1/α < 3 barely worth a mention3 ≤ 1/α < 20 positive
20 ≤ 1/α < 150 strong150 ≤ 1/α very strong
A Criterion for Outlier Detection: Nice Properties
MAP partition estimates automatically satisfy thecriterion for fixed α. Hence, no special or novelcomputational methods are required.
Because the DPM accommodates any data likelihood,outlier detection with Dirichlet process mixtures ispossible with any statistical model that specifies alikelihood function.
Microarray Time Series in Cell Cycle Synchronized YeastI [Spellman et al., 1998]I 66 minute period, 2 cycles
G1 Phase
Time (min)
−2
−1
0
1
2
0 20 40 60 80 100 120
Norm
aliz
ed E
xpre
ssio
n
Microarray Time Series in Cell Cycle Synchronized Yeast
α = 1150
G1 Phase
Time (min)
Nor
mal
ized
Exp
ress
ion
−2
−10
12
20 60 100
n=107 n=81
20 60 100
n=20 n=6
20 60 100
n=16
n=11 n=4 n=4 n=6
−2−10
12n=10
−2−1
012 n=5 n=1 n=2 n=4 n=1
n=3 n=4 n=2 n=2
−2−1
012n=1
−2−1
01
2 n=1
20 60 100
n=2 n=2
20 60 100
n=1 n=1
MAP Estimation for z
I Agglomeration [Ward, 1963]
I Polya Urn Gibbs Sampler [MacEachern, 1994]
I Split-Merge Sampler [Jain and Neal, 2004]
I SUGS [Wang and Dunson, 2010]
I sampling is overkill for MAP estimationI we proposed a stochastic algorithm:
I consists of ‘Explode’ and ‘Merge’ stepsI consistent for the MAP estimateI avoids complexity of samplingI facilitates parallel search of partition space
I R package profdpm
Outlier Detection with Finite Mixtures
I [Fraley and Raftery, 2002]
I select z that maximizes the BIC
I requires BFom > nρ2ν
I i.e. BFom must increase nρ2 fold for each outlier
I DPM outlier detection is generally more conservative.
Efron, B. and Gous, A. (2001).
Scales of evidence for model selection: Fisher versus jeffreys.In Lahiri, P., editor, Model Selection: Lecture Notes–Monograph Series, volume 38, pages 208–246.Institute of Mathematical Statistics, Beachwood, OH.
Fraley, C. and Raftery, A. E. (2002).
Model-based clustering, descriminant analysis, and density estimation.Journal of the American Statistical Association, 97:611–631.
Hartigan, J. A. (1990).
Partition models.Communications in Statistics, Theory and Methods, 19:2745–2756.
Jain, S. and Neal, R. M. (2004).
A split-merge markov chain monte carlo procedure for the dirichlet process mixture model.Journal of Computational and Graphical Statistics, 13(1):158–182.
MacEachern, S. N. (1994).
Estimating normal means with a conjugate style dirichlet process prior.Communications in Statistics B, 23:727–741.
Spellman, P. T., Sherlock, G., Zhang, M. Q. Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein,
D., and Futcher, B. (1998).Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae bymicroarray hybridization.Molecular Biology of the Cell, 9:3273–3297.
Wang, L. and Dunson, D. B. (2010).
Fast bayesian inference in dirichlet process mixture models.Journal of Computational and Graphical Statistics, In Press.
Ward, J. H. (1963).
Hierarchical grouping to optimize an objective function.Journal of the American Statistical Association, 58(301):236–244.