Presenter : Lin, Shu -Han Authors : Jeen-Shing Wang, Jen- Chieh Chiang
description
Transcript of Presenter : Lin, Shu -Han Authors : Jeen-Shing Wang, Jen- Chieh Chiang
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
A cluster validity measure with a hybrid parameter search method
for the support vector clustering algorithm
Presenter : Lin, Shu-HanAuthors : Jeen-Shing Wang, Jen-Chieh Chiang
PR (2008)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
2
Outline
Introduction of SVC Motivation Objective Methodology Experiments Conclusion Comments
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.SVC
SVC is from SVMs SVMs is supervised clustering technique
Fast convergence Good generalization performance Robustness for noise
SVC is unsupervised approach1. Data points map to HD feature space using a Gaussian kernel.
2. Look for smallest sphere enclose data.
3. Map sphere back to data space to form set of contours.
4. Contours are treated as the cluster boundaries.
3
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.SVC - Sphere Analysis
To find the minimal enclose sphere with soft margin:
To solve this problem, the Lagrangian function:
4
a
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.SVC - Sphere Analysis
5
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.SVC - Sphere Analysis
Karush-Kuhn-Tucker complementarity:
6
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.SVC -Sphere Analysis
To find the minimal enclose sphere with soft margin:
C : existence of outliers allowed
7
Wolfe dual optimization
problem a
Bound SV; Outlier
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.SVC -Sphere Analysis
The distance (similarity) between x and a:
q : |clusters| & the smoothness/tightness of the cluster boundaries.
8
Mercer kernelKernel: Gaussian
a
Gaussian function:
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Motivation
9
Drawbacks of Cluster validation Compactness
Different densities or size As the # of clusters increases, it will monotonic decrease
Separation Irregular cluster structures
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Motivation
10
Their previous study Can handle
Different sizes Different densities Arbitrary shape
But…
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Objectives – A cluster validity method and a parameter search algorithm for SVC
Auto determine the two parameter: Increasing q lead to increasing # of clusters C regulates the existence of outliers and overlapping clusters
To Identify the optimal structure
11
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology- Idea
12
q is related to the densities of the clusters Each cluster structure corresponds to an interval of q Identify the optimal structure is equivalent to finding the
largest interval
N=64, max # of cluster = , 8 N
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology- Problem
13
How to locate overall search range of q How to detect outliers/noises How to identify the largest interval
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – Locate range of q
14
Lower bound
Upper bound: Employ K-Means to get clusters, and get variance of each clusters vi
N
Ascending order: cluster size
n =3, the biggest 3 clusters’ variance
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – Outlier Detection
Set q = qmax ,the tightest of q
15
outliersingleton
And we get Copt, remove these outlier
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – the largest interval
16
qopt
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – the largest interval
17
Fibonacci search: locate the interval wherethe cluster structure is the same
Bisection search
n: iteration
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology – Overview
18
Locate range of q
Outlier Detection
the largest interval
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Experiments - Benchmark and Artificial Examples
19
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Experiments - Outlier
20
Copt
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments
21
?
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
22
Conclusions
A new measure: Inspired from the observations of q
Determine the optimal cluster structure with its corresponding range of q and C
qC
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
23
Comments
Advantage Inspired from observation of parameter
Drawback …
Application SVC DBSCAN: MinPts / Eps