Support Vector Clustering Algorithm

20
Support Vector Clustering Algorithm presentation by : Jialiang Wu

description

Support Vector Clustering Algorithm. presentation by : Jialiang Wu. Reference paper and code website. Support Vector Clustering by Asa Ben-Hur, David Horn, Hava T. Siegelmann, and Vladimir Vapnik. www.cs.tau.ac.il/~borens/course/ml/cluster.html by Elhanan Borenstein, Ofer,and Orit. - PowerPoint PPT Presentation

Transcript of Support Vector Clustering Algorithm

Page 1: Support Vector Clustering Algorithm

Support Vector Clustering Algorithm

presentation by : Jialiang Wu

Page 2: Support Vector Clustering Algorithm

Reference paper and code website

• Support Vector Clustering by Asa Ben-Hur, David Horn, Hava T. Siegelmann, and Vladimir Vapnik.

• www.cs.tau.ac.il/~borens/course/ml/cluster.html by Elhanan Borenstein, Ofer,and Orit.

Page 3: Support Vector Clustering Algorithm

Clustering

Clustering algorithm groups data according to the distance between points.

• Points are close to each other will be allocated to the same cluster.

• Clustering is most effective is data has some geometric structure.

• Outliers may cause unjust increase in cluster size or a fault clustering.

Page 4: Support Vector Clustering Algorithm

Support Vector Machine(SVM)

• SVM maps the data from data space to a higher dimensional feature space through a suitable nonlinear mapping.

• Data from two categories can always be separated by a hyper-plane.

Page 5: Support Vector Clustering Algorithm

Support Vector Machine(SVM)Main Idea: 1.Much of the geometry of the data in the embedding space (relative positions) is contained in all pairwise inner product. We can work in that space by specifying an inner product

function between points in it. An explicit mapping is not necessary.

2. In many cases, the inner product have simple kernel representation and therefore can be easily evaluated.

Page 6: Support Vector Clustering Algorithm

Support Vector Clustering(SVC)

• SVC map data from data space to higher dimensional feature space using a Gaussian kernel.

• In feature space we look for the smallest sphere the encloses the image of the data.

• When the sphere is mapped back to data space, it forms a set of contours, which enclose the data points.

Page 7: Support Vector Clustering Algorithm

Support Vector Clustering(SVC)

• The clustering level is controlled by: 1) q---the width parameter of Gaussian kernel: q increase number of disconnected contour increase, number of clusters increase. 2) C--- the soft margin constant that allow sphere in feature space not to enclose all points.

Page 8: Support Vector Clustering Algorithm

clustering controlled by q

Page 9: Support Vector Clustering Algorithm

Cross Dataset:q=0.5,C=1

Page 10: Support Vector Clustering Algorithm

Cross Dataset:as q grows...

Page 11: Support Vector Clustering Algorithm

Cross Dataset:as q grows, the number of cluster increase

Page 12: Support Vector Clustering Algorithm

Circle with noise: #noise pts.=30,q=2,C=1

Page 13: Support Vector Clustering Algorithm

Circle with noise: #noise pts.=30, q=2,C=1

Page 14: Support Vector Clustering Algorithm

Circle with noise: #noise pts.=30, q=10,C=1

Page 15: Support Vector Clustering Algorithm

Circle with noise: #noise pts.=30, q=10,C=1

Page 16: Support Vector Clustering Algorithm

Circle with noise: #noise pts.=100, q=2,C=1

Page 17: Support Vector Clustering Algorithm

Circle with noise: #noise pts.=100, q=2,C=1

Page 18: Support Vector Clustering Algorithm

Conclusions

• points located close to one another tend to be allocated to the same cluster.

• the number of clusters increase as q grows.• q depends considerably on the specific sample

points(scaling, range, scatter,etc.) , there is no one q which is always appropriate. Use drill-down search for dataset is a solution but it's very time consuming.

• When samples represent a relatively large number of classes, the SVC in less efficient.

Page 19: Support Vector Clustering Algorithm

My work on progress

• Theoretical exploration:

To find out whether there is restriction we can impose on the inner product such that the mapped back figure in the data space is connected (or has only one component).

• Importance

Page 20: Support Vector Clustering Algorithm

Q & A