Integrating Constraints and Metric Learning in Semi-Supervised Clustering

21
Integrating Constraints and Metric Learning in Semi- Supervised Clustering Mikhail Bilenko, Sugato Basu, Raymond J. Mooney ICML 2004 Presented by Xin Li

description

Integrating Constraints and Metric Learning in Semi-Supervised Clustering. Mikhail Bilenko, Sugato Basu, Raymond J. Mooney ICML 2004 Presented by Xin Li. Semi-Supervised Clustering. K=4. Semi-Supervised Clustering. Semi-Supervised Clustering. How to exploit supervision in clustering. - PowerPoint PPT Presentation

Transcript of Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Page 1: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Mikhail Bilenko, Sugato Basu, Raymond J. Mooney

ICML 2004

Presented by Xin Li

Page 2: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Semi-Supervised Clustering

K=4

Page 3: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Semi-Supervised Clustering

Page 4: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Semi-Supervised Clustering

Page 5: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

How to exploit supervision in clustering

Incorporate supervision as constraints Learn a distance metric using

supervision Integration of these two approaches

Page 6: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

K-means Clustering

X = {x1,x2,…}

L = {l1,l2,…,lk}

Euclidean Distance:

Minimizing:

Page 7: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Clustering with constraints

Pairwise constraints: M – Must-link pairs

(xi, xj) should be in the same cluster

C -- Cannot-link pairs (xi, xj) should be in different

clusters

Page 8: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Learning a pairwise distance metricBinary Classification: (xi, xj) 0/1 M positive examples

(xi, xj) are the same cluster

C negative examples (xi, xj) are in different clusters

Apply the learned distance metric in clustering Metric learning and clustering are disjointed

Page 9: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Unsupervised Clustering with Metric Learning

Maximizing the complete data log-likelihood under generalized K-means

Learn a distance metric that optimize a quality function

Page 10: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Integrating Constraints and Metric Learning

Combining the previous two equations leads to the following objective function that minimizes cluster dispersion under that learned metrics while reducing constraint violations.

Page 11: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Penalty for violating constraints

Penalty for violating a must-link constraints between distant points should be higher than that between nearby points.

Penalty for violating a cannot-link constraints between nearby points should be lower than that between nearby points.

Page 12: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

MPCK-MEANS Algorithm

Constraints are utilized during cluster initialization and when assigning points to clusters.

The distance metric is adapted by re-estimating the weights in matrices Ah.

Page 13: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Initialization An initial guess of the clusters. Assign each point x to one of K clusters in a way that satisfies the

constraints. Compute the centroid of each cluster.

Page 14: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

E-step

Every point x is assigned to the cluster that minimizes the sum of the distance of x to the cluster centroid according to the local metric and the cost of any constraint violations incurred by the cluster assignment.

Page 15: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

M-Step

= 0

Update Metrics:

Page 16: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Experimental Setting

Page 17: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Single Metric, Diagonal Matrix A

Page 18: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Single Metric, Diagonal Matrix A

Page 19: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Multiple Metrics, Full Matrix A

Page 20: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Multiple Metrics, Full Matrix A

Page 21: Integrating Constraints and Metric Learning in Semi-Supervised Clustering

Conclusion and Discussion

This paper has presented MPCK-MEANS, a new approach to semi-supervised clustering.

Supervision and metric learning are helpful in clustering and multiple distance metrics are not necessary in most cases.

Question 1: If we have supervision in clustering, why not utilize supervision in the same way as in a typical classification task ?

Question 2: If there are infinite number of classes, can we gain from supervision on part of them ?